Filter out low-complexity paired- or single-end reads with PRINSEQ.
This App filters low-complexity reads from single- or paired-end read libraries using PRINSEQ. There are two filtering methods to choose from (i) Dust and (ii) Entropy. Note that the Entropy threshold becomes more strict as the threshold value increases, whereas the Dust threshold becomes more strict as the threshold decreases. The threshold value field used is determined by the Filtering Method selected.
If the input reads object is a SingleEndLibrary, the resulting object will be a filtered SingleEndLibrary object. If the input reads object is a PairedEndLibrary, however, there is the potential for the following 3 objects to be made:
- A PairedEndLibrary object in which both the forward (FWD) and reverse (REV) reads passed the complexity filters.
- A FWD SingleEndLibrary in which the paired REV reads failed complexity filtering.
- A REV SingleEndLibrary in which the paired FWD reads failed complexity filtering.
Output Report:
The output first lists the objects created within KBase as a result of running PRINSEQ.
The next section of the output details summary statistics of PRINSEQ. Note if the input was a PairedEndLibrary, PRINSEQ internally breaks up that object into two separate input files, one for each direction. As a result, more statistical information is generated for PairedEndLibraries. The summary stats can potentially include:
- Input sequences (from 1 or 2 files)
- Input bases (from 1 or 2 files)
- Input mean length (from 1 or 2 files)
- Input sequences (from 1 or 2 files)
- Good sequences (Uncategorized if SingleEndLibrary input, Pairs or Singletons for PairedEndLibrary input
- Good bases (Uncategorized if SingleEndLibrary input, Pairs or Singletons for PairedEndLibrary input
- Good mean length (Uncategorized if SingleEndLibrary input, Pairs or Singletons for PairedEndLibrary input
- Bad sequences (Uncategorized if SingleEndLibrary input, File identifier for PairedEndLibrary input
- Bad bases (Uncategorized if SingleEndLibrary input, File identifier for PairedEndLibrary input
- Bad mean length (Uncategorized if SingleEndLibrary input, File identifier for PairedEndLibrary input
Related Publications
- Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011;27: 863 864. doi:10.1093/bioinformatics/btr026 , http://www.ncbi.nlm.nih.gov/pubmed/21278185
- PRINSEQ source: , http://prinseq.sourceforge.net/
App Specification:
https://github.com/kbaseapps/kb_PRINSEQ//tree/8d686578cddaee4a9b7b5e3a642cf27f380fc00f/ui/narrative/methods/execReadLibraryPRINSEQModule Commit: 8d686578cddaee4a9b7b5e3a642cf27f380fc00f