I'm starting to use cutadapt instead of sickle, and I'm not sure I understand how it works exactly. I tried to use minimum length of 20 bp and minimum quality score of 20, and the results are super different between the three programs.
This is cutadapt output:
cutadapt -q 20 -m 20 -o output_q20_m20.fastq input.fastq This is cutadapt 1.13 with Python 2.7.9 Command line parameters: -q 20 -m 20 -o output_q20_m20.fastq input.fastq Trimming 0 adapters with at most 10.0% errors in single-end mode ... Finished in 12.03 s (3 us/read; 17.89 M reads/minute). === Summary === Total reads processed: 3,587,045 Reads with adapters: 0 (0.0%) Reads that were too short: 571,747 (15.9%) Reads written (passing filters): 3,015,298 (84.1%) Total basepairs processed: 125,546,575 bp Quality-trimmed: 21,189,744 bp (16.9%) Total written (filtered): 98,543,830 bp (78.5%)
This is sickle output:
sickle se --fastq-file input.fastq --qual-type sanger --qual-threshold 20 --length-threshold 20 --output-file output_sickle_q20_m20.fastq SE input file: input.fastq Total FastQ records: 3587045 FastQ records kept: 2449731 FastQ records discarded: 1137314
This is BBDuk output:
./bbduk.sh -Xmx1g in=input.fastq out=output_bbduk.fq qtrim=r trimq=20 ml=20 overwrite=true java -Djava.library.path=/home/rioualen/Desktop/bbmap/jni/ -ea -Xmx1g -Xms1g -cp /home/rioualen/Desktop/bbmap/current/ jgi.BBDukF -Xmx1g in=input.fastq out=output_bbduk.fq qtrim=r trimq=20 ml=20 overwrite=true Executing jgi.BBDukF [-Xmx1g, in=input.fastq, out=output_bbduk.fq, qtrim=r, trimq=20, ml=20, overwrite=true] BBDuk version 37.10 Initial: Memory: max=1029m, free=993m, used=36m Input is being processed as unpaired Started output streams: 0.020 seconds. Processing time: 2.012 seconds. Input: 3587045 reads 125546575 bases. QTrimmed: 3156968 reads (88.01%) 68828955 bases (54.82%) Total Removed: 1592853 reads (44.41%) 68828955 bases (54.82%) Result: 1994192 reads (55.59%) 56717620 bases (45.18%) Time: 2.059 seconds. Reads Processed: 3587k 1742.23k reads/sec Bases Processed: 125m 60.98m bases/sec
How can there be such a big difference between them, eg 15.9%, 31.7% and 44.4% of reads filtered?
Is cutadapt saying " Trimming 0 adapters with at most 10.0% errors in single-end mode " a bug? Cause I checked the fastq files and it is properly removing bp with a score <20.
Below are links to fastQC results