Hi everyone,
I'm a wet lab biologist who now has 100+ RNA-seq samples to analyze so it's been a steep learning curve. Any help would be super appreciated!
I have PE 125bp fastq files from an Illumina Hiseq and my fastqc analysis shows Illumina Universal Adapter contamination. I used trimmomatic to try and remove them. I used default settings I saw in the trimmomatic manual even though I don't really need quality trimming (all high quality bases according to fastqc).
java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.36.jar PE R1_001.fastq.gz R2_002.fastq.gz R1_paired.fastq.gz R1_unpaired.fastq.gz R2_paired.fastq.gz R2_unpaired.fastq.gz ILLUMINACLIP:$EBROOTTRIMMOMATIC/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25
After trimming I notice that while the adapter contamination is much better, it's not all removed?? Also, I would go from ~17 million reads (all 125bp long) to ~14 million reads (almost all 124bp long). That doesn't seem like it's working properly. Below I've attached a Multiqc report of before and after trimming (paired files only).
Multiqc Adapter contamination of trimmed vs untrimmed reads
Useful for future reference: How to add images to a Biostars post I have done it for you this time.
I am going to suggest that you try
bbduk.sh
from BBMap suite instead oftrimmomatic
. Guide on how to use it can be found here.Look at this thread for help on how to write a
bash
loop to process those 100+ samples efficiently: Bash Script Loop HelpAh, thank you very much, both for the answer and for the extra help!
I will try bbduk.sh and see how it goes. I used trimmomatic because it was a tool already available on the cloud cluster I'm using (compute canada). However, is there a specific reason to use bbduk.sh instead of trimmomatic other than "if one tool doesn't work, try another"?
I am not a regular
trimmomatic
user so I won't comment on why it seems to be missing an obvious set of adapter sequences. Are you using the right adapter sequence file withtrimmomatic
based on the kind of library you have?bbduk.sh
uses a singleadapters.fa
file that you can find inresources
directory which contains all commonly used commercial adapter sequences. While there may a bit of over-trimming possible it is easier to use a single file. Options are easier to understand and use. It is just a matter of what one gets used to. Both program should in theory work the same.