Hi,
I'm working on putting together exercises in sequence data processing and analysis for graduate and undergraduate students, most of whom have relatively little experience in this area. One of the first things I'd like to have them do is remove adapter sequences and quality trim, and I've been focusing on Trimmomatic as the tool to have them use for this. However, in my trials, I have been unable to get Trimmomatic to recognize and remove the adapter sequences.
Some background: This is 100bp, paired end Illlumina data (not sure about insert size, but no readthrough is expected). The sequence data I'm working with came with the adapters pre-trimmed by the sequencing facility. However, I think it's valuable for students to see what adapters look like in raw data files, and to have to remove them, as this is frequently an essential step in data processing. To that end, I wrote a little script to add the Truseq3-PE forward and reverse adapter sequences (with qual scores >30 on each base in phred64) back onto the relevant sequence reads. Since I just copied the adapter sequences directly from the TruSeq3-PE.fa adapter file, I am sure that the adapters on the reads match those in the adapters file. Despite this, I have been unable to remove the adapters by running Trimmomatic.
I really appreciate any insight anyone can offer as to why this is the case. Below are the contents of files I've used, the command I executed, and the outputs.
Input files: for workflow testing purposes, these are very small subsets of the data I'm working with: 6 paired forward and reverse reads in fastq format with the adapters added to the front of each read. For brevity's sake, just the first two reads are shown:
Forward:
@FCC3J6UACXX:2:1101:1139:2030#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTNGGGAGCTTTCTTGGCTAGAATTACCATAGTGTATAGTTCTATAGCTTTTTTCTACATTGANNGANCCAACTCAAAGATCATTACTATAGTATGAACTAC
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_BP\ccceegggggiiiihhhhhihfehfhfggdfhehggfhhhiadhhiiiighfgdgfbfBBLTBLTZcddeefdceeeeddcddcdcccdddbdcccb
@FCC3J6UACXX:2:1101:20936:33723#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTATGAATCATCATATACAATAGACTAAAAATCCACCGCTACGCGCTTCCGGGCACTACCCTTGTTGACTCTATTCATCAATCGCTACGCGCTTCCGGGCAC
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__b_ceeeccggggiiiiihicgfdghhhh]eefafdgfaghiiiihihhh[geeeeeccddbcccccccbc`bcdbbccccccccacc[acBBBBBBBBB
Reverse:
@FCC3J6UACXX:2:1101:1139:2030#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTAAGAGGCAATGCACAGAAGCTTAGAGGATCATGGTCGATACCATGGATATAACAAGTCATACCAACGTGACTACAGTGCACATGGACCACAAGATGACA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__a_eeeeegegggiihihiihiihhhiifffhhhighhhififhfhiffheffhfiihhfhhhhhhfggggeeeeedbdddddcccccccccccbccccc
@FCC3J6UACXX:2:1101:20936:33723#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTCCCGAAATTGATAAATCAGTACATATAGCGTAAGACGATTTCACATTTCGAGGTCGGAATGGGATCGGGTGTTTTTATGTCTTACCATAGTGCCCGGAA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__abeeeeeefggffiifiie]ggggeghgghhhhidghh[cP^fcfghhf`affgh]fgbgggeebecca\ZW^bccacceddcbbb]`bcbbccccaca
Command executed and stdout from execution (most params from Trimmomatic manual examples):
TrimmomaticPE: Started with arguments: -threads 1 -phred64 -trimlog /home/bmorgan/fg_tests/trimm/small_with_adapters_trimlog.txt /home/bmorgan/small_r1_with_adapters.fq /home/bmorgan/small_r2_with_adapters.fq /home/bmorgan/fg_tests/trimm/read1_paired.fq /home/bmorgan/fg_tests/trimm/read1_unpaired.fq /home/bmorgan/fg_tests/trimm/read2_paired.fq /home/bmorgan/fg_tests/trimm/read2_unpaired.fq ILLUMINACLIP:/opt/Software/Trimmomatic-0.32/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 6 Both Surviving: 6 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully
You can see that the prefix pair was read successfully from the provided adapter fasta, that the forward and reverse adapters exactly match the starts of the corresponding reads, and that Trimmomatic completed error free and did not discard any sequences. However, while it did quality trim using the LEADING and TRAILING params, it did not remove the adapters.
Forward output:
@FCC3J6UACXX:2:1101:1139:2030#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTNGGGAGCTTTCTTGGCTAGAATTACCATAGTGTATAGTTCTATAGCTTTTTTCTACATTGA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_BP\ccceegggggiiiihhhhhihfehfhfggdfhehggfhhhiadhhiiiighfgdgfbf
@FCC3J6UACXX:2:1101:20936:33723#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTATGAATCATCATATACAATAGACTAAAAATCCACCGCTACGCGCTTCCGGGCACTACCCTTGTTGACTCTATTCATCAATCGCTACGCGCT
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__b_ceeeccggggiiiiihicgfdghhhh]eefafdgfaghiiiihihhh[geeeeeccddbcccccccbc`bcdbbccccccccacc[ac
Reverse output:
@FCC3J6UACXX:2:1101:1139:2030#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTAAGAGGCAATGCACAGAAGCTTAGAGGATCATGGTCGATACCATGGATATAACAAGTCATACCAACGTGACTACAGTGCACATGGACCACAAGATGACA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__a_eeeeegegggiihihiihiihhhiifffhhhighhhififhfhiffheffhfiihhfhhhhhhfggggeeeeedbdddddcccccccccccbccccc
@FCC3J6UACXX:2:1101:20936:33723#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTCCCGAAATTGATAAATCAGTACATATAGCGTAAGACGATTTCACATTTCGAGGTCGGAATGGGATCGGGTGTTTTTATGTCTTACCATAGTGCCCGGAA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__abeeeeeefggffiifiie]ggggeghgghhhhidghh[cP^fcfghhf`affgh]fgbgggeebecca\ZW^bccacceddcbbb]`bcbbccccaca
I've tried this with quite a few different parameters for ILLUMINACLIP (mostly with very low clipping thresholds); none have resulted in adapter removal. I have tried removing the /1 and /2 from the adapter prefix names in the adapter file to force it to test both adapters against all reads in simple mode (there is no read-through, so no need for palindrome clipping) with identical ILLUMINACLIP parameters, and this resulted in all reads, forward and reverse, being dropped.
I am at a loss as to why this isn't working and how to move forward. Any and all suggestions are very much appreciated and will be tested. Please let me know if I can provide any more information to help diagnose this problem and I will provide it ASAP.
Thanks in advance,
Ben Morgan
Trimmomatic can be finicky in the tacit assumptions it makes.
I am guessing that in case one when it runs in palindromic mode there is no palindrome that Trimmomatic could actually detect since the adapters are at the beginning so everything passes.
In case two, when run in simple mode all reads should be dropped, no? Since they match at least on of the adapters. So that seems to be the correct behavior.
Hi Istvan,
Thanks for the quick reply (I'm obviously waiting with bated breath for responses).
Maybe I'm totally misunderstanding how Trimmomatic works, but that does not make sense to me. Why should an entire read be dropped because there is an adapter match?
My understanding based on the manual is that the ILLUMINACLIP action should be the first step taken, and it should remove ONLY the adapter sequence, as specified in the adapters file, so long as the number of mismatches between the adapter on the read and the one specified in the adapter file is less than the ILLUMINACLIP seed mismatches parameter. Since I added teh adapters in silico, I know there are zero mismatches, so the output fastq files should be the entire read, minus the adapter and any low quality bases. Reads should only be dropped if they fall below the MINLEN threshold after adapter removal, end clipping, and quality trimming, which none of my testing reads do.
Am I way off base here?
Thanks again.
There is a difference between 5' and 3' adapter clipping. These will be treated differently.
The default and most common process is that of trimming off adapters ligated at the end of the reads (3' adapters). Once an adapter is seen it means that the sequencing ran out of useful DNA and ran into the adapter. Everything after adapter information is considered useless information.
The Trimmomatic paper has more information with graphical representations.