Question: Trimmomatic not removing TruSeq3 adapters
2
gravatar for benjaminmorgan2016
4.0 years ago by
United States
benjaminmorgan201630 wrote:

Hi,

I'm working on putting together exercises in sequence data processing and analysis for graduate and undergraduate students, most of whom have relatively little experience in this area.  One of the first things I'd like to have them do is remove adapter sequences and quality trim, and I've been focusing on Trimmomatic as the tool to have them use for this.  However, in my trials, I have been unable to get Trimmomatic to recognize and remove the adapter sequences.

Some background: This is 100bp, paired end Illlumina data (not sure about insert size, but no readthrough is expected).  The sequence data I'm working with came with the adapters pre-trimmed by the sequencing facility.  However, I think it's valuable for students to see what adapters look like in raw data files, and to have to remove them, as this is frequently an essential step in data processing.  To that end, I wrote a little script to add the Truseq3-PE forward and reverse adapter sequences (with qual scores >30 on each base in phred64) back onto the relevant sequence reads.  Since I just copied the adapter sequences directly from the TruSeq3-PE.fa adapter file, I am sure that the adapters on the reads match those in the adapters file.  Despite this, I have been unable to remove the adapters by running Trimmomatic.

I really appreciate any insight anyone can offer as to why this is the case.  Below are the contents of files I've used, the command I executed, and the outputs.  

Input files: for workflow testing purposes, these are very small subsets of the data I'm working with: 6 paired forward and reverse reads in fastq format with the adapters added to the front of each read.  For brevity's sake, just the first two reads are shown:

Forward:

@FCC3J6UACXX:2:1101:1139:2030#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTNGGGAGCTTTCTTGGCTAGAATTACCATAGTGTATAGTTCTATAGCTTTTTTCTACATTGANNGANCCAACTCAAAGATCATTACTATAGTATGAACTAC
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_BP\ccceegggggiiiihhhhhihfehfhfggdfhehggfhhhiadhhiiiighfgdgfbfBBLTBLTZcddeefdceeeeddcddcdcccdddbdcccb
@FCC3J6UACXX:2:1101:20936:33723#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTATGAATCATCATATACAATAGACTAAAAATCCACCGCTACGCGCTTCCGGGCACTACCCTTGTTGACTCTATTCATCAATCGCTACGCGCTTCCGGGCAC
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__b_ceeeccggggiiiiihicgfdghhhh]eefafdgfaghiiiihihhh[geeeeeccddbcccccccbc`bcdbbccccccccacc[acBBBBBBBBB

Reverse:

@FCC3J6UACXX:2:1101:1139:2030#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTAAGAGGCAATGCACAGAAGCTTAGAGGATCATGGTCGATACCATGGATATAACAAGTCATACCAACGTGACTACAGTGCACATGGACCACAAGATGACA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__a_eeeeegegggiihihiihiihhhiifffhhhighhhififhfhiffheffhfiihhfhhhhhhfggggeeeeedbdddddcccccccccccbccccc
@FCC3J6UACXX:2:1101:20936:33723#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTCCCGAAATTGATAAATCAGTACATATAGCGTAAGACGATTTCACATTTCGAGGTCGGAATGGGATCGGGTGTTTTTATGTCTTACCATAGTGCCCGGAA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__abeeeeeefggffiifiie]ggggeghgghhhhidghh[cP^fcfghhf`affgh]fgbgggeebecca\ZW^bccacceddcbbb]`bcbbccccaca

Command executed and stdout from execution (most params from Trimmomatic manual examples):

TrimmomaticPE: Started with arguments: -threads 1 -phred64 -trimlog /home/bmorgan/fg_tests/trimm/small_with_adapters_trimlog.txt /home/bmorgan/small_r1_with_adapters.fq /home/bmorgan/small_r2_with_adapters.fq /home/bmorgan/fg_tests/trimm/read1_paired.fq /home/bmorgan/fg_tests/trimm/read1_unpaired.fq /home/bmorgan/fg_tests/trimm/read2_paired.fq /home/bmorgan/fg_tests/trimm/read2_unpaired.fq ILLUMINACLIP:/opt/Software/Trimmomatic-0.32/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 6 Both Surviving: 6 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully

You can see that the prefix pair was read successfully from the provided adapter fasta, that the forward and reverse adapters exactly match the starts of the corresponding reads, and that Trimmomatic completed error free and did not discard any sequences.  However, while it did quality trim using the LEADING and TRAILING params, it did not remove the adapters.

Forward output:

@FCC3J6UACXX:2:1101:1139:2030#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTNGGGAGCTTTCTTGGCTAGAATTACCATAGTGTATAGTTCTATAGCTTTTTTCTACATTGA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_BP\ccceegggggiiiihhhhhihfehfhfggdfhehggfhhhiadhhiiiighfgdgfbf
@FCC3J6UACXX:2:1101:20936:33723#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTATGAATCATCATATACAATAGACTAAAAATCCACCGCTACGCGCTTCCGGGCACTACCCTTGTTGACTCTATTCATCAATCGCTACGCGCT
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__b_ceeeccggggiiiiihicgfdghhhh]eefafdgfaghiiiihihhh[geeeeeccddbcccccccbc`bcdbbccccccccacc[ac

Reverse output:

@FCC3J6UACXX:2:1101:1139:2030#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTAAGAGGCAATGCACAGAAGCTTAGAGGATCATGGTCGATACCATGGATATAACAAGTCATACCAACGTGACTACAGTGCACATGGACCACAAGATGACA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__a_eeeeegegggiihihiihiihhhiifffhhhighhhififhfhiffheffhfiihhfhhhhhhfggggeeeeedbdddddcccccccccccbccccc
@FCC3J6UACXX:2:1101:20936:33723#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTCCCGAAATTGATAAATCAGTACATATAGCGTAAGACGATTTCACATTTCGAGGTCGGAATGGGATCGGGTGTTTTTATGTCTTACCATAGTGCCCGGAA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__abeeeeeefggffiifiie]ggggeghgghhhhidghh[cP^fcfghhf`affgh]fgbgggeebecca\ZW^bccacceddcbbb]`bcbbccccaca

I've tried this with quite a few different parameters for ILLUMINACLIP (mostly with very low clipping thresholds); none have resulted in adapter removal.  I have tried removing the /1 and /2 from the adapter prefix names in the adapter file to force it to test both adapters against all reads in simple mode (there is no read-through, so no need for palindrome clipping) with identical ILLUMINACLIP parameters, and this resulted in all reads, forward and reverse, being dropped.  

I am at a loss as to why this isn't working and how to move forward.  Any and all suggestions are very much appreciated and will be tested.  Please let me know if I can provide any more information to help diagnose this problem and I will provide it ASAP.

Thanks in advance,

Ben Morgan

ADD COMMENTlink modified 3.4 years ago by simon.rayner0 • written 4.0 years ago by benjaminmorgan201630

Trimmomatic can be finicky in the tacit assumptions it makes.

I am guessing that in case one when it runs in palindromic mode there is no palindrome that Trimmomatic could actually detect since the adapters are at the beginning so everything passes.

In case two, when run in simple mode all reads should be dropped, no? Since they match at least on of the adapters. So that seems to be the correct behavior.

ADD REPLYlink written 4.0 years ago by Istvan Albert ♦♦ 78k
1

Hi Istvan,

Thanks for the quick reply (I'm obviously waiting with bated breath for responses).  

Maybe I'm totally misunderstanding how Trimmomatic works, but that does not make sense to me.  Why should an entire read be dropped because there is an adapter match?  

My understanding based on the manual is that the ILLUMINACLIP action should be the first step taken, and it should remove ONLY the adapter sequence, as specified in the adapters file, so long as the number of mismatches between the adapter on the read and the one specified in the adapter file is less than the ILLUMINACLIP seed mismatches parameter.  Since I added teh adapters in silico, I know there are zero mismatches, so the output fastq files should be the entire read, minus the adapter and any low quality bases.  Reads should only be dropped if they fall below the MINLEN threshold after adapter removal, end clipping, and quality trimming, which none of my testing reads do.

Am I way off base here?

Thanks again.

ADD REPLYlink written 4.0 years ago by benjaminmorgan201630

There is a difference between 5' and 3' adapter clipping. These will be treated  differently.

The default and most common process is that of trimming off adapters ligated at the end of the reads (3' adapters). Once an adapter is seen it means that the sequencing ran out of useful DNA and ran into the adapter. Everything after adapter information is considered useless information. 

The Trimmomatic paper has more information with graphical representations:

http://bioinformatics.oxfordjournals.org/content/early/2014/04/01/bioinformatics.btu170

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Istvan Albert ♦♦ 78k
2
gravatar for rtliu
4.0 years ago by
rtliu2.0k
New Zealand
rtliu2.0k wrote:

Using TruSeq3-PE-2.fa with Trimmomatic will get rid of TruSeq3 adapters in any position.

For adapter read-through, you need to append R1.fq with reverse complement of PrefixPE/1 adapter,  i.e. replacing the last 34 bp in R1.fq with 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA';  and append reverse complement of PrefixPE/2 adapter with R2.fq, i.e. replacing the first 34 bp in R2.fq with 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'; then using TruSeq3-PE-2.fa will correctly trim R1 off the last 34bp.

ADD COMMENTlink written 4.0 years ago by rtliu2.0k
0
gravatar for Brian Bushnell
4.0 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

Try using the reverse-complements instead of the forward sequences.  When you read through into adapter sequence, you see the reverse-complement; Trimmomatic, by default, only looks for one orientation.  BBDuk, incidentally, defaults to looking for both forward and reverse-complements of all sequences.

ADD COMMENTlink written 4.0 years ago by Brian Bushnell16k

Let me add the URL for it http://sourceforge.net/projects/bbmap/

ADD REPLYlink written 4.0 years ago by Istvan Albert ♦♦ 78k

Hi Brian,

Thanks for the reply.  I'm going to go ahead and try with the rc, but I don't think it's likely to solve the problem (you'll hear from me real fast if it does), since I am not anticipating any read-through: the forward and reverse reads are separated by a large enough insert (size unknown to me) that they aren't likely to have any overlap at all, much less read through.  At this stage, I am primarily interested in removing the sense-oriented adapters from the 5' end of reads.

Thanks again.

ADD REPLYlink written 4.0 years ago by benjaminmorgan201630

Maybe I am misinterpreting something...  You are just trying to detect the adapters that you place yourself, at the beginning of the reads, right?  So the actual insert sizes should have no bearing on being able to detect them.

ADD REPLYlink written 4.0 years ago by Brian Bushnell16k

Correct.  My point was just that the fragments being sequenced should all be long enough that a read from any one direction should not ever be long enough to read through into the RC adapter sequence from the opposite end of the fragment.  I am trying to get Trimmomatic to recognize the the adapters that I know to be at the 5' end of reads in the sense orientation, remove them in their entirety, then perform the end clipping and sliding window quality trimming, and output the resulting sequences for downstream processing.

I just tried using an adapters file with the RC of the adapters on the sequence reads: no sequences dropped, but no adapters removed.

Cheers.

ADD REPLYlink written 4.0 years ago by benjaminmorgan201630
0
gravatar for simon.rayner
3.4 years ago by
Norway/Oslo/UiO
simon.rayner0 wrote:

Bit late coming to the discussion, but I had a similar problem with adapters not being trimmed on a SE dataset, despite previously having no problem. The trimming came back after I added the -phred64 flag, (Trimmomatic decided the quality was coded as phred33).

Also, since I have a smallRNA dataset, often I may not have a 16nt seed region as specified in the manual, so I need to lower the <SimpleClipThreshold> parameter for the ILLUMINACLIP step, otherwise i will have many full length reads remaining that contain adapter sequence

ILLUMINACLIP:Trimmomatic-0.33/adapters/TruSeqE-SE.fa:2:30:7

 

 

 

ADD COMMENTlink written 3.4 years ago by simon.rayner0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2391 users visited in the last hour