3
2
Entering edit mode
7.6 years ago

Hi,

I'm working on putting together exercises in sequence data processing and analysis for graduate and undergraduate students, most of whom have relatively little experience in this area. One of the first things I'd like to have them do is remove adapter sequences and quality trim, and I've been focusing on Trimmomatic as the tool to have them use for this. However, in my trials, I have been unable to get Trimmomatic to recognize and remove the adapter sequences.

I really appreciate any insight anyone can offer as to why this is the case. Below are the contents of files I've used, the command I executed, and the outputs.

Input files: for workflow testing purposes, these are very small subsets of the data I'm working with: 6 paired forward and reverse reads in fastq format with the adapters added to the front of each read. For brevity's sake, just the first two reads are shown:

Forward:

@FCC3J6UACXX:2:1101:1139:2030#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTNGGGAGCTTTCTTGGCTAGAATTACCATAGTGTATAGTTCTATAGCTTTTTTCTACATTGANNGANCCAACTCAAAGATCATTACTATAGTATGAACTAC
+
@FCC3J6UACXX:2:1101:20936:33723#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTATGAATCATCATATACAATAGACTAAAAATCCACCGCTACGCGCTTCCGGGCACTACCCTTGTTGACTCTATTCATCAATCGCTACGCGCTTCCGGGCAC
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__b_ceeeccggggiiiiihicgfdghhhh]eefafdgfaghiiiihihhh[geeeeeccddbcccccccbcbcdbbccccccccacc[acBBBBBBBBB


Reverse:

@FCC3J6UACXX:2:1101:1139:2030#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTAAGAGGCAATGCACAGAAGCTTAGAGGATCATGGTCGATACCATGGATATAACAAGTCATACCAACGTGACTACAGTGCACATGGACCACAAGATGACA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__a_eeeeegegggiihihiihiihhhiifffhhhighhhififhfhiffheffhfiihhfhhhhhhfggggeeeeedbdddddcccccccccccbccccc
@FCC3J6UACXX:2:1101:20936:33723#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTCCCGAAATTGATAAATCAGTACATATAGCGTAAGACGATTTCACATTTCGAGGTCGGAATGGGATCGGGTGTTTTTATGTCTTACCATAGTGCCCGGAA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__abeeeeeefggffiifiie]ggggeghgghhhhidghh[cP^fcfghhfaffgh]fgbgggeebecca\ZW^bccacceddcbbb]bcbbccccaca


Command executed and stdout from execution (most params from Trimmomatic manual examples):

TrimmomaticPE: Started with arguments: -threads 1 -phred64 -trimlog /home/bmorgan/fg_tests/trimm/small_with_adapters_trimlog.txt /home/bmorgan/small_r1_with_adapters.fq /home/bmorgan/small_r2_with_adapters.fq /home/bmorgan/fg_tests/trimm/read1_paired.fq /home/bmorgan/fg_tests/trimm/read1_unpaired.fq /home/bmorgan/fg_tests/trimm/read2_paired.fq /home/bmorgan/fg_tests/trimm/read2_unpaired.fq ILLUMINACLIP:/opt/Software/Trimmomatic-0.32/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 6 Both Surviving: 6 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully


You can see that the prefix pair was read successfully from the provided adapter fasta, that the forward and reverse adapters exactly match the starts of the corresponding reads, and that Trimmomatic completed error free and did not discard any sequences. However, while it did quality trim using the LEADING and TRAILING params, it did not remove the adapters.

Forward output:

@FCC3J6UACXX:2:1101:1139:2030#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTNGGGAGCTTTCTTGGCTAGAATTACCATAGTGTATAGTTCTATAGCTTTTTTCTACATTGA
+
@FCC3J6UACXX:2:1101:20936:33723#/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCTATGAATCATCATATACAATAGACTAAAAATCCACCGCTACGCGCTTCCGGGCACTACCCTTGTTGACTCTATTCATCAATCGCTACGCGCT
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__b_ceeeccggggiiiiihicgfdghhhh]eefafdgfaghiiiihihhh[geeeeeccddbcccccccbcbcdbbccccccccacc[ac


Reverse output:

@FCC3J6UACXX:2:1101:1139:2030#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTAAGAGGCAATGCACAGAAGCTTAGAGGATCATGGTCGATACCATGGATATAACAAGTCATACCAACGTGACTACAGTGCACATGGACCACAAGATGACA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__a_eeeeegegggiihihiihiihhhiifffhhhighhhififhfhiffheffhfiihhfhhhhhhfggggeeeeedbdddddcccccccccccbccccc
@FCC3J6UACXX:2:1101:20936:33723#/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTCCCGAAATTGATAAATCAGTACATATAGCGTAAGACGATTTCACATTTCGAGGTCGGAATGGGATCGGGTGTTTTTATGTCTTACCATAGTGCCCGGAA
+
^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^_^__abeeeeeefggffiifiie]ggggeghgghhhhidghh[cP^fcfghhfaffgh]fgbgggeebecca\ZW^bccacceddcbbb]bcbbccccaca


I've tried this with quite a few different parameters for ILLUMINACLIP (mostly with very low clipping thresholds); none have resulted in adapter removal. I have tried removing the /1 and /2 from the adapter prefix names in the adapter file to force it to test both adapters against all reads in simple mode (there is no read-through, so no need for palindrome clipping) with identical ILLUMINACLIP parameters, and this resulted in all reads, forward and reverse, being dropped.

I am at a loss as to why this isn't working and how to move forward. Any and all suggestions are very much appreciated and will be tested. Please let me know if I can provide any more information to help diagnose this problem and I will provide it ASAP.

Ben Morgan

0
Entering edit mode

Trimmomatic can be finicky in the tacit assumptions it makes.

I am guessing that in case one when it runs in palindromic mode there is no palindrome that Trimmomatic could actually detect since the adapters are at the beginning so everything passes.

In case two, when run in simple mode all reads should be dropped, no? Since they match at least on of the adapters. So that seems to be the correct behavior.

1
Entering edit mode

Hi Istvan,

Thanks for the quick reply (I'm obviously waiting with bated breath for responses).

Maybe I'm totally misunderstanding how Trimmomatic works, but that does not make sense to me. Why should an entire read be dropped because there is an adapter match?

Am I way off base here?

Thanks again.

0
Entering edit mode

There is a difference between 5' and 3' adapter clipping. These will be treated differently.

The default and most common process is that of trimming off adapters ligated at the end of the reads (3' adapters). Once an adapter is seen it means that the sequencing ran out of useful DNA and ran into the adapter. Everything after adapter information is considered useless information.

2
Entering edit mode
7.6 years ago
rtliu ★ 2.1k

Using TruSeq3-PE-2.fa with Trimmomatic will get rid of TruSeq3 adapters in any position.

For adapter read-through, you need to append R1.fq with reverse complement of PrefixPE/1 adapter, i.e. replacing the last 34 bp in R1.fq with 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'; and append reverse complement of PrefixPE/2 adapter with R2.fq, i.e. replacing the first 34 bp in R2.fq with 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'; then using TruSeq3-PE-2.fa will correctly trim R1 off the last 34bp.

2
Entering edit mode
7.1 years ago
simon.rayner ▴ 20

Bit late coming to the discussion, but I had a similar problem with adapters not being trimmed on a SE dataset, despite previously having no problem. The trimming came back after I added the -phred64 flag, (Trimmomatic decided the quality was coded as phred33).

Also, since I have a smallRNA dataset, often I may not have a 16nt seed region as specified in the manual, so I need to lower the <SimpleClipThreshold> parameter for the ILLUMINACLIP step, otherwise i will have many full length reads remaining that contain adapter sequence

ILLUMINACLIP:Trimmomatic-0.33/adapters/TruSeqE-SE.fa:2:30:7

0
Entering edit mode
7.6 years ago

Try using the reverse-complements instead of the forward sequences. When you read through into adapter sequence, you see the reverse-complement; Trimmomatic, by default, only looks for one orientation. BBDuk, incidentally, defaults to looking for both forward and reverse-complements of all sequences.

0
Entering edit mode

Let me add the URL for it http://sourceforge.net/projects/bbmap/

0
Entering edit mode

Hi Brian,

Thanks for the reply. I'm going to go ahead and try with the rc, but I don't think it's likely to solve the problem (you'll hear from me real fast if it does), since I am not anticipating any read-through: the forward and reverse reads are separated by a large enough insert (size unknown to me) that they aren't likely to have any overlap at all, much less read through. At this stage, I am primarily interested in removing the sense-oriented adapters from the 5' end of reads.

Thanks again.

0
Entering edit mode

Maybe I am misinterpreting something... You are just trying to detect the adapters that you place yourself, at the beginning of the reads, right? So the actual insert sizes should have no bearing on being able to detect them.

0
Entering edit mode

Correct. My point was just that the fragments being sequenced should all be long enough that a read from any one direction should not ever be long enough to read through into the RC adapter sequence from the opposite end of the fragment. I am trying to get Trimmomatic to recognize the the adapters that I know to be at the 5' end of reads in the sense orientation, remove them in their entirety, then perform the end clipping and sliding window quality trimming, and output the resulting sequences for downstream processing.

I just tried using an adapters file with the RC of the adapters on the sequence reads: no sequences dropped, but no adapters removed.

Cheers.