Which Trimmomatic adapter file cuts the "Illumina Universal Adapter"?
2
1
Entering edit mode
3.4 years ago
cg1440 ▴ 60

Hi.

I ran the fastqc quality check on the fastq sequencing file. It seems that I have the "Illumina Universal Adapter" in my reads and I need to trim the adapter sequence using trimmomatic. However, I'm not sure which adapter file should I use.

EDIT: I have paired end reads

enter image description here

trimmomatic fastqc NGS • 13k views
ADD COMMENT
2
Entering edit mode
3.4 years ago

Hi,

So, when you download Trimmomatic, Trimmomatic comes with a folder that contains some fasta files with Illumina adapters. You can use one of these files to trim off the adapters from your reads. You should provide the file depending on the library prep used for your data. The manual of Trimmomatic says this (see the manual):

Using one of the supplied Fasta Files Illumina adapter and other technical sequences are copyrighted by Illumina,but we have been granted permission to distribute them with Trimmomatic. Suggested adapter sequences are provided for TruSeq2 (as used in GAII machines) and TruSeq3 (as used by HiSeq and MiSeq machines), for both single-end and paired-end mode. These sequences have not been extensively tested, and depending on specific issues which may occur in library preparation, other sequences may work better for a given dataset. As a rule of thumbnewer libraries will useTruSeq3, but this really depends on your service provider.

So, if you use TruSeq2 paired-end library, you should use TruSeq2-PE.fa file. Which library prep did they (sequencing facility) use for your data? (see more in the documentation of Trimmomatic to know how to provide the file of adapters to the tool)

Although since you have identified the over represented sequence with the fastqc report, you can specify the exact sequence (in fasta format - file) that you want to trim to Trimmomatic (please see the manual link provided above - the third page counting from the last page).

Other option is to try to use trim_galore (GitHub repo) that by default tries to find automatically the adapters and trim them off (see the manual). You can also specify the adapter sequence that you want to trim to trim_galore.

I hope this answers your question.

António

ADD COMMENT
0
Entering edit mode

I'm not really sure which library prep was used. I only have the fastq files and I need to build a pipeline to ultimately obtain variants (part of a functional genomics course). Also, I know about the adapter files that come with trimmomatic, however I don't know which one should be actually used in this case. I tried all the adapter files, and found that TruSeq3-PE-2.fa gave the best results. BUT I still got some adapters left in the trimmed files. Is this enough? Or should the trimmed files be completely free of adapters?

Thank you.

Adapter content after trimming, using <code>TruSeq3-PE-2.fa</code>

ADD REPLY
0
Entering edit mode

I have seen that sometimes. I think that it is "normal". With standard trimmomatic settings, reads with a few nt compatible with adapters are left untouched, because there is no way to tell whether this is contamination or just that the end of a genuine genomic/transcriptomic sequence looks by chance like an adapter.

IMHO, this is enough trimming really for most application. The exception would transcriptome assembly, because you need really clean reads here.

ADD REPLY
0
Entering edit mode

I think the trimming is good enough, I mean you have adapters only in less than 5% of your reads. I don't know if you use trim_galore if those would disappear, but you might want to try that.

There is any chance to contact the sequencing company to get that information, about the library prep used and the adapters?

ADD REPLY
0
Entering edit mode
3.4 years ago

In complement to the answer provided by antonioggsousa, you can find all the contaminant sequences used by fasqc here: https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt

If you want to build your own contaminant fasta file for trimmomatic, you need to use the reverse complement of the above sequences. So in your case, that would be:

>TruSeq Universal Adapter (reverse complemented)
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
ADD COMMENT
0
Entering edit mode

Thank you, I will try it

UPDATE: it did not work. I guess i'll stick to TruSeq3-PE-2.fa

ADD REPLY
0
Entering edit mode

What did not work ? The code crashed or the reads were not properly trimmed ?

ADD REPLY
0
Entering edit mode

I still got the same adapter content as the untrimmed fastq file

ADD REPLY
0
Entering edit mode

Weird, it usually works for me. By the way, the end of the above sequence is actually the same as

>PE1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT

in the TruSeq3-PE-2.fa file. It is possible that in your case, due to difference in library prep, you need the reverse complement of that, which is included in the TruSeq3-PE-2.fa file. So to take the full universal adapter, that would be

> TruSeq Universal Adapter (original)
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
ADD REPLY

Login before adding your answer.

Traffic: 3051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6