Question: Which truseq trimmomatic adapters file to use when removing truseq adapters?
0
gravatar for salamandra
9 months ago by
salamandra200
salamandra200 wrote:

1 - I'm analysing RNA-seq data from a publication that says adapters used are Truseq. I want to trim adapters from this data with trimmomatic, but in 'adapters' folder in trimmomatic there're several files with 'truseq' in the name: 'TruSeq2-PE.fa', 'TruSeq2-SE.fa', 'TruSeq3-PE-2.fa', 'TruSeq3-PE.fa' and 'TruSeq3-SE.fa'. Which of those files should be used?all?

2 - Also, why does TruSeq Index Adapter sequence in trimmomatic 'TruSeq3-SE.fa' file has an extra 'A' nucleotide at the beginning of the sequence:

>TruSeq3_IndexedAdapter
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

when comparing with same adapter in adapter sequences' pdf provided by Illuminia (page 25): https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/experiment-design/illumina-adapter-sequences-1000000002694-06.pdf ?

5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCAC....

3 - Why is the TruSeq Universal Adapter in trimmomatic:

>TruSeq3_UniversalAdapter
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

the reverse complement of the 3' part of same adapter in pdf provided by Illuminia (page 25)?

TruSeq Universal Adapter
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

4 - What are the different index underlined in adapter sequences provided by Illuminia?

rna-seq trimmomatic adapters • 1.5k views
ADD COMMENTlink written 9 months ago by salamandra200
2

Maybe ask those detailed questions to one of the Trimmomatic devs (Tony Bolger is usually quite helpful -- you can find his details on the same webpage that also has the Trimmomatic documentation). If you do, please don't forget to share your newfound knowledge here.

Also, slightly out of topic: It is Illumina, not Illuminia. ;)

ADD REPLYlink written 9 months ago by cschu1811.6k

thanks for noticing, otherwise i would continue saying Illuminia

ADD REPLYlink written 9 months ago by salamandra200
1

So Tony replied:

"1 - It depends mostly on which TruSeq protocol was used (V2 - which is old at this stage and usually data from the GAII, or V3, which is everything from the HiSeq or later machines), and whether the data is single-ended or paired ended (SE or PE). The only exception is TruSeq-3-PE which has two sets - TruSeq-3-PE.fa works fine for high quality libraries, but TruSeq-3-PE-2.fa contains some additional sequences which find partial adapters in unusual location/orientation.

2 - This reflects the A added during A-tailing.

3 - Because, AFAIK, that is the orientation which the adapter will have if it is included in the read. Naturally you can add it, or any other sequence you find and don't like, to the adapter file if it works better for you. "

ADD REPLYlink written 9 months ago by salamandra200
1

There is a core sequence common to Illumina adapters. Once trimming programs find that sequence everything to the right of that core is generally trimmed.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax65k

So, where do I find that sequence to provide it to trimmomatic?

ADD REPLYlink written 9 months ago by salamandra200

The best way to know for sure is to ask the sequencing facility. They should provide you with that information as a customer service. Often the standard adapters might work, but sometimes they might have used their own modifications. Whenever we receive a sample I do this. Second best option is to check for known sequences using fastQC or another program. SE = single-end, PE=paired-end. The protocol version should be given in the Material&Methods of the paper.

ADD REPLYlink modified 9 months ago • written 9 months ago by Michael Dondrup46k

it's single end, my question is to whether should I use the trueseq 2 or 3 files or both... and i'm not a customer of them

ADD REPLYlink modified 9 months ago • written 9 months ago by salamandra200
1

Start with TruSeq3-SE.fa If that does not seem to trim anything then try TruSeq2-SE.fa

ADD REPLYlink written 9 months ago by genomax65k

How can we check that it trimmed? Are adapters always at the begining of the reads, and if trimmed they disapear? I'm sorry if i'm doing dumb questions, i'm just start learning with bioinformatics on my own..

ADD REPLYlink written 9 months ago by salamandra200
1

Adapter sequence should never be at the beginning of reads. If that is the case then you may have an adapter dimer without an insert. You will not see any adapter sequence (and hence nothing may be trimmed) if your insert sizes are longer than the number of cycles of sequencing. Only if you have short inserts (that are smaller than the length of sequencing) then you will see adapter sequences towards 3'-end of the reads. I am not a regular trimmomatic user but I assume it should produce a log of what got trimmed (if any).

If you are willing then I suggest you give bbduk.sh from BBmap suite a try instead or in addition to trimmomatic. Easy to use and understand options. Here is a guide to get you started.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax65k

it makes sense for adapter sequences to be only at the end of reads, nevertheless bbduk.sh has an option for removing 5' adapters ( “ktrim=r” is for right-trimming (3′ adapters), and “ktrim=l” is for left-trimming (5′ adapters), so probably to remove the dimers I guess

ADD REPLYlink written 9 months ago by salamandra200
1

bbduk.sh can trim any type of sequences (not only adapters). One can even provide sequences to scan/trim by using literal=seq1,seq2 etc. (with real sequences in place of seq1 and seq2).

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax65k

There is a core sequence common to Illumina adapters

True, but mind that different types of adapters have different cores, e.g. TruSeq vs. Nextera (becomes important once you analyze stuff like ATAC-seq).

ADD REPLYlink written 9 months ago by ATpoint15k
1

One option is to use adapters.fa included with BBMap suite in the resources directory. It contains all commonly used commercial adapter kit sequences. There may be some additional trimming of the data (by using a common file) but that should not greatly affect the end result, especially when you have millions of reads to work with.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 968 users visited in the last hour