Question: Adapter trimming - Nextera XT dual indexes
0
gravatar for Pinki
3.4 years ago by
Pinki0
Pinki0 wrote:

Hi,

I am working on a RNA-seq data which was produced using Nextera XT tagmentation based library preparation. The FASTQC report shows the presence of Nextera transposase sequence in the adapter content. It seems they have used Nextera XT dual indexes as adapters. How can I trim the reads for these dual indexes?

Thanks,

Cheers, G

ADD COMMENTlink modified 4 weeks ago by DriesB10 • written 3.4 years ago by Pinki0
1
gravatar for Brian Bushnell
3.4 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

If you download BBMap, the Nextera XT adapters are in the text file /bbmap/resources/adapters.fa. You can use them for trimming as you would with normal adapters:

bbduk.sh -Xmx1g in=reads.fq out=clean.fq ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo

ADD COMMENTlink written 3.4 years ago by Brian Bushnell16k

Thank you for your reply Brian. I see that adapters.fa has the Nextera XT adapters. However, the indexes used for this sequencing is for example: TAAGGCGA-GCGTAAGA. Like this I have different dual indexes. Is it common to trim for Nextera XT adapters listed in the adapters.fa or would it be appropriate to look for the indexes used here? Thanks.

ADD REPLYlink written 3.4 years ago by Pinki0

Those are both in the adapters file:

I7_Primer_Nextera_XT_and_Nextera_Enrichment_N701

CCGAGCCCACGAGAC TAAGGCGA ATCTCGTATGCCGTCTTCTGCTTG

I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]517

GACGCTGCCGACGA TCTTACGC GTGTAGATCTCGGTGGTCGCCGTATCATT

Note that the second one is reverse-complemented. BBDuk doesn't care whether adapter sequences are reverse-complemented, though.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Brian Bushnell16k
0
gravatar for DriesB
4 weeks ago by
DriesB10
Leiden, The Netherlands
DriesB10 wrote:

Removing adapters containing dual indexes is not trivial, as every adapter is specific for a sample.

Illumina provides sequences for adapter trimming (thank you candida.vaz!), but for the example, Nextera XT, these are not very specific, as they are short.

So I agree that BBMap's adapters.fa currently gives the best overview. Moreover, BBMap's resources directory contains more specific collections of adapters and the option --stats keeps track of which contaminants were detected.

I do find it strange that Illumina and other competitors doesn't supply complete adapter sequences as part of their services...

ADD COMMENTlink written 4 weeks ago by DriesB10

I know that this question is old, but I wanted to include a more in-depth answer for future reference.

ADD REPLYlink written 4 weeks ago by DriesB10

Illumina actually supplies full sequences of their adapters. They are here.

You are also not linking to the correct BBMap repository. Your link is actually to someone's copy of real BBMap repo.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax73k

Yes, Illumina supplies these adapter sequences, but how do you reconstruct the complete 'artifact' sequence from that? I've responded to ATpoint's comment about this (below).

I know that I'm not linking to the correct repository, but on SourceForge you can only download the entire bundle of BBMap, not look through the contents. So I think sharing this link makes a discussion easier.

ADD REPLYlink modified 29 days ago • written 29 days ago by DriesB10

but how do you reconstruct the complete 'artifact' sequence from that?

I am not sure what you are referring to here?

ADD REPLYlink written 29 days ago by genomax73k

As I explain below, with artifact I mean adapter+ID+P5/P7. I think there's a risk that we're now repeating the comment thread started by ATpoint.

ADD REPLYlink modified 29 days ago • written 29 days ago by DriesB10

There is no need to construct artifact sequences. Once a trimming programs finds adapter sequence it will remove all sequence 3' to the end of the read.

ADD REPLYlink modified 29 days ago • written 29 days ago by genomax73k

You seem to be mixing up index and adapter sequences. The adapter itself (so the part right next to the actual DNA sequence that you're interested in is the same for all samples but the primers used to amplify the fragments can have different indices. In any case, the adapter sequence is the same and that is what you trim. Illumina provides the full adapter sequences necessary to properly trim your reads, see the manuals of e.g. Nextera or TruSeq kits.

See here, the violet part is what you trim, and this is identical while the indices differ based on the multiplexing strategy.

enter image description here

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ATpoint24k

Thank you for your comment! I may be indeed using the incorrect terminology here. With adapter, I meant the entire artifact, including index and p5/7. We're trimming away the entire artifact, right (although adapter already would suffice).

Nextera's adapter sequence CTGTCTCTTATACACATCT (source) should already be enough for trimming, but BBMap's adapters.fa also gives the sequences for indexes, which provide more sequences to recognize. Is that a correct understanding?

ADD REPLYlink written 29 days ago by DriesB10
1

I never used BBmap, and trim Nextera by the sequence you indicate. This is in my experience sufficient. Unless your fragments are quite short and the reads quite long, you will anyway not reach the index sequence during sequencing but only parts of the adapter sequence itself.

ADD REPLYlink modified 29 days ago • written 29 days ago by ATpoint24k

DriesB : It is enough to find the core sequence at the beginning of these adapter (which is common for all indexes). Once this is done, trimming programs will generally remove all sequence 3' of that to the end of the read.

ADD REPLYlink written 29 days ago by genomax73k

Okay, i was thinking of BBduk's Usage Examples; the first one uses k=23 and mink=11. The adapter sequence above is too short to find adapters within the sequence, but what if the sequence is read further than the adapter, all the way into the index? Then mink is not useful either.

... Perhaps this is a bit too theoretical? Thanks for your time anyway!

ADD REPLYlink modified 29 days ago • written 29 days ago by DriesB10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2328 users visited in the last hour