Question

Adapter trimming - Nextera XT dual indexes

0

Entering edit mode

7.9 years ago

Pinki • 0

Hi,

I am working on a RNA-seq data which was produced using Nextera XT tagmentation based library preparation. The FASTQC report shows the presence of Nextera transposase sequence in the adapter content. It seems they have used Nextera XT dual indexes as adapters. How can I trim the reads for these dual indexes?

Thanks,

Cheers, G

RNA-Seq adapter trimming Nextera XT dual indexes • 8.1k views

ADD COMMENT • link updated 4.6 years ago by DriesB ▴ 110 • written 7.9 years ago by Pinki • 0

score 1 · Answer 1 · 2016-06-06

1

Entering edit mode

7.9 years ago

Brian Bushnell 20k

If you download BBMap, the Nextera XT adapters are in the text file /bbmap/resources/adapters.fa. You can use them for trimming as you would with normal adapters:

bbduk.sh -Xmx1g in=reads.fq out=clean.fq ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo

ADD COMMENT • link 7.9 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you for your reply Brian. I see that adapters.fa has the Nextera XT adapters. However, the indexes used for this sequencing is for example: TAAGGCGA-GCGTAAGA. Like this I have different dual indexes. Is it common to trim for Nextera XT adapters listed in the adapters.fa or would it be appropriate to look for the indexes used here? Thanks.

ADD REPLY • link 7.9 years ago by Pinki • 0

0

Entering edit mode

Those are both in the adapters file:

I7_Primer_Nextera_XT_and_Nextera_Enrichment_N701

CCGAGCCCACGAGAC TAAGGCGA ATCTCGTATGCCGTCTTCTGCTTG

I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]517

GACGCTGCCGACGA TCTTACGC GTGTAGATCTCGGTGGTCGCCGTATCATT

Note that the second one is reverse-complemented. BBDuk doesn't care whether adapter sequences are reverse-complemented, though.

ADD REPLY • link 7.9 years ago by Brian Bushnell 20k

score 0 · Answer 2 · 2019-09-23

0

Entering edit mode

4.6 years ago

DriesB ▴ 110

Removing adapters containing dual indexes is not trivial, as every adapter is specific for a sample.

Illumina provides sequences for adapter trimming (thank you candida.vaz!), but for the example, Nextera XT, these are not very specific, as they are short.

So I agree that BBMap's adapters.fa currently gives the best overview. Moreover, BBMap's resources directory contains more specific collections of adapters and the option --stats keeps track of which contaminants were detected.

I do find it strange that Illumina and other competitors doesn't supply complete adapter sequences as part of their services...

ADD COMMENT • link 4.6 years ago by DriesB ▴ 110

0

Entering edit mode

I know that this question is old, but I wanted to include a more in-depth answer for future reference.

ADD REPLY • link 4.6 years ago by DriesB ▴ 110

0

Entering edit mode

Illumina actually supplies full sequences of their adapters. They are here.

You are also not linking to the correct BBMap repository. Your link is actually to someone's copy of real BBMap repo.

ADD REPLY • link 4.6 years ago by GenoMax 141k

0

Entering edit mode

Yes, Illumina supplies these adapter sequences, but how do you reconstruct the complete 'artifact' sequence from that? I've responded to ATpoint's comment about this (below).

I know that I'm not linking to the correct repository, but on SourceForge you can only download the entire bundle of BBMap, not look through the contents. So I think sharing this link makes a discussion easier.

ADD REPLY • link 4.6 years ago by DriesB ▴ 110

0

Entering edit mode

but how do you reconstruct the complete 'artifact' sequence from that?

I am not sure what you are referring to here?

ADD REPLY • link 4.6 years ago by GenoMax 141k

0

Entering edit mode

As I explain below, with artifact I mean adapter+ID+P5/P7. I think there's a risk that we're now repeating the comment thread started by ATpoint.

ADD REPLY • link 4.6 years ago by DriesB ▴ 110

0

Entering edit mode

There is no need to construct artifact sequences. Once a trimming programs finds adapter sequence it will remove all sequence 3' to the end of the read.

ADD REPLY • link 4.6 years ago by GenoMax 141k

0

Entering edit mode

You seem to be mixing up index and adapter sequences. The adapter itself (so the part right next to the actual DNA sequence that you're interested in is the same for all samples but the primers used to amplify the fragments can have different indices. In any case, the adapter sequence is the same and that is what you trim. Illumina provides the full adapter sequences necessary to properly trim your reads, see the manuals of e.g. Nextera or TruSeq kits.

See here, the violet part is what you trim, and this is identical while the indices differ based on the multiplexing strategy.

enter image description here

ADD REPLY • link 4.6 years ago by ATpoint 81k

0

Entering edit mode

Thank you for your comment! I may be indeed using the incorrect terminology here. With adapter, I meant the entire artifact, including index and p5/7. We're trimming away the entire artifact, right (although adapter already would suffice).

Nextera's adapter sequence CTGTCTCTTATACACATCT (source) should already be enough for trimming, but BBMap's adapters.fa also gives the sequences for indexes, which provide more sequences to recognize. Is that a correct understanding?

ADD REPLY • link 4.6 years ago by DriesB ▴ 110

1

Entering edit mode

I never used BBmap, and trim Nextera by the sequence you indicate. This is in my experience sufficient. Unless your fragments are quite short and the reads quite long, you will anyway not reach the index sequence during sequencing but only parts of the adapter sequence itself.

ADD REPLY • link 4.6 years ago by ATpoint 81k

0

Entering edit mode

DriesB : It is enough to find the core sequence at the beginning of these adapter (which is common for all indexes). Once this is done, trimming programs will generally remove all sequence 3' of that to the end of the read.

ADD REPLY • link 4.6 years ago by GenoMax 141k

0

Entering edit mode

Okay, i was thinking of BBduk's Usage Examples; the first one uses k=23 and mink=11. The adapter sequence above is too short to find adapters within the sequence, but what if the sequence is read further than the adapter, all the way into the index? Then mink is not useful either.

... Perhaps this is a bit too theoretical? Thanks for your time anyway!

ADD REPLY • link 4.6 years ago by DriesB ▴ 110