Question: Trimming out primer sequences in the middle of reads
gravatar for s.kyungyong64
3.5 years ago by
Berkeley, USA
s.kyungyong6430 wrote:


I have PacBio reads that need to be assembled. These reads have Illumina primers at the both ends as well as in the middle. The problem is that the primer sequences vary and standard trimming cannot remove all the primers in the reads. My lab wants the assembled genome with the best quality, so I might have to write a script to detect the primers in the middle. I am currently thinking that I might want to remove sequences that are 80 ~ 100% similar to the primer sequences. But I am worried that this would also get rid of some informative sequences of the genome.

How do you guys deal with such situations?

Thank you in advance!

genome • 2.4k views
ADD COMMENTlink modified 3.5 years ago by Brian Bushnell17k • written 3.5 years ago by s.kyungyong6430
gravatar for Brian Bushnell
3.5 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

I wrote a tool for removing internal PacBio adapter sequences, in the BBMap package:

removesmartbell in=reads.fq out=clean.fq split=t adapter=ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT

By default it uses the standard PacBio SmartBell adapters, but you can specify an Illumina adapter in this case. It uses indel-aware alignment designed to model PacBio's error rates of indels and substitutions, and has a very low false-positive rate. I don't remember the exact rate but I think it was around 1 in 5 megabases of PacBio sequence, or something like that. So it should not cause any problems downstream.

ADD COMMENTlink written 3.5 years ago by Brian Bushnell17k

Hello, does your script also remove the reverse complement? Do I find it within the BBmap scripts?

ADD REPLYlink written 6 months ago by ricardoguerreiro212160

You can include the RC sequence in adapter file or command line above.

removesmartbell in=reads.fq out=clean.fq split=t adapter=ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT,RC_Sequence
ADD REPLYlink written 6 months ago by genomax87k
gravatar for dariober
3.5 years ago by
WCIP | Glasgow | UK
dariober11k wrote:

I don't have direct experience with the situation you describe but cutadapt is very flexible in how you want to detect, remove or mask one or more adapters. See for example the paragraph

If the adapter sequence you give in input is long enough, say > 15 nt, it's unlikely you will throw away informative sequence (roughly speaking, of course).

ADD COMMENTlink written 3.5 years ago by dariober11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1474 users visited in the last hour