Question: Trimming out primer sequences in the middle of reads
0
gravatar for s.kyungyong64
2.8 years ago by
Berkeley, USA
s.kyungyong6410 wrote:

Hi!

I have PacBio reads that need to be assembled. These reads have Illumina primers at the both ends as well as in the middle. The problem is that the primer sequences vary and standard trimming cannot remove all the primers in the reads. My lab wants the assembled genome with the best quality, so I might have to write a script to detect the primers in the middle. I am currently thinking that I might want to remove sequences that are 80 ~ 100% similar to the primer sequences. But I am worried that this would also get rid of some informative sequences of the genome.

How do you guys deal with such situations?

Thank you in advance!

genome • 1.8k views
ADD COMMENTlink modified 2.8 years ago by Brian Bushnell16k • written 2.8 years ago by s.kyungyong6410
2
gravatar for Brian Bushnell
2.8 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

I wrote a tool for removing internal PacBio adapter sequences, in the BBMap package:

removesmartbell in=reads.fq out=clean.fq split=t adapter=ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT

By default it uses the standard PacBio SmartBell adapters, but you can specify an Illumina adapter in this case. It uses indel-aware alignment designed to model PacBio's error rates of indels and substitutions, and has a very low false-positive rate. I don't remember the exact rate but I think it was around 1 in 5 megabases of PacBio sequence, or something like that. So it should not cause any problems downstream.

ADD COMMENTlink written 2.8 years ago by Brian Bushnell16k
1
gravatar for dariober
2.8 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

I don't have direct experience with the situation you describe but cutadapt is very flexible in how you want to detect, remove or mask one or more adapters. See for example the paragraph https://cutadapt.readthedocs.io/en/stable/guide.html#multiple-adapter-occurrences-within-a-single-read

If the adapter sequence you give in input is long enough, say > 15 nt, it's unlikely you will throw away informative sequence (roughly speaking, of course).

ADD COMMENTlink written 2.8 years ago by dariober10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1293 users visited in the last hour