Question: Removing vector sequences from pacbio BAM files
0
gravatar for bnorris823
2.0 years ago by
bnorris8230
bnorris8230 wrote:

Hello,

I have a PacBio BAM files with 10kb+ long reads and a vector sequence that is about 8kb. I want to remove any part of the reads which are a certain percent matching the vector sequence.

I have looked into BBduk, but can't seem to find a way to remove any matching sequences, only filter out reads with matching sequences.

Please let me know if there is a different approach that I should be taking.

Thanks.

bbduk pac bio sequence • 680 views
ADD COMMENTlink written 2.0 years ago by bnorris8230

You didn't tell us what you want to do afterwards. If it's reference genome alignment I would just include the vector sequence as a chromosome and let the aligner sort it out.

ADD REPLYlink written 2.0 years ago by WouterDeCoster44k

way to remove any matching sequences, only filter out reads with matching sequences.

What does that mean?

You may able to use bbsplit.sh with the vector (and reference) sequence to bin reads containing the vector.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax90k

related side-note question :

is bbduk suitable for pacbio data anyway?

ADD REPLYlink written 2.0 years ago by lieven.sterck8.6k

I want to trim out the vector sequence from the middle of the read.

ADD REPLYlink written 2.0 years ago by bnorris8230
1

I am not sure if any of standard trimming programs are setup to do this since most are meant for small reads and expect the adapter to be on one end (or other) of the read. Your best bet may be to filter the reads containing the vector, separate them and then deal with them separately.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax90k

would bbduk with ktrim=r and providing the vector as the adapter file not get the desired behavior?

ah, and you'll need to first convert the BAM file to fastq file

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by lieven.sterck8.6k

wouldn't that remove everything to the right of where the vector is found? What if the vector is in the middle of a read?

ADD REPLYlink written 2.0 years ago by bnorris8230
1

yes, indeed. But I thought that was the goal, my bad.

as pointed out by genomax I don't think there is an off-the-shelf tool that will do that for you.

I'm also a bit puzzled why you want to do that, or rather how you end up with that kind of situation in your pacbio reads? Can the vector also be on the extremities or do you suspect it to always be in the middle?

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by lieven.sterck8.6k

ya it could be anywhere in the reads. I think I'm going to try to trim right and left and then merge the reads back together in a python script.

ADD REPLYlink written 2.0 years ago by bnorris8230

That would be one way to do this. Filter the reads out using bbsplit.sh.

ADD REPLYlink written 2.0 years ago by genomax90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2212 users visited in the last hour