Removing targeted sequences from contigs
1
0
Entering edit mode
6.7 years ago
mz1101 • 0

Hi,

can anybody suggest a tool which aligns a targeted sequence (10 kb) against contigs/scaffolds or long reads, then removes that sequence from the contig and if necessary splits the contig into two if the undesired sequence is flanked by other sequence?

I could script this with BLAST or BWAmem coordinate alignments but I'd rather not reinvent the wheel if there is a tool which does this already. Most contaminant (adapter) trimming tools are designed for short stretches of sequence.

Thanks

genome alignment • 1.4k views
ADD COMMENT
0
Entering edit mode
6.7 years ago

You might try BBMap's BBMask, which can mask a sequence using a sam file, converting all covered bases bases to N or lowercase. It can additionally split the result into contiguous sequences of unmasked bases only and discard the masked regions, which sounds like what you are looking for.

bbmask.sh in=sequence.fa sam=mapped.sam masklowentropy=f split=t out=split.fa
ADD COMMENT

Login before adding your answer.

Traffic: 2375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6