bamtools filtering syntax - reads containing more than X mismatches and soft-clipped reads
Entering edit mode
13 months ago
joelepaul • 0

Hi @ll!

I'm extremely new to the field of bioinformatics, though I have some programming knowledge. A lot of tools have quite nice manuals out there, which helps me learning about how to use them properly. However, when it came to bamtools, I have been stuck.

My aims are twofold - starting from nice *.bam datasets

1) to remove reads containing more than 4 mismatches. However, I can only find the syntax to filter reads that contain 0 mismatches.

bamtools filter -tag XM:0 -in reads.bam -out reads.noMismatch.bam

However, I cannot find any documentation about the syntax of the BWA-specific flags such as XM, so I do not know how to adapt this code correctly to include a "more than" function.

2) to remove soft-clipped reads. My supervisor told me to use bamtools for that, but all guides like these Remove Soft Clipped Bases refer to other software/scripts. The bamtools manuals also do not say anything about how to do that. Does anyone know how to use specifically bamtools for that purpose?

Thank you for your time!

Cheers Joe

bwa bamtools • 416 views
Entering edit mode
13 months ago

not bamstools, using samjdk

java -jar dist/samjdk.jar -e 'Integer xm = record.getIntegerAttribute("XM"); if(xm!=null && xm > 4) return false; return record.getCigar()==null   || !record.getCigar().isClipped();' input.bam

Login before adding your answer.

Traffic: 1285 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6