bamtools filtering syntax - reads containing more than X mismatches and soft-clipped reads
13 months ago
joelepaul • 0

Hi @ll!

I'm extremely new to the field of bioinformatics, though I have some programming knowledge. A lot of tools have quite nice manuals out there, which helps me learning about how to use them properly. However, when it came to bamtools, I have been stuck.

My aims are twofold - starting from nice *.bam datasets

1) to remove reads containing more than 4 mismatches. However, I can only find the syntax to filter reads that contain 0 mismatches.

bamtools filter -tag XM:0 -in reads.bam -out reads.noMismatch.bam


However, I cannot find any documentation about the syntax of the BWA-specific flags such as XM, so I do not know how to adapt this code correctly to include a "more than" function.

2) to remove soft-clipped reads. My supervisor told me to use bamtools for that, but all guides like these Remove Soft Clipped Bases refer to other software/scripts. The bamtools manuals also do not say anything about how to do that. Does anyone know how to use specifically bamtools for that purpose?

Cheers Joe

bwa bamtools
13 months ago

not bamstools, using samjdk http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samjdk.jar -e 'Integer xm = record.getIntegerAttribute("XM"); if(xm!=null && xm > 4) return false; return record.getCigar()==null   || !record.getCigar().isClipped();' input.bam