bamtools filtering syntax - reads containing more than X mismatches and soft-clipped reads
1
0
Entering edit mode
13 months ago
joelepaul • 0

Hi @ll!

I'm extremely new to the field of bioinformatics, though I have some programming knowledge. A lot of tools have quite nice manuals out there, which helps me learning about how to use them properly. However, when it came to bamtools, I have been stuck.

My aims are twofold - starting from nice *.bam datasets

1) to remove reads containing more than 4 mismatches. However, I can only find the syntax to filter reads that contain 0 mismatches.

bamtools filter -tag XM:0 -in reads.bam -out reads.noMismatch.bam


However, I cannot find any documentation about the syntax of the BWA-specific flags such as XM, so I do not know how to adapt this code correctly to include a "more than" function.

2) to remove soft-clipped reads. My supervisor told me to use bamtools for that, but all guides like these Remove Soft Clipped Bases refer to other software/scripts. The bamtools manuals also do not say anything about how to do that. Does anyone know how to use specifically bamtools for that purpose?

Cheers Joe

bwa bamtools • 416 views
0
Entering edit mode
13 months ago

not bamstools, using samjdk http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samjdk.jar -e 'Integer xm = record.getIntegerAttribute("XM"); if(xm!=null && xm > 4) return false; return record.getCigar()==null   || !record.getCigar().isClipped();' input.bam