Samtools filtering based on PNEXT
1
0
Entering edit mode
5 months ago
predeus ★ 1.9k

Hi all,

I was wondering if I'm missing something obvious: samtools can filter your BAM file based on many criteria (such as flags, tags, qlen etc) - but what is the correct way to get rid of the chimeric mappings (at least the type where R1/R2 map to different chromosomes?). I can come up with a one-liner (decode to text/check 7th column/encode back to binary), but I think this should be in samtools - am I wrong?

I am working with hisat2 BAMS, and no flags are set from what I can tell. This is one of the mappings:

SRR12813830.114509898   65      NZ_JAHRBL010000049.1    1405    1       46M     NZ_WJPN01000041.1       1609    0       TCCTGGTGGTGCCCTTCCGTCAATTCCTTTAAGTTTCAGCTTTGCA  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  AS:i:0  ZS:i:0  XN:i:0  XM:i:0      XO:i:0  XG:i:0  NM:i:0  MD:Z:46 YT:Z:UP NH:i:18
CRAM filtering samtools SAM BAM • 885 views
ADD COMMENT
0
Entering edit mode

That example read is missing the 'mapped in proper pair' flag, which is generally not set when the mate is on a different chromosome. You should be able to filter out those reads with samtools view -f2 infile.bam.

ADD REPLY
0
Entering edit mode

I wanted to retain single-end mappings though. I am very surprised that there are no specific options to filter chimeric reads from samtools (I guess there's a flag that needs to be set, so technically that's the mapper's fault, but still)

ADD REPLY
0
Entering edit mode

What is the relevance of the CRAM and SAM tags here?

ADD REPLY
0
Entering edit mode

No special relevance, but the question can be applied to all 3 formats, so I decided to add them just for the hell of it.

ADD REPLY
1
Entering edit mode
5 months ago

The "samtools" way of specifying primary alignments is to remove supplementary and secondary alignments:

samtools view -F 265 -F 2048

but as for your specific question, if I recall correctly, bwa for example has a special behavior where when a template is matching to different chromosomes it will have all POS field set but the flag will contain the unmapped flag as set. It is the weird, unmapped read, that has a position...

This is bwa specific behavior I believe.

Many aligners do not mark the chimeric and supplementary reads, so in those cases you are out of luck from - from flags alone you can't filter these, and you would need to write some processing code yourself.

Or perhaps one of the tools in the BAM file cottage industry has a prebuilt feature that suits you, try:

  • samtools, picard, sambamba, samblaster, samblast, samsift, bamtools, biobambam
ADD COMMENT
0
Entering edit mode

Thank you! I'm a bit shocked that samtools does not have this option by default. As you can see from the example above, all of those flags are not set in the alignment I obtained from hisat2.

I'll ask in the samtools mailing list. Another interesting thing I realised samtools is not capable of is to "cap" a BAM file at a certain coverage. Seems like such as basic and useful task, I'm really surprised it's not one of the options.

ADD REPLY
0
Entering edit mode

if you look a the latest version of samtools you will we see that it has a staggering number of features gaining new ones on regular basis, the level of attention, support and care is second to none.

then we have the "cottage industry" that I also mention

it all shows just how many valid ways we might want to filter alignments and that it really never ends, so we can't entirely be surprised if a feature is not there,

as for your question specifically writing a short custom filtering program would not be hard, for example with PySam would be a few lines of code.

I believe you could also do it at the command line like so

(samtools -H align.bam && samtools view align.bam | awk '$3 != $7 {print}') | samtools view -b > filtered.bam
ADD REPLY

Login before adding your answer.

Traffic: 1644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6