Detection Of Alignment Algorithm Based On Bam Files
7.9 years ago

I wanted to know if it is possible to find which mapping algorithm (TopHat, BWA, ...) has been used to map the reads from the BAM files?

alignment bam • 2.6k views
7.9 years ago
lomereiter ▴ 470

Usually aligners put an entry in the header, which you can see using samtools view -H, e.g.

\$ samtools view -H ~/bam/NA12043.chrom20.LS454.ssaha2.CEU.low_coverage.20101123.bam.v1 | grep @PG

<...2 GATK-related @PG records...>

@PG ID:ssaha2 VN:2.5 <-------- tells that the aligner was ssaha2

Header of some BAM files do not include the PG tag. For instance, using samtools view -H file.bam, the output includes lines only with @SN and @VN tags. What should I do with this file?

In this case, the last resort is looking at the tags that are seen in the reads. Meaning of tags starting with X/Y/Z is not fixed by standard, and different aligners use them in different ways. For instance, TopHat stores strand in XS tag while BWA uses that tag to store suboptimal alignment score - so for TopHat it's XS:A:+/-, while for BWA XS:i:<an integer>. I haven't seen any comprehensive table about such differences but you can read documentation for each possible aligner and check if the definitions for these tags make sense.