I wanted to know if it is possible to find which mapping algorithm (TopHat, BWA, ...) has been used to map the reads from the BAM files?
Usually aligners put an entry in the header, which you can see using samtools view -H, e.g.
samtools view -H
$ samtools view -H ~/bam/NA12043.chrom20.LS454.ssaha2.CEU.low_coverage.20101123.bam.v1 | grep @PG
<...2 GATK-related @PG records...>
@PG ID:ssaha2 VN:2.5 <-------- tells that the aligner was ssaha2
Header of some BAM files do not include the PG tag. For instance, using samtools view -H file.bam, the output includes lines only with @SN and @VN tags. What should I do with this file?
In this case, the last resort is looking at the tags that are seen in the reads. Meaning of tags starting with X/Y/Z is not fixed by standard, and different aligners use them in different ways. For instance, TopHat stores strand in XS tag while BWA uses that tag to store suboptimal alignment score - so for TopHat it's XS:A:+/-, while for BWA XS:i:<an integer>. I haven't seen any comprehensive table about such differences but you can read documentation for each possible aligner and check if the definitions for these tags make sense.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy