I will need to create summaries for a bunch of SAM files produced by Tophat (RNA-Seq spliced mapping). Are there any tools /scripts out there which will give me stats for unique vs non unique matches, spliced vs non_spliced mapping and finally split it into number of mismatches?
EDIT I collected all tags from my SAM file and feed it to explain_sam_flags.py (you can get it from Picard source: http://picard.svn.sourceforge.net/viewvc/picard/trunk/src/scripts/
None of my tags from Tophat's accepted_hits.sam has "not primary alignment". Looks like Tophat reports only unique matches, which is OK for me. Can somebody confirm this?
EDIT 2 "Tophat reports only unique matches" can not be true. Sequence like below have "0" flag. "CAACAACAGCAACAACAACAGCAACAGCAACAGCAACAGCAACAGCAACAACAA". Puzzling.
EDIT 3 (SAM example)
8_96_444_1622   73      scaffold00005   155754  255     54M     *       0       0       ATGTAAAGTATTTCCATGGTACACAGCTTGGTCGTAATGTGATTGCTGAGCCAG  BC@B5)5CBBCCBCCCBC@@7C>CBCCBCCC;57)8(@B@B>ABBCBC7BCC=>  NM:i:0
8_80_1315_464   81      scaffold00005   155760  255     54M     =       154948  0       AGTACCTCCCTGGTACACAGCTTGGTAAAAATGTGATTGCTGAGCCAGACCTTC  B?@?BA=>@>>7;ABA?BB@BAA;@BBBBBBAABABBBCABAB?BABA?BBBAB  NM:i:0
8_17_1222_1577  73      scaffold00005   155783  255     40M1116N10M     *       0       0       GGTAAAAATGTGATTGCTGAGCCAGACCTTCATCATGCAGTGAGAGACGC      BB@BA??>CCBA2AAABBBBBBB8A3@BABA;@A:>B=,;@B=A:BAAAA      NM:i:0  XS:A:+  NS:i:0
8_43_1211_347   73      scaffold00005   155800  255     23M1116N27M     *       0       0       TGAGCCAGACCTTCATCATGCAGTGAGAGACGCAAACATGCTGGTATTTG      #>8<=<@6/:@9';@7A@@BAAA@BABBBABBB@=<A@BBBBBBBBCCBB      NM:i:2  XS:A:+  NS:i:0
8_32_1091_284   161     scaffold00005   156946  255     54M     =       157071  0       CGCAAACATGCTGGTAGCTGTGACACCACATCAACAGCTTGACTATGTTTGTAA  BBBBB@AABACBCA8BBBBBABBBB@BBBBBBA@BBBBBBBBBA@:B@AA@=@@  NM:i:0
Two reads: 8_17_1222_1577 and 8_43_1211_347 are spliced.
My second column tags are: 65 73 81 83 97 99 113 115 129 137 145 147 161 163 177
I just stumbled over the difference between BAM (binary) and SAM (text) formats. Can you give a shortened example of your file?