I am trying to find some statistics of mismatches and indels from SAM/BAM file. The SAM file is generated using BWA. The statistics should include the %mismatch and %indel for each aligned reads. I am wondering if there are any good tools I could use.
Is there an executable? I am getting error while compiling.
Yes, there are statically compiled binaries available here and we did run it previously on Nanopore, Illumina and PacBio reads but if you experience any problems please let me know.
It worked fine except the BAM files generated by LAST
Looks nice, did you announce it at the Tools section?
Thanks, I have not created a Tools page in Biostars but there is a fairly extensive README on github.
metrics.tsv should contain all the statistics, right? I am having hard time to read that file. Would it be possible to make it a text file that should contain like the following
It is a tab-delimited text file. You can use datamash to convert it to row-format:
The column-format is useful if you want to compare statistics across multiple samples because you can just concatenate the metrics files.
Thanks for your alfred tool. I have computed the mismatch rate and error rates of my long-read alignment. Mismatch rate seems to be number of mismatches / number of aligned bases. How do you define the error rate? I have an error rate of 11.4%. What does that mean? And how do you define the insertion and deletion rate? Is one wrong insertion counted as one and then you divide the total by the number of aligned bases? If there are two consecutive wrongly inserted bases, do you count that as two errors? Thanks.
The InDel size doesn't matter as discussed in this Alfred issue.