Question: Getting number of mismatches from bam record
1
gravatar for noah
4.4 years ago by
noah10
United States
noah10 wrote:

I'm trying to estimate the sequencing error rates (mismatches and indels) from a bam file.

I can get the number and length of insertions and deletions from the cigar string by counting the number and length of the "I" and "D" values.

How do I calculate the number of mismatches without going to the reference fasta file? We can assume the MD tag is present, but I haven't figured out how to actually parse it properly.

(I'm using python with pysam, if someone has example code somewhere.)

bam python • 2.5k views
ADD COMMENTlink modified 4.4 years ago by Zev.Kronenberg11k • written 4.4 years ago by noah10
2
gravatar for Zev.Kronenberg
4.4 years ago by
United States
Zev.Kronenberg11k wrote:

Simply use the NM tag. 

ADD COMMENTlink written 4.4 years ago by Zev.Kronenberg11k

This is, I believe, the edit distance, and therefore dependent on the scoring scheme used by the aligner. Is there some way to generally back out the number of mismatches from this? Also, bwa appears to include only mismatches, but bowtie includes insertions and deletions in its NM.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by noah10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2095 users visited in the last hour