MD tags with diff ref sequence
0
0
Entering edit mode
2.6 years ago
radwa.raed ▴ 40

Hi, I came across this puzzling finding. I have two different reads in a bamfile, with the same exact sequence. For one of them, when I look at the MD aligned pairs, it seems like the last position (position 42) matches to the reference sequence (the T is written in capital letter case).

TTCTGAATTAGCTGTATCGTCAAGGCACTCTTGCCTACGCCAT
(42, 25398283, 'T')

But for the other read, the MD aligned pairs show that the last position (position 42) is a mismatch and the reference sequence should be a C at that position (C is written in small letter case).

TTCTGAATTAGCTGTATCGTCAAGGCACTCTTGCCTACGCCAT
(42, 25398283, 'c')

In reality, the reference sequence at that position (chr12:25398283 in hg19) is a C but I don't understand why there is a difference in the MD aligned pairs between these two different reads that have the same exact sequence. Any ideas?

Thanks so much!

Note: this numbering is 0-based indexing in pysam. So looking up the position in a 1-based indexing genome browser would be off by 1 (25398284 instead of 25398283)

bam • 456 views
ADD COMMENT

Login before adding your answer.

Traffic: 3230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6