Hi everyone !
I'm currently working on parsing SAM file to extract methyllation site from a nanopore sequencing. According to the SAM documentation, it's possible thanks to MM:Z and ML:B:C tags.
You can find bellow, a read, the MM:Z and ML:B:C tags extracted from my SAM file:
TGATCGCGCGGACCTGTTCTACCAGGTAGGTCACCGGGTCAAATGATATTTTGATGGTGTTGGACACCACCGTCTGGCTGGCGCTCAGGGTGCCGGAGTTCAGAGCGTAGATGAATGTCTCAAACGCGGAGGATTTCTCGCCTCCCAGCATGTAAATTGGCCACTGCAGGGCGCTGCTCTTGTCAGTATAGCGGAAATGTATGGGGAGCGGCATATTTCGTTAAGGACGGTTGCAATGGCTACCCCAGAATCTTGGCTGCTGTTGCCTTCGACCGCCGCGTTCACGCGCTCAATTGTGGGGTGGAGCACAGCGATCGCTGAAGCGGCGCACGAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACCACAAAGACACCGACAACTTTCTTCGAGCAAATTCACCTACGCCAGCAACTGAACGAAGTAC
MM:Z:C+m?,1,2,1,0,1,1,0,1,0,0,0,1,2,0,9,0,0,0,0,0,10,2,1,7,4,8,3,0,3,4,4,4,9,6,0,0;
ML:B:C,234,246,49,84,5,0,11,228,254,0,0,2,8,1,3,1,1,0,146,0,10,0,1,115,19,0,167,19,0,121,21,9,188,112,93,6
The sequence has a length of 430 bases. According to the MM:Z tag, it gives information about 5-Methylcytosine presence. The modification status of the first cytosine is unknow, the second is called, the third and the fourth are not called etc... So MM:Z tag gives information for SUM(numeric value of MM:Z) + nb of numeric value = 120 cytosines. ML:B:C tag gives the same number of information about cytosines. However, my read contains only 110 cytosines. Why is it possible to get this difference ?
Best regards.
Antoine
Hi, did you find an answer for this question? I have the exact same problem and just don't understand why there are more Cs than indicated.