Hi all,
I am trying to decode the SAM MM and ML tags read by read and create a table of methylation locations on that read. For example, from an aligned sam file:
Nanopore_Sequence_Example 4 * 0 0 * * 0 0
GTTATGTAACCTACTTGGTTCCATTACGTATTGCTGGTGCTGAAGATTGTAGGTGTCTTTGTGCAGAGTGTATGATATACACGGCGGTGCTGAAGAAAGTTATTGCGGGTGTATTTGTGCAGAAGTATATGATGTGCGCGGGCGGAGGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGCAGAAGTATATGATGGCGAGGTGTTGAAGAAAGTTGTCGGTGTCTTTGTGCAGAAGTATATGATGTGCGCGGGCGGATCCGCCCGCGCATCCTTCTGCGCAAT
"&+*+*)$#$%'&&')%&'(,)))*1555:>B@7777BBCD10.*.%%&)$$$*14//0.,.-..'(%(%&''..211--'$$%&+))56<;;998:44892.-,+)('&*)'&&((,++/0064:385566;>A>=6@<@AA?;:::=>>?==7=?@=<>>;BA@??@?=:;;;7011+,,).-,++++-&$$%)(,)*,)$%'(((&&'(&&%%%&+0///20877656??BAA@@ABBEFGC>=57793222532110,50-++$$$%&),,,+())
rl:i:0
MM:Z:C+h?,5,5,0,1,1,0,0,1,2,0,2,0,0,1,2,0,4;C+m?,5,5,0,1,1,0,0,1,2,0,2,0,0,1,2,0,4;
ML:B:C,159,6,135,2,7,9,3,4,13,11,6,22,6,1,2,2,218,0,4,19,1,2,7,4,1,0,1,4,15,11,2,1,0,0
I'm trying to create a table/list that would look like this:
Read: Nanopore_Sequence_Example
Methylation position 1: 5mC Methylated
Methylation position 2: Unmethylated
Methylation position 3: 5hmC Methylated
I am new to methylation calling, any help would be amazing.
I made changes to how I decode m6A events and updated links in the answer above. Python code is in a separate repository. Javascript code is a bit more fleshed out. Hopefully this is useful for others.