I have a bismark aligned bam file, and I would like to extract the number of methylated CpGs and the number of unmethylated CpGs for each read in the bam file.
I know how to extract methylation calls using bismark. When you use bismark to call methylation, it will give you the number of methylated and unmethylated reads overlapping each CpG site in the genome. But that's not my question.
I want to know all the methylated and unmethylated sites on each read.
For example, the first read in the bam file might overlap 4 CpG sites. I would like to know how many of those CpG sites were methylated and how many were unmethylated.
I believe I found the answer to my question here.
counting number of CpGs that are methylated and unmethylated in a read of whole genome bisulfite data
Bismark calls methylation values and stores the calls in the XM tag of each read in the BAM file. You can use MethylDackel do this, or you can count the number of 'z' and 'Z' characters for each read (pair) in the BAM
Thank you! This is very helpful!
Please read the manual for extraction the methylation calls.
Yes, I know how to do this, but I am not trying to extract methylation calls. See my other comment.