Genes do not have methylation levels, but you can certainly summarize the methylation around a gene (average, median, min, max; 2kb upstream, gene body, first intron). You will need to determine how best to summarize your own data, though (or if summarizing even makes sense for your questions).
There's are a couple ways to do this, none of which are exactly perfect.
- Average (or take the median of) all of the methylation percentages in the gene.
- You'll likely want to filter out low-coverage Cs.
- Use the methylation counts (i.e., the number of fragments supporting methylated/unmethylated Cs at a position), where you just sum everything and take the percentage.
- Again, you probably want to filter out low-coverage Cs.
- This only makes sense if you have even coverage or similarly biased coverage between samples.
- Forget trying to come up with a single number and just compare sites with some minimum coverage.
I actually prefer (3), but it seems that most other people prefer to average methylation percentages. To each their own.
As Yinzl2007, I would like to summarize the methylation level in each gene in order to correlate these values with gene expression levels in different tissues of a non-model species. I have WGBS data for four individuals of the same population, three of them at 6-7x and the fourth at a lower coverage. The gene expression profiles (obtained in a different study) comes from a different individual in the same population.
One of my doubts is whether to pool the methylomes of all the individuals to get the methylation values for each gene. I will filter out low-coverage Cs as well as for too high-coverage Cs. Do you think that this approach is adequate?
I planned of averaging the methylation percentage in each gene (approach 1), although I understand that this is a rough approach and would imply that every methylated C “weights” the same. I was considering to summarize the methylation just in the promoter region, but I wonder whether using all the methylated c's in promoter+gene body, or promoter+first intron would be a better option. Or even other alternative. I do not find any consensus on that.
Should I consider only genes containing a mínimum number of methylated C’s and discard the rest of genes? I was thinking of putting a mínimum limit of 20 Cs with a methylation level over 30%.
Any suggestion is more than welcome.
Thanks a lot in advance,