I'm using the methylKit package from Bioconductor for perform statistics analysis of methylation data, but when I use the getMethylationStats() function, I can't understand the information shown in the histogram no matter I use the help() command or I read the package documentation. Please can anyone explain me what I'm looking in this histogram?
You need to read up on what your data actually means. Below I'll be copying from methylKit's vignette to hopefully point you to the most important concepts of the data that you are probably looking at.
I assume you started with a table that look something like this:
## chrBase chr base strand coverage freqC freqT
## 1 chr21.9764539 chr21 9764539 R 12 25.00 75.00
## 2 chr21.9764513 chr21 9764513 R 12 0.00 100.00
## 3 chr21.9820622 chr21 9820622 F 13 0.00 100.00
## 4 chr21.9837545 chr21 9837545 F 11 0.00 100.00
## 5 chr21.9849022 chr21 9849022 F 124 72.58 27.42
The last two columns, named freqC and freqT are the ones that are of importance for the histogram you showed (or, as the methylKit vignette puts it: "we can check the basic stats about the methylation data such as coverage and percent methylation"). FreqC and freqT mean "frequency of C's" and "frequency of T's", i.e. how many times was a "C" recorded for the locus indicated by the positional information in the first four columns and how many times was there a "T". The methylKit function that you used simply generates a histogram of the "freqC" column as that indicates the fraction of sequencing reads where the locus of interest still contained a cytosine after bisulfite sequencing. It simply added the convenient service of replacing the x-axis label, which should be "freqC", with what it actually means, i.e. "% methylation".
Does this help or are there any specific issues you have trouble wrapping your head around?
This really helps me, thank you so much, Friederike!