Hi everyone, sorry about the long query,
Due to my lack of programming background, I've been trying to use MethHC (http://methhc.mbc.nctu.edu.tw/php/search.php?opt=gene) to find the correlation between gene expression and methylation in breast cancer. With MethHC, the user inputs a gene and the correlation graphs are returned - one is for differentially methylated/expressed values and the other isn't. I've been testing genes from literature stating that the promoter hypermethylation of X gene leads to loss of expression in the specific cancer.
I see the option to select specific probes, and the probes are "annotated probes into 8 gene regions (promoter, enhancer, TSS1500, TSS200, 5ˊUTR, 1st exon, gene body and 3ˊUTR)". I noticed different probes can change correlation from negative to positive. Choosing just one probe, I was wondering if I could (1) compare whether the methylation of CpG islands in the promoter region or the methylation of CpG islands in the gene body affects the correlation r squared value. I saw some papers where the methylation of oncogenes in the gene body region has an effect on expression, like how promoter methylation would be the case for tumour suppressor genes. (2) Or, if it makes sense to test the methylated CpG Islands' distance from TSS on the effect of correlation (testing regional dna methylation vs expression).
Note: My aim is to conduct an extremely simple analysis of methylation and expression with independent and variable variables and the above is the only idea I've had so far. Basically I am trying to use gene regions to explore its effect on correlation between methylation and expression (not trying to make novel discoveries but simply trying to show what's known about DNA methylation and cancer).
I would really appreciate any insights anyone has to offer me, and it would be even more helpful if someone could explain what is there I can do with correlation values from databases like MethHC and Wanderer (http://maplab.imppc.org/wanderer/), for example, how they can be tested for biological significance using some statistical calculations. Thank you incredibly much