How To Compare Gene Expression And Methylation Level Of A Gene
1
5
Entering edit mode
7.0 years ago
Chip ▴ 110

I am analyzing data (from the TCGA project) of patients affected by Glioblastoma Multiforme and, specifically, I want to compare Gene Expression values with Methylation levels.

Methylation levels have been obtained using Illumina Infinium HumanMethylation27 BeadChip, of which I downloaded the product support file*, that retrieves methylation levels of ~27k CpG sites.

Here comes the issue: for a lot of genes there are several probes (hence, CpG sites) that regulates the same gene. I was wondering what could be the best way to treat them as a unique entity, so to obtain a single methylation level for each gene.

I was thinking of taking the average of all the probes that control one specific gene but the assumption here is "all CpGs have the same importance as gene expression regulators" and I don't know if I can justify it.


* https://support.illumina.com/array/array_kits/infinium_humanmethylation27_beadchip_kit.ilmn

** http://support.illumina.com/downloads/humanmethylation27_product_support_files.ilmn

dna methylation gene expression • 5.3k views
ADD COMMENT
1
Entering edit mode

This is an excellent question. How to summarise methylation probes to gene level is an issue that is routinely ignored or glossed over in publications on this topic. I call it the 'genes x samples' problem, because statistics papers always talk about "matrices of genes x samples" with no indication of how they were derived.

ADD REPLY
0
Entering edit mode

Thanks, though not a definitive answer it provides very useful insight. I will proceed taking one probe per gene.

ADD REPLY
0
Entering edit mode

Hey, there! Do you have find any method to do this jod? Recently, I also met the same problem. Thanks a lot! Wayne

ADD REPLY
0
Entering edit mode

As suggested in Neilfws's comment I decided to choose the probe with the highest variance.

ADD REPLY
1
Entering edit mode
7.0 years ago
B. Arman Aksoy ★ 1.2k

Selection of the probes is a hard problem; the way TCGA does it to assign a methylation score for each is to correlate all probe values in the proximity of a gene with the gene expression, and pick the one that best negative correlation. This is of course having some phenotype of interest in mind, i.e. it works only if you want to see methylation probes that can help explain the gene expression levels across many patients. If you have another phenotype in mind, I think you can apply the same thing, but correlate the values with a different measure instead of gene expression.

Have a look at this question and my reply to, I think you can find it useful: A: Interpreting Fractional Methylation Data

Also you can learn more about the TCGA way from this web site: https://confluence.broadinstitute.org/display/GDAC/Methylation+Preprocessor

ADD COMMENT

Login before adding your answer.

Traffic: 1154 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6