Question: Correlation between methylation (450K) and gene expression (RNA-Seq)
gravatar for Nicolas Rosewick
5.3 years ago by
Belgium, Brussels
Nicolas Rosewick9.3k wrote:


I wonder how to do a correlation between 450K methylation data and RNA-Seq data. The major issue for me is that I have several probes per gene ( in the promoter, the gene body, and 5' and 3'UTR) ; and only one expression value (from RNA-Seq data). 

For example, I've 10 probes for one gene (ELF3). Several probes have a significant adj p-value other not ... and this gene is differentially expressed in my RNA-Seq data. Do I have to regroup probes by ucscRefGene_GROUP (TSS1500, TSS200, 1stExon, gene body, 5UTR, 3UTR,..) , and perform a mean p-value (that's a little bit odd ..) ?

Some advices/comments/ideas ?


rna-seq 450k • 2.5k views
ADD COMMENTlink modified 2.3 years ago by Charles Warden8.0k • written 5.3 years ago by Nicolas Rosewick9.3k

Hi, did you ever find a solution to this? Please help!

ADD REPLYlink written 2.3 years ago by Will0

I don't know - what is the hypothesis? Could methylation at a promoter be sufficient to reduce expression of the gene, irrespective of methylation at other sites of the same gene? I think that you could build gene-to-probes models, whereby you are regressing the probes' methylations to the expression of the gene.

lm(GeneExpression ~ probe1 + probe2 + probe3)
ADD REPLYlink written 2.3 years ago by Kevin Blighe71k
gravatar for Charles Warden
2.3 years ago by
Charles Warden8.0k
Duarte, CA
Charles Warden8.0k wrote:

It took me a second to realize the update was a comment (so, I don't know if the original user is still having an issue). However, for the sake of discussion, I'll throw some ideas out there:

1) I would like to see consistent signal from multiple probes / sites. So, if you have a way to determine that there is a region with a consistent methylation trend at multiple nearby sites, I would use some sort of measurement for that region (although precisely what to use can vary somewhat between projects).

2) In the case of suggestion #1, you would have one value per region (rather than probe / site). However, if you are familiar with R and you have a lot of sites / probes to compare, it is possible that you may want to see if you can implement something based in C++ (to speed up the time for the separate tests).

As one possible example, you can take a look at the COHCAP code for possible ideas:

This includes the fastLmPure() function in RcppArmadillo, which doesn't really involve knowing Rcpp/C++ (you just have to write some sort of wrapper in R).

Also, outside of COHCAP, it also looks like you can also use the fastLm() function in a similar way as the lm() function:

To be clear, in COHCAP, the gene expression tests are for a limited number of regions, I don't do this for comparing expression and methylation. However, if this sounds helpful, then you can get some more information about the alt.pvalue parameter in the documentation for the and functions:

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Charles Warden8.0k

It took me a second to realize the update was a comment

Yeh, it was revived by Will, who probably found this via Google. It's good to provide answers to old threads, nevertheless, as otherwise the biostar bot will bump the unanswered Q to the top of the pile.

ADD REPLYlink written 2.3 years ago by Kevin Blighe71k

Good point - hopefully, this will be useful to Will :)

ADD REPLYlink written 2.3 years ago by Charles Warden8.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2107 users visited in the last hour