Question: Microarray | RNA Seq | Methylation Arrays - Correlations?
gravatar for andrew.j.skelton73
6.2 years ago by
andrew.j.skelton736.0k wrote:


I've compared RNA Seq data to Microarray data (Downstream - same tissue cohort), by taking Mean expression of genes in the microarray and comparing them to their relative FPKM values (XY scatter). Is this an accepted method? Are there any others that people know of, that is generally accepted?

Also, if anyone has any suggestions in how to compare Methylation data to microarray data (again, same tissue type / cohort), to show a correlation between methylation and gene expression, it'd be very much appreciated!


ADD COMMENTlink modified 6.2 years ago by Irsan7.2k • written 6.2 years ago by andrew.j.skelton736.0k
gravatar for mikhail.shugay
6.2 years ago by
Czech Republic, Brno, CEITEC
mikhail.shugay3.4k wrote:


I used to calculate correlation between log2 microarray probe signal intensity (aka R.F.U) and log2 FPKM values. Note that the result will depend on several factors:

  • Are you comparing gene-wise of isoform-wise expression? While RNA-Seq allows in theory to capture all isoforms and microarray is limited to several isoforms by their design, I recommend grouping all isoforms by gene and selecting the isoform/probeset with maximal signal as reference one
  • FPKM values, which are defined as fragments per kilobas per million reads are actually calculated in different ways by different tools. For example Tophat apply various correction (e.g. GC correction) to FPKM values. You should check whether FPKM, or straightforward count/RPKM values give better correlation with your microarray data. See Correlation Of Fpkm And Length Normalized Transcript Mapped Read Count

As for me, I was able to blindly identify tumor RNA-Seq samples from breast and colon cancer by comparing them to a quite complete panel of reference tissue datasets obtained from GEO (have a look here for GEO accessions). I've got correlations in range 0.4-0.8 for all datasets. Microarray datasets that corresponded to the tumor tissue of origin gave a significantly higher correlation with tumor RNA-Seq data (in range of 0.6-0.8) than other tissue datasets.

As for Methylation data, there are many ways to show that. You can split you gene set in high-, mid- and low-expressed, split your promoter regions in methylated and un-methylated, build a contingency table and perform a statistical test for dependence. You can compare promoter methylation level distributions in groups of genes with high- and low-expression with something like Kolmogorov-Smirnov test, and vice-versa, expression distributions for genes with methylated and un-methylated promoters. As long as your data is biologically consistent, it should not much depend on statistical test you use, and you'll get a robust result, just try various approaches. Are you trying to do it for whole transcriptome or for a single gene/set of genes?

ADD COMMENTlink modified 6 months ago by RamRS27k • written 6.2 years ago by mikhail.shugay3.4k
gravatar for Irsan
6.2 years ago by
Irsan7.2k wrote:
For the first part of your question: when talking about differential gene expression, in the end people are interested in (log) fold changes and their p-values. So those are the ones you should use when comparing rna-seq results with array results. You can calculate the pearson correlation coefficient between rna-seq and array logFCs or -log(p-values). In parallel you should do linear regression and get the slope estimate. When the pearson correlation coefficient and the slope are 1 you have a perfect fit. In order to quantify the influence of methylation on mRNA expression I would use pearson correlations and linear regression as well. Also have a look at the SIM bioconductor package for integration of various omics data sets
ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Irsan7.2k

Thanks for your points, all very useful!

ADD REPLYlink written 6.2 years ago by andrew.j.skelton736.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1209 users visited in the last hour