Question

Combining Microarray and RNAseq data

7

Entering edit mode

8.4 years ago

Ron ★ 1.2k

Hi,

I want to combine RNAseq and microarray data of samples, and do clustering analysis.

I don't want to comparison between these techniques but merge them and analyze.I came across this post but this mentions comparison of these techniques: Combined Rnaseq + Microarray Transcriptomics Datasets

Since these are on different scales, I can't do clustering. Also I have tried batch normalization using combat but this does not work. Any suggestions?

Thanks

Ron

RNA-Seq normalization R microarray • 11k views

ADD COMMENT • link updated 6 months ago by aUser ▴ 30 • written 8.4 years ago by Ron ★ 1.2k

0

Entering edit mode

I would not try to normalize them together or make them comparable, they never will be. Instead maybe a non-negative matrix factorization approach or some similar algorithm could be helpful?

ADD REPLY • link 8.4 years ago by Jason Chen ▴ 20

0

Entering edit mode

Dear Ron,

I have a similar situation to yours, where I want to merge RNA-Seq and Microarray datasets (I have some samples from RNA and others from Microarray), I would like to ask if you found a way to do that? and what were the results?

Many Thanks, Bests, Ilyes

ADD REPLY • link 6.7 years ago by baali.ilyes • 0

score 4 · Answer 1 · 2016-08-01

I don't think clustering or other subgroup discovery methods would really be appropriate to perform on a combined data set. You could try applying the LIMMA/Voom normalization to the RNAseq data -- this corrects for total library size and attempts to capture mean-variance relationships and applies a log normalizaiton. Then, for each data set separately, you could z-scale each gene's expression values. This might put things on an identical scale and focus on the mean-variance relationships within the RNAseq data. Perhaps limit your examination to genes that are above some detection threshold (>10 raw reads in 50% of samples, or something similar) in the RNAseq data. You could try clustering or subgroup discover, however, if your clustering solution consistently aligns with the two platforms, then you know some bias still exists in the data. Instead, maybe perform separate clustering analysis for each data set.

If you're interested in finding differentially expressed genes, then one acceptable approach might be to model each data set separately using appropriate methods for each data type, then combining the resulting test statistics using a meta-analytic method. On the other hand, meta-analysis (e.g. per each gene) might require both the microarray and RNAseq test statistics (e.g. p-value and effect size) to be produced by the same statistical test. In that case, you might consider using normalizing the RNAseq data following the LIMMA voom approach -- supposedly this renders the data suitable for parametric analyses (i.e. it might be appropriate to use the same statistical model as used for the microarray, facilitating meta-analysis.).

Other approaches to finding differentially expressed genes using the two platforms outlined in these sources: https://support.bioconductor.org/p/75246/ https://peerj.com/articles/1621/ http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0016345

score 1 · Answer 2 · 2016-08-02

1

Entering edit mode

7.7 years ago

chris86 ▴ 400

I think you could z score normalise each row and then just cluster them all together and possibly use a technique like consensus clustering to improve robustness.

ADD COMMENT • link 7.7 years ago by chris86 ▴ 400

score 1 · Answer 3 · 2017-07-27

I found this article https://peerj.com/articles/1621/ quite interesting. They get good results with quantile normalization [targeted, that means that they adapt a target dataset (RNA-seq) to a reference dataset (microarray)] and TMD methods. For checking the code they use just go to the Supplementary info.

R packages: preprocessCore package TDM package

score 1 · Answer 4 · 2017-07-29

1

Entering edit mode

6.7 years ago

theobroma22 ★ 1.2k

You can use SVA on both datasets and then compare them. There are many others ways to do this as well, but SVA may be the most straight forward. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3617154/

ADD COMMENT • link 6.7 years ago by theobroma22 ★ 1.2k

score 0 · Answer 5 · 2023-10-18

Just in case someone ese stumbled upon this issue, there is a package to integrate RNA-seq and microarray data; GEDI (https://www.biorxiv.org/content/10.1101/2021.11.11.468093v1) that can be used to process/integrate both data sets. GEDI package used SVA (as mentioned by @theobroma22).

Apart from this; following publication is also useful: A joint Bayesian modeling for integrating microarray and RNA-seq transcriptomic data (https://matianzhou.github.io/files/preprints/CBM.pdf).