Question

microarray, RNAseq, CEL, edgeR, etc. for DGE analysis

0

Entering edit mode

6.2 years ago

moxu ▴ 510

I am working on a project which involves both Affy microarray gene expression datasets and RNAseq datasets. My most recent experience is with RNAseq using edgeR. Although I have some experience with microarray DGE analysis, but that's quite a while ago and I am not sure how the field has progressed recently. Searching the internet found information posted more than 10 years ago, so I am not sure if such information is still valid today. So please bear with me if the questions below are too naive or have been answered elsewhere and are still correct.

How to extract expression level from a .CEL file?
How to collapse probe expression levels into gene expression levels?
What's the best way to compare microarray samples with RNAseq samples? I understand it's not advised to do so, but what I have is a set of control samples in microarray, and a set of treatment samples in RNAseq.
Can I use edgeR for the microarray datasets after some preprocessing (e.g. normalization)? I have developed a whole pipeline for DEG analysis based on edgeR (e.g. volcano plot, MDS plot, heatmaps), and it would be nice if the microarrays can be fed into edgeR.

Thanks a lot in advance!

RNA-Seq microarray gene • 4.8k views

ADD COMMENT • link updated 6.2 years ago by Kevin Blighe 87k • written 6.2 years ago by moxu ▴ 510

score 3 · Accepted Answer · 2018-01-26

How to extract expression level from a .CEL file?

Reading the fluorescent intensities in the CEL files, initially, will depend on the microarray manufacturer and version. If you let me know which one you are using, then I can guide further. Once you have read the CEL files into a Expression Set object, the subsequent steps are fairly standard for the majority of cDNA microarrays.

---------------------------------------

How to collapse probe expression levels into gene expression levels?

project.bgcorrect.norm.avg <- rma(project, background=TRUE, normalize=TRUE, target="core")
project.bgcorrect.norm.avg.Exons <- rma(project, background=TRUE, normalize=TRUE, target="probeset")

Usually this is performed during the normalisation, which is performed using rma() or gcmRA(). These perform background correction, quantile normalization, and then transform by log base 2 (in the case of gcrma(), expression values are also adjusted for probe and target sequence GC bias). More specifically:

Summarise by gene: rma(..., background=TRUE, normalize=TRUE, target="core")
Summarise by probe / exon: rma(..., background=TRUE, normalize=TRUE, target="probeset")

-----------------------------------------

What's the best way to compare microarray samples with RNAseq samples? I understand it's not advised to do so, but what I have is a set of control samples in microarray, and a set of treatment samples in RNAseq.

Yes, why do you want to do this? The best that you can do is normalise them each as per their respective recommended guidelines, and then get them on the same data distribution (e.g. both log2 expression values). After that, you could convert these to the Z scale and then perform a simple / manual merge. If a transcript exists in one but not another, then there's nothing that you can do - it has to be eliminated.

-------------------------------------------

Can I use edgeR for the microarray datasets after some preprocessing (e.g. normalization)? I have developed a whole pipeline for DEG analysis based on edgeR (e.g. volcano plot, MDS plot, heatmaps), and it would be nice if the microarrays can be fed into edgeR.

You should make your functions as reproducible as possible. MA, volcano, box, etc plots are all applied generally to different types of data; therefore, take the opportunity to adapt your functions for general use so that you can re-use them again and again. As a start, here's some code for a simple volcano plot (to convert this to a MA plot is easy): A: Volcano Plot from DEseq2

Here are some other ideas: A: Hierarchical Clustering in single-channel agilent microarray experiment

Kevin

score 1 · Accepted Answer · 2018-01-26

You can do everything with limma, including reading CEL files, analysing RNAseq with limma-voom, and plot your stuff. In fact, several edgeR functions are in fact limma functions. Carefully read the limma User Guide, and then ask again if you still have questions.

What's the best way to compare microarray samples with RNAseq samples? I understand it's not advised to do so, but what I have is a set of control samples in microarray, and a set of treatment samples in RNAseq.

If you don't have samples in common between control and treatment, you can't disentangle technical variance from biological variance, so you can't know the biological significance of any gene you find as differentially expressed.