Question: Difference Between Rma Analysis Of Cel Files And Data From Geoquery Of Array Data
gravatar for J.F.Jiang
7.6 years ago by
J.F.Jiang850 wrote:

Hi all,

Just a discussion,

For microArray data, there are always two ways to obtain the expression value for probe across the samples,

1) download the original CEL files, then use ReadAffy & rma to get the matrix OR use justRMA directly

2) use GEOquery to obtain the matrix directly

However, I found somehow minimal difference between these two method, but I do not know why?

Another question is that can I use the matrix from GEOquery to directly do differential expression analysis as outputs of rma?

And which one is better for DE analysis, 1) using probe level 2) using gene level Because one gene may point to several probes, when we carry out DE analysi, one step is to obtain the DE output which needs p.adjust, so the question is that the array may have 50K probes but only have 20k genes, which may results quite different results.

Anyone can answer these questions?


array • 4.2k views
ADD COMMENTlink modified 7.6 years ago by Neilfws49k • written 7.6 years ago by J.F.Jiang850

I always use the expression matrix directly. The difference between the two methods can be ignored, array data is not so accurate. I don't know the choice between probe and genes.

ADD REPLYlink written 7.6 years ago by jlshi.nudt210

Maybe I am so quite agree with you, I do think for gene expression analysis, array seems more accurate than RNASeq, using VST or RPKM value. The great advantage of RNAseq I think is the great ablity to hold all genes and special for those low transcribed genes.

If I am misunderstanding, plz correct me.

ADD REPLYlink written 7.6 years ago by J.F.Jiang850
gravatar for Neilfws
7.6 years ago by
Sydney, Australia
Neilfws49k wrote:
  1. Where raw data (CEL files) are available, you should use them. Simply for the reason that you can never fully trust data that has been processed by someone else, unless what they did is absolutely explicit.

  2. You can expect "minimal" differences in RMA values between different implementations. If raw data are not available on which to perform normalization yourself and you are comfortable with the available processed data matrix, by all means use it. By "comfortable" I mean you understand what kind of values it contains, how they were derived and that they "look sensible" (for example, are not in the hundreds or thousands if log2 transformation was supposedly transformed).

  3. Neither probeset-level nor gene-level data are "better" for DE analysis: it all depends what you are trying to achieve. Using multiple probesets per gene can be informative if you are interested in splice variants or in evaluating how good probesets are as measures of expression; some may be more "responsive" than others.

In general, the most differentially-expressed genes in a gene-level analysis will also have the most differentially-expressed probesets. Simply because gene-level values are a rather crude summary, most often obtained by taking the median of (core) probesets for a gene.

ADD COMMENTlink written 7.6 years ago by Neilfws49k

Great comments, actually I later tried rma and justRMA, there is not difference between them. For comment of splice variants, it indeed important to go through the probe-level analysis, however, for this kind of p adjustment, it may introduce the bias, how to adjust the p-value is still unclear, FDR control to get the Q-value?

And for CEL file analysis, we are always recommend to use RMA to decrease the bias among arrays, here is another question, MAS5 normalization could always scale all the arrays to the same level, e.g., 200 for affy, then we will do log2 to transform the matrix, so the two methods seems to be all appropriate for the DE analysis.

Finnally, when using RMA or MAS5, the probe level matrix will be normalized, after log2 transfromation, we will do quantile normalization for the probes, and gene mapping, so which one should be carried out first, normalization -> gene mapping OR gene mapping -> normalization?

ADD REPLYlink written 7.6 years ago by J.F.Jiang850
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1744 users visited in the last hour