Question: Microarray analysis of CEL files with Log-transformation instead of GCRMA or RMA
0
gravatar for Bioinformatist Newbie
4.3 years ago by
Germany
Bioinformatist Newbie240 wrote:

Basic Problem: When I analyse microarray data (Treated vs. Control) with GEO2R I get some 1000 genes above lfc=1 but when I do that analysis by using either GCRMA or RMA and then limma I get only 3 genes above lfc=1.

I want to do differential gene expression analysis of multiple drug vs. treated cases. I was wondering if it is possible to read the cel files but instead of using RMA or GCRMA use the log-transformation as is done by GEO2R. I am encountering problem in making an expression-set by reading the cel files without using RMA and GCRMA.

If somebody has tried it this way then share your experience. Thanks.

Note: The dataset I have doesn't contains series matrix file, otherwise I could have use GEO2R approach simply. So only thing I have is cel files.

microarray geo2r limma R • 2.3k views
ADD COMMENTlink modified 4.3 years ago by andrew480 • written 4.3 years ago by Bioinformatist Newbie240

Do you have replicates (ie., more than one sample per group when doing differential expression)?

ADD REPLYlink written 4.2 years ago by Sean Davis25k

Yes, For every experiment I have at least 3 treatment and 3 control sample. I am analyzing Build 2 of Connectivity Map

ADD REPLYlink written 4.2 years ago by Bioinformatist Newbie240

When making comparisons with replicates available, I'd suggest focusing on FDR rather than (or at least in addition to) LFC.  If the FDRs are near 1, then your experiment may simply not have detectable differentially-expressed genes.  

ADD REPLYlink written 4.2 years ago by Sean Davis25k
0
gravatar for andrew
4.3 years ago by
andrew480
United States
andrew480 wrote:

RMA and GCRMA is used for normalization.  Supposedly, GEO2R performs RMA, but in my experience, it does not appear to perform any kind of QC - which is incredibly problematic as you have no idea is there is a bad file/sample.  Although GEO2R does allow one to inspect the distribution of each CEL file, it provide no objective tool to identify outliers.  Log transformation is usually only performed after QC and normalization has been performed.

The company I work for offers these capabilities for most major Affy platforms for Human, Mouse, and Rat.  The application is called iPathwayGuide, and will accept raw CEL files and then perform QC and Normalization (GCRMA) automatically, and will provide statistics on the acceptable CEL files and then perform DEGs analysis including prediction of miRNA activity, GO analysis, Pathway analysis, Disease analysis, and can perform meta analysis comparing various contrasts.

The best part is that its 100% free to use.  You only pay if you want you want to keep your results beyond 72 hours.

Give it a shot.

www.iPathwayGuide.com

Here a re a few screen shots of the QC/Normalization process.

 

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by andrew480

In my case it will be of no use because GCRMA is not giving me more than 3 genes which are above lfc=1 and this application you are sharing is by default using GCRMA for normalizarion.

ADD REPLYlink written 4.3 years ago by Bioinformatist Newbie240

GEO2R does not perform RMA.  It uses the values as provided by the submitter (modulo log2 transformation), I believe.

ADD REPLYlink written 4.2 years ago by Sean Davis25k

Yes you are right and that I already know. But my question is:

1- Why I get more than 1000 genes having lfc=1 while I use GEO2R analysis while only 3 gene having lfc=1 when I use RMA or GCRMA and limma?

2- Is it possible that I can read the cel files but instead of applying RMA or GCRMA I just apply log2 transformation and then limma to find DEG? I think in this case I will get the exactly same no.of genes with lfc=1 as if analysis done by GEO2R. 

Problem: I have tried the 2nd method, read cel files into an affy batch and take exprs(affybatch) but in that case I loose information about gene identifiers (row names) and total no.of rows in my expression set are ~550000 (while total no.of genes in GPL96 platform are 22283). In contrast to this when I read cel files and apply RMA or GCRMA I get the expression set with 22283 rows and also the gene identifiers. Problem is how can I solve this issue ?

Kindly read it thoroughly and reply comprehensively. Thanks

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Bioinformatist Newbie240
  1. GEO2R has to assume that all of the normalization done previously was appropriate.
  2. Sure, but the results will be meaningless.

Use affy with RMA or GCRMA normalization. The results from that will be more reliable than what GEO2R could reasonably produce.

ADD REPLYlink written 4.2 years ago by Devon Ryan92k

But if I use affy with RMA or GCRMA then I usually get less than 4 genes above lfc=1 for most of the experiements. What should I do in this case try some other normalizations like mas5  or decrease the lfc threshold? The same approach (affy + RMA/GCRMA) for some experiments yield ~1500 genes above lfc=1 but in the current study it is not giving me more than 3 genes above this threshold. It means that samples I am using are having very similar expression values?

ADD REPLYlink written 4.2 years ago by Bioinformatist Newbie240
1

You seem to be assuming that your data actually have detectable differentially expressed genes. That is simply not always the case, unfortunately. The number of differentially expressed genes is strongly affected by the experiment, not just the analysis approach.

You are free to try other normalization methods and use different cutoffs for LFC (you should probably be using FDR, instead, for comparing across methods), but you'll need to determine the effect that this has on your false positive rate.

ADD REPLYlink modified 7 days ago by RamRS24k • written 4.2 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1899 users visited in the last hour