Question

A lot of differentially expressed genes

0

Entering edit mode

9.8 years ago

mahdijalili ▴ 20

Hi,

There are more than 17,000 genes (probes) differentially expressed (adjusted p-value < 0.05) among 48,000 genes (probes) in a microarray analysis (by limma R package, 22 case samples vs 8 control). Is it correct and acceptable?

Thanks

DEG Microarray • 3.3k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.8 years ago by mahdijalili ▴ 20

Ram · Answer 1 · 2015-09-08

0

Entering edit mode

9.8 years ago

svlachavas ▴ 790

Dear Mahdijalili,

What is your experimental design regarding your analysis? For instance the case samples are cancer samples? If so, you could expect a lot of DE between cancer and control samples. Or the case samples represent some drug treatment? Nevertheless, if you could give us more information about your procedure(specific microarray platform, if you have performed any non-specific filtering prior DE testing etc). Generally,

I believe that there is no general characterization as "correct" or acceptable, as it depends on the biology of your system and your samples/analysis. Also, limma is capable of taking into account the imbalance between the case and the control samples.
Finally, just to pinpoint that generally you should not consider DEG genes only with a adjusted p-value threshold and just use the cutoff to get a first pool of DEG candidates. You can use the functions treat & topTreat from limma to test differential expression against a minimum log-fold change cutoff. It will reduce the number of your deg genes, and also probably will represent a more biologically meaningful subset of genes for your analysis.

Best,
Efstathios

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.8 years ago by svlachavas ▴ 790

0

Entering edit mode

Thanks Efstathios,

Yes, cases are un-treated APL (M3 leukemia) vs normal BM samples. Thanks for your finally suggestion. Also, are there any similar functions for package like RankProd (Rank product analysis)?

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.8 years ago by mahdijalili ▴ 20

0

Entering edit mode

In this case, the big number of DEG genes(based on the adjusted p-value criterion only) can be explained by the generally large differences between your leukemia samples and normal ones. Regarding the RankProd analysis, I'm familiar with it but I would not recommended it, as it is usually for a few samples and in simple words just "ranks the log-fcs". Limma is far more powerful and with many more capabilities for handling any issue like small samples sizes or inbalanced studies.

In fact, you could use treat() after lmFit(), instead of eBayes step-and then use topTreat() like topTable to return the subset of the DEG candidates. Check ?treat() from the limma package. For instance you can use an lfc=0.5, to give you genes with at least higher lfc-in this case bigger than 1.5 fold change.

Finally, I would also consider non-specific intensity filtering to remove low expressed probes in most samples-which would be uninformative for any further analysis.

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.8 years ago by svlachavas ▴ 790