Question: Identify Expressed Genes From Combined Microarray Data Sets
gravatar for Jessica
7.0 years ago by
Jessica60 wrote:

Hi all,

I have to combine two datasets obtained using two different platforms (Illumina and Affymetrix). The combined dataset contains gene expression for 11 cell types. For my purpose, I do not need to find the differentially expressed genes of one cell type to the others, but I need to find the upregulated genes of each cell type. To do this, I ranked ~20000 genes for each sample, and selected genes that were ranked within the top 20% of the ~20000 genes for 80% of the replicates of each cell type (all the cell types have >=5 replicates). However, I am not sure how to estimate the statistical significance (e.g., FDR) for my selected genes. Any advice is appreciated. Also, does anybody know any methods that suit my purpose?

Thank you very much.


data microarray • 2.1k views
ADD COMMENTlink modified 4.5 years ago by Biostar ♦♦ 20 • written 7.0 years ago by Jessica60

up-regulated relative to what?

ADD REPLYlink written 7.0 years ago by brentp22k

Maybe you can use Combat within the SVA package on bioconductor ( It can help to merge data sets from different batches with different conditions and it also contain functions for p-value calculation. The problem is, you might find it difficult to map the probe ids to generate the required data structure

ADD REPLYlink written 4.5 years ago by Sam2.2k
gravatar for Sean Davis
7.0 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Gene expression measurements on a microarray are not absolute (that is, a gene that has a high expression value may or may not have more RNA in cell than another gene with a lower expression value), so ranking genes by their expression measures is not something that makes much sense. Also, I would not be surprised if the top-ranked genes by your described method are quite overlapping between cell types.

Without knowing your biologic question, it is hard to tell you what to do, but I'd suggest that what you look for cell-type-specific genes. For that, one can use typically hypothesis testing methods across samples; with multiple classes (cell types), this is often done using an F-statistic. The two-platform thing limits what can be done, but I think in the end across-sample, within-gene hypothesis testing is another more established way to go.

ADD COMMENTlink written 7.0 years ago by Sean Davis25k

Hi Sean, is it possible to do a Wilcoxon Signed-Rank test for the expressed and non-expressed genes and apply the correction for multiple tests across cellLines?

ADD REPLYlink written 7.0 years ago by Gjain5.2k

Hi Sean, I am trying to find the expressed ligand and receptor genes by each cell type out of a ligand/receptor database. The selected ligand and receptor genes do not have to be cell type specific. Given this aim, do you have any further comments? Would you mind to elaborate more about across-sample, within-gene hypothesis testing? What sort of statistical methods should I look into? Thank you.

ADD REPLYlink written 7.0 years ago by Jessica60

Hi Sean, I am wondering for gene expression measurement, why a gene with high expression value may not have more RNA?

ADD REPLYlink written 7.0 years ago by Jessica60

For a given probe or probeset, characteristics like binding affinity, cross-hybridization with non-target molecules, potentially mRNA secondary structure, and many other factors may affect the operating characteristics of the probe. Unfortunately, those effects are not identical for all different probes on the array, making comparisons between probes problematic. See this figure, for example:

ADD REPLYlink written 7.0 years ago by Sean Davis25k

Note that there are methods for determining which genes are expressed in a sample.

ADD REPLYlink written 7.0 years ago by Sean Davis25k

Thanks a lot Sean.

ADD REPLYlink written 7.0 years ago by Jessica60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1783 users visited in the last hour