Question: How to handle several measurements of the same gene on one DNA microarray?
gravatar for froessler
12 months ago by
froessler0 wrote:

Hi everybody,

I'm quite new in bioinformatics and struggle with the following problem:

I want to detect significantly different expressed genes in a microarray data set downloaded from GEO. The Affymetrix microarray they used (Human Genome U133 Plus 2.0 Array) measures some genes several times (with different target oligonucleotides). I'm now wondering how to handle this data in the final statistical analysis. Do I have to:

1) keep all measurements until after the statistical analysis (t-test + multiple comparison correction)? If so, how do I handle genes that are significantly different expressed only once (but they are measured several times)? And if all measurements show a significant difference, which fold change do I choose for further analysis?


2) remove all measurements for one gene except one? How do I decide which one to keep and which ones to remove?

Thanks a lot in advance for your help on this matter. Also let me know if something needs further clarification.

Best, Fabienne

ADD COMMENTlink modified 12 months ago by Kevin Blighe65k • written 12 months ago by froessler0

Hi Fabienne, I deal with similar issue, of having multiple expression measurements for a single gene while dealing with microarray data from GEO. I usually deal with it by grouping the gene symbols and calculating the row means, keeping only the row with measurements which has the maximum row mean.

ADD REPLYlink modified 12 months ago • written 12 months ago by patelk26130

Hi, thanks a lot for your comment, I will consider it. :)

ADD REPLYlink written 12 months ago by froessler0
gravatar for Kevin Blighe
12 months ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

Hey, you can summarise and / or filter the data as you please - there are no norms for this. All that you must ensure is that you report the methodology in full. You have not provided any example genes for which this problem exists, so, I cannot elaborate too much further. It is therefore your job to ensure that you know why these duplicates exist - I can assist, but only if you provide examples.

With data from the GEO, it is better (in my opinion) to download the raw data CEL files so that you have full control over the data processing. In line with this, please take a look at my answer, here: C: Human Exon array probeset to gene-level expression


ADD COMMENTlink written 12 months ago by Kevin Blighe65k

Dear Kevin,

Thanks a lot for your answer. As I'm currently not only working with microarrays from Affymetrix, but also Agilent and Illumina, the answer why there are duplicates is always a bit a different one. I was just hoping, that there is an overall norm for all the different platforms (that's why I only mentioned Affymetrix in my original post), but you're entirely right, I have to think about what the duplicates mean and summarise/filter them accordingly.

Thanks again.

Best, Fabienne

ADD REPLYlink modified 12 months ago • written 12 months ago by froessler0

I see. Yes, there are differences between the different vendors (Affymetrix, Illumina, and Agilent) and also within the same vendor, in terms of the design of the 'chip' / microarray. You just have to take it on a 'case by case' basis.

For Affymetrix microarrays, as I explain in the other thread ( C: Human Exon array probeset to gene-level expression ), summarisation can be controlled via the target parameter that is passed to rma() or gcrma(). For other arrays, summarisation can be achieved via limma::avereps - summarisation by mean across genes is common.

ADD REPLYlink written 12 months ago by Kevin Blighe65k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1175 users visited in the last hour