How to handle several measurements of the same gene on one DNA microarray?
1
0
Entering edit mode
2.1 years ago
froessler • 0

Hi everybody,

I'm quite new in bioinformatics and struggle with the following problem:

I want to detect significantly different expressed genes in a microarray data set downloaded from GEO. The Affymetrix microarray they used (Human Genome U133 Plus 2.0 Array) measures some genes several times (with different target oligonucleotides). I'm now wondering how to handle this data in the final statistical analysis. Do I have to:

1) keep all measurements until after the statistical analysis (t-test + multiple comparison correction)? If so, how do I handle genes that are significantly different expressed only once (but they are measured several times)? And if all measurements show a significant difference, which fold change do I choose for further analysis?

OR

2) remove all measurements for one gene except one? How do I decide which one to keep and which ones to remove?

Thanks a lot in advance for your help on this matter. Also let me know if something needs further clarification.

Best, Fabienne

DNA microarray genes Affymetrix analysis • 465 views
1
Entering edit mode

Hi Fabienne, I deal with similar issue, of having multiple expression measurements for a single gene while dealing with microarray data from GEO. I usually deal with it by grouping the gene symbols and calculating the row means, keeping only the row with measurements which has the maximum row mean.

0
Entering edit mode

Hi, thanks a lot for your comment, I will consider it. :)

1
Entering edit mode
2.1 years ago

Hey, you can summarise and / or filter the data as you please - there are no norms for this. All that you must ensure is that you report the methodology in full. You have not provided any example genes for which this problem exists, so, I cannot elaborate too much further. It is therefore your job to ensure that you know why these duplicates exist - I can assist, but only if you provide examples.

With data from the GEO, it is better (in my opinion) to download the raw data CEL files so that you have full control over the data processing. In line with this, please take a look at my answer, here: C: Human Exon array probeset to gene-level expression

Kevin

0
Entering edit mode

Dear Kevin,

Thanks a lot for your answer. As I'm currently not only working with microarrays from Affymetrix, but also Agilent and Illumina, the answer why there are duplicates is always a bit a different one. I was just hoping, that there is an overall norm for all the different platforms (that's why I only mentioned Affymetrix in my original post), but you're entirely right, I have to think about what the duplicates mean and summarise/filter them accordingly.

Thanks again.

Best, Fabienne

0
Entering edit mode

I see. Yes, there are differences between the different vendors (Affymetrix, Illumina, and Agilent) and also within the same vendor, in terms of the design of the 'chip' / microarray. You just have to take it on a 'case by case' basis.

For Affymetrix microarrays, as I explain in the other thread ( C: Human Exon array probeset to gene-level expression ), summarisation can be controlled via the target parameter that is passed to rma() or gcrma(). For other arrays, summarisation can be achieved via limma::avereps - summarisation by mean across genes is common.