Question: How to handle several measurements of the same gene on one DNA microarray?
0
gravatar for froessler
5 weeks ago by
froessler0
froessler0 wrote:

Hi everybody,

I'm quite new in bioinformatics and struggle with the following problem:

I want to detect significantly different expressed genes in a microarray data set downloaded from GEO. The Affymetrix microarray they used (Human Genome U133 Plus 2.0 Array) measures some genes several times (with different target oligonucleotides). I'm now wondering how to handle this data in the final statistical analysis. Do I have to:

1) keep all measurements until after the statistical analysis (t-test + multiple comparison correction)? If so, how do I handle genes that are significantly different expressed only once (but they are measured several times)? And if all measurements show a significant difference, which fold change do I choose for further analysis?

OR

2) remove all measurements for one gene except one? How do I decide which one to keep and which ones to remove?

Thanks a lot in advance for your help on this matter. Also let me know if something needs further clarification.

Best, Fabienne

ADD COMMENTlink modified 5 weeks ago by Kevin Blighe50k • written 5 weeks ago by froessler0
1

Hi Fabienne, I deal with similar issue, of having multiple expression measurements for a single gene while dealing with microarray data from GEO. I usually deal with it by grouping the gene symbols and calculating the row means, keeping only the row with measurements which has the maximum row mean.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by patelk26100

Hi, thanks a lot for your comment, I will consider it. :)

ADD REPLYlink written 4 weeks ago by froessler0
1
gravatar for Kevin Blighe
5 weeks ago by
Kevin Blighe50k
Kevin Blighe50k wrote:

Hey, you can summarise and / or filter the data as you please - there are no norms for this. All that you must ensure is that you report the methodology in full. You have not provided any example genes for which this problem exists, so, I cannot elaborate too much further. It is therefore your job to ensure that you know why these duplicates exist - I can assist, but only if you provide examples.

With data from the GEO, it is better (in my opinion) to download the raw data CEL files so that you have full control over the data processing. In line with this, please take a look at my answer, here: C: Human Exon array probeset to gene-level expression

Kevin

ADD COMMENTlink written 5 weeks ago by Kevin Blighe50k

Dear Kevin,

Thanks a lot for your answer. As I'm currently not only working with microarrays from Affymetrix, but also Agilent and Illumina, the answer why there are duplicates is always a bit a different one. I was just hoping, that there is an overall norm for all the different platforms (that's why I only mentioned Affymetrix in my original post), but you're entirely right, I have to think about what the duplicates mean and summarise/filter them accordingly.

Thanks again.

Best, Fabienne

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by froessler0

I see. Yes, there are differences between the different vendors (Affymetrix, Illumina, and Agilent) and also within the same vendor, in terms of the design of the 'chip' / microarray. You just have to take it on a 'case by case' basis.

For Affymetrix microarrays, as I explain in the other thread ( C: Human Exon array probeset to gene-level expression ), summarisation can be controlled via the target parameter that is passed to rma() or gcrma(). For other arrays, summarisation can be achieved via limma::avereps - summarisation by mean across genes is common.

ADD REPLYlink written 4 weeks ago by Kevin Blighe50k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1987 users visited in the last hour