Microarray: How To Select One Of Multiple Probes Corresponding To A Gene
Entering edit mode
12.1 years ago
Nasir ▴ 270

Hi All

I know similar questions have been asked before but, having read the answers, I am still unclear of the best solution to the following problem:

We have done a custom one-colour Agilent oligonucleotide microarray (with essentially genome-wide coverage) on 24 disease and 24 control human brain samples. In some cases, there are multiple probes which correspond to the same gene. How do I calculate the fold change for a gene mapped to by multiple probes? Here are some of the options I have come across:

  1. use the probe with the highest normalized intensity averaged over all samples
  2. use the probe with the highest absolute value of differential expression
  3. use the probe with the highest signal variation
  4. use the probe with maximum inter quartile expression range value (this method is implemented in Agilent's GeneSpring for the Gene Set Enrichment Analysis function)
  5. for each gene, select a single RefSeq entry, primarily the one annotated by TaqMan assays. If multiple probes match the same RefSeq entry, only the probe closest to the 3′ end is used (this method is adopted in this MicroArray Quality Control project paper
  6. select the probe least likely to cross-hybridise, i.e., the probe with the least similarity to other areas of the genome based on a BLAT search using UCSC genome browse
  7. take the median fold change of all probes
  8. select the probe with the lowest p-value

Which option would you use & why? (Apologies about the long question!)

microarray agilent • 21k views
Entering edit mode
12.1 years ago

before you do any of that see if you can associate your probes to transcripts (ENST or otherwise) instead of genes. You might find some of these changes are limited to one isoform, which you would mask with the averaging.

Entering edit mode
12.1 years ago
Davy ▴ 410

This option is not on your list, but I would check to see if the probes impart information about multiple transcripts from the same gene. I don't know of a way to do this programmatically, but if you have a list of top hit probes, you could do it for those, then revisit even non-significant probes from within the same gene. If there are any. If you are calculating the fold change, then you want to show the biggest difference because this is likely a list of top hits or something, so I would go for option 2 or 8.

Entering edit mode

Dear Davy,

I understand that it has been a long time since your suggestion, but in my opinion, option 8 might be considered as cherry picking from the data, since you are only interested in ones with the lowest p-value, and that might not implicate the biological scenario, especially when we look into the case of drug treatments for a particular condition. Any thoughts?

Entering edit mode
12.1 years ago

From my own experience, and as Davy and Jeremy stated, if multiple probes targetting the same gene show different expression levels, this might be an indicator of alternative transcription and should be investigated. These might be many more than you first thought.

For probes targetting the same transcript and showing similar expression levels, I would take the median (or mean) fold-change, no need to get too fancy here.

But do check for alternative transcripts first.

Entering edit mode
4.9 years ago
asalimih ▴ 60

Although this question is old but i found this answer from one of reddit bioinformatics forum very useful. link


Login before adding your answer.

Traffic: 3174 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6