How to map probeset associated statistics to gene statistics in microarray differential expression analysis?
Entering edit mode
3.7 years ago
moxu ▴ 500

For microarray data, differential expression analysis is done for each probeset. The problem is that one gene is typically mapped to multiple probesets. Since for most if not all practical reasons, we are interested in differential expression at the gene but not probeset level, I am wondering what's the best way to map probeset analysis into gene analysis. For instance, there are multiple probesets for a gene and each probeset has a p-value, fold-change, etc. When we map the probelets into the corresponding gene, shall we take the probeset with the smallest p-value and use its statistics for the gene? Or median p-value? Mean? ...?

Thanks in advance!

gene R RNA-Seq • 1.4k views
Entering edit mode
3.7 years ago

NB - added July 31, 2020: see also C: Human Exon array probeset to gene-level expression


For microarray analysis, during RMA normalisation, there is one key function parameter that relates to your question: target

summarise probe-level expression to genes (or exons)

rma(MyCELfiles, background=TRUE, normalize=TRUE, target="core")

Functionality of this depends on the array type. If you have a 'Gene' array, then expression is summarised to genes. If you have an 'Exon' array, then it will be summarised to Exons.

summarise at probe-set level

rma(MyCELfiles, background=TRUE, normalize=TRUE, target="probeset")


Two further options are available for 'Exon' arrays:

  • target = ’full'
  • target = ’extended’

If you still cannot obtain the correct level of summarisation with these, then just summarise by mean via the aggregate() function.


Entering edit mode

I used geo2r offered by GEO DB, and it generates p-value, logFC, etc.



Login before adding your answer.

Traffic: 2722 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6