How to map probeset associated statistics to gene statistics in microarray differential expression analysis?
1
0
Entering edit mode
3.7 years ago
moxu ▴ 500

For microarray data, differential expression analysis is done for each probeset. The problem is that one gene is typically mapped to multiple probesets. Since for most if not all practical reasons, we are interested in differential expression at the gene but not probeset level, I am wondering what's the best way to map probeset analysis into gene analysis. For instance, there are multiple probesets for a gene and each probeset has a p-value, fold-change, etc. When we map the probelets into the corresponding gene, shall we take the probeset with the smallest p-value and use its statistics for the gene? Or median p-value? Mean? ...?

gene R RNA-Seq • 1.4k views
4
Entering edit mode
3.7 years ago

## ----

For microarray analysis, during RMA normalisation, there is one key function parameter that relates to your question: target

# summarise probe-level expression to genes (or exons)

rma(MyCELfiles, background=TRUE, normalize=TRUE, target="core")


Functionality of this depends on the array type. If you have a 'Gene' array, then expression is summarised to genes. If you have an 'Exon' array, then it will be summarised to Exons.

# summarise at probe-set level

rma(MyCELfiles, background=TRUE, normalize=TRUE, target="probeset")


## ---------------------------------------

Two further options are available for 'Exon' arrays:

• target = ’full'
• target = ’extended’

If you still cannot obtain the correct level of summarisation with these, then just summarise by mean via the aggregate() function.

Kevin

1
Entering edit mode

I used geo2r offered by GEO DB, and it generates p-value, logFC, etc.

Thanks!