Question: Human Exon array probeset to gene-level expression
gravatar for mforde84
4 months ago by
mforde841.1k wrote:


I'm working with some Affy HuEx arrays.

I'm able to do all of the necessary backgrounding, normalization, and summarization steps for the arrays, however I'm a little stuck on what's a reasonable way to collapse transcript level probesets to gene-level estimates. Should I do something as simple as aggregate by mean / median? Or is there a more robust way to estimate general abundance of a gene from these arrays?


microarray affy • 267 views
ADD COMMENTlink modified 4 months ago by Kevin Blighe11k • written 4 months ago by mforde841.1k

So anyone have any ideas?

ADD REPLYlink written 4 months ago by mforde841.1k

Hey Marty,

I've analysed data from this chip recently (2016). Are you using the limma and oligo packages in R in order to process these?

When you get to the rma() function, you can specify there how you want to summarise the expression values:

Summarised per gene:

rma(project, background=TRUE, normalize=TRUE, target="core")

Summarised per exon:

rma(project, background=TRUE, normalize=TRUE, target="probeset")

Does that help?


ADD REPLYlink written 4 months ago by Kevin Blighe11k

thanks kevin. what would you suggest for collapsing multimapped probesets? aggregate by mean / median, or something different?

ADD REPLYlink written 4 months ago by mforde841.1k

I may have missed exactly what you mean by 'multimapped probesets'? During RMA normalisation, a Tukey's 'median polish' is applied using information from all probesets. Are you just referring to summarising transcript isoforms into a single expression value for each gene? - in the past, I have always used the mean in this case and results have been as expected.

ADD REPLYlink written 4 months ago by Kevin Blighe11k

my mistake, but yes just summarizing transcripts to gene level. i checked both the results for core and probeset summarization options. while im using SCAN.UPC instead, they are both appear to be generating estimates for transcripts. So to convert to entrez id, we simply select using key types matching in the transcriptcluster.db, then aggregate by entrez id to get gene level estimates.

thanks for the help.

ADD REPLYlink written 4 months ago by mforde841.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1304 users visited in the last hour