Question: Human Exon array probeset to gene-level expression
gravatar for mforde84
16 days ago by
mforde84900 wrote:


I'm working with some Affy HuEx arrays.

I'm able to do all of the necessary backgrounding, normalization, and summarization steps for the arrays, however I'm a little stuck on what's a reasonable way to collapse transcript level probesets to gene-level estimates. Should I do something as simple as aggregate by mean / median? Or is there a more robust way to estimate general abundance of a gene from these arrays?


microarray affy • 136 views
ADD COMMENTlink modified 15 days ago by Kevin Blighe1.2k • written 16 days ago by mforde84900

So anyone have any ideas?

ADD REPLYlink written 15 days ago by mforde84900

Hey Marty,

I've analysed data from this chip recently (2016). Are you using the limma and oligo packages in R in order to process these?

When you get to the rma() function, you can specify there how you want to summarise the expression values:

Summarised per gene:

rma(project, background=TRUE, normalize=TRUE, target="core")

Summarised per exon:

rma(project, background=TRUE, normalize=TRUE, target="probeset")

Does that help?


ADD REPLYlink written 15 days ago by Kevin Blighe1.2k

thanks kevin. what would you suggest for collapsing multimapped probesets? aggregate by mean / median, or something different?

ADD REPLYlink written 15 days ago by mforde84900

I may have missed exactly what you mean by 'multimapped probesets'? During RMA normalisation, a Tukey's 'median polish' is applied using information from all probesets. Are you just referring to summarising transcript isoforms into a single expression value for each gene? - in the past, I have always used the mean in this case and results have been as expected.

ADD REPLYlink written 15 days ago by Kevin Blighe1.2k

my mistake, but yes just summarizing transcripts to gene level. i checked both the results for core and probeset summarization options. while im using SCAN.UPC instead, they are both appear to be generating estimates for transcripts. So to convert to entrez id, we simply select using key types matching in the transcriptcluster.db, then aggregate by entrez id to get gene level estimates.

thanks for the help.

ADD REPLYlink written 15 days ago by mforde84900
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 651 users visited in the last hour