Question: Human Exon array probeset to gene-level expression
gravatar for mforde84
10 weeks ago by
mforde841000 wrote:


I'm working with some Affy HuEx arrays.

I'm able to do all of the necessary backgrounding, normalization, and summarization steps for the arrays, however I'm a little stuck on what's a reasonable way to collapse transcript level probesets to gene-level estimates. Should I do something as simple as aggregate by mean / median? Or is there a more robust way to estimate general abundance of a gene from these arrays?


microarray affy • 193 views
ADD COMMENTlink modified 9 weeks ago by Kevin Blighe6.3k • written 10 weeks ago by mforde841000

So anyone have any ideas?

ADD REPLYlink written 10 weeks ago by mforde841000

Hey Marty,

I've analysed data from this chip recently (2016). Are you using the limma and oligo packages in R in order to process these?

When you get to the rma() function, you can specify there how you want to summarise the expression values:

Summarised per gene:

rma(project, background=TRUE, normalize=TRUE, target="core")

Summarised per exon:

rma(project, background=TRUE, normalize=TRUE, target="probeset")

Does that help?


ADD REPLYlink written 9 weeks ago by Kevin Blighe6.3k

thanks kevin. what would you suggest for collapsing multimapped probesets? aggregate by mean / median, or something different?

ADD REPLYlink written 9 weeks ago by mforde841000

I may have missed exactly what you mean by 'multimapped probesets'? During RMA normalisation, a Tukey's 'median polish' is applied using information from all probesets. Are you just referring to summarising transcript isoforms into a single expression value for each gene? - in the past, I have always used the mean in this case and results have been as expected.

ADD REPLYlink written 9 weeks ago by Kevin Blighe6.3k

my mistake, but yes just summarizing transcripts to gene level. i checked both the results for core and probeset summarization options. while im using SCAN.UPC instead, they are both appear to be generating estimates for transcripts. So to convert to entrez id, we simply select using key types matching in the transcriptcluster.db, then aggregate by entrez id to get gene level estimates.

thanks for the help.

ADD REPLYlink written 9 weeks ago by mforde841000
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1494 users visited in the last hour