Question: Human Exon array probeset to gene-level expression
1
gravatar for mforde84
8 months ago by
mforde841.1k
mforde841.1k wrote:

Hi,

I'm working with some Affy HuEx arrays.

I'm able to do all of the necessary backgrounding, normalization, and summarization steps for the arrays, however I'm a little stuck on what's a reasonable way to collapse transcript level probesets to gene-level estimates. Should I do something as simple as aggregate by mean / median? Or is there a more robust way to estimate general abundance of a gene from these arrays?

Marty

microarray affy • 461 views
ADD COMMENTlink modified 8 months ago by Kevin Blighe19k • written 8 months ago by mforde841.1k

So anyone have any ideas?

ADD REPLYlink written 8 months ago by mforde841.1k
1

Hey Marty,

I've analysed data from this chip recently (2016). Are you using the limma and oligo packages in R in order to process these?

When you get to the rma() function, you can specify there how you want to summarise the expression values:

Summarised per gene:

rma(project, background=TRUE, normalize=TRUE, target="core")

Summarised per exon:

rma(project, background=TRUE, normalize=TRUE, target="probeset")

Does that help?

Kevin

ADD REPLYlink written 8 months ago by Kevin Blighe19k
1

This is of great help. If target isn't provided RMA by default normalizes, summarizes based on gene-level. ?rma doesn't provide any information either on this. Thanks.

ADD REPLYlink written 7 weeks ago by Bioinformatics_NewComer300

Yes this is one of fundamental and critical things that is surprisingly not mentioned in many of the tutorials! I cannot remember how I figured it out, but it was a long time ago.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Kevin Blighe19k
1

Yes, I ran analyses with target as "core" and without anything assuming without target RMA will do probeset. I looked at the row counts for both results and everything was same. wasted many hours to make sure I did everything OK before stumbling onto this thread. :)

ADD REPLYlink written 7 weeks ago by Bioinformatics_NewComer300

thanks kevin. what would you suggest for collapsing multimapped probesets? aggregate by mean / median, or something different?

ADD REPLYlink written 8 months ago by mforde841.1k

I may have missed exactly what you mean by 'multimapped probesets'? During RMA normalisation, a Tukey's 'median polish' is applied using information from all probesets. Are you just referring to summarising transcript isoforms into a single expression value for each gene? - in the past, I have always used the mean in this case and results have been as expected.

ADD REPLYlink written 8 months ago by Kevin Blighe19k

my mistake, but yes just summarizing transcripts to gene level. i checked both the results for core and probeset summarization options. while im using SCAN.UPC instead, they are both appear to be generating estimates for transcripts. So to convert to entrez id, we simply select using key types matching in the transcriptcluster.db, then aggregate by entrez id to get gene level estimates.

thanks for the help.

ADD REPLYlink written 8 months ago by mforde841.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 728 users visited in the last hour