Question: Oligo Exon array transcript cluster ID to gene symbols
gravatar for lingjianyang
4.6 years ago by
lingjianyang0 wrote:

Dear all,

I am using Oligo package to preprocess CEL files from Human Exon 1.0 st array. I have summarised expression data to the level of transcript cluster (rma(celfiles,target="core")) and end up with a total number of 22011 transcript clusters.  I would love to perform gene level analysis instead of trascript level, therefore I need to map the transcript clusters to the genes. 

From the following command: featureData(exonCore) <- getNetAffx(exonCore, "transcript") I have obtained the corresponding annotation file. However, when I looked into the annotation information from pData(featureData(exonCore))[,c("probesetid","geneassignment")], it looks like a few thousand transcript clusters do not have gene assignments at all. That may be a smaller of an issue but more importantly, a lot of transcript clusters are mapped to many gene symbols. The geneassignment column has many entries. 


When I take away the transcript clusters that are mapped to multiple gene symbols, I end up with around 12,000 or 14,000 transcript clusters that can uniquely map to genes. This number looks too few for me, as for example TCGA exon expression data contains about 18,000 genes. 

Do I use the annotation file correctly or I have already misdone something here? Is that a generally better strategy to summarise to the level of probe set and then represent the genes with their constituent probe sets somehow?







ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by lingjianyang0
gravatar for bulldogshepherd
4.6 years ago by
bulldogshepherd40 wrote:

I used to do analysis of Human Exon array years ago (not with Oligo), and it was a common practice to summarize probe set level values into gene level value. As far as I remember, the number of "core" unique genes should be around 18000 as TCGA data shows. Many genes have multiple gene symbols and it's not recommended to use gene symbol to define a unique gene. Try using gene id instead, which is also written in gene assignment column.

ADD COMMENTlink written 4.6 years ago by bulldogshepherd40

Thanks for your insight. Do you recommend summarising from probe level to probe set level (target="probeset") before further summarising probe set level values to gene level values (for example by taking the mean/median across all probe sets in a gene); or summarising from probe level to transcript level (target="core"), before mapping transcripts to the genes?

ADD REPLYlink modified 7 months ago by RamRS28k • written 4.6 years ago by lingjianyang0

I would recommend to summarize probe set level to gene level, as seen in a comment of a previous post.

Computing Expression From Affymetrix Exon Array Data

ADD REPLYlink written 4.6 years ago by bulldogshepherd40

Thanks, appreciate this.

ADD REPLYlink written 4.6 years ago by lingjianyang0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1693 users visited in the last hour