Question: Mapping Affymetrix IDs to GeneSymbols; Why so many NAs?
0
gravatar for bi_Scholar
2.6 years ago by
bi_Scholar0
bi_Scholar0 wrote:

Hello, after performing a differential expression analysis on a set of .CEL files downloaded from GEO, I'm trying to map the Affymetrix Probe IDs to GeneSymbols using the 'annotate' package with 'hgu133plus2.db'. However, from ~150 significant genes, about 50 can't be mapped ("NA"), which I think is quite a lot. Even more concerning, the top 5 genes can't be mapped.

I also tried using Probe IDs instead of the GeneSymbols when performing GO-enrichment analysis with DAVID, but they don't seem to mapped at all, when choosing AFFYMETRIX_3PRIME_IVT_ID.

My questions are: Why is that? Shouldn't all IDs map? How am I supposed proceed from here on? I doesn't feel right to simply exclude all "NA" Genes from further analysis's.

Any help is greatly appreciated.

mapping affymetrix microarray • 1.2k views
ADD COMMENTlink modified 2.6 years ago by aln290 • written 2.6 years ago by bi_Scholar0
1

See this recent thread: Why some probes have "NA" for gene symbol and Entrez ID?

ADD REPLYlink written 2.6 years ago by genomax69k
4
gravatar for aln
2.6 years ago by
aln290
Ukraine
aln290 wrote:

To answer your question precisely I would need to see the snippet of your code, especially the annotation step. But in general old Affymetrix arrays (including the chip version you use now) have ambiguous design, where probes in the probeset can map to different genes or even non-transcribed regions (according to nowadays annotation), and where some genes can be represented by several probesets. So, I would recommend using custom CDF files from Brainarray project with EntrezG IDs - http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/entrezg.asp. As a result you will get ~19000 genes, while initially there are 54675 probesets in hgu133plus2 chip.

For the details on Brainarray custom CDF read following article - http://nar.oxfordjournals.org/content/33/20/e175.full

How to use CDFs - http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/cdfreadme.htm. Be aware, that you will need to use new annotation package, which you are gonna download and install from the same site.

ADD COMMENTlink written 2.6 years ago by aln290

Hello aln, thanks for your reply, it was really helpful.

The annotation is performed in a very basic manner:

results <- topTable(...)

symbols <- getSYMBOL(rownames(results), "hgu133plus2")

anno_results <- cbind(results, symbols)

I'll look into the above links and see if I can make some improvements. In general, is there a rule how to handle probesets which couldn't be annotated? Are they simply removed from the result-set? How does one handle a case where one gene is reported as differentially expressed multiple times? (you mentioned, that some genes are represented by multiple probesets) Should I keep the most significant one and remove the others or do I mean over all?

Again, many thanks for your help. Cheers!

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by bi_Scholar0
1

If you use Brainarray custom CDF you won't have multiple probesets per gene (read the previous links I posted to understand why). But if you want to apply different solution you should do it before DEGs analysis, so you won't have one gene reported as differentially expressed multiple times. First, I would eliminate all the probesets that map to different genes and all probesets with NA. Second, there are multiple other ways how to do deal with multiple probesets per one gene. Indeed, as you said you can mean over all, but it is not considered the best solution. Better solutions, read:

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-322

http://biorxiv.org/content/early/2016/06/18/059600

As for annotation, I usually use select function, at least I'm sure that it reports all the entries for all the probesetsIDs:

probesetsID_to_EntrezID<-select("hgu133plus2.db", probesetsID, "ENTREZID")

where probesetsID is the list of your platform probesets IDs.

So, in my case I do annotation step before DEGs analysis (no matter Brainarray CDF or regular one), so I can eliminate NA probesets and probesets mapped to different genes simultaneously, I don't want them in my DEG list.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by aln290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1449 users visited in the last hour