Why some probes have "NA" for gene symbol and Entrez ID?
0
5
Entering edit mode
5.7 years ago
Raheleh ▴ 230

Hi, I converted the affy probes to official gene symbol by using library (annotate). However, many of them have "NA" in stead of gene symbols. Could anyone tell me why and how I have to deal with them? Any help would be appreciated!

affymetrix microarray NA genesymbol • 5.3k views
ADD COMMENT
1
Entering edit mode

I don't know what annotate does but if it does mapping using some sort of ID conversion process this could explain why you're missing genes it if it uses outdated IDs. For example the documentation for annotate mentions LocusLink which has been retired for over 10 years. I would pick a reference genome and map the probes to it.

ADD REPLY
2
Entering edit mode

Some probesets were probably designed based on ESTs and may not map to a known gene. Have a look at the Affymetrix website, get the latest Affymetrix annotation files for the microarray you are dealing with, and check whether the probeset maps to a known gene or not.

ADD REPLY
0
Entering edit mode

How to get the latest affymetrix annotation for hgu133plus2? Many thanks for your help.

ADD REPLY
1
Entering edit mode

The Affymetrix website is at http://www.affymetrix.com.

Their online database for microarray-related annotations and sequences is the NetAffx Analysis Centre. You will have to login with a user name and password, but then as well as querying data for individual probesets, you will be able to download a text file with the latest annotations (na.36) for the hgu133plus2 array, as well as files with all the probe sequences.

You may actually be able to download the na.36 annotation files without having to register. http://www.affymetrix.com/support/support_result.affx?entity=hg-u133-plus&keyword=&filters=.

ADD REPLY
1
Entering edit mode

Thank you. I am using hgu133plus2.db and annotate package to convert affymetrix probeset IDs to gene symbols.

library(hgu133plus2.db)
library(annotate)
gene.symbols <- getSYMBOL(rownames(probeset.list), "hgu133plus2")
results <- cbind(probeset.list, gene.symbols)
head(results)
write.table(results, "results.txt", sep="\t", quote=FALSE)

If I’m not mistaken there are about 11 probes which are representative a specific gene. So, why there isn’t any gene symbol for some probeset IDs?

ADD REPLY
3
Entering edit mode

According to the Affymetrix website, the hgu133plus2 probes were designed against a hodgepodge of sequences and their reference gene set was a UniGene version from 2001. So it is not surprising that probes don't match genes 15 years later as human genome annotations have evolved a bit in 15 years. If you want to understand what's going on, get the probe sequences and map them to an annotated genome reference of your choice. As another option, EnsEMBL has mapped probes and makes them available via BioMart.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6