Why some probes have "NA" for gene symbol and Entrez ID?
0
5
Entering edit mode
5.7 years ago
Raheleh ▴ 230

Hi, I converted the affy probes to official gene symbol by using library (annotate). However, many of them have "NA" in stead of gene symbols. Could anyone tell me why and how I have to deal with them? Any help would be appreciated!

affymetrix microarray NA genesymbol • 5.3k views
1
Entering edit mode

I don't know what annotate does but if it does mapping using some sort of ID conversion process this could explain why you're missing genes it if it uses outdated IDs. For example the documentation for annotate mentions LocusLink which has been retired for over 10 years. I would pick a reference genome and map the probes to it.

2
Entering edit mode

Some probesets were probably designed based on ESTs and may not map to a known gene. Have a look at the Affymetrix website, get the latest Affymetrix annotation files for the microarray you are dealing with, and check whether the probeset maps to a known gene or not.

0
Entering edit mode

How to get the latest affymetrix annotation for hgu133plus2? Many thanks for your help.

1
Entering edit mode

The Affymetrix website is at http://www.affymetrix.com.

Their online database for microarray-related annotations and sequences is the NetAffx Analysis Centre. You will have to login with a user name and password, but then as well as querying data for individual probesets, you will be able to download a text file with the latest annotations (na.36) for the hgu133plus2 array, as well as files with all the probe sequences.

You may actually be able to download the na.36 annotation files without having to register. http://www.affymetrix.com/support/support_result.affx?entity=hg-u133-plus&keyword=&filters=.

1
Entering edit mode

Thank you. I am using hgu133plus2.db and annotate package to convert affymetrix probeset IDs to gene symbols.

library(hgu133plus2.db)
library(annotate)
gene.symbols <- getSYMBOL(rownames(probeset.list), "hgu133plus2")
results <- cbind(probeset.list, gene.symbols)
write.table(results, "results.txt", sep="\t", quote=FALSE)


If I’m not mistaken there are about 11 probes which are representative a specific gene. So, why there isn’t any gene symbol for some probeset IDs?

3
Entering edit mode

According to the Affymetrix website, the hgu133plus2 probes were designed against a hodgepodge of sequences and their reference gene set was a UniGene version from 2001. So it is not surprising that probes don't match genes 15 years later as human genome annotations have evolved a bit in 15 years. If you want to understand what's going on, get the probe sequences and map them to an annotated genome reference of your choice. As another option, EnsEMBL has mapped probes and makes them available via BioMart.

0
Entering edit mode