Does microarray cover all genes?
2
0
Entering edit mode
5.7 years ago
wenbinm ▴ 40

Hi there,

I downloaded some microarray data from GEO on Affymetrix Human Genome U133A Array and tried to map probes to genes using packages 'annotate' and 'hgu133a.db. But I found a lot of genes on my list don't have probes.

Try following codes:

library(annotate)
library(hgu133a.db)
x <- hgu133aENSEMBL
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
xx['AJUBA']

And it returns 'NA' for gene 'AJUBA' (the famous hippo pathway gene). Does that mean Affymetrix Human Genome U133A Array doesn't have any probes for 'AJUBA'?

Thank you!

R microarray genome • 1.4k views
ADD COMMENT
1
Entering edit mode
5.7 years ago
ejm32 ▴ 450

It does appear that AJUBA is not represented on the Affymetrix Human Genome U133A Array. Furthermore, the code example you provide is incorrect. if you inspect xx you'll notice the names of the list elements are the probe IDs, which will always return NA/NULL. Here is the code I used to determine if AJUBA was represented on the array. Lastly, it looks like AJUBA is on the U133 plus 2 array.

library(org.Hs.eg.db)
library(hgu133a.db)
library(hgu133plus2.db)

x <- hgu133aENSEMBL
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
# $`1053_at`
# [1] "ENSG00000049541"
# 
# $`117_at`
# [1] "ENSG00000173110"
# 
# $`121_at`
# [1] "ENSG00000125618"
# 
# $`1255_g_at`
# [1] "ENSG00000048545"
# 
# $`1316_at`
# [1] "ENSG00000126351"

#First get common identifiers for AJUBA
ajuba <- AnnotationDbi::select(org.Hs.eg.db, "AJUBA", columns=c("REFSEQ","ENTREZID","SYMBOL","ENSEMBL","ALIAS"), keytype = "SYMBOL")

head(ajuba)
# SYMBOL       REFSEQ ENTREZID         ENSEMBL ALIAS
# 1  AJUBA NM_001289097    84962 ENSG00000129474   JUB
# 2  AJUBA NM_001289097    84962 ENSG00000129474 AJUBA
# 3  AJUBA    NM_032876    84962 ENSG00000129474   JUB
# 4  AJUBA    NM_032876    84962 ENSG00000129474 AJUBA
# 5  AJUBA    NM_198086    84962 ENSG00000129474   JUB
# 6  AJUBA    NM_198086    84962 ENSG00000129474 AJUBA

#Loop over each set of identifiers and query the array annotation db.
#I had to warp the select in a tryCatch because the ENTREZID was throwing an error because the it is not in the array
rst <- lapply(colnames(ajuba), function(x){
  tryCatch(select(hgu133a.db, unique(ajuba[, x]), columns = c("PROBEID"), keytype = x), 
           error=function(e) return("error")) 
})
sapply(rst, NROW) #check oh many rows are returned from querying each column (not promising)
# [1] 0 0 1 0 0

#To make sure I didn't miss something, I also search the Affy Human U133 plus 2 array.
rst2 <- lapply(colnames(ajuba), function(x){
  tryCatch(select(hgu133plus2.db, unique(ajuba[, x]), columns = c("PROBEID"), keytype = x), 
           error=function(e) return("error")) ##I had to warp the select in a tryCatch because the ENTREZID was throwing an error 
})

sapply(rst2, NROW) #Looks like there are probes that map to AJUBA.
# [1]  4 24  4  4  8

# lapply(rst2, head)
# [[1]]
# SYMBOL      PROBEID
# 1  AJUBA 1553764_a_at
# 2  AJUBA    225806_at
# 3  AJUBA    225807_at
# 4  AJUBA    243446_at
# 
# [[2]]
# REFSEQ      PROBEID
# 1 NM_001289097 1553764_a_at
# 2 NM_001289097    225806_at
# 3 NM_001289097    225807_at
# 4 NM_001289097    243446_at
# 5    NM_032876 1553764_a_at
# 6    NM_032876    225806_at
# 
# [[3]]
# ENTREZID      PROBEID
# 1    84962 1553764_a_at
# 2    84962    225806_at
# 3    84962    225807_at
# 4    84962    243446_at
# 
# [[4]]
# ENSEMBL      PROBEID
# 1 ENSG00000129474 1553764_a_at
# 2 ENSG00000129474    225806_at
# 3 ENSG00000129474    225807_at
# 4 ENSG00000129474    243446_at
# 
# [[5]]
# ALIAS      PROBEID
# 1   JUB 1553764_a_at
# 2   JUB    225806_at
# 3   JUB    225807_at
# 4   JUB    243446_at
# 5 AJUBA 1553764_a_at
# 6 AJUBA    225806_at
ADD COMMENT
0
Entering edit mode

I found another U133 plus 2 array dataset that satisfies my need. Your answer really help!

ADD REPLY
0
Entering edit mode
5.7 years ago
GenoMax 141k

A commercial microarray like U133A is going to have a preset complement of probes for genes on the array. You can find the information about which genes are present on the array by looking at the Platform entry for that array at NCBI GEO database. This for example is the entry for U133A_2. If scroll down to the bottom of the page you will see the annotation table. Click on Download full table to get a copy of the entire table.

ADD COMMENT

Login before adding your answer.

Traffic: 1867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6