I have some trouble when I try to annotate the probeset-level data on this particular chip: HuGene-1_0-st-v1.
The AffyIDs ranging: 7892501, 7892502, 7892503 ... 8180413, 8180415, 8180417, 8180418
Here are my unsuccessful attempts:
(1) using biomart with getBM function. With this approach 38% of the ~33200 probesets can be annotated
# replace the affyID with gene symbol
mart <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",host = "www.ensembl.org", path = "/biomart/martservice", dataset = "hsapiens_gene_ensembl")
hgnc <- getBM(attributes = c("affy_hugene_1_0_st_v1", "hgnc_symbol","ensembl_gene_id","entrezgene","chromosome_name","start_position","end_position","band"), filters = "affy_hugene_1_0_st_v1", values=tab$ID, mart = mart)
# Now match the array data probesets with the genes data frame
m <- match(as.numeric(tab$ID), hgnc$affy_hugene_1_0_st_v1)
# And append e.g. the HGNC symbol to the array data frame
tab$hgnc <- hgnc[m, "hgnc_symbol"]
(2) using the NetAffy Annotation file from the Affymetrix Support section [1]. When I compare the ProbeIDs from the first line of the file with the ~33200 ProbeIDs from the experiment, the overlap is only 13%. The AffyIDs are starting with the values 7896739, 7896741, 7896743 ....
(3) Using getSYMBOL(head(fit$genes$ID), "hugene10sttranscriptcluster.db")
using library(annotate)
and library(hugene10sttranscriptcluster.db)
32% can be annotated, but this annotation seems not to be consistent with (1)
(4) Using (3) but instead of hugene10sttranscriptcluster.db
the library hugene10stprobeset.db
. Only 0.4% can be annotated due to the fact that hugene10stprobeset.db is for exon annotation
My question: Is there a way to annotate 100% of the AffyIDs with a Gene Symbol? And where are the annotation information for this?
Thank you in advance for your efforts!