Question

Mapping Between Affymetrix Features To Genes

2

Entering edit mode

10.1 years ago

chengzhao41 ▴ 110

I'm working with gene expression data. The platform is [HuEx-1_0-st] Affymetrix Human Exon 1.0 ST Array [transcript (gene) version].

My questions are:

1) What do the feature represent? "2315554" "2315633" "2315674" "2315739" "2315894" "2315918" "2315951" "2316218" "2316245" "2316379"

2) How do I map them to the gene?

I tried using DAVID to convert the identifiers to gene symbols, but there are some that it is unable to do so.

I then tried using NetAffx but I do not know what to select for Query For option: Transcript Clusters, Exon Probe set, or probe set

Is there an automated way of getting the gene symbol for in R?

affymetrix gene-expression microarray • 13k views

ADD COMMENT • link updated 10.1 years ago by Neilfws 49k • written 10.1 years ago by chengzhao41 ▴ 110

score 9 · Answer 1 · 2014-03-30

It's always difficult to "guess the identifier" without additional context, but what I think you have there are Affymetrix transcript cluster IDs.
You should first download a probeset annotation file from Affymetrix (account required). In your case, I think this is the appropriate page. Scroll down to "Archived NetAffx Annotation Files".

I downloaded the zip file at the link HuEx-1_0-st-v2 Probeset Annotations, CSV Format, Release 32 (40 MB, 6/23/11) and unzipped it. Here's part of a grep for one of your IDs:

grep 2315633 HuEx-1_0-st-v2.na32.hg19.probeset.csv
"2315637","chr1","+","1167620","1167657","4","2315633","297","407","NM_080605 // B3GALT6 /// ENST00000379198 // B3GALT6","NM_080605 // chr1 // 100 // 1 // 1 // 0 /// ENST00000379198 // chr1 // 100 // 1 // 1 // 0","3","2","4","1","extended","0","0","0","0","0","1","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","main"
"2315638","chr1","+","1167689","1167804","4","2315633","297","408","NM_080605 // B3GALT6 /// ENST00000379198 // B3GALT6","NM_080605 // chr1 // 100 // 4 // 4 // 0 /// ENST00000379198 // chr1 // 100 // 4 // 4 // 0","1","2","0","2","core","0","0","1","2","0","2","0","0","1","0","0","0","0","0","0","0","0","0","0","0","0","0","main"
"2315639","chr1","+","1167873","1167951","4","2315633","297","409","NM_080605 // B3GALT6 /// ENST00000379198 // B3GALT6","NM_080605 // chr1 // 100 // 4 // 4 // 0 /// ENST00000379198 // chr1 // 100 // 4 // 4 // 0","1","2","0","2","core","0","0","1","2","1","4","0","0","1","0","1","1","0","1","0","0","0","0","0","0","0","0","main"

Column 1 is the probeset ID. Now, the problem is that few ID conversion systems use transcript cluster IDs, but many use probeset IDs. So you could use, for example, the R biomaRt package as follows:

library(biomaRt)
mart.hs <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
# get probeset IDs for transcript cluster 2315633
huex <- read.table("~/Downloads/HuEx-1_0-st-v2.na32.hg19.probeset.csv", sep = ",", stringsAsFactors = F, header = T)
probes <- subset(huex, transcript_cluster_id == "2315633")$probeset_id
# get gene symbols
genes <- getBM(attributes = c("affy_huex_1_0_st_v2", "hgnc_symbol"), filters = "affy_huex_1_0_st_v2", values = probes, mart = mart.hs)
genes
#  affy_huex_1_0_st_v2 hgnc_symbol
#1             2315638     B3GALT6
#2             2315642     B3GALT6
#3             2315639     B3GALT6
#4             2315643     B3GALT6
#5             2315644     B3GALT6
#6             2315640     B3GALT6
#7             2315637     B3GALT6
#8             2315645     B3GALT6
#9             2315641     B3GALT6

For more information, search this site for "biomart".