Mapping Between Affymetrix Features To Genes
1
2
Entering edit mode
10.2 years ago
chengzhao41 ▴ 110

I'm working with gene expression data. The platform is [HuEx-1_0-st] Affymetrix Human Exon 1.0 ST Array [transcript (gene) version].

My questions are:

1) What do the feature represent? "2315554" "2315633" "2315674" "2315739" "2315894" "2315918" "2315951" "2316218" "2316245" "2316379"

2) How do I map them to the gene?

I tried using DAVID to convert the identifiers to gene symbols, but there are some that it is unable to do so.

I then tried using NetAffx but I do not know what to select for Query For option: Transcript Clusters, Exon Probe set, or probe set

Is there an automated way of getting the gene symbol for in R?

affymetrix gene-expression microarray • 13k views
ADD COMMENT
9
Entering edit mode
10.2 years ago
Neilfws 49k
  1. It's always difficult to "guess the identifier" without additional context, but what I think you have there are Affymetrix transcript cluster IDs.

  2. You should first download a probeset annotation file from Affymetrix (account required). In your case, I think this is the appropriate page. Scroll down to "Archived NetAffx Annotation Files".

I downloaded the zip file at the link HuEx-1_0-st-v2 Probeset Annotations, CSV Format, Release 32 (40 MB, 6/23/11) and unzipped it. Here's part of a grep for one of your IDs:

grep 2315633 HuEx-1_0-st-v2.na32.hg19.probeset.csv
"2315637","chr1","+","1167620","1167657","4","2315633","297","407","NM_080605 // B3GALT6 /// ENST00000379198 // B3GALT6","NM_080605 // chr1 // 100 // 1 // 1 // 0 /// ENST00000379198 // chr1 // 100 // 1 // 1 // 0","3","2","4","1","extended","0","0","0","0","0","1","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","main"
"2315638","chr1","+","1167689","1167804","4","2315633","297","408","NM_080605 // B3GALT6 /// ENST00000379198 // B3GALT6","NM_080605 // chr1 // 100 // 4 // 4 // 0 /// ENST00000379198 // chr1 // 100 // 4 // 4 // 0","1","2","0","2","core","0","0","1","2","0","2","0","0","1","0","0","0","0","0","0","0","0","0","0","0","0","0","main"
"2315639","chr1","+","1167873","1167951","4","2315633","297","409","NM_080605 // B3GALT6 /// ENST00000379198 // B3GALT6","NM_080605 // chr1 // 100 // 4 // 4 // 0 /// ENST00000379198 // chr1 // 100 // 4 // 4 // 0","1","2","0","2","core","0","0","1","2","1","4","0","0","1","0","1","1","0","1","0","0","0","0","0","0","0","0","main"

Column 1 is the probeset ID. Now, the problem is that few ID conversion systems use transcript cluster IDs, but many use probeset IDs. So you could use, for example, the R biomaRt package as follows:

library(biomaRt)
mart.hs <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
# get probeset IDs for transcript cluster 2315633
huex <- read.table("~/Downloads/HuEx-1_0-st-v2.na32.hg19.probeset.csv", sep = ",", stringsAsFactors = F, header = T)
probes <- subset(huex, transcript_cluster_id == "2315633")$probeset_id
# get gene symbols
genes <- getBM(attributes = c("affy_huex_1_0_st_v2", "hgnc_symbol"), filters = "affy_huex_1_0_st_v2", values = probes, mart = mart.hs)
genes
#  affy_huex_1_0_st_v2 hgnc_symbol
#1             2315638     B3GALT6
#2             2315642     B3GALT6
#3             2315639     B3GALT6
#4             2315643     B3GALT6
#5             2315644     B3GALT6
#6             2315640     B3GALT6
#7             2315637     B3GALT6
#8             2315645     B3GALT6
#9             2315641     B3GALT6

For more information, search this site for "biomart".

ADD COMMENT
0
Entering edit mode

Neilfws: If you have an established pipeline for "Exon 1.0 ST arrays" analysis (by Oligo or any other package) then can you please share this information? Or if you can point me out towards such a tutorial. I tried to follow userguide of oligo package but it is so confusing for me. Thanks.

ADD REPLY
0
Entering edit mode

Hi Neilfws, I am trying to map HuGene-2_0-st (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL16686) using above mentioned script but I get the following error.

Error in getBM(attributes = c("HuGene-2_0-st", "hgnc_symbol"), filters = "HuGene-2_0-st", : Invalid attribute(s): HuGene-2_0-st

I also tried using _v1 or _v2. But no success. How I can locate the actual name. if you have any suggestion? Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6