Question

Annotation of exon array on probeset id and transcriptclusterids using biomaRT

0

Entering edit mode

10.7 years ago

cqtnljy • 0

Hello everybody!

It is my first time working with the Affy Human Exon St. 1.0. I use Affymetrix Power Tools (APT) and R to do it.Thus, I get two ExpressionSet on exon-level and gene-level,but I have a little questions about the annotation method by biomaRT.

I can annotate the the probesetid in exon-level ExpressionSet by biomaRT

eg:

ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
getBM(attributes=c('affy_huex_1_0_st_v2', 'hgnc_symbol'),filters="affy_huex_1_0_st_v2",value="2315588",mart=ensembl) #2315588 is a probesetid

but I cannot annotate the transcriptclusterids produced in the gene-level ExpressionSet only by biomaRT.

eg:

getBM(attributes=c('affy_huex_1_0_st_v2', 'hgnc_symbol'),filters="affy_huex_1_0_st_v2",value="2316379 ",mart=ensembl) #2316379is a transcript_cluster_id

Is there no direct annotation of transcript_cluster_id by biomaRt or there is any error of my code?

Thank you very much in advance

annotation biomaRT array exon • 5.1k views

ADD COMMENT • link updated 4.3 years ago by Ram 45k • written 10.7 years ago by cqtnljy • 0

Ram · Answer 1 · 2014-10-29

3

Entering edit mode

10.7 years ago

komal.rathi ★ 4.1k

cqtnljy

Your code is correct. One of the attributes that you are using i.e. 'affy_huex_1_0_st_v2' contains nothing but the probeset ids for Affy Human Exon St. 1.0, which is exactly why you were able to retrieve data based on probeset ids. In fact, that is the only ID available for Affy Human Exon St. 1.0 in biomaRt. You cannot search based on transcriptcluster id because there is no attribute associated with it in biomaRt.

UPDATE

Alternative Method using Bioconductor AnnotationData Packages:

    source("http://bioconductor.org/biocLite.R")
    biocLite('huex10sttranscriptcluster.db')
    library(huex10sttranscriptcluster.db)

Annot <- data.frame(SYMBOL=sapply(contents(huex10sttranscriptclusterSYMBOL), paste, collapse=","),
                    DESC=sapply(contents(huex10sttranscriptclusterGENENAME), paste, collapse=","),
                    ENSEMBLID=sapply(contents(huex10sttranscriptclusterENSEMBL), paste, collapse=","))

# The rownames are transcript cluster IDs here, so you can access your ID like:

Annot[grep('2316379',rownames(Annot)),]

        SYMBOL               DESC       ENSEMBLID
2316379    SKI SKI proto-oncogene ENSG00000157933

ADD COMMENT • link updated 4.3 years ago by Ram 45k • written 10.7 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

komal.rathi ,

Thanks very much! Is there any simple methods to transfer the TranscriptCluster id to gene symbol? In the 'HuGene-1_0-st-v1.na33.2.hg19.transcript.csv' file, there are so many symbol map to one TranscriptCluster id in the gene_assignment column, I don't know how to deal with it.

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 10.7 years ago by cqtnljy • 0

0

Entering edit mode

In your question you are referring to HuGene 1.0 st v2 and here you are looking at v1. Which one are you working on exactly? v1 and v2 have different probe/transcript IDs.

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 10.7 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

Can you show some Transcript Cluster IDs that you are working on? That might tell us what annotation you are really working on.

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 10.7 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

Anyway, I have updated my answer. Please check.

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 10.7 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

I am working on v2,that file name is a mistake,i used your updated code to transfer the TranscriptCluster id to gene symbol successfully! Thank you so much!

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 10.7 years ago by cqtnljy • 0