How to create transcript-gene association matrix using Refseq IDs ?
1
0
Entering edit mode
10.1 years ago
jack ▴ 980

Hi all,

I have transcript and gene IDs in Refseq format like this :

Gene IDs:

ZNF498   IL11RA    KIF2A    NCOA3 ....

Transcript IDs:

NM_152486 NM_015658 NM_198317     NM_032129

I want create matrix which associate transcripts to it's gene. I looked at the Refseq database, but I couldn't find file which contain Gene and it's transcripts in Refseq IDs format. I don't want to convert my ids to other format, because I lose some of them in conversion.

Would someone help me how can I do this?

gene next-gen RNA-Seq R • 2.4k views
ADD COMMENT
1
Entering edit mode
10.1 years ago
komal.rathi ★ 4.1k

You could use biomaRt to get Refseq Transcript ID & Gene Symbol table:

library(biomaRt)
ensembl = useMart("ensembl", dataset="hsapiens_gene_ensembl")
results = getBM(attributes = c('refseq_mrna','hgnc_symbol'), mart = ensembl)

or if you have a list of Refseq Transcript IDs, say refseq_transcript_ID, then you can use:

results = getBM(attributes = c('refseq_mrna','hgnc_symbol'), filters = 'refseq_mrna', values = refseq_transcript_ID, mart = ensembl)

Alternatively, if you want a 'ready made' file with Transcript IDs and Gene Symbols, you can use gene2refseq.gz. The fields you are interested in are given under the names RNA_nucleotide_accession.version & Symbol.

And yes, Refseq Transcript ID to Gene Symbol is a many to one relationship.

ADD COMMENT

Login before adding your answer.

Traffic: 1374 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6