Question: Get gene names from ensembl ID or gene region
0
gravatar for angrypigeon
8 weeks ago by
angrypigeon120
angrypigeon120 wrote:

I have some RNASeq data that has fpkm values labeled with genome positions like chr7 52823165 52830546 Ensembl IDs like ENSMUST00000143813 and gene symbols like 0610005C13Rik. Is there a good programmatic way in python to get gene names from any of this data so I can match up the fpkm values with the actual altered genes?

python rna-seq ensembl • 290 views
ADD COMMENTlink modified 8 weeks ago by Alex Reynolds26k • written 8 weeks ago by angrypigeon120

This is the mouse genome so you should be able to use the answer posted a couple of days back here : A: How to add gene symbol to RNA-Seq data using R

ADD REPLYlink written 8 weeks ago by genomax58k
4
gravatar for Alex Reynolds
8 weeks ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

You could use Python mygene with the ensembl.transcript scope:

#!/usr/bin/env python

import sys
import mygene

mg = mygene.MyGeneInfo()

names = []
for line in sys.stdin:
    names.append(line.rstrip())

for name in names:
    result = mg.query(name, scopes="ensembl.transcript", fields=["symbol"], species="mouse", verbose=False)
    ensembl_name = name
    for hit in result["hits"]:
        if "symbol" in hit:
            sys.stdout.write("%s\t%s\n" % (ensembl_name, hit["symbol"]))

Given a text file like names.txt:

ENSMUST00000143813
ENSMUST00000099042
ENSMUST00000073363

You could run this script like so:

$ ./map_ensembl_transcripts_to_hgnc_symbols_mm10.py < names.txt
ENSMUST00000143813      0610009L18Rik
ENSMUST00000099042      Gm10717
ENSMUST00000073363      Amtn

Installation instructions: https://pypi.org/project/mygene/

Full listing of scopes/fields here: http://docs.mygene.info/en/latest/doc/query_service.html#available-fields

Edit: If you have a lot of genes/transcripts/etc. in names.txt, then you may instead want to use querymany(), which queries all genes/transcripts/etc. at once, instead of running one query() call per gene/transcript/etc.:

#!/usr/bin/env python

import sys
import mygene

mg = mygene.MyGeneInfo()

names = []
for line in sys.stdin:
    names.append(line.rstrip())

results = mg.querymany(names, scopes='ensembl.transcript', fields='symbol', species='mouse', verbose=False)
for res in results:
    if 'symbol' in res:
        sys.stdout.write("%s\t%s\n" % (res['query'], res['symbol']))
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Alex Reynolds26k
1
gravatar for Eric Lim
8 weeks ago by
Eric Lim1.1k
Boston
Eric Lim1.1k wrote:

Their REST is probably the quickest, especially for single ID/gene lookup

https://rest.ensembl.org/documentation/info/lookup

https://rest.ensembl.org/documentation/

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Eric Lim1.1k
1
gravatar for cilgaiscan
8 weeks ago by
cilgaiscan30
Turkey
cilgaiscan30 wrote:

Hey! I use biomaRt in R. Here is my code for it. Hope it will help :

library("biomaRt")  
differentialexpression <- read.csv("put your file's path here", sep = ",",header = T)
ensembl = useMart("ensembl", dataset="mmusculus_gene_ensembl")
values<- differentialexpression$columnnameensembl 
data <- getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id", values = values, mart= ensembl)
ADD COMMENTlink written 8 weeks ago by cilgaiscan30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1825 users visited in the last hour