Question: Get gene names from ensembl ID or gene region
0
gravatar for angrypigeon
11 days ago by
angrypigeon120
angrypigeon120 wrote:

I have some RNASeq data that has fpkm values labeled with genome positions like chr7 52823165 52830546 Ensembl IDs like ENSMUST00000143813 and gene symbols like 0610005C13Rik. Is there a good programmatic way in python to get gene names from any of this data so I can match up the fpkm values with the actual altered genes?

python rna-seq ensembl • 208 views
ADD COMMENTlink modified 10 days ago by Alex Reynolds25k • written 11 days ago by angrypigeon120

This is the mouse genome so you should be able to use the answer posted a couple of days back here : A: How to add gene symbol to RNA-Seq data using R

ADD REPLYlink written 11 days ago by genomax56k
4
gravatar for Alex Reynolds
10 days ago by
Alex Reynolds25k
Seattle, WA USA
Alex Reynolds25k wrote:

You could use Python mygene with the ensembl.transcript scope:

#!/usr/bin/env python

import sys
import mygene

mg = mygene.MyGeneInfo()

names = []
for line in sys.stdin:
    names.append(line.rstrip())

for name in names:
    result = mg.query(name, scopes="ensembl.transcript", fields=["symbol"], species="mouse", verbose=False)
    ensembl_name = name
    for hit in result["hits"]:
        if "symbol" in hit:
            sys.stdout.write("%s\t%s\n" % (ensembl_name, hit["symbol"]))

Given a text file like names.txt:

ENSMUST00000143813
ENSMUST00000099042
ENSMUST00000073363

You could run this script like so:

$ ./map_ensembl_transcripts_to_hgnc_symbols_mm10.py < names.txt
ENSMUST00000143813      0610009L18Rik
ENSMUST00000099042      Gm10717
ENSMUST00000073363      Amtn

Installation instructions: https://pypi.org/project/mygene/

Full listing of scopes/fields here: http://docs.mygene.info/en/latest/doc/query_service.html#available-fields

Edit: If you have a lot of genes/transcripts/etc. in names.txt, then you may instead want to use querymany(), which queries all genes/transcripts/etc. at once, instead of running one query() call per gene/transcript/etc.:

#!/usr/bin/env python

import sys
import mygene

mg = mygene.MyGeneInfo()

names = []
for line in sys.stdin:
    names.append(line.rstrip())

results = mg.querymany(names, scopes='ensembl.transcript', fields='symbol', species='mouse', verbose=False)
for res in results:
    if 'symbol' in res:
        sys.stdout.write("%s\t%s\n" % (res['query'], res['symbol']))
ADD COMMENTlink modified 10 days ago • written 10 days ago by Alex Reynolds25k
1
gravatar for Eric Lim
11 days ago by
Eric Lim980
Boston
Eric Lim980 wrote:

Their REST is probably the quickest, especially for single ID/gene lookup

https://rest.ensembl.org/documentation/info/lookup

https://rest.ensembl.org/documentation/

ADD COMMENTlink modified 11 days ago • written 11 days ago by Eric Lim980
1
gravatar for cilgaiscan
11 days ago by
cilgaiscan30
Turkey
cilgaiscan30 wrote:

Hey! I use biomaRt in R. Here is my code for it. Hope it will help :

library("biomaRt")  
differentialexpression <- read.csv("put your file's path here", sep = ",",header = T)
ensembl = useMart("ensembl", dataset="mmusculus_gene_ensembl")
values<- differentialexpression$columnnameensembl 
data <- getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id", values = values, mart= ensembl)
ADD COMMENTlink written 11 days ago by cilgaiscan30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 887 users visited in the last hour