Converting A Long List of Ensembl IDs to Gene Symbols
1
0
Entering edit mode
2.2 years ago

I have a long list (60000+-) of Ensembl IDs, and I want to convert them into Gene Symbols and write them into a txt file. It takes me a few hours to complete only 20000++ IDs. Can anyone please tell what is the problem? Below is a part of my code, the lines in the code is the list of Ensembl IDs, as shown in the pictureas

#Coding:

import mygene
mg = mygene.MyGeneInfo()
getgenedata={}
getgenesymbol={}
k=0
while k<100: 

    newfile=open('data collector3.txt','a')
    getgenedata[k]=mg.getgene(lines[k],fields='gene symbol') #output : {'_id': '7105', '_version': 2, 'symbol': 'TSPAN6'}

    if getgenedata[k] != None:  # this part is to remove the _id and _version as I do not need them 
        getgenesymbol[k]=getgenedata[k].get('symbol')
        newfile.write(str(getgenesymbol[k]))
        newfile.write('\n')
    else:
        pass
    k+=1


newfile.close()
huge Python Ensembl Gene Symbols • 1.6k views
ADD COMMENT
0
Entering edit mode

use biomart to get a file containing the gene ID and their symbols, sort your file and the ensembl file on the ID and use join. https://linux.die.net/man/1/join

ADD REPLY
0
Entering edit mode

This has been asked so many times before, please use the search function and google for it, e.g. Translating gene names to entrez id's

ADD REPLY
0
Entering edit mode
2.2 years ago
Shred ★ 1.4k

Wrote this script to download a tsv of ensembl gene_id - gene symbol for any organism supported by Ensembl. Run it via:

python3 query_ensembl.py --organism homo_sapiens

As stated by others, this question was asked so many times and has many many possible solutions. As you're working with Python, provided script may be an alternative solution to Rest API to learn also how to manage easly GTF file format.

ADD COMMENT

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6