Question

Converting A Long List of Ensembl IDs to Gene Symbols

0

Entering edit mode

2.2 years ago

bingwei990204 • 0

I have a long list (60000+-) of Ensembl IDs, and I want to convert them into Gene Symbols and write them into a txt file. It takes me a few hours to complete only 20000++ IDs. Can anyone please tell what is the problem? Below is a part of my code, the lines in the code is the list of Ensembl IDs, as shown in the picture

#Coding:

import mygene
mg = mygene.MyGeneInfo()
getgenedata={}
getgenesymbol={}
k=0
while k<100: 

    newfile=open('data collector3.txt','a')
    getgenedata[k]=mg.getgene(lines[k],fields='gene symbol') #output : {'_id': '7105', '_version': 2, 'symbol': 'TSPAN6'}

    if getgenedata[k] != None:  # this part is to remove the _id and _version as I do not need them 
        getgenesymbol[k]=getgenedata[k].get('symbol')
        newfile.write(str(getgenesymbol[k]))
        newfile.write('\n')
    else:
        pass
    k+=1


newfile.close()

huge Python Ensembl Gene Symbols • 1.6k views

ADD COMMENT • link updated 2.2 years ago by Shred ★ 1.4k • written 2.2 years ago by bingwei990204 • 0

0

Entering edit mode

use biomart to get a file containing the gene ID and their symbols, sort your file and the ensembl file on the ID and use join. https://linux.die.net/man/1/join

ADD REPLY • link 2.2 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

This has been asked so many times before, please use the search function and google for it, e.g. Translating gene names to entrez id's

ADD REPLY • link 2.2 years ago by ATpoint 81k

score 0 · Answer 1 · 2022-02-12

Wrote this script to download a tsv of ensembl gene_id - gene symbol for any organism supported by Ensembl. Run it via:

python3 query_ensembl.py --organism homo_sapiens

As stated by others, this question was asked so many times and has many many possible solutions. As you're working with Python, provided script may be an alternative solution to Rest API to learn also how to manage easly GTF file format.