Tool:Python script to query GeneCards to get EntrezID, symbol from Ensembl geneID
0
1
Entering edit mode
2.8 years ago
Shred ★ 1.4k

Hi guys, GeneCards is one of the most comprehensive repository for gene info. While doing RNA-seq analysis, the common task to convert Ensembl geneID to Entrez ID with tools like BiomaRt may leave lots of genes without a corresponding Entrez number.

Although much of the losses are from pseudogenes or very poor characterized genes, someone may be interested to evaluate this loss, expecially if some of the lost genes populate the DEG list.

I've written this quick script in Python to query GeneCards to retain info about these uncharacterized genes. Please keep in mind that this must not be intended as a replacement for BiomaRt or other tools: GeneCards implements a protection against automatic queries and so would be impossible to assign correspondant ID to every gene in your analysis. I've made some tests with list of 150-200 genes and it seems to be ok. If you keep displaying the same error, consider to run a smaller gene list.

It can be run in a gene list mode, --list, where a file composed by 1 Ensembl geneID per line will be submitted, or in a single gene mode, --gene, passing a single ensembl geneID.

I'm working to bypass the website protection while minimizing the impact on users. Any suggestion are welcome.

Code -->

Requires BeautifulSoup

biomart annotation Ensembl • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 3846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6