Access BioMart Homologs from Python?
Entering edit mode
25 days ago
ngarber ▴ 50

From within Python, I want to be able to query BioMart to return a list containing information about genes and their homologs:

  1. Source species - Stable Protein ID
  2. Source species - Gene name
  3. Source species - Protein sequence
  4. Target species - Stable Protein ID of homolog
  5. Target species - Gene name of homolog
  6. Target species - Protein sequence of homolog

For example, say I was to input :

dataset = "Ensemble Genes 107"
target_species_dataset = "Elephant genes (Loxafr3.0)"
homolog_query = "Human"

How do I feed that into BioMart so that it spits out the six parameters I listed earlier?

Thanks so much in advance if anyone can help!

sequences Python biomaRt homology Ensembl • 356 views
Entering edit mode
Entering edit mode

On top of Arup Ghosh's answer, you can also consider using the files available on the Ensembl FTP site:

Say we want all orthologous gene pairs between Human and Cow from the default Vertebrate ncRNA-trees. We could download the entire set of default Vertebrate ncRNA-trees homologies in one TSV file. For Ensembl 107 this would be located at:

This is a pretty massive file — 3.2 GB — but if we filter it to keep only the rows in which the 'homology_type' is an orthology (i.e. 'ortholog_one2one', 'ortholog_one2many' or 'ortholog_many2many'), while 'species' and 'homology_species' are 'homo_sapiens' and 'bos_taurus' (or vice versa), we will get a reasonably sized file of Human-Cow orthologues.

You can also use the language agnostic Ensembl REST API to retrieve orthologue data programmatically using the homology endpoints. E.g:

Entering edit mode
6 days ago

You can also pass gene symbols to gget search with your target species and then use gget info on the returned Ensembl IDs to get the other information:


Login before adding your answer.

Traffic: 2116 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6