Question: Bulky collection of the orthologs and paralogs data from Ensembl
gravatar for wangdp123
3.4 years ago by
wangdp123250 wrote:

Hi there,

I am trying to retrieve all the orthologs and paralogs for all fungi species in ensembl fungi database, but I couldn't find a way to download them in bulk. I know Biomart provides the service for manual collection but it doesn't work for my purpose.

For example, I would like to download all ORTHOLOGUES from for Aspergillus clavatus genes (TIGR), is there an easy way to do this in a batch? Furthermore, I fancy getting all orthologs for all fungi species.

BTW, I could see it seems that many of the species have been lost in the ORTHOLOGUES page of Biomart service. For instance, Hyphopichia burtonii NRRL Y-1933, Laccaria amethystina LaAM-08-1.......In fact, there are 735 species listed in the but only 51 species could be found in the ortholog page? Where are the remaining species?

Many thanks,



paralogs ensembl orthologs • 1.4k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by wangdp123250
gravatar for Emily_Ensembl
3.4 years ago by
Emily_Ensembl21k wrote:

The entire protein tree with all the homologues for all genomes included in the comparative genomics analysis for Ensembl Fungi can be downloaded here.

The reason you can't see all 735 genomes in BioMart is that not all genomes are in BioMart. For some genomes, we work with the communities involved to import annotation of different kinds, and take an active role in adding it into the database, and we include them in BioMart. For some genomes, we import the genome and the annotation directly from the INSDC and carry out no further analysis or annotation, and do not add them to BioMart. Essentially, BioMart will break if you try to put too much stuff in it, so this is the limit we choose to place.

There are also some genomes where there will be no protein trees or homologues at all. We only run the protein trees with one genome of each species, where there are multiple strains of one species.

ADD COMMENTlink written 3.4 years ago by Emily_Ensembl21k


Thank you for the explanation.

Any clue to understand the tables in Compara.88.protein_default.nh.emf.gz and Compara.88.protein_default.nhx.emf.gz and parse them if I try to identify the one-to-one, one-to-many and many-to-many relations?



ADD REPLYlink written 3.4 years ago by wangdp123250

I'm afraid we don't provide any software for parsing these formats. Maybe have a google around.

ADD REPLYlink written 3.4 years ago by Emily_Ensembl21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 809 users visited in the last hour