Question: Bulky collection of the orthologs and paralogs data from Ensembl
1
gravatar for wangdp123
2.8 years ago by
wangdp123210
Oxford
wangdp123210 wrote:

Hi there,

I am trying to retrieve all the orthologs and paralogs for all fungi species in ensembl fungi database, but I couldn't find a way to download them in bulk. I know Biomart provides the service for manual collection but it doesn't work for my purpose.

For example, I would like to download all ORTHOLOGUES from http://fungi.ensembl.org for Aspergillus clavatus genes (TIGR), is there an easy way to do this in a batch? Furthermore, I fancy getting all orthologs for all fungi species.

BTW, I could see it seems that many of the species have been lost in the ORTHOLOGUES page of Biomart service. For instance, Hyphopichia burtonii NRRL Y-1933, Laccaria amethystina LaAM-08-1.......In fact, there are 735 species listed in the http://fungi.ensembl.org/species.html but only 51 species could be found in the ortholog page? Where are the remaining species?

Many thanks,

Regards,

Tom

paralogs ensembl orthologs • 1.2k views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by wangdp123210
1
gravatar for Emily_Ensembl
2.8 years ago by
Emily_Ensembl20k
EMBL-EBI
Emily_Ensembl20k wrote:

The entire protein tree with all the homologues for all genomes included in the comparative genomics analysis for Ensembl Fungi can be downloaded here.

The reason you can't see all 735 genomes in BioMart is that not all genomes are in BioMart. For some genomes, we work with the communities involved to import annotation of different kinds, and take an active role in adding it into the database, and we include them in BioMart. For some genomes, we import the genome and the annotation directly from the INSDC and carry out no further analysis or annotation, and do not add them to BioMart. Essentially, BioMart will break if you try to put too much stuff in it, so this is the limit we choose to place.

There are also some genomes where there will be no protein trees or homologues at all. We only run the protein trees with one genome of each species, where there are multiple strains of one species.

ADD COMMENTlink written 2.8 years ago by Emily_Ensembl20k

Hi,

Thank you for the explanation.

Any clue to understand the tables in Compara.88.protein_default.nh.emf.gz and Compara.88.protein_default.nhx.emf.gz and parse them if I try to identify the one-to-one, one-to-many and many-to-many relations?

Cheers,

Tom

ADD REPLYlink written 2.8 years ago by wangdp123210

I'm afraid we don't provide any software for parsing these formats. Maybe have a google around.

ADD REPLYlink written 2.8 years ago by Emily_Ensembl20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1744 users visited in the last hour