Question

Bulky collection of the orthologs and paralogs data from Ensembl

1

Entering edit mode

7.0 years ago

wangdp123 ▴ 340

Hi there,

I am trying to retrieve all the orthologs and paralogs for all fungi species in ensembl fungi database, but I couldn't find a way to download them in bulk. I know Biomart provides the service for manual collection but it doesn't work for my purpose.

For example, I would like to download all ORTHOLOGUES from http://fungi.ensembl.org for Aspergillus clavatus genes (TIGR), is there an easy way to do this in a batch? Furthermore, I fancy getting all orthologs for all fungi species.

BTW, I could see it seems that many of the species have been lost in the ORTHOLOGUES page of Biomart service. For instance, Hyphopichia burtonii NRRL Y-1933, Laccaria amethystina LaAM-08-1.......In fact, there are 735 species listed in the http://fungi.ensembl.org/species.html but only 51 species could be found in the ortholog page? Where are the remaining species?

Many thanks,

Regards,

Tom

orthologs paralogs ensembl • 2.3k views

ADD COMMENT • link 7.0 years ago by wangdp123 ▴ 340

score 1 · Answer 1 · 2017-05-18

1

Entering edit mode

7.0 years ago

Emily 23k

The entire protein tree with all the homologues for all genomes included in the comparative genomics analysis for Ensembl Fungi can be downloaded here.

The reason you can't see all 735 genomes in BioMart is that not all genomes are in BioMart. For some genomes, we work with the communities involved to import annotation of different kinds, and take an active role in adding it into the database, and we include them in BioMart. For some genomes, we import the genome and the annotation directly from the INSDC and carry out no further analysis or annotation, and do not add them to BioMart. Essentially, BioMart will break if you try to put too much stuff in it, so this is the limit we choose to place.

There are also some genomes where there will be no protein trees or homologues at all. We only run the protein trees with one genome of each species, where there are multiple strains of one species.

ADD COMMENT • link 7.0 years ago by Emily 23k

0

Entering edit mode

Hi,

Thank you for the explanation.

Any clue to understand the tables in Compara.88.protein_default.nh.emf.gz and Compara.88.protein_default.nhx.emf.gz and parse them if I try to identify the one-to-one, one-to-many and many-to-many relations?

Cheers,

Tom

ADD REPLY • link 7.0 years ago by wangdp123 ▴ 340

0

Entering edit mode

I'm afraid we don't provide any software for parsing these formats. Maybe have a google around.

ADD REPLY • link 7.0 years ago by Emily 23k