Question: Phylogenetic Tree And Ensembl
10.0 years ago
European Union
Jamand110 wrote:

Hi all, I'm building a phylogenetic tree, based on protein sequences. I've selected sequences using PSI blast and I 've built an ML tree. Then I visited ensembl site, from entrez gene and I found orthologous an paralogous sequences that weren't listed in PSI blast result. Could you kindly suggest me the different results in ensembl? Do you suggest me to build a phylogenetic protein sequences tree based on ensembl orthologs and paralogs?

best regards

Do you mean "explain the different results" by the first "suggest"?

post the sequence ids to look at

So, this question is not really about phylogenetics and trees, it's about why you get different results from blasting against (which database did you blast against) and para- orthologous genes annotated in a database.

I think it's about blasting protein sequence finding their homologous and orthologus and generating their respective phylogenetic tree.

10.0 years ago
Cambridge, UK


The Ensembl pipeline starts with protein sequences (the longest protein for every gene in Ensembl), and calculates BLAST reciprocal hits. After that, M-coffee and TreeBest are used to make the tree, and to determine homology relationships. The full pipeline is here:

Without knowing what group of proteins you are starting with, or your PSI BLAST parameters, I'm guessing that initial step of BLAST+Smith Waterman is picking up more relationships. You're welcome to try our pipeline.

9.8 years ago
Dror280 wrote:

I would go with more ortholog control database like: orthoMCL or inparanoid, to blast against their databases and look for orthologs. This will give you a better over-all look for orthologs of your gene. Ensembl is nice, but I prefer relying on either ENTREZ refseq proteins or uniprotKB, to avoid duplication, and more controled blast and orthologs groups. plus, they have better interface.

8.7 years ago
Arelicorlaior50 wrote:

Most probably the differences between your gene tree and the Ensembl genetree are due to differences in the methods used to analyse such data.

The parameters used to include or exclude sequences into a gene family can make it small but tightly aligned or large but sparsely aligned.

Ensembl uses a combination of multiple aligners and TreeBest to generate the tree from the alignments:

