Question: Phylogenetic tree from single copy core genes (metazoa proteomes)
0
gravatar for Elizabeth
15 months ago by
Elizabeth20
Elizabeth20 wrote:

Dear fellow bioinformaticians, For the past 4 weeks I have been trying to understand and interpret the general protocol required to build a phylogenetic tree using single copy core genes. However, I have not arrived to a single technique to build a tree using single copy core genes or how to extract single copy core genes from the proteome sequences of 6 metazoan organisms that I have. I know how to build a gene tree using specific sequences from different species, however I am unsure about how to detect single copy core genes and extract the 100's of genes from the proteome sequences. Any advice will be deeply appreciated. Thank you.

ADD COMMENTlink modified 15 months ago • written 15 months ago by Elizabeth20

Are there typing schemes already known for your organism? I.e. single copy core genes that people have already identified?

ADD REPLYlink written 15 months ago by Joe14k

You mean single copy orthologs? If yes, you can use orthofinder tool

https://github.com/davidemms/OrthoFinder

the tool uses proteins of species (proteome files) and find single copy proteins with their gene ids, and does alignment and produce phylogenetic tree.

you can easily get sequences of single copy genes from output of orthofinder.

ADD REPLYlink written 15 months ago by Mehmet490

Moved my response from comment to answer

ADD REPLYlink modified 15 months ago • written 15 months ago by Anand Rao250

Thank you everyone.

ADD REPLYlink written 15 months ago by Elizabeth20
1
gravatar for Anand Rao
15 months ago by
Anand Rao250
United States
Anand Rao250 wrote:

If I understand your question and problem correctly, what might useful and relevant is something like BUSCO - http://busco.ezlab.org/?

Which in turn uses the orthologs listed at OrthoDB - https://www.orthodb.org/ At that link, you can see there are 330 BUSCOs i.e. orthologs that are "expected" in any complete metazoan genome, since these are conserved genes across that clade.

The nice thing about BUSCO is that it allows you to identify these expected orthologs from either the genome or the proteome (or even from the transcriptome).

-m MODE, --mode MODE Specify which BUSCO analysis mode to run. There are three valid modes: - geno or genome, for genome assemblies (DNA) - tran or transcriptome, for transcriptome assemblies (DNA) - prot or proteins, for annotated gene sets (protein)

Once you identify the expected ortholog in each proteome, go into the BUSCO output folder and either manually or better yet using some simple parsing scripts you can extract sequences orthologous to each BUSCO sequence (1 by 1 for each of the expected 330 orthologs in Metazoa).

Then you align each of these 330 sequence sets. You can then concatenate the 330 alignments, and then run something like online RaXML tool, at the CIPRES portal at PHYLO - http://www.phylo.org/sub_sections/portal/ to obtain the final phylogenetic tree with bootstrap support...

Hopefully this tool (BUSCO) with the accompanying OrthoDB set that is most suited for your species' of interest (Metazon?), should help to start answering your question about ortholog-based phylogeny. Recognize however, that there are several other ways to answer this question...

ADD COMMENTlink written 15 months ago by Anand Rao250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour