Dear fellow bioinformaticians, For the past 4 weeks I have been trying to understand and interpret the general protocol required to build a phylogenetic tree using single copy core genes. However, I have not arrived to a single technique to build a tree using single copy core genes or how to extract single copy core genes from the proteome sequences of 6 metazoan organisms that I have. I know how to build a gene tree using specific sequences from different species, however I am unsure about how to detect single copy core genes and extract the 100's of genes from the proteome sequences. Any advice will be deeply appreciated. Thank you.
If I understand your question and problem correctly, what might useful and relevant is something like BUSCO - http://busco.ezlab.org/?
Which in turn uses the orthologs listed at OrthoDB - https://www.orthodb.org/ At that link, you can see there are 330 BUSCOs i.e. orthologs that are "expected" in any complete metazoan genome, since these are conserved genes across that clade.
The nice thing about BUSCO is that it allows you to identify these expected orthologs from either the genome or the proteome (or even from the transcriptome).
-m MODE, --mode MODE Specify which BUSCO analysis mode to run. There are three valid modes: - geno or genome, for genome assemblies (DNA) - tran or transcriptome, for transcriptome assemblies (DNA) - prot or proteins, for annotated gene sets (protein)
Once you identify the expected ortholog in each proteome, go into the BUSCO output folder and either manually or better yet using some simple parsing scripts you can extract sequences orthologous to each BUSCO sequence (1 by 1 for each of the expected 330 orthologs in Metazoa).
Then you align each of these 330 sequence sets. You can then concatenate the 330 alignments, and then run something like online RaXML tool, at the CIPRES portal at PHYLO - http://www.phylo.org/sub_sections/portal/ to obtain the final phylogenetic tree with bootstrap support...
Hopefully this tool (BUSCO) with the accompanying OrthoDB set that is most suited for your species' of interest (Metazon?), should help to start answering your question about ortholog-based phylogeny. Recognize however, that there are several other ways to answer this question...