Apologies if this is a really naive question, but I cannot figure out how to do this easily. Here is a related post regarding the best method to find orthologous genes of a species.
Let's say I have a protein alignment downloaded from Ensembl (coming, for instance, from this Ensembl tree).
This gene is present in some other "NCBI species" that I would like to include in my tree (for instance, Stegastes partitus, with available genome and present in the NCBI database but NOT in the Ensembl database). Indeed, if I manually blastp asip protein sequence of D. rerio (extracted from my Ensembl multifasta protein alignment) onto nr database parsed for S. partitus, I find this sequence, corresponding to the first blast hit. Perfect! And I can manually append it to my initial protein tree.
Where the problem starts is that I don't have one gene and one NCBI species but many of them (let's say
p genes and
n NCBI species). I already have an Ensembl protein multifasta file for each of my
My question is: is there an easy way to append to each of my
p multifasta files the corresponding homologous protein sequence(s) of the
n "NCBI species"?
Thanks for any insight!