I would like to have the opinion of the community about a problem I'm facing. How to reconstruct phylogeny based on protein sequence of plant gene family. To this aim, one should retrieve all possible protein entries related to this family on Genbank.
Unfortunately as you probably know many of the protein sequences in GenBank (at the NCBI) are result of conceptual translations. Therefore they are predicted or hypothetical.
My aim is to infer the correct phylogeny without false positive/negative results, as well as not incurring mis-alignments due to incorrect predictions.
Which workflow/strategy would you recommend to choose ?
Thank you so much,