Question: Phylogenetic Issue - a problem regarding which protein entries sampling from Genbank
0
gravatar for ilmagodellepcr
4.6 years ago by
European Union
ilmagodellepcr10 wrote:

Dear all,

I would like to have the opinion of the community about a problem I’m facing.
How to reconstruct phylogeny based on protein sequence of plant gene family. To this aim, one should retrieve all possible protein entries related to this family on Genbank.  

Unfortunately as you probably know many of the protein sequences in GenBank (at the NCBI) are result of conceptual translations. Therefore they are predicted or hypothetical.

My aim is to infer the correct phylogeny without false positive/negative results, as well as not incurring mis-alignments due to incorrect predictions.

Which workflow/strategy would you recommend to choose ?

Thank you so much,

Luca

sequence alignment gene • 1.6k views
ADD COMMENTlink modified 3.4 years ago by Pappu1.9k • written 4.6 years ago by ilmagodellepcr10
2
gravatar for cdsouthan
4.6 years ago by
cdsouthan1.8k
cdsouthan1.8k wrote:

You can choose plants for which the proteomes are complete (or at least close to it) in Swiss-Prot

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by cdsouthan1.8k

Since many plants do not yet have complete proteomes, this would be somewhat limiting. As such finding as wide a range of family members as possible including taxa without complete proteomes it a reasonable thing to do in the first instance.

As a first pass searching UniProtKB/Swiss-Prot using either:

And limiting the result based on the Taxonomy annotations will give a set of possible candidates. This set can then be filtered based on the protein existence annotation. This will give a set of proteins that you can be reasonably sure actually exist in vivo. From there generating a phylogeny should be relatively simple.

ADD REPLYlink written 4.6 years ago by hpmcwill1.1k
0
gravatar for ilmagodellepcr
4.6 years ago by
European Union
ilmagodellepcr10 wrote:

It's not that simple. Even if proteomes are not complete, a protein in particular can have been already identified or characterized.

ADD COMMENTlink written 4.6 years ago by ilmagodellepcr10
0
gravatar for Pappu
3.4 years ago by
Pappu1.9k
Pappu1.9k wrote:
Look at ensembl plants for orthologs.
ADD COMMENTlink written 3.4 years ago by Pappu1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1519 users visited in the last hour