Question: How to construct a tree with MEGA when some of the sequences are very short
1
gravatar for vharshavardhanan
4.8 years ago by
Belgium
vharshavardhanan30 wrote:

Hi, Im a begginer in MEGA software. I have 89 protein sequence for which I need to construct a phylogenetic tree using bootstrap method with 1000 replication with data set parameter with complete deletion. But I am not able to construct a tree because of 3 sequence whose protein length is very less when compared to other 86 sequence. Even I tried by deleting non conserved regions in all protein sequence but still I am not able to get a tree because the size of the smaller proteins become smaller and smaller. Kindly help me out in solving this problem.

mega protein phylogenetic tree • 5.1k views
ADD COMMENTlink modified 4.8 years ago by Brice Sarver2.8k • written 4.8 years ago by vharshavardhanan30
2

get rid of the short proteins ... if they are short they cannot be aligned and do not contribute information anyhow

ADD REPLYlink written 4.8 years ago by Istvan Albert ♦♦ 81k

The 3 short protein sequence are upregulated in abiotic stresses. Is it ok if i omit the sequence because they have role in abiotic stresses?

Moreover, I have selected these 3 proteins for my experiments and its ongoing with RT-PCR and Real Time PCR. So is there any possibilty to include these 3 sequence?

ADD REPLYlink written 4.8 years ago by vharshavardhanan30

when doing science you can easily end up with unsolvable situations - in that case you have to find something else to move forward 

ADD REPLYlink written 4.8 years ago by Istvan Albert ♦♦ 81k
3
gravatar for Brice Sarver
4.8 years ago by
Brice Sarver2.8k
United States
Brice Sarver2.8k wrote:

 

Regardless of the approach or program you are using, the input for any phylogenetic estimation approach is an alignment, i.e., an inference of homology. Therefore, by necessity, your sequences must have a shared ancestry to even begin to infer a phylogeny. If the sequences are shorter but homologous, a multiple sequence alignment (of nucleotides or amino acids or both via a translation alignment for protein-coding sequences) ought to resolve the sequences by introducing gaps - insertions or deletions. It sounds like you're not doing this; when you say 

"The 3 short protein sequence are upregulated in abiotic stresses. Is it ok if i omit the sequence because they have role in abiotic stresses?"

it suggests that your dataset may consist of multiple proteins, not the same protein across samples, which is a completely inappropriate input for phylogenetic techniques.

In other words, your workflow would be:

  1. Construct a dataset of the same locus across all samples
  2. Align the amino acids or nucleotides
  3. Model selection for ML analysis or NJ distance corrections/uncorrected NJ/UPGMA/etc.
  4. [If you decide to use a model: With an appropriate model, any likelihood (maximum likelihood or Bayesian) approach.]
  5. Bootstrapping etc. for support.

If you do have sequences with a shared history, I would follow Istvan Albert's recommendation and remove the short sequences if they are truly unalignable.

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Brice Sarver2.8k

Thank you Mr. Brice. I will follow ur guidelines and I will try to solve this issue. I will get back to you if i still face the same problem again and again. Once again thanks for ur idea.

ADD REPLYlink written 4.8 years ago by vharshavardhanan30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1118 users visited in the last hour