How to construct a tree with MEGA when some of the sequences are very short
1
1
Entering edit mode
9.5 years ago

Hi, Im a begginer in MEGA software. I have 89 protein sequence for which I need to construct a phylogenetic tree using bootstrap method with 1000 replication with data set parameter with complete deletion. But I am not able to construct a tree because of 3 sequence whose protein length is very less when compared to other 86 sequence. Even I tried by deleting non conserved regions in all protein sequence but still I am not able to get a tree because the size of the smaller proteins become smaller and smaller. Kindly help me out in solving this problem.

Mega Phylogenetic Tree Protein • 7.4k views
ADD COMMENT
2
Entering edit mode

get rid of the short proteins ... if they are short they cannot be aligned and do not contribute information anyhow

ADD REPLY
0
Entering edit mode

The 3 short protein sequence are upregulated in abiotic stresses. Is it ok if i omit the sequence because they have role in abiotic stresses?

Moreover, I have selected these 3 proteins for my experiments and its ongoing with RT-PCR and Real Time PCR. So is there any possibilty to include these 3 sequence?

ADD REPLY
0
Entering edit mode

when doing science you can easily end up with unsolvable situations - in that case you have to find something else to move forward

ADD REPLY
3
Entering edit mode
9.5 years ago
Brice Sarver ★ 3.8k

Regardless of the approach or program you are using, the input for any phylogenetic estimation approach is an alignment, i.e., an inference of homology. Therefore, by necessity, your sequences must have a shared ancestry to even begin to infer a phylogeny. If the sequences are shorter but homologous, a multiple sequence alignment (of nucleotides or amino acids or both via a translation alignment for protein-coding sequences) ought to resolve the sequences by introducing gaps - insertions or deletions. It sounds like you're not doing this; when you say

The 3 short protein sequence are upregulated in abiotic stresses. Is it ok if I omit the sequence because they have role in abiotic stresses?

It suggests that your dataset may consist of multiple proteins, not the same protein across samples, which is a completely inappropriate input for phylogenetic techniques.

In other words, your workflow would be:

  1. Construct a dataset of the same locus across all samples
  2. Align the amino acids or nucleotides
  3. Model selection for ML analysis or NJ distance corrections/uncorrected NJ/UPGMA/etc.
  4. [If you decide to use a model: With an appropriate model, any likelihood (maximum likelihood or Bayesian) approach.]
  5. Bootstrapping etc. for support.

If you do have sequences with a shared history, I would follow Istvan Albert's recommendation and remove the short sequences if they are truly unalignable.

ADD COMMENT
0
Entering edit mode

Thank you Mr. Brice. I will follow ur guidelines and I will try to solve this issue. I will get back to you if i still face the same problem again and again. Once again thanks for ur idea.

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6