Hi, Im a begginer in MEGA software. I have 89 protein sequence for which I need to construct a phylogenetic tree using bootstrap method with 1000 replication with data set parameter with complete deletion. But I am not able to construct a tree because of 3 sequence whose protein length is very less when compared to other 86 sequence. Even I tried by deleting non conserved regions in all protein sequence but still I am not able to get a tree because the size of the smaller proteins become smaller and smaller. Kindly help me out in solving this problem.
Regardless of the approach or program you are using, the input for any phylogenetic estimation approach is an alignment, i.e., an inference of homology. Therefore, by necessity, your sequences must have a shared ancestry to even begin to infer a phylogeny. If the sequences are shorter but homologous, a multiple sequence alignment (of nucleotides or amino acids or both via a translation alignment for protein-coding sequences) ought to resolve the sequences by introducing gaps - insertions or deletions. It sounds like you're not doing this; when you say
"The 3 short protein sequence are upregulated in abiotic stresses. Is it ok if i omit the sequence because they have role in abiotic stresses?"
it suggests that your dataset may consist of multiple proteins, not the same protein across samples, which is a completely inappropriate input for phylogenetic techniques.
In other words, your workflow would be:
- Construct a dataset of the same locus across all samples
- Align the amino acids or nucleotides
- Model selection for ML analysis or NJ distance corrections/uncorrected NJ/UPGMA/etc.
- [If you decide to use a model: With an appropriate model, any likelihood (maximum likelihood or Bayesian) approach.]
- Bootstrapping etc. for support.
If you do have sequences with a shared history, I would follow Istvan Albert's recommendation and remove the short sequences if they are truly unalignable.