I am currently working with several hypothetical bacterial proteins.
One approach I applied to characterize them is comparing them with similar sequences. I retrieved similar proteins based on blastp search and on domain architecture comparison.
My aim is to infer a putative function for my unknown proteins based on the similarity with known ones. I thought to align all sequences, defined conserved region and perform a phylogenetic reconstruction.
In case my unknown sequences are clustering together with known ones I can speculate that they might have similar function.
Now my question is, would it make sense to include an outgroup in the tree?
Including an outgroup would mean decrese the quality of the alignment, since the outgroup seqs will create many gaps that will decrease the number of conserved regions to keep for building the tree.
What you opinion about that?