Question: Is always correct to include an outgroup in phylogenetic trees?
I am currently working with several hypothetical bacterial proteins.

One approach I applied to characterize them is comparing them with similar sequences. I retrieved similar proteins based on blastp search and on domain architecture comparison.

My aim is to infer a putative function for my unknown proteins based on the similarity with known ones. I thought to align all sequences, defined conserved region and perform a phylogenetic reconstruction.

In case my unknown sequences are clustering together with known ones I can speculate that they might have similar function.

Now my question is, would it make sense to include an outgroup in the tree?

Including an outgroup would mean decrese the quality of the alignment, since the outgroup seqs will create many gaps that will decrease the number of conserved regions to keep for building the tree.

What you opinion about that?

IMO you could just midpoint root but what's the point of making the trees to begin with? Why not just cluster with some clustering algorithm and be done with it?

You mean grouping the protein using something like blastclust? Wouldn't less robust than a tree? the clustering is based on alignment similarity and I think you might introduce artefact by choosing the similarity cutoff. Am I missing something?

Well, first of all, since it's bacterial proteins, you should probably go for global alignment instead of local alignment for clustering (so e.g. cd-hit). Second, you could do the clustering at e.g. 80-99% similarity and see how the clusters react. Just ideas..

I could indeed give it a try. I cannot find the standalone version of CD-HIT, you know a working link? GOT IT. Sorry

