Question: Phylogenetic tree building from BlastP data
Hello everyone,

I just have a question regarding building phylogenetic trees.

I have a protein sequence that I have run through BlastP with several hits returned (250). I am wanting to build a phylogenetic tree with this data, but I am wondering if it is best to use the entire sequences of the hits or just the aligned sequence portion to complete the MUSCLE alignment and PHYML.

Which would be best and why?

Thank you so much for your time!

While it is best to use the entire sequence as @Jean-Karim points out below, you should examine the blast hits in context. You may have picked up homologies to domains for example. If main protein of your interest is 50 AA and the hit points to a 500 AA protein then you would want to make your selection for MSA carefully.

Use the entire sequences. Blast may have only picked up the most homologous regions and failed to align the less conserved parts which may nonetheless be picked up by a multiple alignment.

