Question

What is the standard way of preparing homolog sequences for a phylogenetic analysis?

0

Entering edit mode

5.5 years ago

johnnytam100 ▴ 110

I have two protein sequences with around 50% identity between them.

I want to study the phylogenetic relationship between them.

I came up with a method myself:

Step 1: blast each of the sequences to a protein database separately (possibly with less stringent thresholds)

Step 2: extract the subject sequences which are hits common to both blasts

Step 3: multiple sequence alignment using the common subject seqeunces and the two query sequences

Step 4: build the phylogenetic tree

Could anyone comment on this method? If it is not ideal, what is the standard way of preparing homolog sequences for a phylogenetic analysis?

Thank you.

phylogenetic tree homolog • 1.1k views

ADD COMMENT • link 5.5 years ago by johnnytam100 ▴ 110

1

Entering edit mode

These are the typical steps. However, you'll need to figure out the details, e.g. which species to include, maybe manually tweak the multiple sequence alignment, which tree building algorithm to choose.

ADD REPLY • link 5.5 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

To add to Jean’s answer about which species to include, you may also want to consider a species or sequence that is less related to be an outgruop if you want a rooted tree.

ADD REPLY • link 5.5 years ago by Joe 21k

0

Entering edit mode

For the outgroup, should it be either 'out' in the sense of 1) blast threshold 2) functional annotation of the protein or 3) both?

ADD REPLY • link 5.5 years ago by johnnytam100 ▴ 110

1

Entering edit mode

It should be a more divergent sequence, which would lead to it being the outer most branch in your final tree. I.e. it will be one half of the most basal node bifurcation.

ADD REPLY • link 5.5 years ago by Joe 21k

0

Entering edit mode

I see! What could we achieve if we play with the sequence alignment step?

ADD REPLY • link 5.5 years ago by johnnytam100 ▴ 110

1

Entering edit mode

Thats too broad of a question really. You need to decide what features you’re looking for. If you wanted to examine preservation of an active site or domain for instance, you’d want to use local alignments, but if you were perhaps interested in the overall gene conservation, a global alignment would be more informative most likely.

ADD REPLY • link 5.5 years ago by Joe 21k