Phylogeny accounting for individual sequence occurrence

0

Entering edit mode

5.7 years ago

avocado_toast ▴ 10

Hi all,

I have a large data set consisting of multiple protein sequences for 100+ strains, ML trees built. My data set contains a column which indicates how frequently the sequence occurs and my supervisor wants new trees built which incorporate the sequence frequency to investigate the effect this would have on the branch lengths.

Eg: Strain A

SeqNo 1 2 3 4 5

Occurence 1 1 17 1 31

Is this even possible? I can't find anything that seems even remotely related, dunno have I wildly misunderstood what I'm supposed to be doing.

Thanks in advance

Edited for example clarity

phylogeny • 682 views

ADD COMMENT • link updated 3.8 years ago by Biostar 20 • written 5.7 years ago by avocado_toast ▴ 10

0

Entering edit mode

If your sequences are paralogous your tree will collapse. There is no way to know which of those duplicated sequences is the 'ancestor' sequence.

The best you could do probably is draw something like a tanglegram/split decomposition tree. That would at least highlight that some branches of that tree have more sequences than others.

ADD REPLY • link 5.7 years ago by Joe 21k

Login before adding your answer.