Phylogeny accounting for individual sequence occurrence
0
0
Entering edit mode
5.7 years ago

Hi all,

I have a large data set consisting of multiple protein sequences for 100+ strains, ML trees built. My data set contains a column which indicates how frequently the sequence occurs and my supervisor wants new trees built which incorporate the sequence frequency to investigate the effect this would have on the branch lengths.

Eg: Strain A

SeqNo 1 2 3 4 5

Occurence 1 1 17 1 31

Is this even possible? I can't find anything that seems even remotely related, dunno have I wildly misunderstood what I'm supposed to be doing.

Thanks in advance

Edited for example clarity

phylogeny • 682 views
ADD COMMENT
0
Entering edit mode

If your sequences are paralogous your tree will collapse. There is no way to know which of those duplicated sequences is the 'ancestor' sequence.

The best you could do probably is draw something like a tanglegram/split decomposition tree. That would at least highlight that some branches of that tree have more sequences than others.

ADD REPLY

Login before adding your answer.

Traffic: 2467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6