Question: How To Code Multiple States For Discrete Data Type
1
gravatar for qiyunzhu
6.1 years ago by
qiyunzhu420
Buffalo
qiyunzhu420 wrote:

Dear all,

I'm trying to do phylogenetic reconstruction using discrete character data, such as geographical distribution. For example:

species A            Asia
species B            Europe
species C            Africa
species D            Europe

This works fine for me, however, my situation is that some species are distributed in more than one continents, for example

species E            Asia,Europe

My question is how I can code multiple character states into one data point for popular phylogenetics softwares, including BEAST, MrBayes, Mesquite, etc. My favorite is BEAST. I tried the way above in BEAST, but it didn't work. In the xml file, "Asia,Europe" is treated as one character state, instead of "Asia" and "Europe", which I desired. So I'm posting to request if anyone can give me a solution, or tell me it's just not possible.

Thanks!

phylogenetics • 2.4k views
ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by qiyunzhu420

I don't have any clue about the softwares you mentioned, but can't you split the multi-continent species to different lines like

species E Asia

species E Europe

Will this work for you?

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Sukhdeep Singh9.5k

Yeah, it looks like species to continent is a many-to-many relationship.

ADD REPLYlink written 6.1 years ago by Alex Paciorkowski3.3k

That sounds a nice idea. I just tried. In BEAST, I wrote the lines in XML as two lines: <attr name="location">Fujian</attr> <attr name="location">Guangdong</attr> I'm waiting to see if BEAST really treats it as two states.

ADD REPLYlink written 6.1 years ago by qiyunzhu420

I found that the lower line overrides the upper line. So it didn't work for BEAST.

ADD REPLYlink written 6.1 years ago by qiyunzhu420

Oh, may be try it as two different location id's but with same names. I was just looking here or just contact the developers.

ADD REPLYlink written 6.1 years ago by Sukhdeep Singh9.5k
3
gravatar for qiyunzhu
6.1 years ago by
qiyunzhu420
Buffalo
qiyunzhu420 wrote:

I consulted the BEAST authors, who kindly gave me the official solutions. Here are how it should be done:

Edit the xml file. In state code section, create ambiguity definitions like:

<generalDataType id="fruit.dataType">
        <state code="Asia"/>
        <state code="Europe"/>
            ...
        <state code="Antarctica"/>
        <ambiguity code="Eurasia" states="Asia Europe"/>
</generalDataType>

Then go back to taxa section, set the code of desired taxa as "Eurasia".

Then go to treeLikelihood section, set useAmbiguities="true".

Hope this is helpful to people who read this post.

ADD COMMENTlink written 6.1 years ago by qiyunzhu420

Excellent, thanks for posting the solution here!

ADD REPLYlink written 6.1 years ago by Josh Herr5.6k
1
gravatar for Josh Herr
6.1 years ago by
Josh Herr5.6k
University of Nebraska
Josh Herr5.6k wrote:

I haven't tried this in BEAST yet, but my analysis works fine in PAUP and MrBayes for multiple character states with parentheses: in your data matrix, for example, you will have (Asia,Europe) for character uncertainty or {Asia,Europe} for both character states. Give this a try.

I do know this designation will not work using the read.nexus.data script used by ape in R. In that case, I've either added a separate character to my data matrix or coded my data as continuous, which both have data analysis disadvantages downstream.

ADD COMMENTlink written 6.1 years ago by Josh Herr5.6k

Thanks for the information in MrBayes and PAUP! I tried it in BEAST, but it didn't work. :( At present I haven't tried MrBayes because I feel that MrBayes runs much slower than BEAST and my data set is just HUGE. But if MrBayes can handle this I will give it a try.

ADD REPLYlink written 6.1 years ago by qiyunzhu420

I think this might be a XML vs. NEXUS problem. Sounds like David's idea to code the location independently is a way around this that is file neutral.

ADD REPLYlink written 6.1 years ago by Josh Herr5.6k

Yes I agree, and I'm coding with multiple characters now for a try. It seems that nexus format is sometimes more flexible than xml, and, more readable.

ADD REPLYlink written 6.1 years ago by qiyunzhu420
1
gravatar for David W
6.1 years ago by
David W4.7k
New Zealand
David W4.7k wrote:

I don't know of any software that deals with a taxon taking multiple states for the same character, but do you really need to code it this way? Why not make "geography" a presence-absence character:

       Asia    Africa     Europe
sp1      1        0         0
sp2      1        1         0
sp3      0        0         0

edit the other option to consider is dreaming up some discrete-character model (eg rev) in which each possible geographic combination is represented by some rate of change that reflects the fact that changing from, say, Asia-Africa -> Europe-Africa-Asia is much more likely than going Europe -> Europe-Africa-Asia... though this seems like a lot of work.

ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by David W4.7k

Thank you for your suggestion! I just discovered that Mesquite can do it, by type things like "Asia&Europe". I also tried setting up multiple characters in BEAST, which also works, but the results are separated.

ADD REPLYlink written 6.1 years ago by qiyunzhu420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1126 users visited in the last hour