How To Code Multiple States For Discrete Data Type
3
1
Entering edit mode
11.4 years ago
qiyunzhu ▴ 430

Dear all,

I'm trying to do phylogenetic reconstruction using discrete character data, such as geographical distribution. For example:

species A            Asia
species B            Europe
species C            Africa
species D            Europe

This works fine for me, however, my situation is that some species are distributed in more than one continents, for example

species E            Asia,Europe

My question is how I can code multiple character states into one data point for popular phylogenetics softwares, including BEAST, MrBayes, Mesquite, etc. My favorite is BEAST. I tried the way above in BEAST, but it didn't work. In the xml file, "Asia,Europe" is treated as one character state, instead of "Asia" and "Europe", which I desired. So I'm posting to request if anyone can give me a solution, or tell me it's just not possible.

Thanks!

phylogenetics • 4.5k views
ADD COMMENT
0
Entering edit mode

I don't have any clue about the softwares you mentioned, but can't you split the multi-continent species to different lines like

species E Asia

species E Europe

Will this work for you?

ADD REPLY
0
Entering edit mode

Yeah, it looks like species to continent is a many-to-many relationship.

ADD REPLY
0
Entering edit mode

That sounds a nice idea. I just tried. In BEAST, I wrote the lines in XML as two lines: <attr name="location">Fujian</attr> <attr name="location">Guangdong</attr> I'm waiting to see if BEAST really treats it as two states.

ADD REPLY
0
Entering edit mode

I found that the lower line overrides the upper line. So it didn't work for BEAST.

ADD REPLY
0
Entering edit mode

Oh, may be try it as two different location id's but with same names. I was just looking here or just contact the developers.

ADD REPLY
3
Entering edit mode
11.4 years ago
qiyunzhu ▴ 430

I consulted the BEAST authors, who kindly gave me the official solutions. Here are how it should be done:

Edit the xml file. In state code section, create ambiguity definitions like:

<generalDataType id="fruit.dataType">
        <state code="Asia"/>
        <state code="Europe"/>
            ...
        <state code="Antarctica"/>
        <ambiguity code="Eurasia" states="Asia Europe"/>
</generalDataType>

Then go back to taxa section, set the code of desired taxa as "Eurasia".

Then go to treeLikelihood section, set useAmbiguities="true".

Hope this is helpful to people who read this post.

ADD COMMENT
0
Entering edit mode

Excellent, thanks for posting the solution here!

ADD REPLY
1
Entering edit mode
11.4 years ago
Josh Herr 5.8k

I haven't tried this in BEAST yet, but my analysis works fine in PAUP and MrBayes for multiple character states with parentheses: in your data matrix, for example, you will have (Asia,Europe) for character uncertainty or {Asia,Europe} for both character states. Give this a try.

I do know this designation will not work using the read.nexus.data script used by ape in R. In that case, I've either added a separate character to my data matrix or coded my data as continuous, which both have data analysis disadvantages downstream.

ADD COMMENT
0
Entering edit mode

Thanks for the information in MrBayes and PAUP! I tried it in BEAST, but it didn't work. :( At present I haven't tried MrBayes because I feel that MrBayes runs much slower than BEAST and my data set is just HUGE. But if MrBayes can handle this I will give it a try.

ADD REPLY
0
Entering edit mode

I think this might be a XML vs. NEXUS problem. Sounds like David's idea to code the location independently is a way around this that is file neutral.

ADD REPLY
0
Entering edit mode

Yes I agree, and I'm coding with multiple characters now for a try. It seems that nexus format is sometimes more flexible than xml, and, more readable.

ADD REPLY
1
Entering edit mode
11.4 years ago
David W 4.9k

I don't know of any software that deals with a taxon taking multiple states for the same character, but do you really need to code it this way? Why not make "geography" a presence-absence character:

       Asia    Africa     Europe
sp1      1        0         0
sp2      1        1         0
sp3      0        0         0

edit the other option to consider is dreaming up some discrete-character model (eg rev) in which each possible geographic combination is represented by some rate of change that reflects the fact that changing from, say, Asia-Africa -> Europe-Africa-Asia is much more likely than going Europe -> Europe-Africa-Asia... though this seems like a lot of work.

ADD COMMENT
0
Entering edit mode

Thank you for your suggestion! I just discovered that Mesquite can do it, by type things like "Asia&Europe". I also tried setting up multiple characters in BEAST, which also works, but the results are separated.

ADD REPLY

Login before adding your answer.

Traffic: 1888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6