How To Code Multiple States For Discrete Data Type
3
1
Entering edit mode
9.2 years ago
qiyunzhu ▴ 430

Dear all,

I'm trying to do phylogenetic reconstruction using discrete character data, such as geographical distribution. For example:

species A            Asia
species B            Europe
species C            Africa
species D            Europe


This works fine for me, however, my situation is that some species are distributed in more than one continents, for example

species E            Asia,Europe


My question is how I can code multiple character states into one data point for popular phylogenetics softwares, including BEAST, MrBayes, Mesquite, etc. My favorite is BEAST. I tried the way above in BEAST, but it didn't work. In the xml file, "Asia,Europe" is treated as one character state, instead of "Asia" and "Europe", which I desired. So I'm posting to request if anyone can give me a solution, or tell me it's just not possible.

Thanks!

phylogenetics • 3.3k views
0
Entering edit mode

I don't have any clue about the softwares you mentioned, but can't you split the multi-continent species to different lines like

species E Asia

species E Europe

Will this work for you?

0
Entering edit mode

Yeah, it looks like species to continent is a many-to-many relationship.

0
Entering edit mode

That sounds a nice idea. I just tried. In BEAST, I wrote the lines in XML as two lines: <attr name="location">Fujian</attr> <attr name="location">Guangdong</attr> I'm waiting to see if BEAST really treats it as two states.

0
Entering edit mode

I found that the lower line overrides the upper line. So it didn't work for BEAST.

0
Entering edit mode

Oh, may be try it as two different location id's but with same names. I was just looking here or just contact the developers.

3
Entering edit mode
9.2 years ago
qiyunzhu ▴ 430

I consulted the BEAST authors, who kindly gave me the official solutions. Here are how it should be done:

Edit the xml file. In state code section, create ambiguity definitions like:

<generalDataType id="fruit.dataType">
<state code="Asia"/>
<state code="Europe"/>
...
<state code="Antarctica"/>
<ambiguity code="Eurasia" states="Asia Europe"/>
</generalDataType>


Then go back to taxa section, set the code of desired taxa as "Eurasia".

Then go to treeLikelihood section, set useAmbiguities="true".

0
Entering edit mode

Excellent, thanks for posting the solution here!

1
Entering edit mode
9.2 years ago
Josh Herr 5.7k

I haven't tried this in BEAST yet, but my analysis works fine in PAUP and MrBayes for multiple character states with parentheses: in your data matrix, for example, you will have (Asia,Europe) for character uncertainty or {Asia,Europe} for both character states. Give this a try.

I do know this designation will not work using the read.nexus.data script used by ape in R. In that case, I've either added a separate character to my data matrix or coded my data as continuous, which both have data analysis disadvantages downstream.

0
Entering edit mode

Thanks for the information in MrBayes and PAUP! I tried it in BEAST, but it didn't work. :( At present I haven't tried MrBayes because I feel that MrBayes runs much slower than BEAST and my data set is just HUGE. But if MrBayes can handle this I will give it a try.

0
Entering edit mode

I think this might be a XML vs. NEXUS problem. Sounds like David's idea to code the location independently is a way around this that is file neutral.

0
Entering edit mode

Yes I agree, and I'm coding with multiple characters now for a try. It seems that nexus format is sometimes more flexible than xml, and, more readable.

1
Entering edit mode
9.2 years ago
David W 4.8k

I don't know of any software that deals with a taxon taking multiple states for the same character, but do you really need to code it this way? Why not make "geography" a presence-absence character:

       Asia    Africa     Europe
sp1      1        0         0
sp2      1        1         0
sp3      0        0         0


edit the other option to consider is dreaming up some discrete-character model (eg rev) in which each possible geographic combination is represented by some rate of change that reflects the fact that changing from, say, Asia-Africa -> Europe-Africa-Asia is much more likely than going Europe -> Europe-Africa-Asia... though this seems like a lot of work.

0
Entering edit mode

Thank you for your suggestion! I just discovered that Mesquite can do it, by type things like "Asia&Europe". I also tried setting up multiple characters in BEAST, which also works, but the results are separated.