How are indels and 'unknown' bases treated in ML phlogenetic tree programs?
1
0
Entering edit mode
9.0 years ago
SemiQuant ▴ 80

I'm attempting to create ML phylogenetic trees, mainly using RaxML (GTR model) and I cant seem to find much information on how this and other programs treat indels and 'unknown' (N) bases or low confidence (lower case)?

I know that if a column in the alignment consist completely of missing data (-) then RaxMl will discard it, but what about regions where one isolates out of say 100 has an insertion, indicated as "-" in the other isolates? And if there were two isolates with this, would it then regard them as being 'correct'? I have read some papers that show

If anyone could offer insight into this it would be appreciated.

RaxMl phylogenetics • 3.3k views
ADD COMMENT
2
Entering edit mode
9.0 years ago
Brice Sarver ★ 3.8k

To be technically correct, missing data is denoted using a '?,' whereas gaps (i.e., indels) are characterized by '-.' Most major phylogenetic software packages treat these as equivalent.

Handling varies among programs, but the most common treatments include:

1) Removing the site completely. This is uncommon, but some programs have per-site missing data cutoffs you can specify.

2) Ignoring them. My understanding is that these characters no longer count toward the single-site likelihood.

3) Treating the character as ambiguous (i.e., an N) and averaging over all possible states. This is how some programs handle other IUPAC ambiguity codes as well.

4) Treating gaps as a fifth character state.

With respect to RAxML, this question has been brought up a couple times on the Google group. I would consult the manual or the paper for your program of interest to really nail down what's going on under the hood.

ADD COMMENT

Login before adding your answer.

Traffic: 2483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6