How much does cleandata in PAML analysis remove?
Entering edit mode
6.9 years ago
DNAngel ▴ 250

I cannot find documentation on how much data cleandata = 1 removes in codeml. For example, if there are 40 sequences in an alignment and in one column only 1 sequence has an ambiguity or a gap character, is that whole column still removed? Is there a way to set a buffer for how much can be removed? Like if only 10% of the sequences have a gap, then it is okay to keep those gaps to avoid losing information for all the other sequences?

Is this possible, or if there are any gaps/ambiguities anywhere that whole column is gone?

Thank you!

PAML codeml cleandata • 2.5k views
Entering edit mode
6.9 years ago
h.mon 35k

Both PAML manual and FAQ answer this question, e.g., from the manual:

Note that alignment gaps are treated as missing data in baseml and codeml (if cleandata = 1). If cleandata = 1, all sites with ambiguity characters and alignment gaps are removed.

This is also discussed in the FAQ (Should I remove alignment gaps and ambiguity characters in my analysis?), where the author also states his personal opinion (which cleandata does not follow):

Personally I think sites at which most sequences have data except for one or two sequences should perhaps be kept while sites at which all sequences except one or two have alignment gaps had better be removed.


Login before adding your answer.

Traffic: 942 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6