Hi, I have four reference genomes of four closely related species. I wonder how to decide allele status -- which is ancestral allele, which is derived allele? Presently, I have finished the multiple genome alignment of the four species. For example

position    ref1    ref2    ref3    ref4

23            G      G       A      G

78            A       T      A      T

145           T       C      -      A


For position 23, can we say "G" is the ancestral allele, since most of the species at this position are "G"? but what about position 78, or position 145, where seems no majority allele? Thank you very much!

You cannot decide the ancestral allele based on a majority of a single row alone, you need a phylogeny for that. Imagine the hypotheses:

1) ref1,2,3,4 all descend from a common ancestor with 23G, and there was a single mutation to A in ref3.

2) ref3 and ref1,2,4, share a common ancestor having 23A, leading to ref3 (23A), and the common ancestor of 1,2,4 with single mutation to 23G

Both hypotheses are equally likely, without knowing the evolutionary tree. You can use tools like FastML for ancestral sequence reconstruction, given a phylogenetic tree.

