Please excuse how basic this question is; I'm a bioinformatics newbie. I'm looking at the 7-way primate genome alignment in Ensembl release 76 (ftp://ftp.ensembl.org/pub/release-76/emf/ensembl-compara/epo_7_primate), and I don't understand how it decides which bases should be (soft-)masked. For example, in the first file (chr1_1.emf), around line 250 there is a run of T's followed by some other bases that is consistently masked for some genomes and not others, even though the sequences are identical. What's going on here?
The columns you are referring to are from the predicted ancestral sequences and we (Ensembl) don't repeat-mask ancestral sequences. In the EMF file, each column is a species, as indicated in the header SEQ elements. You can see at the beginning of the file that extant species are mixed with ancestral ones.