Question: How does Ensembl decide what to mask?
0
gravatar for dbweissman
4.6 years ago by
dbweissman10
United States
dbweissman10 wrote:

Please excuse how basic this question is; I'm a bioinformatics newbie. I'm looking at the 7-way primate genome alignment in Ensembl release 76 (ftp://ftp.ensembl.org/pub/release-76/emf/ensembl-compara/epo_7_primate), and I don't understand how it decides which bases should be (soft-)masked. For example, in the first file (chr1_1.emf), around line 250 there is a run of T's followed by some other bases that is consistently masked for some genomes and not others, even though the sequences are identical. What's going on here?

ADD COMMENTlink modified 3.4 years ago by Biostar ♦♦ 20 • written 4.6 years ago by dbweissman10

Odds are good that that comes originally from Repeatmasker, in which case differences in the repeat databases used for each organism could cause what you're seeing (obviously, it would take someone from Ensembl to give you the real answer).

ADD REPLYlink written 4.6 years ago by Devon Ryan90k
1
gravatar for Denise - Open Targets
4.6 years ago by
UK, Hinxton, EMBL-EBI
Denise - Open Targets4.9k wrote:

The columns you are referring to are from the predicted ancestral sequences and we (Ensembl) don't repeat-mask ancestral sequences. In the EMF file, each column is a species, as indicated in the header SEQ elements. You can see at the beginning of the file that extant species are mixed with ancestral ones.

ADD COMMENTlink written 4.6 years ago by Denise - Open Targets4.9k

Ha, I should have noticed that the unmasked ones were all the ancestral sequences... Thanks!

ADD REPLYlink written 4.6 years ago by dbweissman10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 728 users visited in the last hour