Lowercase Characters In Sequenced Genomes
1
0
Entering edit mode
10.2 years ago
Pappu ★ 2.1k

I am wondering if there is any special meaning of lower case a,t,g,c in the sequenced genomes.

genome • 2.9k views
ADD COMMENT
4
Entering edit mode
10.2 years ago

That's called "soft masking". Generally, these were found to be repeat regions with RepeatMasker (or some other tool). There are a couple options when it comes to masking genome. Aside from soft masking, one can "hard mask", meaning replacing a given region with a bunch of N's. This, of course, can produce pretty useless genomes for many use cases. Consequently, you'll often find people keeping soft-masked genomes around (I recall that UCSC provides both, with the regular genomes that you download already being soft masked).

ADD COMMENT

Login before adding your answer.

Traffic: 2968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6