Using sequence case to encode intron/exon information
1
1
Entering edit mode
8.9 years ago
anderspitman ▴ 70

I've heard that sometimes sequences are stored in files partially lowercase and partially uppercase, with the intent that lower- or upper-case regions indicate introns or exons.

Is this a common practice? Are there file formats that explicitly support this type of encoding? Google hasn't yielded anything, but I might just be searching for the wrong things.

sequence • 2.1k views
ADD COMMENT
2
Entering edit mode
8.9 years ago
h.mon 35k

Yes it is common practice, but it is not a standard.

Another common usage for lower case is to indicate softmasks, that is, lower case sequences should not be processed/analyzed.

ADD COMMENT
0
Entering edit mode

Can you link to any examples where the convention is explained?

ADD REPLY
2
Entering edit mode

Lower-case as exon-intron: this paper, which refers to this (or this) database; or see this server to generate pretty graphics of gene structure.

Lower case for soft-masking: see USEARCH manual or Blast lcase_masking parameter.

ADD REPLY

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6