Question: What Makes A Dna Sequence Motif A Dna Sequence Motif?
gravatar for Faheemmitha
7.0 years ago by
Faheemmitha210 wrote:

I've read about DNA sequence motifs, but still don't understand what makes some sequence a DNA sequence motif. Is this a well-defined concept? For, example, What are DNA sequence motifs? says:

Sequence motifs are short, recurring patterns in DNA that are presumed to have a biological function. Often they indicate sequence-specific binding sites for proteins such as nucleases and transcription factors (TF). Others are involved in important processes at the RNA level, including ribosome binding, mRNA processing (splicing, editing, polyadenylation) and transcription termination.

For example, I've been working with Recombination Signal Sequences (RSS), which are important in immunology. However, from what I can tell, RSS are not themselves motifs, but they do contain motifs. E.g. Conservation of sequence in recombination signal sequence spacers says

Previously, the RSS has been described as possessing both a conserved heptamer and a conserved nonamer motif.

So, apparently, the heptamer and nonamer component of the RSS are themselves motifs, but the RSS itself is not. Can anyone explain why? Thanks.

bioinformatics • 4.3k views
ADD COMMENTlink modified 7.0 years ago by Jelena Aleksic910 • written 7.0 years ago by Faheemmitha210
gravatar for seidel
7.0 years ago by
United States
seidel7.1k wrote:

I think you answered your own question about what makes a motif a motif, but your question is actually something else: why isn't a given sequence-based entity a motif? The RSS is composed of motifs but has a larger structure consisting of motifs and a spacer. This is akin to words versus phrases in language. You're essentially asking, "Why isn't a phrase a word?" There are plenty of examples in biology of functional structures that consist of an assemblage of motifs. A given motif is simply the smallest identifiable sequence subcomponent of something larger. The constraints on the structure of the larger entity can vary and are usually the subject of experimentation. For instance, with the RSS, one could vary the orientation of the motifs and the width of the spacer and define very exact requirements. Whereas some cis-regulatory modules (an assemblage of DNA binding motifs) require co-occurrence, but actual order and spacing may be less important.

ADD COMMENTlink written 7.0 years ago by seidel7.1k

+1 for "motif is simply the smallest identifiable sequence subcomponent".

ADD REPLYlink written 7.0 years ago by PoGibas4.8k

Thanks, seidel, that's helpful. Your analogy of words and phrases is suggestive, but words are clearly distinguished from phrases, at least in English, by blank spaces. I suppose even if you were concatenate words together in a single word, the individual words would be distinguishable by an English speaker. In any case, is "the smallest identifiable sequence sub-component of something larger" determined purely by the statistical properties of these sequences? If you have any good reference that discusses this kind of issue, I'd appreciate it. Thanks.

ADD REPLYlink written 7.0 years ago by Faheemmitha210

Also, just to be clear, a RSS is not itself a motif?

ADD REPLYlink written 7.0 years ago by Faheemmitha210
gravatar for Jelena Aleksic
7.0 years ago by
Cambridge, UK
Jelena Aleksic910 wrote:

It's an interesting question, with I think a not particularly well defined answer. When searching for them, we define a motif as a DNA sequence (or DNA position weight matrix) that is found at a frequency higher than would be expected based on the sequence background model. A motif can in theory be of any length (though bigger ones become increasingly less likely to find). However, we try and look for motifs in the hope of finding patterns that correspond in some way to the underlying biology, so we introduce search constraints based on this. So, for example, if we are looking for transcription factor binding site motifs, it would be usual to search for a motif length along the lines of 6-10bp, as this corresponds to the size of the transcription factors, and the number of bases they are likely to be binding at any one time.

ADD COMMENTlink written 7.0 years ago by Jelena Aleksic910
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 949 users visited in the last hour