Question: What Is The Appropriate Order For A Background Model In Motif Searches?
gravatar for mgalactus
5.0 years ago by
United Kingdom
mgalactus720 wrote:


when searching for DNA motifs in upstream sequences (for instance with the MEME suite), it is suggested to add a background model to distinguish the motif from the sequence background noise. One possibility is to use Markov Models, in which the frequency of k-mers is computed (k-order).

My question is: how to decide the value of k? A search on related literature states that it should be proportional to the putative motif length, but no clear rule of thumb is given. Our guess is that it shouldn't be too big for computing and overfitting problems.


meme motif • 2.2k views
ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by mgalactus720
gravatar for mgalactus
5.0 years ago by
United Kingdom
mgalactus720 wrote:

The authors of the MEME suite have written a general rule of thumb for choosing the appropriate order for both protein and DNA searches.

Here's a significant extract:

Typically, you should not specify an order larger than 3 for DNA sequences, or larger than 2 for protein sequences. However, if your input sequences contain higher-order non-random effects that are getting in the way of motif finding, you can follow the following "rules of thumb":

  • Use a background model at least four orders less than the shortest motifs you are looking for. So, if you want to find motifs as short as six, I wouldn't use a model higher than order two.
  • For an accurate model of order N, you need to use a FASTA file as input to fasta-get-markov with at least 10 times 4(N+1) DNA characters** in it. So,

order-3 requires 2560 characters

order 4 requires 10240 characters

order 5 requires 40960 characters etc.

ADD COMMENTlink written 5.0 years ago by mgalactus720
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1099 users visited in the last hour