I am using the de-novo transcription binding motif discovery software MEME.
The MEME suite documentation (here) claims you can use the '-bfile' argument to provide MEME with a background Markov model file.
I am asking whether anybody has got suggestions for how to generate such a -bfile, or better still if anybody could point me towards a script that generates such a file given .fasta sequence data.
Here is the description of the -bfile as per MEME Suite documentaiton:
BACKGROUND MODEL -bfile <bfile> — The name of the file containing the background model for sequences. The background model is the model of random sequences used by MEME. The background model is used by MEME during EM as the "null model", for calculating the log likelihood ratio of a motif, for calculating the significance (E-value) of a motif, and, for creating the position-specific scoring matrix (log-odds matrix). By default, the background model is a 0-order Markov model based on the letter frequencies in the training set.
Markov models of any order can be specified in <bfile> by listing frequencies of all possible tuples of length up to order+1. Note that MEME uses only the 0-order portion (single letter frequencies) of the background model for purposes 3) and 4), but uses the full-order model for purposes 1) and 2), above.
Example: To specify a 1-order Markov background model for DNA, <bfile> might contain the following lines. Note that optional comment lines are marked by "#" and are ignored by MEME.
# tuple frequency_non_coding
a 0.324
c 0.176
g 0.176
t 0.324
# tuple frequency_non_coding
aa 0.119
ac 0.052
ag 0.056
at 0.097
ca 0.058
cc 0.033
cg 0.028
ct 0.056
ga 0.056
gc 0.035
gg 0.033
gt 0.052
ta 0.091
tc 0.056
tg 0.058
tt 0.119
Sample -bfile files are given in directory tests: tests/nt.freq (DNA), and tests/na.freq (amino acid).
aha! cheers for that