Question: Calculating Background Model For Genome Scanning With Pwm
gravatar for Amm
9.3 years ago by
Amm20 wrote:

I have a number of PWMs (8 positions) for several TFs I'd like to scan a genome with, and I've been suggested to use the FIMO tool from the MEME suit for this (

This, however, requires a background model for the organism. A friend mentioned that the accepted way to do this is to simply use a 4th Order Markov Model using the whole genome.

So, what I'd like to ask you is:

(1) Would you agree with the use of FIMO for my problem of scanning and finding statistically significant potential targets of my TFs of interest? If not, what other platforms would you recommend? RSAT? Other?

(2) Do you think that the Markov model approach is the best one for getting a background model? Can background models be included in other methods for identifying potential targets through the use of PWMs?

Thanks in advance!

ADD COMMENTlink modified 9.0 years ago by razor170 • written 9.3 years ago by Amm20
gravatar for Gjain
9.3 years ago by
Bengaluru, India
Gjain5.6k wrote:

Hi, If you have some programming background, then you can write your own program to get a TFBS score from your PWM.

You can convert the genome to 8-mers and input the 8-mers to pwm to get the scores.

Here is an example:

A    16    352    3    354    268    360
C    46    0    10    0    0    3
G    18    2    2    5    0    20
T    309    35    374    30    121    6

# INPUT k-mers

# To convert PFM to PWM
w = log2 ( ( f + sqrt(N) * p ) / ( N + sqrt(N) ) / p )
    w - is a weight for the current nucleotide we are calculating
    f - is a number of occurences of the current nucleotide in the current column (e.g., "61" for A in column 1, "46" for C etc)
    N - total number of observations, the sum of all nucleotides occurences in a column (61+46+18+31=156 in this example)
    p - [prior] [background] frequency of the current nucleotide; this one usually defaults to 0.25 (i.e. one nucleotide out of four)

# PWM we get:
A    -0.43    1.11    -0.27    1.10    1.46    1.09    
C    -0.83    -0.21    -0.36    -0.21    -0.21    -0.23    
G    -0.42    -0.22    -0.26    -0.25    -0.21    -0.35    
T    1.54    -0.44    1.09    -0.41    -1.53    -0.25

# To calculate z-score
z = (x - mean)/sd
    The variables in the z-score formula are:
    z = z-score
    x = raw score or observation to be standardized
    mean = mean of the population
    sd = standard deviation of the population

You can choose a statistically significant threshold or a p-value corresponding to the z-score obtained. you can also look at the paper below choose a threshold value:

I hope this helps.

ADD COMMENTlink written 9.3 years ago by Gjain5.6k

'"61" for A in column 1, "46" for C etc' and '61+46+18+31' Sorry, but how did you obtain those numbers ?

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum134k
gravatar for razor
9.0 years ago by
razor170 wrote:

There is the INCLUSive software collection available here:

It can search for overrepresented/de-novo motifs, create background models, compare motifs, etc. You might need to convert your PWMs to the INCLUSive format though.

ADD COMMENTlink written 9.0 years ago by razor170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1775 users visited in the last hour