Question

Calculating Background Model For Genome Scanning With Pwm

2

Entering edit mode

12.5 years ago

Amm ▴ 20

I have a number of PWMs (8 positions) for several TFs I'd like to scan a genome with, and I've been suggested to use the FIMO tool from the MEME suit for this (http://meme.sdsc.edu/meme/cgi-bin/fimo.cgi)

This, however, requires a background model for the organism. A friend mentioned that the accepted way to do this is to simply use a 4th Order Markov Model using the whole genome.

So, what I'd like to ask you is:

(1) Would you agree with the use of FIMO for my problem of scanning and finding statistically significant potential targets of my TFs of interest? If not, what other platforms would you recommend? RSAT? Other?

(2) Do you think that the Markov model approach is the best one for getting a background model? Can background models be included in other methods for identifying potential targets through the use of PWMs?

Thanks in advance!

transcription binding scoring meme • 5.2k views

ADD COMMENT • link updated 12.1 years ago by razor ▴ 190 • written 12.5 years ago by Amm ▴ 20

score 5 · Answer 1 · 2011-11-01

Hi, If you have some programming background, then you can write your own program to get a TFBS score from your PWM.

You can convert the genome to 8-mers and input the 8-mers to pwm to get the scores.

Here is an example:

# PFM from JASPAR
A    16    352    3    354    268    360
C    46    0    10    0    0    3
G    18    2    2    5    0    20
T    309    35    374    30    121    6

# INPUT k-mers
TTGGGG
TATATA
TATAAA
TAAATA

# To convert PFM to PWM
w = log2 ( ( f + sqrt(N) * p ) / ( N + sqrt(N) ) / p )
where
    w - is a weight for the current nucleotide we are calculating
    f - is a number of occurences of the current nucleotide in the current column (e.g., "61" for A in column 1, "46" for C etc)
    N - total number of observations, the sum of all nucleotides occurences in a column (61+46+18+31=156 in this example)
    p - [prior] [background] frequency of the current nucleotide; this one usually defaults to 0.25 (i.e. one nucleotide out of four)

# PWM we get:
A    -0.43    1.11    -0.27    1.10    1.46    1.09    
C    -0.83    -0.21    -0.36    -0.21    -0.21    -0.23    
G    -0.42    -0.22    -0.26    -0.25    -0.21    -0.35    
T    1.54    -0.44    1.09    -0.41    -1.53    -0.25

# To calculate z-score
z = (x - mean)/sd
    The variables in the z-score formula are:
    z = z-score
    x = raw score or observation to be standardized
    mean = mean of the population
    sd = standard deviation of the population

You can choose a statistically significant threshold or a p-value corresponding to the z-score obtained. you can also look at the paper below choose a threshold value:

http://www.engineeringletters.com/issues_v16/issue_4/EL_16_4_06.pdf

I hope this helps.

score 1 · Answer 2 · 2012-03-20

1

Entering edit mode

12.1 years ago

razor ▴ 190

There is the INCLUSive software collection available here:

http://homes.esat.kuleuven.be/~sistawww/bioi/thijs/download.html

It can search for overrepresented/de-novo motifs, create background models, compare motifs, etc. You might need to convert your PWMs to the INCLUSive format though.

ADD COMMENT • link 12.1 years ago by razor ▴ 190