Calculating Background Model For Genome Scanning With Pwm
2
2
Entering edit mode
10.7 years ago
Amm ▴ 20

I have a number of PWMs (8 positions) for several TFs I'd like to scan a genome with, and I've been suggested to use the FIMO tool from the MEME suit for this (http://meme.sdsc.edu/meme/cgi-bin/fimo.cgi)

This, however, requires a background model for the organism. A friend mentioned that the accepted way to do this is to simply use a 4th Order Markov Model using the whole genome.

So, what I'd like to ask you is:

(1) Would you agree with the use of FIMO for my problem of scanning and finding statistically significant potential targets of my TFs of interest? If not, what other platforms would you recommend? RSAT? Other?

(2) Do you think that the Markov model approach is the best one for getting a background model? Can background models be included in other methods for identifying potential targets through the use of PWMs?

transcription binding scoring meme • 4.6k views
5
Entering edit mode
10.7 years ago
Gjain 5.7k

Hi, If you have some programming background, then you can write your own program to get a TFBS score from your PWM.

You can convert the genome to 8-mers and input the 8-mers to pwm to get the scores.

Here is an example:

# PFM from JASPAR
A    16    352    3    354    268    360
C    46    0    10    0    0    3
G    18    2    2    5    0    20
T    309    35    374    30    121    6

# INPUT k-mers
TTGGGG
TATATA
TATAAA
TAAATA

# To convert PFM to PWM
w = log2 ( ( f + sqrt(N) * p ) / ( N + sqrt(N) ) / p )
where
w - is a weight for the current nucleotide we are calculating
f - is a number of occurences of the current nucleotide in the current column (e.g., "61" for A in column 1, "46" for C etc)
N - total number of observations, the sum of all nucleotides occurences in a column (61+46+18+31=156 in this example)
p - [prior] [background] frequency of the current nucleotide; this one usually defaults to 0.25 (i.e. one nucleotide out of four)

# PWM we get:
A    -0.43    1.11    -0.27    1.10    1.46    1.09
C    -0.83    -0.21    -0.36    -0.21    -0.21    -0.23
G    -0.42    -0.22    -0.26    -0.25    -0.21    -0.35
T    1.54    -0.44    1.09    -0.41    -1.53    -0.25

# To calculate z-score
z = (x - mean)/sd
The variables in the z-score formula are:
z = z-score
x = raw score or observation to be standardized
mean = mean of the population
sd = standard deviation of the population


You can choose a statistically significant threshold or a p-value corresponding to the z-score obtained. you can also look at the paper below choose a threshold value:

I hope this helps.

0
Entering edit mode

'"61" for A in column 1, "46" for C etc' and '61+46+18+31' Sorry, but how did you obtain those numbers ?

1
Entering edit mode
10.3 years ago
razor ▴ 190

There is the INCLUSive software collection available here:

It can search for overrepresented/de-novo motifs, create background models, compare motifs, etc. You might need to convert your PWMs to the INCLUSive format though.