Question: Calculating Background Model For Genome Scanning With Pwm
2
gravatar for Amm
9.3 years ago by
Amm20
Amm20 wrote:

I have a number of PWMs (8 positions) for several TFs I'd like to scan a genome with, and I've been suggested to use the FIMO tool from the MEME suit for this (http://meme.sdsc.edu/meme/cgi-bin/fimo.cgi)

This, however, requires a background model for the organism. A friend mentioned that the accepted way to do this is to simply use a 4th Order Markov Model using the whole genome.

So, what I'd like to ask you is:

(1) Would you agree with the use of FIMO for my problem of scanning and finding statistically significant potential targets of my TFs of interest? If not, what other platforms would you recommend? RSAT? Other?

(2) Do you think that the Markov model approach is the best one for getting a background model? Can background models be included in other methods for identifying potential targets through the use of PWMs?

Thanks in advance!

ADD COMMENTlink modified 9.0 years ago by razor170 • written 9.3 years ago by Amm20
5
gravatar for Gjain
9.3 years ago by
Gjain5.6k
Bengaluru, India
Gjain5.6k wrote:

Hi, If you have some programming background, then you can write your own program to get a TFBS score from your PWM.

You can convert the genome to 8-mers and input the 8-mers to pwm to get the scores.

Here is an example:

# PFM from JASPAR
A    16    352    3    354    268    360
C    46    0    10    0    0    3
G    18    2    2    5    0    20
T    309    35    374    30    121    6

# INPUT k-mers
TTGGGG
TATATA
TATAAA
TAAATA

# To convert PFM to PWM
w = log2 ( ( f + sqrt(N) * p ) / ( N + sqrt(N) ) / p )
where
    w - is a weight for the current nucleotide we are calculating
    f - is a number of occurences of the current nucleotide in the current column (e.g., "61" for A in column 1, "46" for C etc)
    N - total number of observations, the sum of all nucleotides occurences in a column (61+46+18+31=156 in this example)
    p - [prior] [background] frequency of the current nucleotide; this one usually defaults to 0.25 (i.e. one nucleotide out of four)

# PWM we get:
A    -0.43    1.11    -0.27    1.10    1.46    1.09    
C    -0.83    -0.21    -0.36    -0.21    -0.21    -0.23    
G    -0.42    -0.22    -0.26    -0.25    -0.21    -0.35    
T    1.54    -0.44    1.09    -0.41    -1.53    -0.25

# To calculate z-score
z = (x - mean)/sd
    The variables in the z-score formula are:
    z = z-score
    x = raw score or observation to be standardized
    mean = mean of the population
    sd = standard deviation of the population

You can choose a statistically significant threshold or a p-value corresponding to the z-score obtained. you can also look at the paper below choose a threshold value:

I hope this helps.

ADD COMMENTlink written 9.3 years ago by Gjain5.6k

'"61" for A in column 1, "46" for C etc' and '61+46+18+31' Sorry, but how did you obtain those numbers ?

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum134k
1
gravatar for razor
9.0 years ago by
razor170
Barcelona
razor170 wrote:

There is the INCLUSive software collection available here:

It can search for overrepresented/de-novo motifs, create background models, compare motifs, etc. You might need to convert your PWMs to the INCLUSive format though.

ADD COMMENTlink written 9.0 years ago by razor170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1775 users visited in the last hour
_