Question: How Can I Re-Format My Dna Motifs' Position Weight Matrices (Pwms)?
0
gravatar for a1ultima
5.5 years ago by
a1ultima710
London
a1ultima710 wrote:

I am working with a set of DNA motifs that are predicted as potential regulatory motifs (e.g. transcription factor binding sites). The motifs belong to several species, and I wanted to cluster these motifs via their Position Weight Matrices (PWMs) (also known as PSSMs) to collapse similar motifs together into groups.

A tool called MATLIGN (website here) does what I need, but their required format for the PWMs are different to what I have, they claim:

"Matrices must be in the frequency matrix format (only integer numbers are acceptable)"

The problem is that my PWM matrices do not have integer numbers but decimals instead. e.g.:

     A        C        G        T
1    0.000000 1.000000 0.000000 0.000000
2    1.000000 0.000000 0.000000 0.000000
3    0.000000 0.000000 1.000000 0.000000
4    0.000000 0.421755 0.000000 0.578245
5    0.289407 0.000000 0.282556 0.428038

In other words, instead of the decimal values I have in my matrix I need to have integer counts. Could anybody suggest what I can do? Would I need to create artificial counts?

matrix motif dna • 2.9k views
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by a1ultima710
2

That looks a lot like a position frequency matrix (PFM) where the counts were divided by the row total. Unless you know that this had a background nucleotide frequency taken into account you can probably just multiply everything by a constant and round to make it into 'counts'. You can also use a tool like TOMTOM to do this where it doesn't require integers.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by UnivStudent380

@UnivStudent: I have actually used TOMTOM before, unfortunately they only do pairwise motif comparisons. I was hoping to use a more advanced method that carries out clustering as well.

ADD REPLYlink written 5.5 years ago by a1ultima710
1

PWM contain less information than the actual counts. Where did you obtain the PWM from? Try to find the counts or actual sequences as well.

ADD REPLYlink written 5.5 years ago by Asaf6.1k
2
gravatar for a1ultima
5.5 years ago by
a1ultima710
London
a1ultima710 wrote:

After looking more closely at my data, I noticed that with each of the PWMs I had there was a value called nSites.

It turns out that nSites refers to the number of DNA sequence regions, or sites, that were used to originally generate the PWMs.

Solution:

With this I was able to convert my PWMs into integer counts by multiplying the proportions by the nSites value.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by a1ultima710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour