I am working with a set of DNA motifs that are predicted as potential regulatory motifs (e.g. transcription factor binding sites). The motifs belong to several species, and I wanted to cluster these motifs via their Position Weight Matrices (PWMs) (also known as PSSMs) to collapse similar motifs together into groups.
A tool called MATLIGN (website here) does what I need, but their required format for the PWMs are different to what I have, they claim:
"Matrices must be in the frequency matrix format (only integer numbers are acceptable)"
The problem is that my PWM matrices do not have integer numbers but decimals instead. e.g.:
A C G T 1 0.000000 1.000000 0.000000 0.000000 2 1.000000 0.000000 0.000000 0.000000 3 0.000000 0.000000 1.000000 0.000000 4 0.000000 0.421755 0.000000 0.578245 5 0.289407 0.000000 0.282556 0.428038
In other words, instead of the decimal values I have in my matrix I need to have integer counts. Could anybody suggest what I can do? Would I need to create artificial counts?