Looking For A Simple Method For Calculating Pwms/Pssms
11.4 years ago
Will 4.5k

I'm looking for a simple method for calculating position specific weight matrices (PWM) from a position occurrence matrix ... like what's found in the jaspar database:

    >MA0004.1 Arnt
A  [ 4 19  0  0  0  0 ]
C  [16  0 20  0  0  0 ]
G  [ 0  1  0 20  0 20 ]
T  [ 0  0  0  0 20  0 ]


I need to scan a large collection of sequences and submitting them to online services would be a complete hassle.

Once I have a PWM I know how to scan sequences, but I'm just having trouble creating them.

PS. I'm looking for an equation (or psuedocode) for fining the PWM, not a library. I plan to implement it in python and matlab. I'd prefer not to make a wrapper for system-calls but to actually implement it in my code. Thanks

transcription pssm • 4.3k views
Have you checked the wikipedia page for PSSM?. It covers the math behind PWMs / PSSM in a comprehensive way. It is not clear to me what you mean by position occurrence matrix.

I presumed that by 'position occurrence matrix' they meant a matrix of counts, rather than frequencies or weights, such as those in Jaspar, based on experimental data. The wikipedia page is fine, but I think the primary reference by Hertz and Stormo is preferable as it can be referenced in any future publications by Will.

11.4 years ago
Stew ★ 1.4k

This paper describes the process. Which you should be able to follow.

Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.

11.4 years ago
brentp 23k

i know you say you're not looking for a library, but i feel compelled to link you to this. it even has a section titled "Loading and using JASPAR and TFD".

11.2 years ago
Will 4.5k

Although this question is answered and closed I figured I'd post a new library I found which completely implements everything I wanted to do. Its called MOODS and the source can be found here. This program uses a C++ interface to improve speed but has python and perl bindings.