Question: Distance between Position weight matrices using a substitution matrix
0
3.8 years ago by
European Union
marco23.p0 wrote:

Is there a way to compute a distance between two PWMs using a substitution matrix? (for genes, prots or arbitrary alphabets doesn't matter)

Say I have PWM1 corresponding to a Motif in a certain set

PWM2 is the same for another dataset

I have a substitution matrix of the alphabet I am using

-------->

What are the possibilities to compute a distance between those two?

modified 3.8 years ago by Jean-Karim Heriche18k • written 3.8 years ago by marco23.p0
2
3.8 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

You can compute the distance between two matrices using any matrix norm ||PWM1-PWM2||p for any p. The choice depends on the problem (i.e. where you want the emphasis to be) but people often use the Frobenius norm (sum of squared element-wise differences) or the spectral norm (largest singular value of (PWM1-PWM2)^2).

Given that PWMs represent frequencies, you could also use a probabilistic approach like the Kullback-Leibler distance or chi-squared statistic (to test if columns are drawn from the same distribution).

This would be the way if I had not a substitution matrix. I really cannot see how to take into account those similarities between characters since all these methods actually assume that every character is only similar to itself, or am I wrong?

1

You could use your substitution matrix to weight the differences in PWM1-PWM2. Alternatively, have a look at the method in this paper: Pape UJ, Rahmann S, Vingron M. Natural similarity measures between position frequency matrices with an application to clustering. Bioinformatics. 2008 Feb 1;24(3):350-7.