Statistical Comparison Between Position-Specific Scoring Matrices
3
2
Entering edit mode
13.1 years ago
Anima Mundi ★ 2.9k

Hello,

I would like to know how to make statistical compares between similar PSSMs (e. g. two SRY matrices obtained in different ways) estimating the significance of the observed differences. If we took, for example, a TRANSFAC matrix, we could make an ANOVA analysis between columns, but there would be the problem of the correlation between elements on a single line, and thus we should use some method to consider this matter (e. g. a Pearson correlation coefficient). Is there an efficient way to sort it out?

pssm matrix comparison • 5.3k views
ADD COMMENT
0
Entering edit mode

I don't think that will work, because a PSSM is not a replicated experiment, so you can't use anova. But please provide an example.

ADD REPLY
0
Entering edit mode

We could consider the numbers in the same column of each matrix as a replication. If, for example, we had two JASPAR matrices, the 1th with 8, 1, 0, 0 and the 2nd whit 7, 2, 0, 0 there would be a a greater variance in the 1th one. We also should somehow consider the information derived from a correlation study highlighting the relationships between basis on the binding site.

ADD REPLY
5
Entering edit mode
13.1 years ago

While I share you concerns about associations between columns in a PSSM, weight matrix based approaches assume independence of columns in the matrix. Thus, if you are using a PSSM representation of protein-DNA recognition, then you are implicitly assuming independence in your model and therefore I don't see that it necessary to account for non-independence among columns when testing the difference of two PSSMs. Whether or not independence (=additivity) is a good assumption is a matter for debate, but Stormo and colleagues state:

We conclude that despite the fact that the additivity assumption does not fit the data perfectly, in most cases it provides a very good approximation of the true nature of the specific protein–DNA interactions.

If you are OK with the additivity assumption, then there are a number of methods to compute the difference between two PSSMs, either to find matches in a PSSM database or to detect significant similarities between PSSMs. I'd recommend using the Noble Lab's Tomtom which is part of the MEME suite, which implements most of the common PSSM difference measures and computes P-values (See also: paper here).

ADD COMMENT
0
Entering edit mode

Thanks, so I think I could simplify the problem assuming additivity. I already use TomTom, in effect if correlation doesn't matter this solution fits my scope.

ADD REPLY
2
Entering edit mode
13.1 years ago
Michael 54k

I think you thinking to complicated about this comparison, because, PSSMs are matrices. The difference between two matrices A,B is, well ofc, the difference between the matrices (A-B), and is itself a matrix. The more the matrix is different from the 0-matrix, the more different the matrices, is this clear?

A PSSM contains log-likelihoods, now the difference between two log-likelihoods should itself be a log likelihood. Now, a likelihood-ratio test between the two matrices is what comes to my mind. But please discuss with somebody else before you build something mission-critical on this.

ADD COMMENT
0
Entering edit mode

Thank you too. You are right, I made it too complicated (it could be interesting to solve the correlation problem, but for now I am mainly interested in typical solutions). The difference method is clear, and it is interesting. Also the likelihood-ratio test seems to help, even if I have to clarify myself its correct usage on PSSMs. For what regards further discussions, I have planned a meeting with a specialist of statistical problems related to matrices (I have been lucky in finding such a suitable expert!).

ADD REPLY
1
Entering edit mode
13.1 years ago
Woa ★ 2.9k

If I can correctly remember BLOSUM and PAM series of matrices are compared for equivalence by their 'Relative Entropy'. Maybe you can use the same method with some tweaks if necessary

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks. You seem to refer as something like "Altschul, S.F. (1991) "Amino acid substitution matrices from an information theoretic perspective." J. Mol. Biol. 219:555-565", included in BLAST's references. Unfortunately I am not able to understand properly this method, maybe I got involved in a problem which is too far from my tasks...

ADD REPLY

Login before adding your answer.

Traffic: 1821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6