matchPWM problem, PSSM
8.1 years ago

Hi

I need to compare a motif to genomic regions using matchPWM(PWM, DNAstring) and get scores.

I have MEME-output:

Motif 1 position-specific scoring matrix

log-odds matrix: alength= 4 w= 21 n= 47985 bayes= 11.8274 E= 5.3e-108
-123   -290    159   -149
-36    -44  -1250    136
-36   -390    162  -1250
-1250  -1250     -9    164
-64   -158    123    -64
-149    -44  -1250    158
-181  -1250    174   -181
-381   -109    -44    147
-11   -390    156  -1250
-101    -90   -190    147
-281   -190    162   -123
-223   -231   -290    183
-36  -1250    165  -1250
-1250  -1250  -1250    204
-36    -90    127   -223
-101   -231    -73    147
-101   -109    153   -381
-281   -231  -1250    191
-1250   -390    193  -1250
-1   -131  -1250    143
-64  -1250    149   -101


my code:

> A <- c(-123,-36,-36,-1250,-64,-149,-181,-381,-11,-101,-281,-223,-36,-1250,-36,-101,-101,-281,-1250,-1,-64)
> C<-c(-290,-44,-390,-1250,-158,-44,-1250,-109,-390,-90,-190,-231,-1250,-1250,-90,-231,-109,-231,-390,-131,-1250)
> G<-c(159,-1250,162,-9,123,-1250,174,-44,156,-190,162,-290,165,-1250,127,-73,153,-1250,193,-1250,149)
> T<-c(-149,136,-1250,164,-64,158,-181,147,-1250,147,-123,183,-1250,204,-223,147,-381,191,-1250,143,-101)
> df <- rbind(A,C,G,T)
> pm_matrix<-data.matrix(df)
> print(mcols(matchPWM(pm_matrix, seq[[295]], min.score="80%", with.score=TRUE))\$score)
[1] 2696 3061 3343 3343 3343 3343 3199 2871


seq[[295]] is a 141-letter DNAString subject

why scores are four-digit number? I expected floats from 0.8 to 1

8.1 years ago

I found a solution, a PWM should be transformed with function unitScale to get scores in percentage:

pwm_matrix <- unitScale(pwm_matrix)


(The modified numeric matrix given by (x - minScore(x)/ncol(x))/(maxScore(x) - minScore(x)) for unitScale).