Am I correctly converting MotifDb PFMs to PWMs?
0
0
Entering edit mode
3.5 years ago

I apologise if some version of this question has already been asked in some way- I could not find any issues that were similar to my own.

I have a choice between taking data from the datasource package 'MotifDb' and 'JASPAR2018':

MotifDb has a much larger wealth of PFMs:

library (MotifDb)
library("BSgenome.Mmusculus.UCSC.mm9")
tf_motif_hoco=subset(MotifDb, organism=='Mmusculus')
length(tf_motif_hoco)
1411


JASPAR database is much smaller:

library(JASPAR2018)
opts = list(species=10090)
jaspar_data = getMatrixSet(JASPAR2018, opts)
length(jaspar_data)
116


Hence in order to maximise the amount of data that I can access, I'd prefer to go with the MotifDb option.

However when I convert the PFM to a PWM the conversion looks slightly different between 2 corresponding searches from the databases.

See below how I take the same PFM from the JASPAR 2018 part of MotifDb and the actual JASPAR2018 database and compare:

library (MotifDb)
library(TFBSTools)
tf_motif_hoco=subset(MotifDb, organism=='Mmusculus' & dataSource=='jaspar2018')
pfm=tf_motif_hoco[[1]] #take first search result- ARNT

pfm

1       2    3    4    5    6
A    0.2    0.95    0    0    0    0
C    0.8    0.00    1    0    0    0
G    0.0    0.05    0    1    0    1
T    0.0    0.00    0    0    1    0

pfm_rounded=round(pfm*1000)
pfm_rounded=apply(pfm_rounded,2,as.integer)
rownames(pfm_rounded)=c('A','C','G','T')
pwm=toPWM(pfm_rounded)


And now the corresponding process in JASPAR:

library(JASPAR2018)
opts = list(species=10090)

pfm <- getMatrixSet(JASPAR2018, opts)


pfm

 [,1] [,2] [,3] [,4] [,5] [,6]

A    4   19    0    0    0    0

C   16    0   20    0    0    0

G    0    1    0   20    0   20

T    0    0    0    0   20    0


You can see that the above printed pfms are the same except for the fact that the values in the one that came from MotifDB are 1/20 that of Jaspar.

I made a mistake when originally writing a script that took all of the PFMs from MotifDB and implementing further analysis- before converting them to PWMs I multiplied the values by 1000 rather than 20... Hence when I compare the PWMs from the different sources are different.... The values are similar but not equal... e.g.

JASPAR2018

     [,1]      [,2]      [,3]      [,4]      [,5]      [,6]

A -0.3081223  1.884523 -4.700440 -4.700440 -4.700440 -4.700440

C  1.6394103 -4.700440  1.957772 -4.700440 -4.700440 -4.700440

G -4.7004397 -2.115477 -4.700440  1.957772 -4.700440  1.957772

T -4.7004397 -4.700440 -4.700440 -4.700440  1.957772 -4.700440


MotifDb

     1          2          3          4          5          6

A  -0.3216398   1.925149 -10.288866 -10.288866 -10.288866 -10.288866

C   1.6772788 -10.288866   1.999135 -10.288866 -10.288866 -10.288866

G -10.2888661  -2.317323 -10.288866   1.999135 -10.288866   1.999135

T -10.2888661 -10.288866 -10.288866 -10.288866   1.999135 -10.288866


The script that I originally used took a really long time to run the first time so I would prefer not to have to run this script again and just use the output that was found the first time...

Also as a wider issue- can I be sure that all PFMs from MotifDb are only need to be multiplied by 20 in order to be convertible to a PWM?

I tried to get hold of a pair of PFMs from HOCOMOCO and from the HOCOMOCO section of MotifDb in order to make the same comparison as above but HOCOMOCO has updated to version 11 while the corresponding PFMs from MotifDb have not and hence they are not comparable...

sequence PWM PFM • 1.7k views
1
Entering edit mode

Commenting on the choice of motifs: I personally do not see the point in specifying an organism when fetching motifs from JASPAR. Motifs are evolutionarily highly conserved sequences, so I personally use the vertebrate core collection for analysis in mammals (in my case human and mouse).

I usually use in R:

require(JASPAR2018)
require(TFBSTools)
jaspar <- function (collection = "CORE", ...)
{
opts <- list()
opts["tax_group"] <- "vertebrates"
opts["collection"] <- collection
opts <- c(opts, list(...))
out <- TFBSTools::getMatrixSet(JASPAR2018::JASPAR2018, opts)
if (!isTRUE(all.equal(TFBSTools::name(out), names(out))))
names(out) <- paste(names(out), TFBSTools::name(out),
sep = "_")
return(out)
}
motifs_JASPAR2018 <- jaspar()

0
Entering edit mode

Hi, If you are interested, you can access the old hocomoco version by the reference http://hocomoco10.autosome.ru