Question: Am I correctly converting MotifDb PFMs to PWMs?
0
gravatar for chrisclarkson100
2.2 years ago by
European Union
chrisclarkson10090 wrote:

I apologise if some version of this question has already been asked in some way- I could not find any issues that were similar to my own.

I have a choice between taking data from the datasource package 'MotifDb' and 'JASPAR2018':

MotifDb has a much larger wealth of PFMs:

library (MotifDb)
library("BSgenome.Mmusculus.UCSC.mm9")
tf_motif_hoco=subset(MotifDb, organism=='Mmusculus')
length(tf_motif_hoco)
1411

JASPAR database is much smaller:

library(JASPAR2018)
opts = list(species=10090)
jaspar_data = getMatrixSet(JASPAR2018, opts)
length(jaspar_data)
116

Hence in order to maximise the amount of data that I can access, I'd prefer to go with the MotifDb option.

However when I convert the PFM to a PWM the conversion looks slightly different between 2 corresponding searches from the databases.

See below how I take the same PFM from the JASPAR 2018 part of MotifDb and the actual JASPAR2018 database and compare:

library (MotifDb)
library(TFBSTools)
tf_motif_hoco=subset(MotifDb, organism=='Mmusculus' & dataSource=='jaspar2018')
pfm=tf_motif_hoco[[1]] #take first search result- ARNT

pfm

       1       2    3    4    5    6
A    0.2    0.95    0    0    0    0
C    0.8    0.00    1    0    0    0
G    0.0    0.05    0    1    0    1
T    0.0    0.00    0    0    1    0

pfm_rounded=round(pfm*1000)
pfm_rounded=apply(pfm_rounded,2,as.integer)
rownames(pfm_rounded)=c('A','C','G','T')
pwm=toPWM(pfm_rounded)

And now the corresponding process in JASPAR:

library(JASPAR2018)
opts = list(species=10090)

pfm <- getMatrixSet(JASPAR2018, opts)

pfm

 [,1] [,2] [,3] [,4] [,5] [,6]

A    4   19    0    0    0    0

C   16    0   20    0    0    0

G    0    1    0   20    0   20

T    0    0    0    0   20    0

You can see that the above printed pfms are the same except for the fact that the values in the one that came from MotifDB are 1/20 that of Jaspar.

I made a mistake when originally writing a script that took all of the PFMs from MotifDB and implementing further analysis- before converting them to PWMs I multiplied the values by 1000 rather than 20... Hence when I compare the PWMs from the different sources are different.... The values are similar but not equal... e.g.

JASPAR2018

     [,1]      [,2]      [,3]      [,4]      [,5]      [,6]


A -0.3081223  1.884523 -4.700440 -4.700440 -4.700440 -4.700440


C  1.6394103 -4.700440  1.957772 -4.700440 -4.700440 -4.700440


G -4.7004397 -2.115477 -4.700440  1.957772 -4.700440  1.957772


T -4.7004397 -4.700440 -4.700440 -4.700440  1.957772 -4.700440

MotifDb

     1          2          3          4          5          6


A  -0.3216398   1.925149 -10.288866 -10.288866 -10.288866 -10.288866


C   1.6772788 -10.288866   1.999135 -10.288866 -10.288866 -10.288866


G -10.2888661  -2.317323 -10.288866   1.999135 -10.288866   1.999135


T -10.2888661 -10.288866 -10.288866 -10.288866   1.999135 -10.288866

The script that I originally used took a really long time to run the first time so I would prefer not to have to run this script again and just use the output that was found the first time...

Also as a wider issue- can I be sure that all PFMs from MotifDb are only need to be multiplied by 20 in order to be convertible to a PWM?

I tried to get hold of a pair of PFMs from HOCOMOCO and from the HOCOMOCO section of MotifDb in order to make the same comparison as above but HOCOMOCO has updated to version 11 while the corresponding PFMs from MotifDb have not and hence they are not comparable...

pfm sequence pwm • 1.0k views
ADD COMMENTlink written 2.2 years ago by chrisclarkson10090
1

Commenting on the choice of motifs: I personally do not see the point in specifying an organism when fetching motifs from JASPAR. Motifs are evolutionarily highly conserved sequences, so I personally use the vertebrate core collection for analysis in mammals (in my case human and mouse).

I usually use in R:

require(JASPAR2018)
require(TFBSTools)
jaspar <- function (collection = "CORE", ...) 
{
  opts <- list()
  opts["tax_group"] <- "vertebrates"
  opts["collection"] <- collection
  opts <- c(opts, list(...))
  out <- TFBSTools::getMatrixSet(JASPAR2018::JASPAR2018, opts)
  if (!isTRUE(all.equal(TFBSTools::name(out), names(out)))) 
    names(out) <- paste(names(out), TFBSTools::name(out), 
                        sep = "_")
  return(out)
}
motifs_JASPAR2018 <- jaspar()
ADD REPLYlink written 2.2 years ago by ATpoint44k

Hi, If you are interested, you can access the old hocomoco version by the reference http://hocomoco10.autosome.ru

ADD REPLYlink written 2.2 years ago by vsevolod.makeev0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2456 users visited in the last hour
_