A mismatch between DiffBind's results and the same results retreieved as DESeq2 (from DiffBind)
1
0
Entering edit mode
8 months ago
Sam ▴ 170

I retrieve DiffBind results as a DESeq2 object, and then look how many differentially bound regions are within it. When doing this, I get a different number of DB regions compared to the original DiffBind object.

library(DESeq2)
library(DiffBind)
data(tamoxifen_counts)
tamoxifen <- dba.contrast(tamoxifen, design="~Tissue")
tamoxifen <- dba.contrast(tamoxifen, group1=tamoxifen$masks$MCF7, group2=tamoxifen$masks$BT474, name1="MCF7", name2="BT474")
tamoxifen <- dba.analyze(tamoxifen)
dds <- dba.analyze(tamoxifen,bRetrieveAnalysis = DBA_DESEQ2)

res <- results(dds, independentFiltering=F, contrast=c("Tissue","MCF7","BT474"))

dba.show(tamoxifen,bContrasts = T,th=0.1)
summary(res)

sessionInfo( )

Output :

   Group Samples Group2 Samples2 DB.DESeq2
1  MCF7       5  BT474        2       954

out of 2845 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 738, 26%
LFC < 0 (down)     : 437, 15%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_IL.UTF-8       LC_NUMERIC=C               LC_TIME=en_IL.UTF-8       
 [4] LC_COLLATE=en_IL.UTF-8     LC_MONETARY=en_IL.UTF-8    LC_MESSAGES=en_IL.UTF-8   
 [7] LC_PAPER=en_IL.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_IL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods  
[10] base     

other attached packages:
 [1] DiffBind_3.0.13             rgl_0.105.13                limma_3.46.0               
 [4] DESeq2_1.30.1               SummarizedExperiment_1.20.0 Biobase_2.50.0             
 [7] MatrixGenerics_1.2.1        matrixStats_0.58.0          GenomicRanges_1.42.0       
[10] GenomeInfoDb_1.26.2         IRanges_2.24.1              S4Vectors_0.28.1           
[13] BiocGenerics_0.36.0         RColorBrewer_1.1-2          pheatmap_1.0.12            
[16] ggrepel_0.9.1               ggplot2_3.3.3              

loaded via a namespace (and not attached):
  [1] GOstats_2.56.0           backports_1.2.1          BiocFileCache_1.14.0    
  [4] plyr_1.8.6               GSEABase_1.52.1          splines_4.0.4           
  [7] BiocParallel_1.24.1      crosstalk_1.1.1          amap_0.8-18             
 [10] digest_0.6.27            invgamma_1.1             htmltools_0.5.1.1       
 [13] GO.db_3.12.1             SQUAREM_2021.1           fansi_0.4.2             
 [16] magrittr_2.0.1           checkmate_2.0.0          memoise_2.0.0           
 [19] BSgenome_1.58.0          base64url_1.4            Biostrings_2.58.0       
 [22] annotate_1.68.0          systemPipeR_1.24.3       bdsmatrix_1.3-4         
 [25] askpass_1.1              prettyunits_1.1.1        jpeg_0.1-8.1            
 [28] colorspace_2.0-0         apeglm_1.12.0            blob_1.2.1              
 [31] rappdirs_0.3.3           xfun_0.21                dplyr_1.0.4             
 [34] crayon_1.4.1             RCurl_1.98-1.2           jsonlite_1.7.2          
 [37] graph_1.68.0             genefilter_1.72.1        brew_1.0-6              
 [40] survival_3.2-7           VariantAnnotation_1.36.0 glue_1.4.2              
 [43] gtable_0.3.0             zlibbioc_1.36.0          XVector_0.30.0          
 [46] webshot_0.5.2            DelayedArray_0.16.1      V8_3.4.0                
 [49] Rgraphviz_2.34.0         scales_1.1.1             mvtnorm_1.1-1           
 [52] DBI_1.1.1                edgeR_3.32.1             miniUI_0.1.1.1          
 [55] Rcpp_1.0.6               emdbook_1.3.12           xtable_1.8-4            
 [58] progress_1.2.2           bit_4.0.4                rsvg_2.1                
 [61] truncnorm_1.0-8          AnnotationForge_1.32.0   htmlwidgets_1.5.3       
 [64] httr_1.4.2               gplots_3.1.1             ellipsis_0.3.1          
 [67] pkgconfig_2.0.3          XML_3.99-0.5             sass_0.3.1              
 [70] dbplyr_2.1.0             locfit_1.5-9.4           utf8_1.1.4              
 [73] tidyselect_1.1.0         rlang_0.4.10             manipulateWidget_0.10.1 
 [76] later_1.1.0.1            AnnotationDbi_1.52.0     munsell_0.5.0           
 [79] tools_4.0.4              cachem_1.0.4             cli_2.3.0               
 [82] generics_0.1.0           RSQLite_2.2.3            evaluate_0.14           
 [85] stringr_1.4.0            fastmap_1.1.0            yaml_2.2.1              
 [88] knitr_1.31               bit64_4.0.5              caTools_1.18.1          
 [91] purrr_0.3.4              RBGL_1.66.0              mime_0.10               
 [94] xml2_1.3.2               biomaRt_2.46.3           compiler_4.0.4          
 [97] rstudioapi_0.13          curl_4.3                 png_0.1-7               
[100] tibble_3.0.6             geneplotter_1.68.0       bslib_0.2.4             
[103] stringi_1.5.3            GenomicFeatures_1.42.1   lattice_0.20-41         
[106] Matrix_1.3-2             vctrs_0.3.6              pillar_1.5.0            
[109] lifecycle_1.0.0          jquerylib_0.1.3          irlba_2.3.3             
[112] data.table_1.14.0        bitops_1.0-6             httpuv_1.5.5            
[115] rtracklayer_1.50.0       R6_2.5.0                 latticeExtra_0.6-29     
[118] hwriter_1.3.2            promises_1.2.0.1         ShortRead_1.48.0        
[121] KernSmooth_2.23-18       MASS_7.3-53.1            gtools_3.8.2            
[124] assertthat_0.2.1         openssl_1.4.3            Category_2.56.0         
[127] rjson_0.2.20             withr_2.4.1              GenomicAlignments_1.26.0
[130] batchtools_0.9.15        Rsamtools_2.6.0          GenomeInfoDbData_1.2.4  
[133] hms_1.0.0                coda_0.19-4              DOT_0.1                 
[136] rmarkdown_2.7            GreyListChIP_1.22.0      ashr_2.2-47             
[139] mixsqp_0.3-43            bbmle_1.0.23.1           numDeriv_2016.8-1.1     
[142] shiny_1.6.0

954 != 738+437 ...

Note: crossposted at Bioconductor, but no answer received so far.

DiffBind • 852 views
ADD COMMENT
0
Entering edit mode

Independent filtering?

ADD REPLY
0
Entering edit mode

I fetched the results from the DESeq object with independent filtering set to FALSE.

res <- results(dds, independentFiltering=F, contrast=c("Tissue","MCF7","BT474"))

ADD REPLY
0
Entering edit mode

Yes, this is why I am asking. Does DiffBind do the same?

ADD REPLY
0
Entering edit mode

Thanks. I'm not sure what are the defaults of DiffBind, but DESEQ results with independentFiltering=T does not coincide with DiffBind results. Using the same code as above :

res <- results(dds, independentFiltering = T, contrast=c("Tissue","MCF7","BT474"))
summary(res)
out of 2845 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 738, 26%
LFC < 0 (down)     : 437, 15%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 4)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Even more importantly, the fold changes and (the p-values) between the two runs are different (so its certainly not an issue of independent filtering)

> peak="1"
> dba.report(tamoxifen,th = 1,bNormalized = T,contrast=1,bCounts=T)[peak,]
GRanges object with 1 range and 13 metadata columns:
    seqnames      ranges strand |      Conc
       <Rle>   <IRanges>  <Rle> | <numeric>
  1    chr18 90841-91241      * |         0
    Conc_MCF7 Conc_BT474      Fold   p-value
    <numeric>  <numeric> <numeric> <numeric>
  1         0    0.88128  -0.88128  0.373987
          FDR     MCF71     MCF72     MCF73
    <numeric> <numeric> <numeric> <numeric>
  1         1         0         0         0
       MCF7r1    MCF7r2    BT4741    BT4742
    <numeric> <numeric> <numeric> <numeric>
  1         0         0      2.47      1.21
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> counts(dds,normalized=T)[peak,]
    BT4741     BT4742      MCF71      MCF72 
 2.4692546  1.2147633  0.0000000  0.0000000 
     MCF73      T47D1      T47D2     MCF7r1 
 0.0000000  0.0000000  0.5461769  0.0000000 
    MCF7r2      ZR751      ZR752 
 0.0000000 25.0813852 10.9513965 


> res[peak,]
log2 fold change (MLE): Tissue MCF7 vs BT474 
Wald test p-value: Tissue MCF7 vs BT474 
DataFrame with 1 row and 6 columns
   baseMean log2FoldChange     lfcSE      stat
  <numeric>      <numeric> <numeric> <numeric>
1   3.66027       -2.52726   1.28297  -1.96985
     pvalue      padj
  <numeric> <numeric>
1 0.0488553  0.114492

You can see from the normalized counts that this is the same gene. It has different fold values at the DESeq and at the DiffBind results.

I have opened a new post on that at BioConductor.

ADD REPLY
0
Entering edit mode

Please don't delete posts that have already received comments/answers.

ADD REPLY
2
Entering edit mode
8 months ago
Sam ▴ 170

Answered by DiffBind developers. The issue was a subtle error in setting the contrast for DiffBind.

ADD COMMENT

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6