Tool:DiffBind 3.0: Extensive updates in Bioconductor 3.12
0
9
Entering edit mode
10 months ago
Rory Stark ★ 1.2k

As part of the latest Bioconductor release 3.12, DiffBind 3.0 includes extensive updates of which users should be aware. The purpose of these updates, reflecting user comments and requests, is to provide more power and control to users, while incorporating up-to-date methods and utilizing knowledge and experience gained in the 10 years since DiffBind was first written.

The main updates are in the areas of modelling, analysis, normalization, and blacklists, as follows:

Modelling: DiffBind now supports modelling using arbitrary design formulas, including multi-factor designs with any combination of metadata factors. Contrasts can be specified in a variety of ways to be evaluated against the design. All sample data in the experiment is incorporated into a single model against which contrasts are evaluated (previously, each contrast was handled in a separate model with only the samples directly involved in the contrast).

Analysis: More standardized usage of the underlying edgeR and DESeq2 packages is implemented. The global objects for these analyses can be easily extracted for fine-grained control over the analysis. A default analysis will be completed from any starting point, including only specifying a samplesheet, completing any of the loading, blacklist/greylisting, consensus, counting, modelling, and analyzing steps.

Normalization: Having identified normalization as a key step in a successful differential binding analysis, normalization options have been split out into a new interface function dba.normalize() to provide fine-grained control. Normalizing against background reads is supported (using functions from the csaw package), as well as support for offsets (e.g. loess fit), exogenous spike-ins, and "parallel factor" normalization. An extensive section on normalization has been added to the vignette examining the impact of the various normalization options.

Blacklists and Greylists: New interface function dba.blacklist() applies ENCODE blacklists by default if the (automatically detected) genome is supported. Greylists derived from experiment-specific controls are also supported, with automatic generation of greylists implemented using the GreyListChIP package.

There are many more changes beyond those listed here. Please see the NEWS file for a more detailed list of changes to functionality and default behavior, as well as the re-vamped vignette. The key changes to defaults will be added to the ?DiffBind3 help page and are listed at the end of this message.

I will be monitoring the support forum as usual to help any users encountering issues in using the new version.

Regards- Rory

Note on Backward compatibility: While efforts have been made to maintain backward compatibility for existing users' scripts and data objects, certain issues may arise. Existing scripts should still run but will use the updated methods unless dba.contrast()is called with design=FALSE. Data objects stored using dba.save() will automatically be updated and run in backward-compatibility mode. See the help page ?DiffBind3 for more discussion of backward compatibility issues.

Changes in default settings: 1. blacklist is applied by default, if available, using automatic detection of reference genome.

1. greylists are generated from controls and applied by default.
2. minimum read counts are now 0 instead of being rounded up to 1 (this is now controllable).
3. centering peaks around summits is now done by default using 401-bp wide peaks (recommend to use summits=100 for ATAC-seq).
4. read counting is now performed by summarizeOverlaps() by default, with single-end/paired-end counting automatically detected.
5. filtering is performed by default; consensus peaks where no peak has more than five reads in any sample are filtered.
6. control read subtraction is now turned off by default if a greylist is present
7. normalization is based on full library sizes by default for both edgeR and DESeq2analyses.
8. score is set to normalized values by default.
ChIP-Seq DiffBind ATAC-seq chipseq diffbind Tool • 1.2k views
0
Entering edit mode

Hi Rory, This is an important update. I do need to ask how can we implement an DB_BLACKLIST_38? by now all my data are for hg38 and this important feature is unusable!

0
Entering edit mode

I noticed the omission this morning and checked in a fix earlier today, exporting DBA_BLACKLIST_HG38 as documented on the help page for dba.blacklist(). The fix will appear in the next update as DiffBind_3_0_2 in the next day or so.

In the current version, if you run with the default blacklist=TRUE, the correct reference genome should automatically be detected, or else you can specify blacklist="BSgenome.Hsapiens.UCSC.hg38" (which is the actual value of DBA_BLACKLIST_HG38).

0
Entering edit mode

Perfect, many thanks!

0
Entering edit mode

Hi Rory, wonderful tool! I'm so happy to see it updated. I do have a question regarding the new setting that bUseSummarizeOverlaps is set to TRUE by default. I notice that in the manual,

Note that if summits is greater than zero, the counting procedure will take twice as long, and bUseSummarizeOverlaps must be FALSE.

Does the statement still hold true in the updated version?

Also, I am analyzing pair-ended ATAC-seq data. I guess with the updated DiffBind, I no longer need to set DBA$config$singleEnd = F, and no need to set DBA$config$fragmentSize neither, right? As the counting process will be dealing with fragments instead of reads automatically, and no read extension is performed. Is it correct?

Thanks again for this wonderful tool! It has really helped me a lot!

0
Entering edit mode

You are correct on all fronts.

It is no longer the case that bUseSummarizeOverlaps need be FALSE if summits is not FALSE. I will update the help page for dba.count() to reflect this (thanks for the catch).

dba.count() now automatically detects if the reads are paired-end or single-end, so you no longer need to set DBA$config$singleEnd. Note that all the reads in all the samples must be either single-end or paired-end.

If the data are paired-end, DBA$config$fragmentSize will be ignored.

0
Entering edit mode

Awesome! Thank you so much!

0
Entering edit mode

Thank you very much for these updates. I have a follow-up question regarding DESeq2/edgeR functionality within DiffBind. Both DESeq2 and edgeR have options for testing multiple levels of a factor (similar to one-way ANOVA), which when used with k-means or hierarchical clustering, is useful for detecting patterns of differential genes/peaks across multiple conditions. DESeq2 provides the 'LRT' test, and edgeR allows users to specify multiple coefficients in the glmQLFTest. Can either of these approaches be implemented in the new version of DiffBind? Thanks!