Multi-factor designs in DiffBind
Entering edit mode
11 weeks ago

Dear all,

I'm trying to analyze some ChIPseq data using DiffBind. Samples has been processed in two different times

SampleID Name Factor   IP Condition Replicates
5 PBS_Pol2   S3 Batch1 Pol2       PBS          1
6 PBS_Pol2   S9 Batch2 Pol2       PBS          2
7 C26_Pol2   S4 Batch1 Pol2       C26          1
8 C26_Pol2  S10 Batch2 Pol2       C26          2

I'm using the following code

dbObjprova <- dba(sampleSheet=samplesdbprova)
tamoxifen <- dba.count(dbObjprova)
tamoxifen <- dba.normalize(tamoxifen)
tamoxifen <- dba.contrast(tamoxifen,design="~Factor + Condition")
tamoxifen <- dba.analyze(tamoxifen)

however dba.analyze give me the following error:

tamoxifen <- dba.analyze(tamoxifen)
Applying Blacklist/Greylists...
Genome detected: Mmusculus.UCSC.mm10
Applying blacklist...
Removed: 142 of 3874 intervals.
Removed 142 (of 3874) consensus peaks.
Normalize DESeq2 with defaults...
Forming default model design and contrast(s)...
Adding contrasts(s)...
Analyze error: Error in pv.DBA(DBA, method, bTagwise = bTagwise, minMembers = 3, bParallel = bParallel): Unable to perform analysis: no contrasts specified.

Unable to complete analysis.
Warning messages:
1: No contrasts added. There must be at least two sample groups with at least three replicates. 
2: No contrasts added. There must be at least two sample groups with at least three replicates.

How I can solve this error? Thanks in advance


Chipseq Multi-factor Diffbind • 629 views
Entering edit mode
11 weeks ago
ATpoint 80k

Read the error: There must be at least two sample groups with at least three replicates. Yet, you only have n=2 per group.

See for example DiffBind error: No contrasts added. There must be at least two sample groups with at least three replicates.

Entering edit mode

Hi ATpoint,

Thanks for your quick answer.

So, If I understand correctly, in this condition and using DiffBind, I can't take into account the batch effect during differential analysis.

I have also try to analyze this data using EdgeR, but now I'm wondering if this is correct (sorry but I can't understand the math behind and for this reason I'm always wondering if I'm doing the right things).

Can I ask your opinion about this second approach?

After retrieve the count in each regions i have analyzed using the following code

x <- read.delim("./Count.txt", stringsAsFactors=FALSE, header = TRUE)
DG <- readDGE(x ,header=FALSE)
matrix= DG$counts
DG <- DGEList(matrix, group= as.character(x$group))
DG$samples$name= x$description
DG$samples$Batch= x$Batch

DG <- calcNormFactors(DG, method = "TMM") #samples are normalized using TMM method
logCPM <- cpm(DG, log=TRUE)

design= model.matrix(~0+ x$Batch + x$group)
y <- estimateDisp(DG, design)
fit <- glmFit(y,design)
results=  glmLRT(fit)
results$FDR= p.adjust(results$PValue, method = "BH")

In any case thanks a lot for your help


Entering edit mode
10 weeks ago
Rory Stark ★ 2.0k

You can perform an analysis with fewer than three replicates per group by specifying the contrasts explicitly in the call to dba.contrast(). You may also need to set minMembers=2.

Entering edit mode

Hi Rory,

Thanks a lot for your comment.

Up to now I have performed the contrast in an explicitly way (as you suggest) using this command

dbObj <- dba.contrast(dbObj, 
                      design=FALSE, contrast=FALSE, 
                      dbObj$masks$C26,  #Treated Group
                      dbObj$masks$PBS,  #Control Group
                                    "C26", "PBS")

giving me ~80 differential enriched site (not bad as results).

However, I know that a batch effect is present among replicates, and I would to take into account it (related info are stored in Factor column),

For that reason I have try to use

dbObj <- dba.contrast(dbObj, design="~Factor + Condition")

So, can I make a differential analysis, taking into account the batch effect and having only two replicates for each conditions?

Thanks a lot in advance

Entering edit mode

Please do not add answers unless you're answering the top level question. Instead, use Add Comment or Add Reply as appropriate. I've moved your post to the right location this time, please be more careful in the future.

Another moderator had to do this the last time, so please pay more attention when you use the site.

EDIT: You did the exact same thing with another answer from Rory 2+ years ago (that I just fixed). Please learn to use the forum better.

Entering edit mode

You should specify the contrast itself:

 tamoxifen <- dba.contrast(tamoxifen,
                           design="~Factor + Condition", 
                           contrast=c("Condition", "PBS", "C26"))

You don't actually need the Factor component as you already have the information in Replicate:

 tamoxifen <- dba.contrast(tamoxifen,
                           design="~Replicate + Condition", 
                           contrast=c("Condition", "PBS", "C26"))
Entering edit mode

Dear Rory, thanks a lot for your help.

It's work, indeed now I'm able to find (as expected) more DB. The only things "strange" is that I have high difference in Deseq2 and EdgeR. I know that the two methods differs in several step and give different results, however I have 87 vs 1098 DB respectively, supposed to be too huge different.

here the code

samplesdbprova <- samplesdbprova[grepl("Pol2", samplesdbprova$IP), ]
dbObjprova <- dba(sampleSheet=samplesdbprova)

dbob.consensus <- dba.peakset(dbObjprova,consensus = DBA_CONDITION, minOverlap=2),dbob.consensus$masks$Consensus)
consensus <- dba.peakset(dbob.consensus, bRetrieve=TRUE,
                         peaks=dbob.consensus$masks$Consensus, minOverlap=1)


dbObjprova_bkg_batch <- dba.normalize(Count)

dbObjprova_bkg_batch <- dba.contrast(dbObjprova_bkg_batch,
                           design="~Replicate + Condition", minMembers =2, 
                           contrast=c("Condition", "PBS", "C26"))

#DB analysis and report
dbObj_region_analysis_batch <- dba.analyze(dbObjprova_bkg_batch, method=DBA_ALL_METHODS), bContrasts=TRUE)

# 4 Samples, 3732 sites in matrix:
#        ID Factor Condition Replicate    Reads FRiP
# 1 PBS_Pol2 Batch1       PBS         1 19313327 0.01
# 2 PBS_Pol2 Batch2       PBS         2  8458898 0.01
# 3 C26_Pol2 Batch1       C26         1 18630636 0.01
# 4 C26_Pol2 Batch2       C26         2  6464943 0.01

# Design: [~Replicate + Condition] | 1 Contrast:
#    Factor Group Samples Group2 Samples2 DB.edgeR DB.DESeq2
# 1 Condition   PBS       2    C26        2     1098        87

dba.plotVenn(dbObj_region_analysis_batch,contrast=1,method=DBA_ALL_METHODS, bDB=TRUE)

If you have other suggestion, please let me know For now, Thanks a lot for your precious help


enter image description here


Login before adding your answer.

Traffic: 2552 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6