Question

Multi-factor designs in DiffBind

0

Entering edit mode

7 months ago

g.persic1991 • 0

Dear all,

I'm trying to analyze some ChIPseq data using DiffBind. Samples has been processed in two different times

SampleID Name Factor   IP Condition Replicates
5 PBS_Pol2   S3 Batch1 Pol2       PBS          1
6 PBS_Pol2   S9 Batch2 Pol2       PBS          2
7 C26_Pol2   S4 Batch1 Pol2       C26          1
8 C26_Pol2  S10 Batch2 Pol2       C26          2

I'm using the following code

dbObjprova <- dba(sampleSheet=samplesdbprova)
tamoxifen <- dba.count(dbObjprova)
tamoxifen <- dba.normalize(tamoxifen)
tamoxifen <- dba.contrast(tamoxifen,design="~Factor + Condition")
tamoxifen <- dba.analyze(tamoxifen)

however dba.analyze give me the following error:

tamoxifen <- dba.analyze(tamoxifen)
Applying Blacklist/Greylists...
Genome detected: Mmusculus.UCSC.mm10
Applying blacklist...
Removed: 142 of 3874 intervals.
Removed 142 (of 3874) consensus peaks.
Normalize DESeq2 with defaults...
Forming default model design and contrast(s)...
Analyzing...
Adding contrasts(s)...
Analyze error: Error in pv.DBA(DBA, method, bTagwise = bTagwise, minMembers = 3, bParallel = bParallel): Unable to perform analysis: no contrasts specified.

Unable to complete analysis.
Warning messages:
1: No contrasts added. There must be at least two sample groups with at least three replicates. 
2: No contrasts added. There must be at least two sample groups with at least three replicates.

How I can solve this error? Thanks in advance

Giuseppe

Chipseq Multi-factor Diffbind • 1.1k views

ADD COMMENT • link 7 months ago by g.persic1991 • 0

Ram · Answer 1 · 2023-12-07

1

Entering edit mode

7 months ago

ATpoint 84k

Read the error: There must be at least two sample groups with at least three replicates. Yet, you only have n=2 per group.

See for example DiffBind error: No contrasts added. There must be at least two sample groups with at least three replicates.

ADD COMMENT • link 7 months ago by ATpoint 84k

0

Entering edit mode

Hi ATpoint,

Thanks for your quick answer.

So, If I understand correctly, in this condition and using DiffBind, I can't take into account the batch effect during differential analysis.

I have also try to analyze this data using EdgeR, but now I'm wondering if this is correct (sorry but I can't understand the math behind and for this reason I'm always wondering if I'm doing the right things).

Can I ask your opinion about this second approach?

After retrieve the count in each regions i have analyzed using the following code

x <- read.delim("./Count.txt", stringsAsFactors=FALSE, header = TRUE)
DG <- readDGE(x ,header=FALSE)
matrix= DG$counts
DG <- DGEList(matrix, group= as.character(x$group))
DG$samples$name= x$description
DG$samples$Batch= x$Batch


DG <- calcNormFactors(DG, method = "TMM") #samples are normalized using TMM method
logCPM <- cpm(DG, log=TRUE)



design= model.matrix(~0+ x$Batch + x$group)
y <- estimateDisp(DG, design)
fit <- glmFit(y,design)
results=  glmLRT(fit)

results=as.data.frame(results)
results$FDR= p.adjust(results$PValue, method = "BH")

In any case thanks a lot for your help

Giuseppe

ADD REPLY • link updated 7 months ago by Ram 44k • written 7 months ago by g.persic1991 • 0

Ram · Answer 2 · 2023-12-07

1

Entering edit mode

7 months ago

Rory Stark ★ 2.0k

You can perform an analysis with fewer than three replicates per group by specifying the contrasts explicitly in the call to dba.contrast(). You may also need to set minMembers=2.

ADD COMMENT • link 7 months ago by Rory Stark ★ 2.0k

0

Entering edit mode

Hi Rory,

Thanks a lot for your comment.

Up to now I have performed the contrast in an explicitly way (as you suggest) using this command

dbObj <- dba.contrast(dbObj, 
                      design=FALSE, contrast=FALSE, 
                      dbObj$masks$C26,  #Treated Group
                      dbObj$masks$PBS,  #Control Group
                                    "C26", "PBS")

giving me ~80 differential enriched site (not bad as results).

However, I know that a batch effect is present among replicates, and I would to take into account it (related info are stored in Factor column),

For that reason I have try to use

dbObj <- dba.contrast(dbObj, design="~Factor + Condition")

So, can I make a differential analysis, taking into account the batch effect and having only two replicates for each conditions?

Thanks a lot in advance

ADD REPLY • link updated 7 months ago by Ram 44k • written 7 months ago by g.persic1991 • 0

0

Entering edit mode

Please do not add answers unless you're answering the top level question. Instead, use Add Comment or Add Reply as appropriate. I've moved your post to the right location this time, please be more careful in the future.

Another moderator had to do this the last time, so please pay more attention when you use the site.

EDIT: You did the exact same thing with another answer from Rory 2+ years ago (that I just fixed). Please learn to use the forum better.

ADD REPLY • link 7 months ago by Ram 44k

0

Entering edit mode

You should specify the contrast itself:

 tamoxifen <- dba.contrast(tamoxifen,
                           design="~Factor + Condition", 
                           contrast=c("Condition", "PBS", "C26"))

You don't actually need the Factor component as you already have the information in Replicate:

 tamoxifen <- dba.contrast(tamoxifen,
                           design="~Replicate + Condition", 
                           contrast=c("Condition", "PBS", "C26"))

ADD REPLY • link 7 months ago by Rory Stark ★ 2.0k

0

Entering edit mode

Dear Rory, thanks a lot for your help.

It's work, indeed now I'm able to find (as expected) more DB. The only things "strange" is that I have high difference in Deseq2 and EdgeR. I know that the two methods differs in several step and give different results, however I have 87 vs 1098 DB respectively, supposed to be too huge different.

here the code

samplesdbprova<-read.csv("DiffBind_template_peaks.csv")
samplesdbprova <- samplesdbprova[grepl("Pol2", samplesdbprova$IP), ]
samplesdbprova
dbObjprova <- dba(sampleSheet=samplesdbprova)


dbob.consensus <- dba.peakset(dbObjprova,consensus = DBA_CONDITION, minOverlap=2)
dba.show(dbob.consensus,dbob.consensus$masks$Consensus)
consensus <- dba.peakset(dbob.consensus, bRetrieve=TRUE,
                         peaks=dbob.consensus$masks$Consensus, minOverlap=1)


#Count
Count<-dba.count(dbObjprova)


#Normalization
dbObjprova_bkg_batch <- dba.normalize(Count)

#Contrst
dbObjprova_bkg_batch <- dba.contrast(dbObjprova_bkg_batch,
                           design="~Replicate + Condition", minMembers =2, 
                           contrast=c("Condition", "PBS", "C26"))

#DB analysis and report
dbObj_region_analysis_batch <- dba.analyze(dbObjprova_bkg_batch, method=DBA_ALL_METHODS)
dba.show(dbObj_region_analysis_batch, bContrasts=TRUE)


# 4 Samples, 3732 sites in matrix:
#        ID Factor Condition Replicate    Reads FRiP
# 1 PBS_Pol2 Batch1       PBS         1 19313327 0.01
# 2 PBS_Pol2 Batch2       PBS         2  8458898 0.01
# 3 C26_Pol2 Batch1       C26         1 18630636 0.01
# 4 C26_Pol2 Batch2       C26         2  6464943 0.01

# Design: [~Replicate + Condition] | 1 Contrast:
#    Factor Group Samples Group2 Samples2 DB.edgeR DB.DESeq2
# 1 Condition   PBS       2    C26        2     1098        87



pdf("./VennPlot.pdf")
dba.plotVenn(dbObj_region_analysis_batch,contrast=1,method=DBA_ALL_METHODS, bDB=TRUE)
dev.off()

If you have other suggestion, please let me know For now, Thanks a lot for your precious help

Giuseppe

enter image description here

ADD REPLY • link 7 months ago by g.persic1991 • 0