Question

Diffbind-edgeR or Deseq2

0

Entering edit mode

4.1 years ago

francesca3 ▴ 140

Hi everyone, I'm using the DIffbind package and I'm noticing that there is a huge difference in the number of diff binding sites found by edgeR and Deseq2. I'm using the default options in dba.analize, just putting the object that I want to analyze and setting method=DBA_ALL_METHODS.

For example in a case edge finds 10397 peaks while deseq2 just 1. It's not always the same. Sometimes deseq2 finds more peaks than edge.

How can I choose the correct method?

How to explain these huge differences?

Some details:

samplesdbprova<-read.csv("SIRT30WT30.csv")

dbObjprova <- dba(sampleSheet=samplesdbprova)

bObjprova <- dba.count(dbObjprova,bUseSummarizeOverlaps=TRUE, minOverlap=2)

contrastprova <- dba.contrast(dbObjprova, dbObjprova$masks$SIRT630W, dbObjprova$masks$W20,"SIRT630w", "wt30")

bObjprova <- dba.analyze(contrastprova, method=DBA_ALL_METHODS)

Number of replicates: SIRT630=6 WT30=5

In one case (SIRT630) the factor (SIRT6) is overexpressed, in the other not. The chip was immunoprecipitated for SIRT6.

ChIP-Seq • 2.1k views

ADD COMMENT • link updated 4.1 years ago by ATpoint 82k • written 4.1 years ago by francesca3 ▴ 140

1

Entering edit mode

10000 vs 1 sounds suspicious. Still, without code it is difficult to debug. Can you add some command lines, number of replicates, design etc? Beyond that: https://support.bioconductor.org/p/79725/

Edit: I moved the details you provided to the toplevel question to keep things organized.

ADD REPLY • link 4.1 years ago by ATpoint 82k

score 1 · Answer 1 · 2020-03-17

1

Entering edit mode

4.1 years ago

Rory Stark ★ 2.0k

Usually when ther eis a big difference between the edgeR and DESeq2, it is driven by differences in normalization. This is generally due to an experiment where there is a wholesale change in binding patterns between two conditions, generally in one direction. The TMM normalization used in edgeR assumes that there is a core set of sites that are not deferentially bound (corresponding to its use in RNA-seq, where the expression of a substantial set of genes does not change). The Biocondcutor support discussion that ATpoint points to is a good reference.

ADD COMMENT • link 4.1 years ago by Rory Stark ★ 2.0k

0

Entering edit mode

Adding on this, you could use MA-plots to explore how the normalized counts behave. I am not a DiffBind user but there is for sure a function for this. It is essentially the log average counts as x and logFC as y. If the assumptions that TMM has hold then a notable number of data points should be somewhat centered at y=0. If this is not true or you even know that this assumption is expected to be violated you could use a bin-based normalization as e.g. suggested in the vignette of the csaw package. A use case could e.g. be a highly-cell type specific histone mark in a very early stem cell vs a late differentiated cell or maybe, as in your case, a strong overexpression of a factor. Is this factor present in the WT cells at all? If not then indeed a massive change in library composition is expected that might mess up your normalization. Can you show some MA-plots? I always produce MA-plots (pre/post normalization) and PCAs as part of my routine differential analysis workflow in order to explore results. You should also check if p-values are not significant for peaks with high fold changes or if you simply have no notable fold changes.

ADD REPLY • link 4.1 years ago by ATpoint 82k