Diffbind-edgeR or Deseq2
1
0
Entering edit mode
4.1 years ago
francesca3 ▴ 140

Hi everyone, I'm using the DIffbind package and I'm noticing that there is a huge difference in the number of diff binding sites found by edgeR and Deseq2. I'm using the default options in dba.analize, just putting the object that I want to analyze and setting method=DBA_ALL_METHODS.

For example in a case edge finds 10397 peaks while deseq2 just 1. It's not always the same. Sometimes deseq2 finds more peaks than edge.

How can I choose the correct method?

How to explain these huge differences?

Some details:

samplesdbprova<-read.csv("SIRT30WT30.csv")

dbObjprova <- dba(sampleSheet=samplesdbprova)

bObjprova <- dba.count(dbObjprova,bUseSummarizeOverlaps=TRUE, minOverlap=2)

contrastprova <- dba.contrast(dbObjprova, dbObjprova$masks$SIRT630W, dbObjprova$masks$W20,"SIRT630w", "wt30")

bObjprova <- dba.analyze(contrastprova, method=DBA_ALL_METHODS)

Number of replicates: SIRT630=6 WT30=5

In one case (SIRT630) the factor (SIRT6) is overexpressed, in the other not. The chip was immunoprecipitated for SIRT6.

ChIP-Seq • 2.0k views
ADD COMMENT
1
Entering edit mode

10000 vs 1 sounds suspicious. Still, without code it is difficult to debug. Can you add some command lines, number of replicates, design etc? Beyond that: https://support.bioconductor.org/p/79725/

Edit: I moved the details you provided to the toplevel question to keep things organized.

ADD REPLY
1
Entering edit mode
4.1 years ago
Rory Stark ★ 2.0k

Usually when ther eis a big difference between the edgeR and DESeq2, it is driven by differences in normalization. This is generally due to an experiment where there is a wholesale change in binding patterns between two conditions, generally in one direction. The TMM normalization used in edgeR assumes that there is a core set of sites that are not deferentially bound (corresponding to its use in RNA-seq, where the expression of a substantial set of genes does not change). The Biocondcutor support discussion that ATpoint points to is a good reference.

ADD COMMENT
0
Entering edit mode

Adding on this, you could use MA-plots to explore how the normalized counts behave. I am not a DiffBind user but there is for sure a function for this. It is essentially the log average counts as x and logFC as y. If the assumptions that TMM has hold then a notable number of data points should be somewhat centered at y=0. If this is not true or you even know that this assumption is expected to be violated you could use a bin-based normalization as e.g. suggested in the vignette of the csaw package. A use case could e.g. be a highly-cell type specific histone mark in a very early stem cell vs a late differentiated cell or maybe, as in your case, a strong overexpression of a factor. Is this factor present in the WT cells at all? If not then indeed a massive change in library composition is expected that might mess up your normalization. Can you show some MA-plots? I always produce MA-plots (pre/post normalization) and PCAs as part of my routine differential analysis workflow in order to explore results. You should also check if p-values are not significant for peaks with high fold changes or if you simply have no notable fold changes.

ADD REPLY

Login before adding your answer.

Traffic: 2539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6