Hi, I have some doubts. I ran an analysis more times. The first time, I designed the matrix with 14 samples divided in 4 conditions (3 samples of condition A-2 samples of B-5 samples of C-4 samples of D). The second time, I put in the matrix only 5 of the 14 samples (3 of condition A and 2 of B). The third time, my matrix was composed only of four samples: the same samples of the second time, except for 1 replicate. (so it was 2 of A vs 2 of B).
I performed the differential analysis for the groups A vs B for all three matrices.
In the first case,I obtained 113 db, in the second 116 and in the third 179. I thought that almost all 113 db should be contained in the 116, since the only thing I changed was the starting matrix composition, but it wasn't the case. In particular :
- peaks common to all 3 analyses:56
- peaks common to first and second analysis: 56
- peaks common to first and third analysis:72
- peaks common to second and third analysis:87
So why I see so different results? If I have more conditions to analyse,is it better to build different matrices for each comparison or only one?
I ran with this code, first creating a consensus dataset for each condition.
library(DiffBind)
samplesdbprova<-read.csv("sheet.csv")
dbObjprova <- dba(sampleSheet=sheet)
dbob.consensus <- dba.peakset(dbObjprova,consensus = DBA_CONDITION, minOverlap=0.66)
consensus <- dba.peakset(dbob.consensus, bRetrieve=TRUE,
peaks=dbob.consensus$masks$Consensus,
minOverlap=1)
dbObjprova <- dba.count(dbObjprova, peaks=consensus, bUseSummarizeOverlaps=TRUE, minOverlap=2)
contrastprova <- dba.contrast(dbObjprova, dbObjprova$masks$A, dbObjprova$masks$B,"A", "B")
bObjprova <- dba.analyze(contrastprova, method=DBA_EDGER)
Thanks Francesca
thanks, I tried to annotate and in the end the majority of the genes are the same. it is as you say.. It's a problem of coordinates. I will try to re-center as you suggest.
Hi Rory, I tried to set
bUseSummarizeOverlaps=FALSE
andsummits=300
, but it gives me ERROR. I noticed that It gives me error just if I setpeaks=consensus
.If I set just
dbObjprova <- dba.count(dbObjprova, minOverlap=2, summits=300)
, it works. I noticed that also someone else had this problem https://support.bioconductor.org/p/121491/ but I could't find the solution. They are single end dataI'll have a look at this later this afternoon to see if I can reproduce it.
In the meantime if you are working on this now perhaps you could try the suggestion I made on the other thread? Try breaking the counting into two steps:
and let me know if that works.
Francesca, thanks to you pointing out that it failed when you supplied the consensus peakset, but it worked when you let
DiffBind
compute the consensus, I was able to reproduce this and find the bug. I am working on a fix now and will post here when it is available. The workaround is to do it the way you were able to make it work (not settingpeaks
when callingdba.count()
).Also, two comments on your original code. In the second call to
dba.peakset()
, theminOverlap
parameter is ignored (it is only used when theconsensus
parameter is set, as in your first call). Likewise, in your call todba.count()
, when you specify a consensus peakset withpeaks= consensus
, theminOverlap
parameter is also ignored and the consensus peakset will be used as-is. When the user supplies a consensus peakset,DiffBind
has no way of knowing where the peaks came from, so it can not calculate overlaps.Thank you for your reply and your explanations. I will wait for you to fix the bug. Francesca
I have checked in the fix. It will appear in DiffBind 2.16.1 once it gets through the Bioconductor build process (usually a day or two unless there is a problem).
If you are able to install from source (ie on linux), I can send you the .tar.gz (2Mb).
Cheers- Rory
Thank you. I will wait for the Bioconductor version, since I'm been working on Windows Rstudio version. No problem. Thank you again.
The update is now available in Bioconductor. Let me know if you have any more problems.
FYI, next month a major update of DiffBind will be released, with much greater control over model designs, contrasts, normalization, and blacklists.