how matrix composition influences the final results in diff bind
1
0
Entering edit mode
3.6 years ago
francesca3 ▴ 140

Hi, I have some doubts. I ran an analysis more times. The first time, I designed the matrix with 14 samples divided in 4 conditions (3 samples of condition A-2 samples of B-5 samples of C-4 samples of D). The second time, I put in the matrix only 5 of the 14 samples (3 of condition A and 2 of B). The third time, my matrix was composed only of four samples: the same samples of the second time, except for 1 replicate. (so it was 2 of A vs 2 of B).

I performed the differential analysis for the groups A vs B for all three matrices.

In the first case,I obtained 113 db, in the second 116 and in the third 179. I thought that almost all 113 db should be contained in the 116, since the only thing I changed was the starting matrix composition, but it wasn't the case. In particular :

  1. peaks common to all 3 analyses:56
  2. peaks common to first and second analysis: 56
  3. peaks common to first and third analysis:72
  4. peaks common to second and third analysis:87

So why I see so different results? If I have more conditions to analyse,is it better to build different matrices for each comparison or only one?

I ran with this code, first creating a consensus dataset for each condition.

library(DiffBind)
samplesdbprova<-read.csv("sheet.csv")
dbObjprova <- dba(sampleSheet=sheet)
dbob.consensus <- dba.peakset(dbObjprova,consensus = DBA_CONDITION, minOverlap=0.66)
 consensus <- dba.peakset(dbob.consensus, bRetrieve=TRUE,
                         peaks=dbob.consensus$masks$Consensus,
                         minOverlap=1)
 dbObjprova <- dba.count(dbObjprova, peaks=consensus, bUseSummarizeOverlaps=TRUE, minOverlap=2)
 contrastprova <- dba.contrast(dbObjprova, dbObjprova$masks$A, dbObjprova$masks$B,"A", "B")
 bObjprova <- dba.analyze(contrastprova, method=DBA_EDGER)

Thanks Francesca

ChIP-Seq diffbind • 1.4k views
ADD COMMENT
2
Entering edit mode
3.6 years ago
Rory Stark ★ 2.0k

Hello Francesca-

There are a number of things that could account for this. Among other things, the consensus peakset will be different each time, which will change things a fair amount. Additionally, as you are not re-centering around summits, even if there are similar peaks in each version, they will differ in their coordinates (with the version involving more samples generally having wider consensus peak intervals).

If you could send me some of your data, I can have a look and give you a more precise answer. I'd need access to the dbObjprova object prior to counting (with peaks from all 14 samples), as well as the three versions obtained after counting.

Also, can you tell me if your sequencing data is paried-end or not?

ADD COMMENT
0
Entering edit mode

thanks, I tried to annotate and in the end the majority of the genes are the same. it is as you say.. It's a problem of coordinates. I will try to re-center as you suggest.

ADD REPLY
0
Entering edit mode

Hi Rory, I tried to set bUseSummarizeOverlaps=FALSE and summits=300, but it gives me ERROR. I noticed that It gives me error just if I set peaks=consensus.
If I set just dbObjprova <- dba.count(dbObjprova, minOverlap=2, summits=300), it works. I noticed that also someone else had this problem https://support.bioconductor.org/p/121491/ but I could't find the solution. They are single end data

ADD REPLY
0
Entering edit mode

I'll have a look at this later this afternoon to see if I can reproduce it.

In the meantime if you are working on this now perhaps you could try the suggestion I made on the other thread? Try breaking the counting into two steps:

 dbObjprova <- dba.count(dbObjprova, peaks=consensus, summits=TRUE)
 dbObjprova <- dba.count(dbObjprova, peaks=NULL, summits=300)

and let me know if that works.

ADD REPLY
1
Entering edit mode

Francesca, thanks to you pointing out that it failed when you supplied the consensus peakset, but it worked when you let DiffBind compute the consensus, I was able to reproduce this and find the bug. I am working on a fix now and will post here when it is available. The workaround is to do it the way you were able to make it work (not setting peaks when calling dba.count()).

Also, two comments on your original code. In the second call to dba.peakset(), the minOverlap parameter is ignored (it is only used when the consensus parameter is set, as in your first call). Likewise, in your call to dba.count(), when you specify a consensus peakset with peaks= consensus, the minOverlap parameter is also ignored and the consensus peakset will be used as-is. When the user supplies a consensus peakset, DiffBind has no way of knowing where the peaks came from, so it can not calculate overlaps.

ADD REPLY
0
Entering edit mode

Thank you for your reply and your explanations. I will wait for you to fix the bug. Francesca

ADD REPLY
1
Entering edit mode

I have checked in the fix. It will appear in DiffBind 2.16.1 once it gets through the Bioconductor build process (usually a day or two unless there is a problem).

If you are able to install from source (ie on linux), I can send you the .tar.gz (2Mb).

Cheers- Rory

ADD REPLY
0
Entering edit mode

Thank you. I will wait for the Bioconductor version, since I'm been working on Windows Rstudio version. No problem. Thank you again.

ADD REPLY
1
Entering edit mode

The update is now available in Bioconductor. Let me know if you have any more problems.

FYI, next month a major update of DiffBind will be released, with much greater control over model designs, contrasts, normalization, and blacklists.

ADD REPLY

Login before adding your answer.

Traffic: 2645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6