DiffBind analysis - minCount parameter in dba.count()
1
0
Entering edit mode
7 months ago
vartika • 0

Why does changing minCount from 0 to 1 in dba.count() (with summits set to false and control reads supplied) change the called status of a peak? With minCount set to 1, a peak is being called in all samples, but with minCount set to 0, the same peak is being called in a few of them?

So, if minCount is set to 1, it will show the peak is called anyway? But that is not the case, cause there are peaks that are not called in half of the samples but called in the other half with minCount set to 1?

Am I missing something basic? What might be happening in my case?

ChIP-Seq DiffBind • 491 views
ADD COMMENT
1
Entering edit mode

If you can supply a bit more info, I can see if I can reproduce this.

How are you determining how many samples a peak is called in? Can you show a bit of code demonstrating:

  • your first call to dba.count()
  • show a peak and which samples it is called in
  • your second call to dba.count() changing minCount
  • show how the peak is appearing to have been called in more/fewer samples

    Also, can you confirm which version of DiffBind you are using?

ADD REPLY
0
Entering edit mode

Hi Rory! Thank you for your response. I was referring to the columns "Called1" and "Called2" in the report, they change when I change the minCount.

Diffbind version 3.0.13

Line from the Code:

minCount = 1

exp <- dba.count(exp, bSubControl = TRUE, minCount = 1, summits = FALSE)

minCount = 0

exp <- dba.count(exp, bSubControl = TRUE, minCount = 0, summits = FALSE)

I made no other changes for the two, the following were set:

exp$config$design <- FALSE

exp$config$doBlacklist <- FALSE

exp$config$doGreylist <- FALSE

After count:

exp <- dba.normalize(exp,normalize = DBA_NORM_LIB, library =DBA_LIBSIZE_FULL )

exp <- dba.contrast(exp,categories = DBA_CONDITION)

exp <-dba.analyze(exp)

exp.DB <- dba.report(exp, contrast =1, bCalled = TRUE, bCalledDetail = TRUE, bCounts = TRUE, bNormalized = TRUE)

Please find the link for a google sheet below which has three examples from the report.

Three examples from the report generated

ADD REPLY
0
Entering edit mode

I've tried this on a couple of my own data sets but can not reproduce it (the Called values are always the same when I compare sum(rep0$Called1 != rep1$Called1) and sum(rep0$Called2 != rep1$Called2)).

I'd like to get to the bottom of this. If you could send me a link to your two exp objects immediately after calling dba.count() I can try to see what is going on.

ADD REPLY
0
Entering edit mode

Okay, I'm not sure where I have gone wrong. Here you go, exp objects after dba.count

ADD REPLY
2
Entering edit mode
7 months ago
Rory Stark ★ 1.2k

I am able to reproduce this and have identified, and fixed, a bug in how the Called status is maintained after filtering (when minCounts is lowered to 0, some counts are lessened, and additional sites are filtered based on the requirement that at least one sample have an RPKM value of 1 or more).

I have checked in a fix that will appear as DiffBind_3.0.14, available via BiocManager::install() in the next couple of days.

ADD COMMENT
0
Entering edit mode

oh! Thank you for answering to my question and fixing the bug so quickly! Also, being a newbie to bioinformatics, it is very inspiring to see the time and efforts put into developing such packages for all of us to use. Great work! Thanks, again.

ADD REPLY
0
Entering edit mode

This was actually a non-trivial bug, so I appreciate you giving me the information I needed to reproduce and resolve it!

ADD REPLY
0
Entering edit mode

Hi Rory, I analysed new datasets with DiffBind_3.0.15, I am still confused with the results when minCount is set to 0 and 1.

When minCount is set to 0, the results show only those peaks that are called in all the samples (3 WT and 3 KO) without any change to the filter.

In the results obtained when minCount was set to 1, a sample that had almost the same number of reads (shown below) as the other WT's, was not called. Wanted to clarify if the called status is solely based on if the peakcaller detected it as a peak in that sample?

WT_1 WT_2 WT_3 KO_1 KO_2 KO_3 Called1 Called2 WT_1.1 WT_2.1 WT_3.1 KO_1.1 KO_2.1 KO_3.1


792.5 377.89 377.26 177.93 40.82 69.38 2 0 + + - - - -

Thank you for your time.

ADD REPLY

Login before adding your answer.

Traffic: 1354 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6