Question

What Is The Meaning Of The Score In Diffbind'S Occupancy/Overlap Analysis?

1

Entering edit mode

10.3 years ago

Nick ▴ 290

I have 3 chip-seq biological replicates, each with its own input control. I was interested in using diffBind to performs some IDR-style analysis - e.g. take only the peaks that come up in more than one sample.

experiment <- dba(sampleSheet="exp_samples.csv")
pdf('overlap_venn.pdf')
dba.plotVenn(experiment, experiment$masks$macs)
dev.off()

I got a nice Venn diagram:

enter image description here

I wanted to perform a sanity check and had a look at the peaks occurring only in A (sample1):

experiment.OL = dba.overlap(experiment, experiment$masks$macs)
a_peaks <- as.data.frame(experiment.OL$onlyA, row.names = NULL, optional = FALSE)
> dim(a_peaks)
[1] 500   6
> head(a_peaks)
seqnames   start     end width strand      score
1 GL172637.1  196748  197840  1093      * 0.09639324
..

I decided to check the first peak and, sure enough I found it in the xls file produced for this sample by macs14:

chr    start    end    length    summit    tags    =-10*LOG10(pvalue)    fold_enrichment    FDR(%)
GL172637.1    196748    197840    1093    299    38    62.55    48.57    100

However, what confuses me is the score computed by diffBind (0.09639324). In the diffBind's documentation it says "The scores associated with each site are derived from the peak caller condence score, and are a measure of condence in the peak call (occupancy), not a measure of how strong or distinct the peak is." I don't know how to interpret this. Also, it is not clear to me how is this score related to its FDR (100%). Can anyone help?

chipseq macs • 6.2k views

ADD COMMENT • link updated 3 months ago by kalaibarathi95 • 0 • written 10.3 years ago by Nick ▴ 290

0

Entering edit mode

I have 4 samples each with individual scores in the range of 100s and also a consenus peakset of all 4 samples where i tried to to convert the scores as per the above into decimal range, but it still does not fit. Am i missing something?

ADD REPLY • link 3 months ago by kalaibarathi95 • 0

score 2 · Answer 1 · 2014-01-17

2

Entering edit mode

10.3 years ago

Nick ▴ 290

Rory (the author of diffbind) kindly replied to a personal message I sent. I'm pasting his reply here:

The default peak score for macs peaks is the "=-10*LOG10(pvalue)" value. It is normalised to a 0..1 scale by dividing the scores by the maximum score (so the max score gets a value of 1).

ADD COMMENT • link 10.3 years ago by Nick ▴ 290