Does IDR (irreproducible discovery rate) higher mean less 'reproducible' or vice versa?
1
2
Entering edit mode
4.3 years ago
sckinta ▴ 680

Applying idr package to ChIP-seq peaks allows researchers to compare multiple replicates and evaluate each peaks with reproducibility. It produces a output file which mimics the input narrowPeak file with additional two columns stating localIDR and globalIDR. Both localIDR and globalIDR represents -log10 IDR values. global IDR is more informative since it is more like multiple test corrected value of local IDR, according to this post (https://groups.google.com/forum/#!topic/idr-discuss/FY2K5VKx8AQ)

So if I want to find the most reproducible peaks, should I select the peaks with higher globalIDR or lower globalIDR? In other words, Does IDR (irreproducible discovery rate) higher mean less 'reproducible' or vice versa?

Another question I have is about pvalue/qvalue in 12-col report. Is the pvalue/qvalue for the peak prediction or for IDR test? In the original individual input narrowPeak file, you have pvalue representing the peak calling confidence for each peak in each file. When you combine in idr, how do you get pvalue/qvalue for it? average the original pvalue from all inputs?

IDR ChIP-Seq • 3.4k views
1
Entering edit mode
4.0 years ago
goodez ▴ 540

This is from their docs on GitHub. For column 5 of the output:

5. score int
Contains the scaled IDR value, min(int(log2(-125IDR), 1000). e.g. peaks with an IDR of 0 have a score of 1000,
idr 0.05 have a score of int(-125log2(0.05)) = 540, and idr 1.0 has a score of 0.


So if you are looking at column 5, a higher IDR means the peak is more reproducible. Much like when looking at p-values or q-values, you have to pay attention to whether it's a raw value or a scaled score using -log(). For raw IDR values, lower is better (just like for p-values), however they present the scaled IDR score in the output, so higher is better.

For your other question about which p-value/q-value to use after using IDR:

What I do is use IDR to filter out peaks from the original narrowPeaks which are not very reproducible, then I just arbitrarily choose the first replicate as my p-values/q-values. Like you said, you could also take the mean p-value/q-value of all replicates for peaks which have good IDR.