Question

Validating ChIP-seq peak-calling output across replicates

2

Entering edit mode

7.2 years ago

bioinfc37 ▴ 30

In general, I would like to validate my ChIP-seq output from MACS2. My ChIP-seq dataset contains libraries that are not pure technical replicates -- the biological sample (1 tube) was divided in three samples (three tubes) for sequencing. The variation between samples is likely due to the sequencer. In any case, how may I validate/compare the replicates computationally.

ChIP-Seq • 2.6k views

ADD COMMENT • link updated 7.2 years ago by Sentinel156 ▴ 190 • written 7.2 years ago by bioinfc37 ▴ 30

score 3 · Answer 1 · 2017-02-02

3

Entering edit mode

7.2 years ago

mforde84 ★ 1.4k

You're interested in a irreproducibility discovery rate (IDR) analysis. ENCODE has a standard pipeline for this application. I have a github with a pipeline implementation available as well.

ADD COMMENT • link 7.2 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Nice! however how current is that ENCODE pipeline (your first link)? I've used the main IDR repo recently but was never quite sure how running IDR this way compares. Also how important is it to go through the process of generating and calling peaks from pseudoreplicates (as per the ENCODE pipeline)? Does your pipeline automate this?

ADD REPLY • link 7.2 years ago by Sentinel156 ▴ 190

0

Entering edit mode

Im not sure you really have to worry too much about the current-ness of the encode pipeline as it's still extensively used and the component software (eg., IDR, SPP, MACS2) is still being actively developed. I think of it as a psuedogold standard pipeline (in the absence of validation :) ) for TF chip calling.

I think subsampling just makes the analysis more rigorous. I mean if you see certain peaks in one psuedosample and not the other, or the peaks from baseline are drastically different, it's kinda questionable if it's real signal. But yes, my pipeline has automated the psuedoreplicate portion as well. I'm not sure if it will work out of the box for you, as you'll likely have a different cloud / HPC setup then me. But it should be compatible with a VM running ubuntu 14.04 lts which you can rent off AWS. You'll likely want to go line by line for a small set of test samples, see where things break, make w/e changes are needed, and then throw the kitchen sink at it.

ADD REPLY • link 7.2 years ago by mforde84 ★ 1.4k

score 3 · Answer 2 · 2017-02-02

3

Entering edit mode

7.2 years ago

Sentinel156 ▴ 190

OP you could also use the excellent Deeptools2 package to look at variation in your technical reps using the plotPCA/plotCorrelation functions

ADD COMMENT • link 7.2 years ago by Sentinel156 ▴ 190