Noisy ChIPSeq data
2
0
Entering edit mode
6.9 years ago

Hello,

I'm hoping someone can help me here, I carried out two CTCF ChIPSeq replicates and the data generated is noisy, as it stand I can only see about 10% of the peaks using SISSR peak calling. I know this signal is bonafide as when I extract the sequences and run a motif analysis I can see clear CTCF binding motif association. Has anyone got any experience dealing with noisy data? The fact that I have two replicates should help me but Im relatively new to bioinformatics and don't really know where to go from here.

Thanks
Teri

ChIP-Seq alignment next-gen-sequencing • 2.8k views
0
Entering edit mode

Use CHANCE to see how the enrichment is of your signal to understand if its a quality to infer peaks or not. Alternatively how much deep is your data? Might be the depth is not enough for calling the maximum number of peaks.

Assess the enrichment of thise peaks and view them in broswer and proceed to see what kind of enrichment you can see.

You can also you deeptools to assess the QC status.

Finally I would also ask you to play with parameters for fine refinement of peak calling and also try to use other peak callers (rseg, macs2.1, pepr and many more) and see what their result is. Good luck!

0
Entering edit mode

Along with vchris_ngs's reply. You might want to take a look at the FastQC for your raw data. When you mapped, what was the overall alignment rate? High, low? Have you tried merging your two samples together?

3
Entering edit mode
6.9 years ago

1) check the sequencing depth, adapter contamination (FASTQC)

2) Run through fastq screen to see if there is no cross-organism contamination.

3) If you have a control for peak calling, run through bamFingerprint module of the deepTools to check the enrichment.

4) Don't merge the samples to start with, if you feel both are showing the same enrichment and check the correlation using bamCorrelate from deepTools.

5) Lower the significance parameters (p-value, fold change, FDR, m-fold etc.) during peak calling. Different peak callers have different options.

6) Check some known gene targets, where your protein should bind, so see if there is a peak as compared to other regions and how high is it.

7) You might need to run this analysis with someone experienced. Sometimes, it's better to redo the ChIP rather than try to find out the meaningful results from a noisy data (better is after you know what might have been the problem or even otherwise)

1
Entering edit mode

In case someone wants to do this and is using a newer version of deepTools, bamFingerprint is now plotFingerprint and bamCorrelate is now multiBamSummary followed by plotCorrelation.

0
Entering edit mode

Yeah, Thanks Devon. I haven't worked with the newer version yet.

0
Entering edit mode

You can also check things like cross-correlation and IDR (see http://genome.cshlp.org/content/22/9/1813.full)

1
Entering edit mode
6.9 years ago
ablanchetcohen ★ 1.2k

Where is your control? An input DNA control or IgG control goes a very long way in helping to distinguish real peaks from background noise.

I would never advise skipping the control. The initial savings in running the experiment do not compensate for the subsequent headaches in trying to distinguish the background noise from the real peaks.

To save money, you can reuse the same control for several experiments, however, as long as the conditions are the same.

1
Entering edit mode

Just with the last point I disagree. Ideally, a control should be generated each time with experiments. We have tested control replicates done at different times with same/different people but using the same protocol, never got the same enrichment at the same sites back though the correlation was between 60-95% range.