What could be the cause of different levels of binding in ChIP-Seq Input samples?
2
0
Entering edit mode
12 months ago
Sam ▴ 170

In ChIP-Seq Drosophila data, there is a large difference between the Input samples. Calling peaks when one is taken as the treatment, and another as the control yields thousands of peaks.

Plotting the bam files in Ngsplot reveals that after normalization, in some of the Inputs there is much more binding in the genebody area than in others.

The inputs are not from exactly the same condition, but from very similar ones (if possible, I'd rather not go into details). In theory, there should be no large differences between the conditions.

How can I determine if this is a biological difference, or some technical artifact / problem with the analysis?

The data is 43bp paired-end; it was aligned in bowtie -m mode (yields uniquely mapped reads) to BDGP6.28. Bam files plotted with Ngsplot with dm6 genome (it's an older release than BDGP6.28 -Ensembl 79). Fragment size in ngsplot was set to 250, which was roughly the fragment size calculated by macs2. Blacklisted areas were not removed from the bam file. In this specific image duplicate reads were removed, but removing them makes no difference.

(Another issue in the plots is that the binding demonstrates peaks in the TSS, and the TES. As far as I understand, open chromatin is more easily sonicated than closed chromatin, and hence the binding is not expected to be even across the genebody. Here is an article showing that)

ChIP-Seq ngsplot • 485 views
1
Entering edit mode
12 months ago
ATpoint 55k

As differences in signal-to-noise ratio are very common in ChIP-seq I suggest you normalize not only by library size but also by composition/quality, see A: ATAC-seq sample normalization (quantil normalization) and then check whether the differences you see are still prominent between treatment groups. Instead of this plot that you have I would check normalization by MA-plots, see for example the MA-plot part in Basic normalization, batch correction and visualization of RNA-seq data or check the csaw vignette at Bioconductor which discusses normalization extensively. You can also calculate the reads that overlap callable peaks per sample (FRiPs) to get a proxy for data quality per sample, would not surprise me if the FRiPs are quite different between samples. ChIP is a pain.

0
Entering edit mode
12 months ago

Plotting the bam files in Ngsplot reveals that after normalization, in some of the Inputs there is much more binding in the genebody area than in others.

How did you normalize your data ? Looks to me that sample 17 has just more signal everywhere, which is unexpected indeed.

If the normalization is on the total number of reads, then perhaps you should worry about where the reads of sample 18 come from: there is less around genes, so more somewhere else – if that somewhere else is just background then it is ok, but it could also come from specific features.

For instance, I remember comparing the input of a WT and a mutant, and there where huge differences in the amount of rDNA repeats and mitochondrial DNA that completely skewed normalization. Excluding these regions from normalization solved the issue.