Hi everyone,
I am analyzing CUT&RUN data and I am encountering an issue with my IgG control. The histone mark analyzed is H3K27me3. Paired-end reads were aligned with bowtie2, duplicates removed with MarkDuplicates and coverage track files generated with bamCoverage.
When looking at the profiles of my samples, I observe distinct peaks that should probably be called by peak calling algorithms.
The problem is that the intensity of my IgG control far exceeds the intensity of my samples in most of the genome, and thus only a very few peaks are called. Here is an example, with reads normalized by the RPKM method. Auto-scaling on this region shows RPKM value is consistently near the maximum 19 for the IgG, but in my samples, RPKM is max at 4.79/5.77.
I am confident in my samples because compared with profiles from other studies for the same cell type, I observe similar profiles (even if I have less reads and peaks are not as clear as the other dataset).
However, in some regions this problem does not show up.
Peak calling with no control (MACS2, GoPeaks, SEACR) is not satisfactory and seems not interpretable (very short peaks).
I know I have a limited number of reads (7 million by sample), but I would like to make the most if these data. But probably the IgG is not usable because of a technical issue ? Or could I scale down my IgG ?
I would be happy to know your recommendations on how you would have proceeded and to provide further information if needed.
Can it be that this region is not representative? I mean, at 7-ish million reads you should see sparse IgG at best, because it is basically an unspecific shapshot of the protein-attracting chromtin/DNA. yet, IgG is all over the place. What is the scale in kb for this screenshot? Did you try the broad option in macs2/3?
Thank you for your pertinent answer. Indeed probably I did not give a good example as screening other regions, the problem does not appear to occur that much, and probably the scale was misleading here (I used mean windowing function but with maximum IgG expression is lower)
I tried the broad option in MACS2 with low success : 1 peak detected for Sample 1, 0 for Sample 2.
When I use mean windowing + log scale, I visually observe some good peaks at large scale, but if I zoom or change the windowing to minimum, the peaks are not visually present, while it is conserved for the samples coming from the other study. Maybe IP did not work as strong as expected.
I had more success with GoPeaks using broad option : 305 and 1673 peaks respectively but the peak count frequency around TSS, and between TTS-TSS graphs does not look what I would expect for the first sample.