Question

Is there a term called 'pseudoenrichment' in Cut&Run analysis?

0

Entering edit mode

13 days ago

sribas.chowdhury888 • 0

Hello everyone!

I am analysing a Cut&Run dataset of two histone variants' pull down. These two variants are known to co-occupy a nucleosome, and to capture these, I have computed the signal enrichment of one variant on other variants' peaks. The peaksets were generated using macs2 and bigwig files were normalised using the method described in Seeking feedback on ChIP-seq normalization method: Calculating scaling factors by dividing input by IP, including spike-in

Despite this, we are facing a serious issue where the signal of one variant is way higher than the other, plausibly due to differences in antibody affinity. We talked about this with a collaborator and he said we have to do something known as "pseudoenrichment", basically scaling the bins of individual bigwig files to the same scale. I wanted to read more and googled about it but am unable to find any tutorial or any resource to understand how it is done. I am new to epigenomic analysis and this is the first time I am hearing about it.

Does anyone have an idea what it is and how it is done? Links to any resource, tutorial or any paper describing it would be really helpful.

Thanks!

cutandrun chipseq • 460 views

ADD COMMENT • link updated 11 days ago by LChart 5.2k • written 13 days ago by sribas.chowdhury888 • 0

score 1 · Answer 1 · 2025-11-03

1

Entering edit mode

12 days ago

LChart 5.2k

Despite this, we are facing a serious issue where the signal of one variant is way higher than the other, plausibly due to differences in antibody affinity

Not at all a surprise. This can also be due to differences in input amounts between experiments.

These two variants are known to co-occupy a nucleosome

It's not clear whether you mean that they always co-occupy a nucleosome, or that they can (or preferentially) do so.

to capture these, I have computed the signal enrichment of one variant on other variants' peaks

It's unclear what it means to "capture" co-occupancy, or what your enrichment calculation is. Why is it insufficient to plot (signal var1, signal var2) and use polar coordinates to define a co-occupancy ratio (angle) and prevalence (radius)? The difference in signal strength (due to, presumably, antibody affinity) should result in an angular bias, which you can either correct, or simply define a relative co-occupancy metric.

Your reviewer will ask for sequential ChIP, though.

ADD COMMENT • link 12 days ago by LChart 5.2k

0

Entering edit mode

Thank you for your inputs. These variants do not always co-occupy a region, rather, at TSS of genes that are going to be differentially expressed across developmental timepoints. We can safely say that they preferentially bind transcriptionally active regions and make the nucleosome unstable.

Coming to "capturing co-ocupancy", it is basically inspired by Fig 1e in this paper: https://doi.org/10.1038/s41467-025-57719-4 where they showed bivalent regions using a heatmap. What we have done is basically take a union of peakfiles of both variants and then we computed a matrix of their signal enrichment (normalized bigwig tracks) on the peakfile. The core idea was, when plotted on an enrichment heatmap, the peaks that are overlapping (have co-occupancy) should form a seperate kmean cluster while the exclusive ones will form other. The results were not as expected because one variant has a much broader distribution and signal intensity than the other.

I have not heard about the method you described, can you please share link to any study that used this method?

ADD REPLY • link 12 days ago by sribas.chowdhury888 • 0

0

Entering edit mode

The core idea was, when plotted on an enrichment heatmap, the peaks that are overlapping (have co-occupancy) should form a seperate kmean cluster while the exclusive ones will form other. The results were not as expected because one variant has a much broader distribution and signal intensity than the other.

The paper you link explicitly separates peaks into 3 sets: K27me3-only, K27ac-only, and ambiguous. I assume, from what you're saying, is that when you split your regions out into Var1-only, Var2-only, and Both; you still see lots of Var1 signal in the "Var2-only" part of the venn - probably because (say) the 10th percentile of Var1 signal is equivalent to the 90th percentile of var2 signal? It would seem fair to use relative signal in this case, where instead of using the normalized .bw (to spike-in/background) you perform a second normalization to the 95%-ile of peak intensity. This would put the two tracks on equivalent relative scales.

I have not heard about the method you described, can you please share link to any study that used this method?

Scatterplots are used everywhere; and taking residuals from the y=x or y=ax + b curves are also ubiquitous. What I have described is just this.

ADD REPLY • link 11 days ago by LChart 5.2k