Question

Interpreting genomic features distribution for CUT&RUN peaks

0

Entering edit mode

3 days ago

Rozita ▴ 40

Hi,

I've done CUT&RUN for my target of interest in siSCR and siBRCA2 cells. I used SEACR for peak calling where the peaks were normalised to IgG using the stringent conditions and I got approx. 450 peaks for siSCR and 7500 peaks for siBRCA2 (after taking into consideration the peaks that overlap between 2 out of 3 replicates). I do expect my target of interest to be enriched in the siBRCA2 cells.

When I plot the metagene profile, I clearly see an enrichment of my signal at the TSS in the siBRCA2 cells compared to siSCR and this is also true when I visualise the bigwig files across several genes.

The issue is when I want to plot the distribution of the peaks across the different genomic features (promoters, 5'UTR, 3'UTR, exon, intron, etc.), I see that the majority of the peaks are at the promoters, however, the percentage of peaks in the promoter region is slightly lower in the siBRCA2 cells compared to the siSCR cells.

enter image description here

The question is:

How do I assess that this is significant? It is 91.5% in siSCR and 88.9% in siBRCA2. Should I do this with the individual replicate files?
Is this an accurate way to represent it? Or am I misinterpreting / missing out on something in my interpretation of the data?

I would highly appreciate your help.

Thank you.

cut_and_run genomic features • 274 views

ADD COMMENT • link updated 6 hours ago by rfran010 ★ 1.7k • written 3 days ago by Rozita ▴ 40

0

Entering edit mode

If I understand correctly, you see increased signal at TSS sites compared to control (taller peaks in browser). However, 92% of the 450 siSCR peaks are in TSS and 88.9% of the 7500 peaks are in siBRCA2 peaks are in TSS.

I wouldn't focus on this slight difference and instead just conclude promoters are bound in both conditions, but the overall level of binding is greater in siBRCA2. You can consider differential binding analysis. See to get started, Using featureCounts and DESeq2 to look at differences between ATAC-seq conditions.

With such an increase, I may expect protein expression increased too which could be cool to show.

ADD REPLY • link 6 hours ago by rfran010 ★ 1.7k

score 0 · Answer 1 · 2025-10-10

0

Entering edit mode

2 days ago

LChart 5.1k

My initial concern is the ~15-fold difference (450 vs 7500) between called peaks in the two different cell line conditions. This suggests to me that the target itself is at very different levels of abundance between the two conditions; so I would double check that first. The slight shift in feature occupancy, at first blush, seems irrelevant in the face of 15-fold difference of overall occupancy; and I wouldn't pursue statistics around feature-level differences.

ADD COMMENT • link 2 days ago by LChart 5.1k

0

Entering edit mode

The massive difference has been reproducible across the 3 biological replicates and is in line with what we would expect in terms of the biology in the cells.

ADD REPLY • link 1 day ago by Rozita ▴ 40