Question

Calling large domains of signal enrichment (broad peaks) from log2,input normalized ChIP-seq data

0

Entering edit mode

7 months ago

rls_08 ▴ 40

As in the title, I am looking for approaches for calling large heterochromatin domains (like Lamina-associated domains, average size ~200Mb). I would like to start from bigwig files that show the log2(IP library in CPM/input library CPM). The idea is to determine the broad regions of enrichment (with signal > 0) as in the image below. I search different papers but have not found a clear explanation on how the domains are being called.

I have tried MACS and SICER/EPIC but they don't use use the normalized bigwig data for the analysis.

ChIP of sample peripheral heterochromatin profile with calls

chip-seq peak-caller normalization broad-domains • 735 views

ADD COMMENT • link updated 7 months ago by rfran010 ▴ 900 • written 7 months ago by rls_08 ▴ 40

score 0 · Answer 1 · 2023-09-07

0

Entering edit mode

7 months ago

ATpoint 82k

I have tried MACS and SICER/EPIC but they don't use use the normalized bigwig data for the analysis.

That should tell you how non-standard it is. Use the bam files for peak callers. This has been asked quite some times before, and iirc never people came up with a reliable tool for bigwig peak calling.

ADD COMMENT • link 7 months ago by ATpoint 82k

0

Entering edit mode

But when the signal depends on the enrichment compared to input, these tools don't seem super reliable. In the case of heterochromatin, I have seen that using BAMs misses a lot of "obvious" domains that are only evident upon log2(IP/input) transformation of each depth-normalized library. Looking at the BAM pile up by itself, for instance, is difficult to determine where the local area of enrichment is. Do you know if any of these tools take into account read depth and the local environment when calling broad domains?

ADD REPLY • link 7 months ago by rls_08 ▴ 40

0

Entering edit mode

Just filter the bigwig for regions with scores that meet your threshold and then use bedtools merge to merge regions that are connected or within a certain distance to get your contiguous domains. I don't think this type of manual threshold based calling is non-standard, especially for broad marks like k9me3. Although normally I've seen the genome split into bins like 10 kb first and then reads counted over the larger bins for target and IP., but this doesn't seem too far off from starting with your bigwig.

ADD REPLY • link 7 months ago by rfran010 ▴ 900