Question

Cut&Run TF Analysis - Very Low Peak Counts Despite Good Library Quality

2

Entering edit mode

27 days ago

sk ▴ 20

I’m analyzing Cut&Run data for a transcription factor with the following setup:

Multiple time points with replicates

IgG control processed identically as a BAM file as control for macs2

Libraries have excellent quality (0.7% duplication rate before peak calling)

Using MACS2 callpeak with BAMPE format and hg38 reference and also using a IgG control which i ran through the same steps until peak calling and then used it as a BAM file control against the treatment

Current filtering: Using samtools filter with less than 120bp size cutoff for TF analysis as per https://doi.org/10.1186/s13059-019-1802-4

Also filtering for “filters”: [ { “id”: “1”, “isProperPair”: “true”, “mapQuality”: “>=30”, “reference”: “!chrM” } ]

Main Issues:

Very low peak counts: Only 11-16 peaks per sample (expecting more)

No motifs found: MEME-ChIP returns no significant motifs

Also seeing low alignment percentage of about 50 - 70 across samples after bowtie2

High duplicate reads in both sample and control - 40 to 50 per cent before and after alignment. I performed dedup but many CUTRUN analysis methods mention to retain duplicates.

Questions:

Should I remove the <120bp size filter for Cut&Run peak calling?

Should i not remove duplicates? In some cut and run analysis I have seen that they recommend not to remove. https://doi.org/10.1186/s13059-019-1802-4

Are there better motif discovery tools than MEME-ChIP for novel TF targets with few peaks?

Could MACS2 parameters be too stringent?

macs2 cutandrun chipseq • 1.1k views

ADD COMMENT • link updated 18 days ago by jared.andrews07 ★ 19k • written 27 days ago by sk ▴ 20

GenoMax · Answer 1 · 2025-05-28

2

Entering edit mode

26 days ago

ATpoint 88k

Show an IGV screenshot. Probably your IP simply did not work and data are not usable.

Libraries have excellent quality (0.7% duplication rate before peak calling)

I assume you mean duplicates marked in the BAM file? If so, then this is more a sign of a bad library because enriched libraries like CUT&RUN typically have not that low of a duplication percentage. That low is expected when sequencing genomic DNA at the typical C&R depth, indicating that you have indeed a failed IP and your library is mainly gDNA background.

High duplicate reads in both sample and control - 40 to 50 per cent before and after alignment. I performed dedup but many CUTRUN analysis methods mention to retain duplicates.

How is this in line with the quote above?

Should i not remove duplicates? In some cut and run analysis I have seen that they recommend not to remove. https://doi.org/10.1186/s13059-019-1802-4

For peak calling I would always remove them, despite if the difference between number of peaks with and without duplicates is "day and night" it indicates that something is very wrong.

Are there better motif discovery tools than MEME-ChIP for novel TF targets with few peaks?

The problem is almost certainly data quality and not software, and this cannot be rescued in silico.

ADD COMMENT • link 26 days ago by ATpoint 88k

0

Entering edit mode

Thank you for the insightful feedback. You're absolutely right, and I'd like to clarify the duplication rates:

Cut&Run data for a transcription factor with the following setup:

Multiple time points with replicates

IgG control processed identically as a BAM file as control for macs2

Raw sample files each had approximately 45 million reads and decent quality except high duplication rate in the samples (about 30 to 40 percent in most files). I have read that this is fairly common for CUT&RUN data but usually goes up to 10 to 15 percent.

Further cleanup was performed with Trimmomatic with a loss of about 10 percent total reads.

Alignment with bowtie2 to hg38 reference gave stats: (all bam are around these thresholds).

The dovetail setting was allowed.

About 50 to 60 percent of all (trimmed) reads were mapped.

I also filtered for: "isProperPair" "true", "mapQuality" ">=30", "reference" "not chrM" "Insert size less than 120".

The last filter was based on input from Henikoff et al., 2017 which mentions that less than 120 bp fragments represent cleavages around TF binding sites.

Post deduplication, there were about 10 to 15 million reads (with 0.7 percent duplication). Peak calling with macs2 (BAMPE, with IgG control BAM that I ran through the same steps) gave:

5 peaks for most samples. The third time point has 9 to 15 peaks per replicate. Cross-correlation graphs are weak.

I'm not too sure what to make of this data, especially since it looks like data quality is fine. Any insight or advice would be greatly appreciated.

ADD REPLY • link 26 days ago by sk ▴ 20

0

Entering edit mode

As I said, check data on the IGV (for example use deeptools bamCoverage on the bm files with a bin size of 1 to make bigwigs), see whether peaks are present or whether it's mostly noise. That alone is the QC that is relevant. Seems like data is not usable from what I read.

ADD REPLY • link 25 days ago by ATpoint 88k

0

Entering edit mode

I re-ran the CUT&RUN data analysis pipeline with adjusted filtering criteria and used a finer bin size of 5 for BAM coverage generation. The IgG control was processed through the same pipeline. I have here two files:

overall - https://drive.google.com/file/d/1HByi7IkcegxzLly1o2YJ5f5L87wrxTjR/view?pli=1

chr7 at a peak - https://drive.google.com/file/d/15UkyvDZkl6RKOI40ft6kZuv40J8rnJYg/view

I just used this specific location to display the peak calls vs the coverage as an example.

This is my first time analyzing such data, so I am unsure of how to interpret the peaks I obtained. I obtained about 50-200 peaks across the replicates. I am also trying to use bed tools intersect to see overlapping peaks.

Additional info:

Trimming removed about 10% of reads Alignment statistics were similar to the previous run - 60 per cent overall mapping Below is data of the correlation matrix after deduplication for one of the samples. Other samples also showed values of 0.7-0.8

#plotCorrelation --outFileCorMatrix
'R1_001_1'R1_001_2' 'R1_001_3'
'R1_001_1'  1.0000  0.9570  0.9332
'R1_001_2'  0.9570  1.0000  0.9721
'R1_001_3'  0.9332  0.9721  1.0000

ADD REPLY • link updated 21 days ago by GenoMax 152k • written 21 days ago by sk ▴ 20

0

Entering edit mode

As ATpoint has repeatedly said, actually look at the data. What does it look like in IGV? If you cannot see numerous obvious peaks by eye, then it is likely the assay failed or is of poor quality. Low numbers of peaks called is indicative of a failed assay.

People are also unlikely to download arbitrary files from dropbox or whatever, embed the images in your post.

ADD REPLY • link 21 days ago by jared.andrews07 ★ 19k

0

Entering edit mode

overall view

view at a peak location called by macs2

I have inserted those images below for reference. I just need some help with understanding if this data makes sense or if it is noise. I do see peaks at locations where MACS2 has identified them, but yes, the number of peaks for each replicate areabout 50-200. I do not have previous literature for this dataset to see where I might be able to observe peaks.

ADD REPLY • link 21 days ago by sk ▴ 20

0

Entering edit mode

Thanks.

Can you show an intermediate resolution, e.g. a few 100 kb of elements you'd expect this TF to bind (promoters, enhancers, whatevre)?

This looks highly artifactual to me, but the scale being rather blown out by the one huge peak makes it tough to tell.

ADD REPLY • link 21 days ago by jared.andrews07 ★ 19k

0

Entering edit mode

enter image description here

This is around 16kb near the exon 1 of a gene of interest and all the replicates. However, MACS2 did not call a peak here.

I also have another image below, at a region where it did call a peak

enter image description here

![enter image description here][3]

enter image description here

ADD REPLY • link 21 days ago by sk ▴ 20

1

Entering edit mode

There is mostly noise and no clear signal. It suggests that, as said before, IP was bad.

ADD REPLY • link 20 days ago by ATpoint 88k

0

Entering edit mode

Is there any other way to confirm this (quantitatively)? I am trying to conclude this analyses/see if I can find useful information.

ADD REPLY • link 18 days ago by sk ▴ 20

0

Entering edit mode

The low number of peaks called is your confirmation. You've visually confirmed there's no real issue with peak calling in your data - there are just no peaks to call. I don't know what else one would expect as confirmation.

I'd ensure your antibody actually works via orthogonal methods. And/or run a side by side with an antibody that your for sure know works for CUTandRUN the next time you try the assay.

ADD REPLY • link 18 days ago by jared.andrews07 ★ 19k