intersect question of H3k4me3, H3k27Ac, H3k27me3, calculating the consensus
1
0
Entering edit mode
2 days ago
theodore ▴ 130

So I would like to calculate the consensus of differential binding sites from 3 different histone marks. I am getting a few (not so many) false overlapping H3k4me3 and the rest overlapping areas that are not as expected. The issue is probably, due to the fact that the algorythm expands the peaks. So, I would like to know if there is a way to: A. first merge H3k27ac and H3k4me3 peaks, that will also post peaks from both sides that do not overlap/intersect, I assume this can be done with bedtools intersect (BUT, I can not make it post both -a or -b even with -wao B. and then intersect the produced bed file and output these that overlap 100% as common, but not what is not overlapping as 100%. These peaks should be separated like in the images. Thank you all image1 image2

histonemarks peakcalling intersect • 277 views
ADD COMMENT
1
Entering edit mode
1 day ago

To compute the consensus overlapping regions across your three histone mark peak files (H3K4me3, H3K27ac, H3K27me3) while handling partial overlaps, expansions, and separating regions by coverage (e.g., unique, pairwise, all-three like in your images), skip manual pairwise merging/intersection. Use bedtools multiinter directly on all three sorted BED files—it automatically splits into disjoint minimal regions and annotates coverage from each input.

Step 1: Run multiinter

Assume your sorted peak files are h3k4me3_peaks.bed, h3k27ac_peaks.bed, and h3k27me3_peaks.bed (chrom/start/end format; sort them first with sort -k1,1 -k2,2n if needed).

bedtools multiinter -i h3k4me3_peaks.bed h3k27ac_peaks.bed h3k27me3_peaks.bed > consensus_regions.bed
  • Output columns: chrom, start, end, h3k4me3_coverage (0/1), h3k27ac_coverage (0/1), h3k27me3_coverage (0/1).
  • Regions are non-overlapping; coverage=1 means fully covered by that mark's peak(s).
  • This captures your "false overlaps" by only counting true intersections (ignores expansions unless they truly overlap).

Step 2: Separate regions by consensus (e.g., via awk)

Filter for desired categories (adjust column numbers if your BED has extra columns):

  • All three marks (100% consensus overlap):

    awk '($4==1 && $5==1 && $6==1)' consensus_regions.bed > all_three.bed
    
  • H3K4me3 + H3K27ac only (no H3K27me3):

    awk '($4==1 && $5==1 && $6==0)' consensus_regions.bed > me3_ac_only.bed
    
  • H3K4me3 + H3K27me3 only:

    awk '($4==1 && $5==0 && $6==1)' consensus_regions.bed > me3_me3m_only.bed
    
  • H3K27ac + H3K27me3 only:

    awk '($4==0 && $5==1 && $6==1)' consensus_regions.bed > ac_me3m_only.bed
    
  • H3K4me3 only:

    awk '($4==1 && $5==0 && $6==0)' consensus_regions.bed > me3_only.bed
    
  • H3K27ac only:

    awk '($4==0 && $5==1 && $6==0)' consensus_regions.bed > ac_only.bed
    
  • H3K27me3 only:

    awk '($4==0 && $5==0 && $6==1)' consensus_regions.bed > me3m_only.bed
    
  • No coverage (ignore):

    awk '($4==0 && $5==0 && $6==0)' consensus_regions.bed > none.bed  # Usually empty
    

Notes

  • No need for -f 1.0 or -wa/-wb/-wao hacks—the coverage columns handle "100% overlap" per region.
  • If peaks are expanded too much, consider shrinking them first: bedtools slop -i peaks.bed -g genome.sizes -b -50 (adjust -50 to contract by 50bp).
  • For your pairwise merge issue in A: multiinter on just two files gives the union/split directly (e.g., -i me3.bed ac.bed), but doing all three avoids extra steps.
  • Visualize in IGV/UCSC to match your images.

This should give clean, separated regions without false positives. If you share sample BED snippets, I can refine.

ADD COMMENT
2
Entering edit mode

Thank you Kevin, Is bedtools multiinter the successor of join? I remember that there was a bedttols tools that I can not find any more that I was also using it for venn or upset plots. The files are here:

I performed your recommendation and not getting exactly what I was looking for: blueIsTheConsensus

ADD REPLY
0
Entering edit mode

I ended up shrinking with removing something like 500 bps (with bedtools slop) of the H3k27me3 peaks and manually remove some that were not correctly called. It looks a bit better like that , and the intersected peaks make more sense based on the RNAseq and also the peak quality of the H3K27me3 landscape Vs the H3K27ac and H3k4me3.

ADD REPLY

Login before adding your answer.

Traffic: 4130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6