I am new to Bioinformatics, with minimal experience in Terminal or scripting languages. There is a large chance that I will not be able to convey what I want to accomplish, please ask questions so that I may clarify.
Statement of Problem: I have run ChIP-Seq experiments on several enhancers and protein markers (I will refer to all of these as marks). I have already gone ahead and aligned the data using Bowtie, and then called peaks using MACS2. As a follow-up to a previous paper published by another lab member I want to find the co-occupancy of three of these marks in relation to the transcription start site (TSS) to determine how many of these peaks overlap and then create a venn diagram to show these overlaps / intersections (I am not sure on the terminology). The problem is I am not exactly sure how to do this and after almost 8+ hours of attempts I have gotten absolutely no-where, though I will post my attempts.
For example, from the Pol II peaks I have identified at TSS or intergenic regions, now I would like to define overlays with other factors/marks. MY PEAK FILES ARE IN .BED FORMAT.
I want to look at overlay between Pol II peaks and H3K4me1/H3K27Ac at intergenic regions.
How do I accomplish my goals?
Previous Study Information: As mentioned earlier an ex-lab member wrote a paper and was able to achieve this. He left behind a python script which I will copy down in a bit, but I can't seem to get it to work, nor do I really want to. I don't want to blindly trust the output of the script without knowing how and what it is doing. The image below is from the ex-lab member's paper and is basically what I would like to accomplish:
I was able to run the Python script the ex-lab member used to generate these venn diagrams for Polymerase II only, and from what I was able to understand the script basically outputs a "total # of intersections" line that he then plugged into somewhere in order to generate these diagrams.
I have uploaded the script to github so that you may view it here:https://github.com/Vov-Bio/findcooccupancy.py/blob/master/Script
What I Have Done: I want to provide a bit of what I have done so far, not only to show that I have attempted to solve the problem, but so that someone can steer me in the right direction.
First thing I did was research and see if I could find any information on how to overlap my data. The only solid topic I could find was here: How to determine genome wide Co-occupancy between two transcription factors?
In this post someone someone says "you can make venn diagram to show common and unique peaks of two transcription factors and could also make a line plot around TSS showing average occupancy of both" using https://usegalaxy.org/
This defines what I want to do almost perfectly.
Other searches brought up two interesting options:
I attempted to use BEDtools multiintersectBED function but could not find the peak intersection counts that I wanted.
I then attempted to use Homer mergePeaks function to find peak intersection counts and got 'something', but I am almost positive it is incorrect.
Essentially I ran the following function:
mergePeaks -d 250 -matrix -ghist ~/data/TSSforAllAnnotatedGenes.txt ~/data/Pol-II-Chip.MACS2_summits.bed
/User/data/TSSforAllAnnotatedGenes.txt 0 13062
/User/data/Pol-II-Chip.MACS2_summits.bed 11608 0
I assumed that 11,608 would be my Pol-II number for my venn diagram, but after attempting to run the same command for other peak files I am not so sure.
Any help is appreciated.