Identification of commonly bound regions by TFs/histone modification/other features
1
0
Entering edit mode
7.2 years ago

Hi I guess this is a sort of naive question. In some papers, there are some Venn diagrams showing the overlap of TFs/Histone modifications bound peaks. I was wondering how they get the overlap. I checked the methods, but could not get the information, maybe this is too rudimentary that people even do not bother to write it down in the method part.

Now I have a ChIP seq data for a DNA binding protein, and H3K27ac, and want to see to what extent they overlap with each other. To that end, I called the peaks by "macs2", with the option "--broadPeak", and tried to get the overlapped regions by:

bedtools intersect -a proteinA_cellX_peaks.broadPeaks -b H3K27ac_cellX_peaks.broadPeaks

Then I could get a long list of something like:

chr7    127471196  127472363  Pos1  0  +  127471196  127472363  255,0,0
...

my question(s) is(are): Is this this the right way to get commonly bound regions by two factors? instead of directly printing the result on the screen, could I wrap them up to an output bed.file? (I searched the bedtools manual, but to no avail.) Thanks in advance.

ChIP-Seq Bedtools sequence • 1.5k views
ADD COMMENT
1
Entering edit mode
7.2 years ago
EagleEye 7.5k
bedtools intersect -wao -a proteinA_cellX_peaks.broadPeaks -b H3K27ac_cellX_peaks.broadPeaks > common_regions.txt

The file 'common_regions.txt' will have overlapped peaks information.

Note: To avoid confusion make your input '.broadPeaks' with minimal information with no extra columns [chr,start,end,peak_name]

ADD COMMENT
0
Entering edit mode

@EagleEye. Thanks, it works! PS: I had another small issue. after I called:

bedtools intersect -wao -a proteinA_cellX_peaks.broadPeaks -b H3K27ac_cellX_peaks.broadPeaks > common_regions.txt

I trie to confirm the result by:

  wc -l proteinA_cellX_peaks.broadPeak 
    19559 proteinA_cellX_peaks.broadPeak 
    wc -l common_regions.txt 
    19604 common_regions.txt

the total intervals did not add up, why would this happen? how could I solve this discrepancy? Sorry to bother again...

ADD REPLY
0
Entering edit mode

Single peak in file 1 can match with multiple peaks from file 2. Example, if peak1F1 from file 1 matches with peak6F2 and peak10F2 in file 2, there will be two entries in the results for peak1F1 from file 1.

ADD REPLY
0
Entering edit mode

@EagleEye , Thanks for the explanation. I sort of fell into some logical trap. Normally, people present data like this: PeakA(a number) only --group a PeakB(a number) only --group b PeakA & PeakB (a number) --groub c like: enter image description here

I think it would be very easy to decide group a and b, PeakA and PeakB mutually does not intersect with any interval in other group. But how about group c? Image an extreme example, PeakA has only 2 peaks, and PeakA1 intersect with PeakB(1~n), then how much is group c? 1 or n ?

ADD REPLY

Login before adding your answer.

Traffic: 3003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6