Question: Identification of commonly bound regions by TFs/histone modification/other features
0
gravatar for Wet&DryImmunology
2.3 years ago by
Japan
Wet&DryImmunology200 wrote:

Hi I guess this is a sort of naive question. In some papers, there are some Venn diagrams showing the overlap of TFs/Histone modifications bound peaks. I was wondering how they get the overlap. I checked the methods, but could not get the information, maybe this is too rudimentary that people even do not bother to write it down in the method part.

Now I have a ChIP seq data for a DNA binding protein, and H3K27ac, and want to see to what extent they overlap with each other. To that end, I called the peaks by "macs2", with the option "--broadPeak", and tried to get the overlapped regions by:

bedtools intersect -a proteinA_cellX_peaks.broadPeaks -b H3K27ac_cellX_peaks.broadPeaks

Then I could get a long list of something like:

chr7    127471196  127472363  Pos1  0  +  127471196  127472363  255,0,0
...

my question(s) is(are): Is this this the right way to get commonly bound regions by two factors? instead of directly printing the result on the screen, could I wrap them up to an output bed.file? (I searched the bedtools manual, but to no avail.) Thanks in advance.

chip-seq sequence bedtools • 697 views
ADD COMMENTlink modified 2.3 years ago by EagleEye6.3k • written 2.3 years ago by Wet&DryImmunology200
1
gravatar for EagleEye
2.3 years ago by
EagleEye6.3k
Sweden
EagleEye6.3k wrote:
bedtools intersect -wao -a proteinA_cellX_peaks.broadPeaks -b H3K27ac_cellX_peaks.broadPeaks > common_regions.txt

The file 'common_regions.txt' will have overlapped peaks information.

Note: To avoid confusion make your input '.broadPeaks' with minimal information with no extra columns [chr,start,end,peak_name]

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by EagleEye6.3k

@EagleEye. Thanks, it works! PS: I had another small issue. after I called:

bedtools intersect -wao -a proteinA_cellX_peaks.broadPeaks -b H3K27ac_cellX_peaks.broadPeaks > common_regions.txt

I trie to confirm the result by:

  wc -l proteinA_cellX_peaks.broadPeak 
    19559 proteinA_cellX_peaks.broadPeak 
    wc -l common_regions.txt 
    19604 common_regions.txt

the total intervals did not add up, why would this happen? how could I solve this discrepancy? Sorry to bother again...

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Wet&DryImmunology200

Single peak in file 1 can match with multiple peaks from file 2. Example, if peak1F1 from file 1 matches with peak6F2 and peak10F2 in file 2, there will be two entries in the results for peak1F1 from file 1.

ADD REPLYlink written 2.3 years ago by EagleEye6.3k

@EagleEye , Thanks for the explanation. I sort of fell into some logical trap. Normally, people present data like this: PeakA(a number) only --group a PeakB(a number) only --group b PeakA & PeakB (a number) --groub c like: enter image description here

I think it would be very easy to decide group a and b, PeakA and PeakB mutually does not intersect with any interval in other group. But how about group c? Image an extreme example, PeakA has only 2 peaks, and PeakA1 intersect with PeakB(1~n), then how much is group c? 1 or n ?

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Wet&DryImmunology200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 780 users visited in the last hour