Hi everyone!
I am using macs 1.4 to perform peak calling on a gata1 transcription factor ChIP-Seq experiment. It's just a didactic exercise. I got in output both the two files: gata1_peaks.xls and gata1_negative_peaks.xls. I know that negative peaks are called by swapping the treatment with the control and that they are to be considered as false positive peaks, but still I cannot figure out the rationale behind this. I mean: regions in gata1_negative_peaks.xls are not present in gata1_peaks.xls, so how are they supposed to be handled? What is the real information negative peaks bring about?
Is it the case that regions in gata1_peaks.xls are only the ones corresponding to peaks which are not negatively called?
The question arose from the need to evaluate an average overall FDR. I thought I could compute an empirical proxy as the ratio between the false positive peaks found and the overall number of peaks called. Then I wondered whether gata1_peaks.xls contains all the peaks called or only the 'true positves'. In the former case I could evaluate the overall FDR as (number of regions is gata1_negative_peaks.xls) / (number of regions in gata1_peaks.xls), while in the latter case I should do like (number of regions is gata1_negative_peaks.xls) / (number of regions in gata1_peaks.xls – number of regions is gata1_negative_peaks.xls).
Could someone please help me?
The developers of MACS appear to have dealt with this issue in MACS2. Please see the response by the developer here: https://github.com/taoliu/MACS/issues/21
Have you tried MACS2 to see what you get?