Hi everyone,
I am analyzing chip-seq data for 60 samples. I am using MACS for peak calling and I have some questions regarding different outputs. Just to make it easier, I will give an example using two of my chip-seq datasets - 1 is control and 1 treatment. The control sample has been sequenced twice (replicates, different lanes) and therefore I have two fastq files for control. According to the protocol given in this paper, we should merge the replicates using samtools after mapping so that we can have a single file for control. I got ~400 peaks after using the merged file as control (lets say this as Analysis A). I did the analysis again (say Analysis B) and this time, I used only one of the file from my two control files (just took one file randomly). I got ~200 peaks using MACS.
Questions
FDR values in Analysis B are much higher as compared to Analysis A. Not even a single peak in Analysis B has FDR below 25% whereas in Analysis A all FDR values are less than 5%. Is it normal to see such a difference? Which output would you prefer - Analysis A or B?
Could anyone please explain, how the FDR values are calculated by MACS? I have read it in the above mentioned paper but I did not understand. May be an example would help (I read one example from the paper).
Thanks in advance.