Hi, I used "macs2" to call peaks from my data of ChIP-seq. This is not my first time to use macs2, but still found myself not being able to grasp what "gappedPeak" stands for in the OUTPUT of macs2.
"NAME_peaks.narrowPeak" , "NAME_peaks.broadPeak" are quite intuitive, "narrowPeak" means narrow peaks which is suitable for TFs "broadPeak" means broad peaks which is suitable for histone modifications spanning wider ranges of genomic regions.
But how about "gappedPeak"?
In GitHub of macs2:
"NAME_peaks.gappedPeak is in BED12+3 format which contains both the broad region and narrow peaks." it seems gappedPeaks contains both categories (narrow & broad), if that is the case, where the gaps come from?
and https://genome.ucsc.edu/FAQ/FAQformat#format14 for ENCODE gapped peaks ( I assumed that those peaks are called using macs) it explained: "regions of signal enrichment based on pooled, normalized (interpreted) data where the regions may be spliced or incorporate gaps in the genomic sequence" "regions may be spliced or incorporate gaps" I could understand RNA being spliced, but for DNA?
Anyone could explain?
[Jun@host workingdirectory]$ less Histonemark_cellA_peaks.broadPeak Chrom ChromStart ChromEnd name score strand signalValue pValue qValue chr1 4775387 4776044 Histonemark_cellA_peak_1 41 . 3.22266 5.57941 4.13770 chr1 4847525 4848363 Histonemark_cellA_peak_2 38 . 3.03717 5.39983 3.82081 chr1 5073148 5073709 Histonemark_cellA_peak_3 31 . 3.02635 4.72286 3.10498 [Jun@host workingdirectory]$ less Histonemark_cellA_peaks.gappedPeak Chrom ChromStart ChromEnd name score strand thickStart thickEnd itemRgb blockCount blockSizes blockStarts signalValue pValue qValue chr1 4775387 4776044 Histonemark_cellA_peak_1 41 . 4775387 4776044 0 2 645,1 0,656 3.22266 5.57941 4.13770
Of course, to understand a file, it is always better to look insides of it. By looking the inside of the broadPeak file and gappedPeak file, I realized that the key is to understand what is "thickStart"/"thickEnd". Then I found a post trying to address that but I found myself still being unable to understand. Especially "Thickstart and thickend are the left and the right boundaries of the coding sequence. " explained by Ido Tamir made me more confused. What does "boundaries of the coding sequence" means in the context of ChIP-seq?