Question: What does "gappedPeak" mean
2
gravatar for Wet&DryImmunology
2.3 years ago by
Japan
Wet&DryImmunology210 wrote:

Hi, I used "macs2" to call peaks from my data of ChIP-seq. This is not my first time to use macs2, but still found myself not being able to grasp what "gappedPeak" stands for in the OUTPUT of macs2.

"NAME_peaks.narrowPeak" , "NAME_peaks.broadPeak" are quite intuitive, "narrowPeak" means narrow peaks which is suitable for TFs "broadPeak" means broad peaks which is suitable for histone modifications spanning wider ranges of genomic regions.

But how about "gappedPeak"?

In GitHub of macs2:

"NAME_peaks.gappedPeak is in BED12+3 format which contains both the broad region and narrow peaks." it seems gappedPeaks contains both categories (narrow & broad), if that is the case, where the gaps come from?

and https://genome.ucsc.edu/FAQ/FAQformat#format14 for ENCODE gapped peaks ( I assumed that those peaks are called using macs) it explained: "regions of signal enrichment based on pooled, normalized (interpreted) data where the regions may be spliced or incorporate gaps in the genomic sequence" "regions may be spliced or incorporate gaps" I could understand RNA being spliced, but for DNA?

Anyone could explain?

[Jun@host workingdirectory]$ less Histonemark_cellA_peaks.broadPeak 
Chrom ChromStart ChromEnd name                        score strand    signalValue pValue qValue
chr1    4775387 4776044 Histonemark_cellA_peak_1     41      .       3.22266 5.57941 4.13770
chr1    4847525 4848363 Histonemark_cellA_peak_2     38      .       3.03717 5.39983 3.82081
chr1    5073148 5073709 Histonemark_cellA_peak_3     31      .       3.02635 4.72286 3.10498

[Jun@host workingdirectory]$ less Histonemark_cellA_peaks.gappedPeak 
Chrom   ChromStart ChromEnd name                       score   strand thickStart thickEnd itemRgb blockCount blockSizes blockStarts signalValue pValue qValue
chr1    4775387 4776044 Histonemark_cellA_peak_1     41      .       4775387 4776044 0       2       645,1   0,656   3.22266 5.57941 4.13770

Of course, to understand a file, it is always better to look insides of it. By looking the inside of the broadPeak file and gappedPeak file, I realized that the key is to understand what is "thickStart"/"thickEnd". Then I found a post trying to address that but I found myself still being unable to understand. Especially "Thickstart and thickend are the left and the right boundaries of the coding sequence. " explained by Ido Tamir made me more confused. What does "boundaries of the coding sequence" means in the context of ChIP-seq?

chip-seq macs2 enocde • 1.6k views
ADD COMMENTlink modified 2.3 years ago by geek_y9.7k • written 2.3 years ago by Wet&DryImmunology210
3
gravatar for geek_y
2.3 years ago by
geek_y9.7k
Barcelona/CRG/London/Imperial
geek_y9.7k wrote:

GappedPeak is a representation of narrow peaks as blocks over a broad peak. To trick the visualisation tools, they use the same format as gene models, but use the narrow peak coordinates as exons coordinates and the broad peak coordinates as coding region coordinate.

ADD COMMENTlink written 2.3 years ago by geek_y9.7k

@Goutham Atla . Thanks, everything begins to make sense.

ADD REPLYlink written 2.3 years ago by Wet&DryImmunology210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1379 users visited in the last hour