NarrowPeak format of ChiP-seq
1
2
Entering edit mode
9.9 years ago
liu4gre ▴ 210

I just learn to understanding ENCODE ChiP-Seq data for Transcription Factor binding. I looked at the narrowpeak files and find there is a column named "Score". Is this the tag density indicating the binding affinity of TF at this site or region? If not, how can I get the tag density (or binding affinity)?

ChIP-Seq tag-density • 31k views
ADD COMMENT
12
Entering edit mode
9.9 years ago

ENCODE narrowPeak: Narrow (or Point-Source) Peaks format

This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+4 format.

  1. chrom - Name of the chromosome (or contig, scaffold, etc.).
  2. chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
  3. chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined aschromStart=0, chromEnd=100, and span the bases numbered 0-99.
  4. name - Name given to a region (preferably unique). Use '.' if no name is assigned.
  5. score - Indicates how dark the peak will be displayed in the browser (0-1000). If all scores were '0' when the data were submitted to the DCC, the DCC assigned scores 1-1000 based on signal value. Ideally the average signalValue per base spread is between 100-1000.
  6. strand - +/- to denote strand or orientation (whenever applicable). Use '.' if no orientation is assigned.
  7. signalValue - Measurement of overall (usually, average) enrichment for the region.
  8. pValue - Measurement of statistical significance (-log10). Use -1 if no pValue is assigned.
  9. qValue - Measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned.
  10. peak - Point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called.

Here is an example of narrowPeak format:

track type=narrowPeak visibility=3 db=hg19 name="nPk" description="ENCODE narrowPeak Example"
browser position chr1:9356000-9365000
chr1    9356548 9356648 .       0       .       182     5.0945  -1  50
chr1    9358722 9358822 .       0       .       91      4.6052  -1  40
chr1    9361082 9361182 .       0       .       182     9.2103  -1  75

Source: https://genome.ucsc.edu/FAQ/FAQformat.html#format12

ADD COMMENT
0
Entering edit mode

Thanks. So does it mean the signalValue is the tag density? I looked through a few samples, and the values are always integer, is it true?

One more question is how to merge information from replicates? Apparently they always don't have the same regions. What kind of regions from replicates can be treated as the same region/site?

Thanks.

ADD REPLY
0
Entering edit mode

Hi, I somehow missed this. Yes, signalValue is the tag density.

For merging replicates, you can
1) Merge fastq files, if they are technical replicates (not the best)
2) Analyse seperately, and use bedtools intersectBed to find the overlapping regions, either on mapped bed files or significant binding sites (this is much better)
3) Calculate the tagDensity for a specific locus (TSS +/-3KB etc) and now you can compare both samples, as they have same locus, you can merge or average them, but dont forget to normalize by read or sequencing depth.

ADD REPLY
0
Entering edit mode

Thanks for replying. I come back to read your replying again, and have another question. Is it reasonable to calculate the binding difference between two TFs at the same position by subtract the signalValue of one TF from another one? Thanks.

ADD REPLY
0
Entering edit mode

yeah, its feasible. Better is to define a genomic locus and caluclate area under the curve normalized by the read depth and then compare.

ADD REPLY
0
Entering edit mode

Dear Moderator : I got question about the narrow peaks format. In general, BED file defined as chromName / chromStart / chromEnd / strand / Name /Score / ..., where score column refers to significance value of peak signal. However, I need to convert score column as p-value ( format of pvalue could be 1 base, 10 based, 100 based) . How can I achieve desired format of peak' p-value while add it as new metadata ? Could you give me possible idea please ? Thanks a lot :)

ADD REPLY
0
Entering edit mode

Hi , can you please explain more about the "peak" field?

ADD REPLY
0
Entering edit mode

the position of highest intensity of that marker proein like in case of h3k36me3 the point of highest methylation.

ADD REPLY

Login before adding your answer.

Traffic: 2800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6