Question: NarrowPeak format of ChiP-seq
2
gravatar for liu4gre
4.8 years ago by
liu4gre200
United States
liu4gre200 wrote:

I just learn to understanding ENCODE ChiP-Seq data for Transcription Factor binding. I looked at the narrowpeak files and find there is a column named "Score". Is this the tag density indicating the binding affinity of TF at this site or region? If not, how can I get the tag density (or binding affinity)?

chip-seq tag density • 11k views
ADD COMMENTlink modified 4.8 years ago by Sukhdeep Singh9.6k • written 4.8 years ago by liu4gre200
6
gravatar for Sukhdeep Singh
4.8 years ago by
Sukhdeep Singh9.6k
Netherlands
Sukhdeep Singh9.6k wrote:

ENCODE narrowPeak: Narrow (or Point-Source) Peaks format

This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+4 format.

  1. chrom - Name of the chromosome (or contig, scaffold, etc.).
  2. chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
  3. chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined aschromStart=0, chromEnd=100, and span the bases numbered 0-99.
  4. name - Name given to a region (preferably unique). Use '.' if no name is assigned.
  5. score - Indicates how dark the peak will be displayed in the browser (0-1000). If all scores were '0' when the data were submitted to the DCC, the DCC assigned scores 1-1000 based on signal value. Ideally the average signalValue per base spread is between 100-1000.
  6. strand - +/- to denote strand or orientation (whenever applicable). Use '.' if no orientation is assigned.
  7. signalValue - Measurement of overall (usually, average) enrichment for the region.
  8. pValue - Measurement of statistical significance (-log10). Use -1 if no pValue is assigned.
  9. qValue - Measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned.
  10. peak - Point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called.

Here is an example of narrowPeak format:

track type=narrowPeak visibility=3 db=hg19 name="nPk" description="ENCODE narrowPeak Example"
browser position chr1:9356000-9365000
chr1    9356548 9356648 .       0       .       182     5.0945  -1  50
chr1    9358722 9358822 .       0       .       91      4.6052  -1  40
chr1    9361082 9361182 .       0       .       182     9.2103  -1  75

 

Source : https://genome.ucsc.edu/FAQ/FAQformat.html#format12

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Sukhdeep Singh9.6k

Thanks. So does it mean the signalValue is the tag density? I looked through a few samples, and the values are always integer, is it true?

One more question is how to merge information from replicates? Apparently they always don't have the same regions. What kind of regions from replicates can be treated as the same region/site?

Thanks.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by liu4gre200

Hi, I somehow missed this. Yes, signalValue is the tag density.

For merging replicates, you can
1) Merge fastq files, if they are technical replicates (not the best)
2) Analyse seperately, and use bedtools intersectBed to find the overlapping regions, either on mapped bed files or significant binding sites (this is much better)
3) Calculate the tagDensity for a specific locus (TSS +/-3KB etc) and now you can compare both samples, as they have same locus, you can merge or average them, but dont forget to normalize by read or sequencing depth.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Sukhdeep Singh9.6k

Thanks for replying. I come back to read your replying again, and have another question. Is it reasonable to calculate the binding difference between two TFs at the same position by subtract the signalValue of one TF from another one? Thanks.

ADD REPLYlink written 4.3 years ago by liu4gre200

yeah, its feasible. Better is to define a genomic locus and caluclate area under the curve normalized by the read depth and then compare.

ADD REPLYlink written 3.6 years ago by Sukhdeep Singh9.6k

Dear Moderator : I got question about the narrow peaks format. In general, BED file defined as chromName / chromStart / chromEnd / strand / Name /Score / ..., where score column refers to significance value of peak signal. However, I need to convert score column as p-value ( format of pvalue could be 1 base, 10 based, 100 based) . How can I achieve desired format of peak' p-value while add it as new metadata ? Could you give me possible idea please ? Thanks a lot :)

ADD REPLYlink written 2.3 years ago by Jurat Shahidin40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 753 users visited in the last hour