Entering edit mode
9.4 years ago
msharmin
•
0
Hi all,
The last column of the narrow peak format of Chip-Seq data is called point source. What is the purpose of this column? Couple of friends told me it might mean the tip of the peak. However, when I looked at the data I found point source often falls outside the narrow peak region itself. I am giving an example below...
> head(tfbs)
chr start end name score strand value p q point
1 chr3 119187486 119188061 . 1000 . 437.4133 -1 4.184663 283
2 chr9 115818601 115819287 . 1000 . 437.3500 -1 4.184663 417
3 chr19 37663509 37664035 . 1000 . 426.5352 -1 4.184663 264
4 chr1 161195562 161196244 . 1000 . 423.0543 -1 4.184663 280
5 chr7 102789692 102790388 . 1000 . 414.2319 -1 4.184663 373
6 chr12 72057639 72058324 . 1000 . 407.5873 -1 4.184663 388
> sum((tfbs$end-tfbs$start)>tfbs$point)
[1] 10255
Thanks in advance for help.
While it's supposed to be the 0-based offset from column 2 of the peak, that's apparently not being adhered to in the dataset you're looking at. Where did you get the file?
It's directly downloaded from encode experimental matrix. Following is detail info...this is just an example...other narrow peak files give me the same concern.
A549 (EtOH .02) TFBS Uniform Peaks of YY1_(SC-281) ENCODE/HudsonAlpha/Analysis
File URL: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsHaibA549Yy1cV0422111Etoh02UniPk.narrowPeak.gz