How to interpret Roadmap epigenome reprocessed data
7.7 years ago

I am trying to analyze reprocessed consolidated roadmap epigenome data and I have downloaded the data regarding H3K4ME1 in Male fetal brain from:

but looking at the line contents, its a bit confusing how to interpret the data since 1) the files have no header for each column, 2) column 3 is always just 'N' and 3) column 4 is always '1000' and 4) There are sometime overlap between coordinate of different lines.

here are first 5 lines:

chr1 10149 10185 N 1000 +

chr1 10153 10189 N 1000 +

chr1 10239 10275 N 1000 -

chr1 10314 10350 N 1000 -

chr1 13043 13079 N 1000 +

I searched a lot to figure out what each column represent and how the data should be interpreted but I could not find any good tutorial for that.
I appreciate if anybody could help with this .

Roadmap epigenome data • 1.4k views
7.7 years ago
Ar ★ 1.1k

The files are in bed format or tagalign format depending upon the suffix of the file name. None of these files have header since, the format is uniform. Column 4 is always the name of the region, N here means no names. Column 5 is the always score. The score can be between 0-1000. The overlap between the coordinates tells you that these are reads (not peak files) or they are directly converted to bed from bam format.


