Question

How to interpret Roadmap epigenome reprocessed data

0

Entering edit mode

8.2 years ago

morteza.mahmoudisaber ▴ 80

I am trying to analyze reprocessed consolidated roadmap epigenome data and I have downloaded the data regarding H3K4ME1 in Male fetal brain from:

http://egg2.wustl.edu/roadmap/data/byFileType/alignments/consolidated/E081-H3K4me1.tagAlign.gz

but looking at the line contents, its a bit confusing how to interpret the data since 1) the files have no header for each column, 2) column 3 is always just 'N' and 3) column 4 is always '1000' and 4) There are sometime overlap between coordinate of different lines.

here are first 5 lines:

chr1 10149 10185 N 1000 +

chr1 10153 10189 N 1000 +

chr1 10239 10275 N 1000 -

chr1 10314 10350 N 1000 -

chr1 13043 13079 N 1000 +

I searched a lot to figure out what each column represent and how the data should be interpreted but I could not find any good tutorial for that.
I appreciate if anybody could help with this .

Roadmap epigenome data • 1.5k views

ADD COMMENT • link updated 8.2 years ago by Ar ★ 1.1k • written 8.2 years ago by morteza.mahmoudisaber ▴ 80

score 0 · Answer 1 · 2016-06-10

The files are in bed format or tagalign format depending upon the suffix of the file name. None of these files have header since, the format is uniform. Column 4 is always the name of the region, N here means no names. Column 5 is the always score. The score can be between 0-1000. The overlap between the coordinates tells you that these are reads (not peak files) or they are directly converted to bed from bam format.