How to interpret Roadmap epigenome reprocessed data
1
0
Entering edit mode
7.9 years ago

I am trying to analyze reprocessed consolidated roadmap epigenome data and I have downloaded the data regarding H3K4ME1 in Male fetal brain from:

http://egg2.wustl.edu/roadmap/data/byFileType/alignments/consolidated/E081-H3K4me1.tagAlign.gz

but looking at the line contents, its a bit confusing how to interpret the data since 1) the files have no header for each column, 2) column 3 is always just 'N' and 3) column 4 is always '1000' and 4) There are sometime overlap between coordinate of different lines.

here are first 5 lines:

chr1 10149 10185 N 1000 +

chr1 10153 10189 N 1000 +

chr1 10239 10275 N 1000 -

chr1 10314 10350 N 1000 -

chr1 13043 13079 N 1000 +

I searched a lot to figure out what each column represent and how the data should be interpreted but I could not find any good tutorial for that.
I appreciate if anybody could help with this .

Roadmap epigenome data • 1.4k views
ADD COMMENT
0
Entering edit mode
7.9 years ago
Ar ★ 1.1k

The files are in bed format or tagalign format depending upon the suffix of the file name. None of these files have header since, the format is uniform. Column 4 is always the name of the region, N here means no names. Column 5 is the always score. The score can be between 0-1000. The overlap between the coordinates tells you that these are reads (not peak files) or they are directly converted to bed from bam format.

ADD COMMENT

Login before adding your answer.

Traffic: 2568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6