3.8 years ago

The following is a snippet of real-world BED data derived from ENCODE experiments. raw DNaseI hypersensitivity signal for the human K562 cell line (region chr21:33031165-33032485, assembly GRCh37/h19 and table wgEncodeUwDnaseK562RawRep1 from the UCSC Genome Browser).

chr21   33031165        33031185        map-1   1.000000
chr21   33031185        33031205        map-2   3.000000
chr21   33031205        33031225        map-3   3.000000
chr21   33031225        33031245        map-4   3.000000
chr21   33031245        33031265        map-5   3.000000
chr21   33031265        33031285        map-6   5.000000
chr21   33031285        33031305        map-7   7.000000
chr21   33031305        33031325        map-8   7.000000
chr21   33031325        33031345        map-9   8.000000
chr21   33031345        33031365        map-10  14.000000
chr21   33031365        33031385        map-11  15.000000


What does the fifth column represent in the above mentioned snippet?

chr1    10069   10268   62BTPAAXX101115:2:91:9786:16480 -
chr1    10070   10269   62BTPAAXX101115:2:56:4378:18049 -
chr1    10078   10277   62BTPAAXX101115:2:90:5029:16617 -
chr1    10084   10283   62BTPAAXX101115:2:46:18739:4855 -
chr1    10150   10349   62BTPAAXX101115:2:16:18355:20525    -
chr1    10270   10469   62BTPAAXX101115:2:30:11752:10409    -
chr1    12761   12960   62BTPAAXX101115:2:3:3059:16154  -
chr1    12896   13095   62BTPAAXX101115:2:33:5384:2796  -
chr1    12898   13097   62BTPAAXX101115:2:81:17428:14803    -
chr1    15503   15702   62BTPAAXX101115:2:48:16108:4560 -


The following snippet was downloaded from Roadmap Epigenomics Project. Which colon separated value in the second snippet corresponds to the fifth column in the first snippet? I was unable to find the file description for both so kindly share your thoughts. Thanks in advance.

3.8 years ago

The fifth column is the signal intensity. Just plot it and it will look like a wave, with peaks where DNaseI hypersensitivity signal for the human K562 cell-line was observed. The format is bedgraph.

Kevin

Alright thanks for that. Does the second snippet contain signal intensity too?

Please show the exact source of the download. Did you look at the FAQ at the source website?

I could not find it in the FAQ.

Please provide the exact location from where you downloaded the second file... and also the file's name. There are 100s of files accessible via the URL to which you pointed, and I unfortunately just don't have the ability to read your mind.

In addition, it looks like a malformed BED file because it is missing a column between the penultimate and final column.

Take a look here: A: histone mark data from roadmap epigenomics

File name : GSM669906 location link

I've downloaded multiple files and all of them are having similar columns. I don't think that they're malformed. Second thoughts?

Hey, please check the other link that I posted ( A: histone mark data from roadmap epigenomics ). I do not think that you need this second file - those RoadMap files seem to relate to the raw reads and do not contain information on signal intensity. It is the first file that is more important.

If there is still some confusion, better to email the RoadMap team directly.

