3.7 years ago by
Ah, Roadmap data. It's a bit of a head-ache to understand it sometimes, but bear with me:
What you are looking at is raw (unconsolidated) input that was output by Pash mapper. It is a misformated bed file. All values in start column should have one subtracted from their coordinate (not respecting the strand), i.e. it should be:
chr1 9983 10183 B09JPABXX110526:5:1101:19182:47634 0 -
chr1 9989 10189 B09JPABXX110526:5:2207:9781:41112 0 -
I am not really sure about what name parameter encodes or why the score is zero...
Once you fix this (bedtools slop -l 1 -r 0 -g hg19.genome), just pass the fixed bed-file to some sort of pileup tool (for instance, MACS).
The way ROADMAP does it:
- They shorten all the reads back to 36 (so its consistent across all experiments)
- They filter duplicate reads, and reads that could not be mapped to the genome at all had they been 36bp long
- They estimate fragment length using SPP and run macs2 with appropriate fragment lengths (--nomodel --ext-size=fragment_length) to generate both peak lists and the signal/foldchange track.
I really suggest you skip doing the preprocessing yourself and just get the data from their download page
Namely you want to look at c (peak calling) or section d (signal tracks). Either go with consolidated reads for your cell line (consolidated = all technical/biological replicates lumped together, recommended), or with unconsolidated (which is what you are looking at at the moment).