I'll walk through the process of using the BEDOPS-based binning script
binReads.sh to generate a histogram of binned reads visualized on a UCSC Genome Browser instance. These instructions assume human (build
hg19) but just as easily work for assemblies of other organisms.
(1) Download and install the BEDOPS toolkit, which includes
sort-bed, conversion scripts and other utilities used in these instructions.
(2) Get the
hg19 version of the
chromInfo table from the UCSC Genome Browser.
Visit the UCSC Table Browser. With the
All Tables group selected, for example, select the
hg19 database and the
chromInfo table. Output all fields to a text file. (This step can also be performed with Kent-tools'
hgsql commands, if this needs automating.)
(3) Edit this text file (e.g. run
awk on it to put in the start coordinate) and pipe it to
sort-bed to turn it into a sorted BED file. Here's a ready-to-use example for
hg19 that I just made: https://dl.dropbox.com/u/31495717/chrList.bed Again, this step can be automated, but it is a file that won't need updating very often.
(4) Bin the BAM-formatted read data. For example, the following makes a 75 bp-windowed read count spaced in 20 bp bins, written to a Starch-formatted archive called
$ binReads.sh myReads.bam $PWD/result.starch 75 20 chrList.bed
You can adjust the size of windows and bins by changing the
20 parameters, resp.
The Starch file is just a very highly-compressed BED file. We made this format so that we could make the best use of our lab's storage capabilities. You can edit the
binReads.sh script to remove the
starch - call if you don't want the BED data to be compressed, which lets you skip step 4. Otherwise, we go on to the next step:
(5) Extract the binned, compressed result to a BED file:
$ unstarch result.starch > result.bedGraph
(6) Edit the
result.bedGraph file to add the track type. All you need to do is insert
track type=bedGraph on its own line at the top of the file, although you can add various parameters to customize the display and look, etc.
(7) Place the modified
result.bedGraph on a public-facing web site and copy the URL — or otherwise load a local copy — into a UCSC Genome Browser instance via the Custom Track page (
manage custom track). The Genome Browser will recognize it as a bedGraph file and render it accordingly.
That's all there is to it. All these steps can be automated, once you have the process down.