Question: Converting Bam To Bedgraph For Viewing On Ucsc?
8
gravatar for user
4.8 years ago by
user770
United States
user770 wrote:

I'm trying to go from a BAM file to a representation viewable in UCSC, ideally bedGraph. I am trying to use Bedtools's genomeCoverage like this:

genomeCoverageBed -ibam accepted_hits.sorted.bam -bg -trackline -split -g ... > mytrack.bedGraph

I'm not sure what the -g argument is supposed to be or how to generate it. The documentation does not explicitly say what it is supposed to be, though it gives an example where it is some sort of BED file. I am simply looking for a bedGraph or other UCSC-friendly compact representation that will allow me to visualize read densities using UCSC from the BAM. EDIT When I generate a bedGraph and put it in UCSC, I get tracks that look like this:

enter image description here

not a histogram. How can I make it a histogram? How can I generate the genome file for use with genomeCoverageBed? Also Is this the best way to get a UCSC viewable file with Bedtools? To clarify, I want to visualize the BAM as a histogram. I'm not sure this is possible with bedGraph? Thank you.

ADD COMMENTlink modified 4.8 years ago by David Langenberger8.0k • written 4.8 years ago by user770
1

It looks like your track is still in "dense" visualization mode, which encodes density as grayscale. Try changing it to "full", and hopefully you'll see the histogram representation you're looking for.

ADD REPLYlink written 4.8 years ago by Ryan Dale4.6k
7
gravatar for Ryan Dale
4.8 years ago by
Ryan Dale4.6k
Bethesda, MD
Ryan Dale4.6k wrote:

Using pybedtools, you can get a bedgraph, complete with track line, like this:

import pybedtools
x = pybedtools.BedTool('path/to/bam')
x.genome_coverage(bg=True, genome='hg19', split=True)\
    .saveas('path/to/bedgraph', trackline='track name="test track" visibility="full" type=bedGraph')

In that case however, the y-axis scale is in reads, so it's difficult to compare across libraries of different sizes in the genome browser. Ideally you'd want to scale your files so they have a uniform y-axis, like reads per million mapped reads (RPMMR). The bam_to_bigwig function does this:

from pybedtools.contrib.bigwig import bam_to_bigwig
bam_to_bigwig(bam='path/to/bam', genome='hg19', output='path/to/bigwig')

So this has the advantage of 1) counting how many mapped reads in your BAM file and scaling the bigWig such that it has units of reads per million mapped reads (RPMMR), 2) takes care of format conversions between bedGraph and bigWig (bigWIg is a much better format in the long run, in my opinion) and 3) handles the genome for you (-g argument for BEDTools, as long as you're working with a commonly-used assembly like hg19).

EDIT: while the bedGraph configuration is in the first line of the file, for bigWigs the configuration happens when you write a track line in the custom track upload text box. No configuration is stored in the file. Once you have uploaded your bigWig to a publicly-accessible location, use something like this in the upload text box:

track name='test' type=bigWig visibility=full viewLimits=0:25 autoScale=off bigDataUrl=http://your/url/here

Using the same viewLimits when uploading another track that was configured in the same way will show it on the same y-axis limits. See also the bigWig help and configuration settings for wig files

  • (edit: added type=bedGraph to track line)
  • (edit 2: added info about configuring bigWig)
  • (edit 3: added autoScale=off to trackline)
ADD COMMENTlink modified 4.7 years ago • written 4.8 years ago by Ryan Dale4.6k

this works great but how can you set the same y-axis for all tracks when you use bam_to_bigwig?

ADD REPLYlink written 4.7 years ago by user770

When you make the track file, you can set it there which contains the description about the bigwig.

ADD REPLYlink written 4.7 years ago by Sukhdeep Singh9.1k

I edited my answer above to address this.

ADD REPLYlink written 4.7 years ago by Ryan Dale4.6k

Everything you wrote works for me except autoscaling if i do: track type=bigWig name="track1" bigDataUrl=http://file1.bigWig visibility=full viewLimits=0.0:25.0 track type=bigWig name="track2" bigDataUrl=http://file2.bigWig color=255,0,0 visibility=full viewLimits=0.0:25.0 then it does not give an error but does not respect viewLimits. tried making the lower:upper values integers too.

ADD REPLYlink written 4.7 years ago by user770
1

add autoScale=off inline :)

ADD REPLYlink written 4.7 years ago by Sukhdeep Singh9.1k

works thank you!

ADD REPLYlink written 4.7 years ago by user770
4
gravatar for David Langenberger
4.8 years ago by
Deutschland
David Langenberger8.0k wrote:

Get a WIG file directly from a sorted BAM file using SAMtools (code snippet from http://www.ecseq.com/NGSsnippets.html)

samtools mpileup -BQ0 run.sorted.bam | perl -pe '($c, $start, undef, $depth) = split;if ($c ne $lastC || $start != $lastStart+1) {print "fixedStep chrom=$c start=$start step=1 span=1\n";}$_ = $depth."\n";($lastC, $lastStart) = ($c, $start);' | gzip -c > run.wig.gz

To get a bedGraph file, you can either create a bigWig file with wigToBigWig and then use bigWigToBedGraph to end up with a bedGrapg file. Or you change the code above (which is much faster). But the probably simplest way is to directly upload the wig or bigWig file to UCSC. :)

ADD COMMENTlink written 4.8 years ago by David Langenberger8.0k
3
gravatar for Alex Reynolds
4.8 years ago by
Alex Reynolds21k
Seattle, WA USA
Alex Reynolds21k wrote:

Perhaps try BEDOPS bam2bed and sort-bed to convert your BAM data to UCSC BED?

$ bam2bed < foo.bam | sort-bed - | cut -f1-6 > foo.bed

Then make a custom track from it, using the full, pack or squish display modes. If you zoom out far enough, you can get a rough picture of read density.


Addendum: If you want to count reads within windows, take a look at this usage case which bins BAM reads ("tags") within a sliding 20 bp window. The result is a compressed BED file that contains binned score values that (when extracted from Starch to BED) can also be brought into a UCSC Genome Browser instance. Because it will contain binned scores, this rendition is probably going to be closer to the histogram visualization you are after, than a simple display of individual tags.


Explanation of binning script

I'll walk through the process of using the BEDOPS-based binning script to generate a histogram of binned reads on a UCSC Genome Browser.

(1) Get the hg19 version of the chromInfo table from the UCSC Genome Browser

Visit the UCSC Table Browser. With the All Tables group selected, for example, select the hg19 database and the chromInfo table. Output all fields to a text file. (This step can also be performed with mysql commands, if this needs automating.)

(2) Edit this text file (e.g. run awk on it to put in the start coordinate) and pipe it to sort-bed to turn it into a sorted BED file. Here's a ready-to-use example for hg19 that I just made: https://dl.dropbox.com/u/31495717/chrList.bed Again, this can be automated.

(3) Bin the read data. For example, the following makes a 75 bp-windowed read count spaced in 20 bp bins, written to a Starch-formatted archive called result.starch:

$ binReads.sh myReads.bam $PWD/result.starch 75 20 chrList.bed

The Starch file is just a very highly-compressed BED file. We made this format so that we could make the best use of our lab's storage capabilities. You can edit the binReads.sh script to remove the starch - call if you don't want the BED data to be compressed, which lets you skip step 4. Otherwise, we go on to the next step:

(4) Extract the binned result to a BED file:

$ unstarch result.starch > result.bedGraph

(5) Edit the result.bedGraph file to add the track type. All you need to do is insert track type=bedGraph on its own line at the top of the file, although you can add various parameters to customize the display and look, etc.

(6) Place the modified result.bedGraph on a public-facing web site and copy the URL — or otherwise load a local copy — into a UCSC Genome Browser instance via the Custom Track page (Genomes > manage custom track). The Genome Browser will recognize it as a bedGraph file and render it accordingly.

Browser snapshot

That's all there is to it. All these steps can be automated, once you have the process down.

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Alex Reynolds21k

Thanks but Can UCSC accept regular bed files from track URL definitions, i.e. with bigDataUrl=? I thought it does not

ADD REPLYlink written 4.8 years ago by user770

Not with bigDataUrl=, but if you could just put the URL in directly into the custom track URL field. That seemed to work for me on our local genome browser instance, at least.

ADD REPLYlink written 4.8 years ago by Alex Reynolds21k

I want to use a track file with track so that I can put all my samples in one file pointing to all the relevant samples' URLs and see it. I don't want to add one BED at a time, so I prefer to use bedGraph. When I add a bedGraph track to UCSC, it appears as vertical bars - not a histogram - see my edit above. Also, the options for the track are dense and full, there's no pack.

ADD REPLYlink written 4.8 years ago by user770

But Alex, thats different from a bedGraph, which is essentially a coverage file as compared to the read locations, a bed file !!!!

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Sukhdeep Singh9.1k

Okay, please see the addendum I added. I think this addresses your concern?

ADD REPLYlink written 4.8 years ago by Alex Reynolds21k
1

Thats good Alex, is there already a utility present in BedOps, or can the code be modified to calculate the density/coverage for a specific region of interest provided as file such as genes. Consider an example of dividing 100 genes of different sizes into 20 bins and then calculating the protein coverage (binding intensity binned average) for each gene in that gene specific bin and averaging all bins per gene later on, to have an averaged profile. Then, one could plot the average intensity of protein binding in a single plot, like here which is also called as a composite profile. Plotting Read Density/ Distribution/ Composite Profiles [Chip-Seq].

Cheers

ADD REPLYlink written 4.8 years ago by Sukhdeep Singh9.1k
2

Sure. One might use the binned output from this usage case as a map or signal file within the bedmap application in BEDOPS, using regions-of-interest (e.g., "genes") as the reference file. That way, you can use numerical operators like --mean, --max, etc. to quantify properties of the bins/signal over regions-of-interest. This was done repeatedly for our portion of the ENCODE project, for example, where we compared read (DNase-seq, ChIP-seq, etc.) densities in sites, say, proximal and distal to promoters, etc. One can apply this bedmap approach to any number of categories of BED-formatted region and signal data.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Alex Reynolds21k

Sounds good Alex, I'll try it out !!

ADD REPLYlink written 4.8 years ago by Sukhdeep Singh9.1k

that looks very promising thanks! but is there an easy way to get a bedgraph at the end so that it can be put into a file defining multiple tracks? Or can the Starched format be put in a UCSC tracks file in track type= lines?

ADD REPLYlink written 4.8 years ago by user770

Let's assume that you used the binning script and you set the output to a file called result.starch.

All you have to do is this:

  1. Run unstarch result.starch > result.bed to extract the result to a BED file.
  2. Edit the result.bed file with a text editor, adding type=bedGraph on its own line at the top of the result BED file.
  3. Load it into a UCSC Genome Browser instance as a custom track.

That's all there is to it. I tried it now with a small BAM file and was able to get a read histogram out of this. If you want to see that example BED file, load this URL into your genome browser as a custom track: https://dl.dropbox.com/u/31495717/density.bed and zoom out a couple times.

Once you do this once or twice, it's easy to automate.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Alex Reynolds21k

Hey Alex, I got acquanited to BedOps suite though few things are off the track. I have posted a new question here, its easy to answer and track. Binning Over Genes And Calculating The Coverage [Bedops/Bedmap]

ADD REPLYlink written 4.7 years ago by Sukhdeep Singh9.1k

Here is my answer; I hope it helps! Binning Over Genes And Calculating The Coverage [Bedops/Bedmap]

ADD REPLYlink written 4.7 years ago by Alex Reynolds21k
2
gravatar for Sukhdeep Singh
4.8 years ago by
Sukhdeep Singh9.1k
Netherlands
Sukhdeep Singh9.1k wrote:

In the help menu, its the first point, though it might not be clear.

Notes: (1) The genome file should tab delimited and structured as follows:

<chromName><TAB><chromSize>
For example, Human (hg19):
chr1    249250621
chr2    243199373
...
chr18_gl000207_random    4262

Its the tab delimited file of length of each chromosome of your organism of interest. Use fetchChromSizes to pull the sizes.

For mm9, this is how it looks

chr1    197195432
chr2    181748087
chrX    166650296
chr3    159599783
chr4    155630120
chr5    152537259
chr7    152524553
chr6    149517037
chr8    131738871
chr10    129993255
chr14    125194864
chr9    124076172
chr11    121843856
chr12    121257530
chr13    120284312
chr15    103494974
chr16    98319150
chr17    95272651
chr18    90772031
chr19    61342430
chrY_random    58682461
chrY    15902555
chrUn_random    5900358
chrX_random    1785075
chr1_random    1231697
chr8_random    849593
chr17_random    628739
chr9_random    449403
chr13_random    400311
chr7_random    362490
chr5_random    357350
chr4_random    160594
chr3_random    41899
chrM    16299
chr16_random    3994

Once saved used it like

genomeCoverageBed -ibam accepted_hits.sorted.bam -bg -g ~/src/useFul/ucsc/genomeIndex/mm9.chrom > file.bedGraph

Then, try reading this on how to visualize it efficiently. Visualizing Chip-Seq Data Using Ucsc [Bigwig]

Cheers

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Sukhdeep Singh9.1k

I read your tutorial on visualization but I am still confused. Remember that I am starting from a BAM file. It seems unnecessary to go from bam to BedGraph to yet another format, Wig/bigWig. There has to be a better way. I just want to plot histogram like distributions (similar to the way phastCons or the Chip-Seq tracks show) using a BAM, and a bedGraph should have all the information to do this it seems, so I don't want to have yet another file format.

ADD REPLYlink written 4.8 years ago by user770

Use the bedgraph then and replace the bigDataUrl with url!!

ADD REPLYlink written 4.8 years ago by Sukhdeep Singh9.1k

This is exactly what I've done. It does not work - the track appears as a bunch of vertical bars, not as a histogram like the phastCons score or other ChipSeq tracks that are built into UCSC. I want a histogram, not vertical bars or pileups of reads

ADD REPLYlink written 4.8 years ago by user770
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1147 users visited in the last hour