Question: How to generate BedGraph from BedTools Coverage?
2
gravatar for SmallChess
3.9 years ago by
SmallChess490
Australia
SmallChess490 wrote:

The tool is at:

http://bedtools.readthedocs.org/en/latest/content/tools/coverage.html

I want to get base coverage for specific regions (I'll provide a bed file). However, the output is not in a standard BedGraph format that I can export into R for visualisation.

Q: How do I draw density plot of coverage over the regions from the outputs generated by the coverage tool?

Q: Alternatively, how to convert the non-standard outputs to a proper BedGraph? For example, the Sushi R package takes a bedgraph input file.

 

genomic bed bedtools • 7.3k views
ADD COMMENTlink modified 3.9 years ago by Alex Reynolds28k • written 3.9 years ago by SmallChess490
6
gravatar for Joseph Pearson
3.9 years ago by
UNC Chapel Hill
Joseph Pearson450 wrote:

You should also consider bedtools genomecov, with the -bg option (bg being bedgraph)

http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html

Edit: After reading comments, and after some experimenting, what I think you want to do is:

bedtools intersect -a myBam.bam -b myRegions.bed > intersected.bam

bedtools genomecov -trackline -bg -ibam intersected.bam > intersected.bedgraph

 

Bedtools intersect will give only bam reads that overlap your regions (in myRegions.bed)

Then genomecov will draw a bedgraph file (coverage) only for the read depth for your filtered regions.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Joseph Pearson450

The problem is that this tool doesn't take a BED file filtering the regions that I'm interested. Therefore, it'd be too time consuming to calculate for all bases in the genome.

ADD REPLYlink written 3.9 years ago by SmallChess490

Fair enough, I didn't catch the subtlety in your question, but James Ashmore did.

ADD REPLYlink written 3.9 years ago by Joseph Pearson450
1

I think this might be what you want to do - genomecov and then pull out the regions you want using bedtools intersect. But kind of hard to tell.

ADD REPLYlink written 3.9 years ago by Madelaine Gogol5.1k

Thanks. I think your approach is the best.

ADD REPLYlink written 3.9 years ago by SmallChess490

Note: -abam is correct.

ADD REPLYlink written 3.8 years ago by SmallChess490
1
gravatar for Alex Reynolds
3.9 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Get your output into a standard five- or more columned BED file and then use awk:

$ awk '{ print $1"\t"$2"\t"$3"\t"$5 }' foo.bed > foo.bedgraph
ADD COMMENTlink written 3.9 years ago by Alex Reynolds28k

@ Alex: How can I convert my .sam file to standard .bed file? I have tried sam2bed < input.sam > output.bed it gives me output file but the column 5 always contains the same value (which is 99). According to UCSC bed format this is score column and to me this look strange that for every region the score is same. Then by using your mentioned command I obtain the .bedgraph file which I later use for ChromImpute. Problem is that when ChromeImpute convert these raw signal into certain sized bin signal (25 bp) then for every region I get the 0 signal.. I have tried many ways (using Pyisco, make wig files, genomeCoverageBed etc..) but I am unable to solve this problem. Can you help me with this one? Thanks.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Bioinformatist Newbie230
0
gravatar for James Ashmore
3.9 years ago by
James Ashmore2.6k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.6k wrote:

To generate bedGraph output from Bedtools coverage command you need to specify the -counts flag, for example:

bedtools coverage -a sample.bed -b sample.bam -counts > sample.bedgraph

Strictly speaking each row of a bedGraph file contains the chromosome name, the start position, the end position and then some value, which in your case would be the coverage, therefore you will want to keep only these fields, so if your had a bed file with six columns, you would want to keep only the first 3 columns (chrom, start, end) and the last column (coverage):

bedtools coverage -a sample.bed -b sample.bam -counts | cut -f1,2,3,7 > sample.bedgraph
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by James Ashmore2.6k
2

It's not really clear to me from the original question that this is what they want. This would give one value for each genome region. I suspect they want "wiggly" values for each genome region. But could be wrong.

ADD REPLYlink written 3.9 years ago by Madelaine Gogol5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 714 users visited in the last hour