Question: Compute mean depth coverage for exome data with paired end, overlapping, features
1
gravatar for fbrundu
2.1 years ago by
fbrundu290
European Union
fbrundu290 wrote:

I am trying to get the mean depth coverage for bam files with bedtools.

So far I've been using the command:

bedtools coverage -hist -abam bamfile.bam -b targets_exome.bed > bamfile.bam.hist.all.txt

But I find myself with a result similar to:

1   9998    10030   HWI-H217:64:C48Y8ACXX:4:1110:12530:87823/1  1   -   9998    10030   0,0,0   1   32, 0,  0   32  32  1.0000000

1   9998    10030   HWI-H217:64:C48Y8ACXX:4:1110:12530:87823/2  1   -   9998    10030   0,0,0   1   32, 0,  0   32  32  1.0000000

1   9999    10096   HWI-H217:64:C48Y8ACXX:4:1303:11532:70370/1  0   -   9999    10096   0,0,0   1   97, 0,  0   97  97  1.0000000

1   9999    10096   HWI-H217:64:C48Y8ACXX:4:1303:11532:70370/2  0   -   9999    10096   0,0,0   1   97, 0,  0   97  97  1.0000000

where the different features are overlapping (see #1 and #3).

I want to compute average coverage, defined as the number of reads covering the captured coding sequence of a haploid reference. How to do it in this case?

depth coverage bam exome • 3.1k views
ADD COMMENTlink modified 2.1 years ago by Kevin Blighe51k • written 2.1 years ago by fbrundu290
10
gravatar for Kevin Blighe
2.1 years ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

To get the mean depth of coverage for each interval specified in your BED file, use:

bedtools coverage -a BED -b [BAM] -mean > MeanCoverageBED.bedgraph

If you wanted to get the overall mean, then just pipe this into awk and get the average of the average, something like:

bedtools coverage -a BED -b [BAM] -mean | awk '{total+=$4} END {print total/NR}'

There are other commands with bedtools that I use as standard in my pipelines:

output depth of coverage for all regions in the BAM file

NB - sequential positions at the same read depth are merged into a single region

bedtools genomecov -ibam [BAM] -bga -split > CoverageTotal.bedgraph

output the per base read depth for each region in the BED file

bedtools coverage -a [BED] -b [BAM] -d > PerBaseDepthBED.bedgraph

get percent genome covered

zero=$(bedtools genomecov -ibam [BAM] -g [ref.fasta] -bga | awk '$4==0 {bpCountZero+=($3-$2)} {print bpCountZero}' | tail -1)
nonzero=$(bedtools genomecov -ibam [BAM] -g [ref.fasta] -bga | awk '$4>0 {bpCountNonZero+=($3-$2)} {print bpCountNonZero}' | tail -1)
percent=$(bc <<< "scale=6; ($nonzero / ($zero + $nonzero))*100")
ADD COMMENTlink modified 9 months ago • written 2.1 years ago by Kevin Blighe51k

Thanks Kevin. My doubt is, to get the average coverage, since I have overlapping regions, it is necessary to get the per-base read depth, right?

ADD REPLYlink written 2.1 years ago by fbrundu290

Hey, it really depends on what exactly you want.

If you want BEDTools to treat each BED region independently (and calculate the mean read depth in each), then use bedtools coverage -a BED -b BAM -mean > MeanCoverageBED.bedgraph. This will ignore the fact that regions may be overlapping, I believe.

Another option is to first edit your BED regions so that overlapping regions are merged into a single entry, using BEDTools merge. After you do that and re-run the above command for determining mean read depth, you'll get a more informative mean for your regions.

You can also output the per-base read-depth, but you'll then have to calculate the mean yourself (possibly using awk again).

BEDTools is very powerful but one has to exactly sure what they want to get and which specific commands to use. There are many other tools for computing coverage/read-depth metrics too.

ADD REPLYlink written 2.1 years ago by Kevin Blighe51k
1

Thanks.. bedtools merge is exactly what I was looking for!

ADD REPLYlink written 2.1 years ago by fbrundu290

Great! Glad to have helped out

ADD REPLYlink written 2.1 years ago by Kevin Blighe51k
1

What version of bedtools is that? Version: v2.17.0 doesn't have the "-mean:" option. only -hist

nevermind, found it in v2.27.1

ADD REPLYlink modified 8 months ago • written 8 months ago by Jon1710

What is the resource requirement for coverage calculation? I am trying on some large windows (1Mbp) segments, it crashed bedtools version 2.24.0 several times when I tried.

This suprises me - given bedtools should be handle genome coverage, it shouldn't crash when calculating some 1Mb windows' coverage? Or it's a bug in 2.24.0

Thanks for your insight in advance!

ADD REPLYlink written 5 months ago by isaacpei0

Should not be a bug. Can you let us know the command that you are running? Also, the size of your BAM file, and number of regions in your BED file?

ADD REPLYlink written 5 months ago by Kevin Blighe51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1300 users visited in the last hour