Question: Compute mean depth coverage for exome data with paired end, overlapping, features
1
gravatar for fbrundu
17 months ago by
fbrundu280
European Union
fbrundu280 wrote:

I am trying to get the mean depth coverage for bam files with bedtools.

So far I've been using the command:

bedtools coverage -hist -abam bamfile.bam -b targets_exome.bed > bamfile.bam.hist.all.txt

But I find myself with a result similar to:

1   9998    10030   HWI-H217:64:C48Y8ACXX:4:1110:12530:87823/1  1   -   9998    10030   0,0,0   1   32, 0,  0   32  32  1.0000000

1   9998    10030   HWI-H217:64:C48Y8ACXX:4:1110:12530:87823/2  1   -   9998    10030   0,0,0   1   32, 0,  0   32  32  1.0000000

1   9999    10096   HWI-H217:64:C48Y8ACXX:4:1303:11532:70370/1  0   -   9999    10096   0,0,0   1   97, 0,  0   97  97  1.0000000

1   9999    10096   HWI-H217:64:C48Y8ACXX:4:1303:11532:70370/2  0   -   9999    10096   0,0,0   1   97, 0,  0   97  97  1.0000000

where the different features are overlapping (see #1 and #3).

I want to compute average coverage, defined as the number of reads covering the captured coding sequence of a haploid reference. How to do it in this case?

depth coverage bam exome • 1.7k views
ADD COMMENTlink modified 17 months ago by Kevin Blighe39k • written 17 months ago by fbrundu280
8
gravatar for Kevin Blighe
17 months ago by
Kevin Blighe39k
Republic of Ireland
Kevin Blighe39k wrote:

To get the mean depth of coverage for each interval specified in your BED file, use:

bedtools coverage -a BED -b [BAM] -mean > MeanCoverageBED.bedgraph

If you wanted to get the overall mean, then just pipe this into awk and get the average of the average, something like:

bedtools coverage -a BED -b [BAM] -mean | awk '{total+=$4} END {print total/NR}'

There are other commands with bedtools that I use as standard in my pipelines:

output depth of coverage for all regions in the BAM file

NB - sequential positions at the same read depth are merged into a single region

bedtools genomecov -ibam [BAM] -bga -split > CoverageTotal.bedgraph

output the per base read depth for each region in the BED file

bedtools coverage -a [BED] -b [BAM] -d > PerBaseDepthBED.bedgraph

get percent genome covered

zero=$(bedtools genomecov -ibam [BAM] -g [ref.fasta] -bga | awk '$4==0 {bpCountZero+=($3-$2)} {print bpCountZero}' | tail -1)
nonzero=$(bedtools genomecov -ibam [BAM] -g [ref.fasta] -bga | awk '$4>0 {bpCountNonZero+=($3-$2)} {print bpCountNonZero}' | tail -1)
percent=$(bc <<< "scale=6; ($nonzero / ($zero + $nonzero))*100")
ADD COMMENTlink modified 6 weeks ago • written 17 months ago by Kevin Blighe39k

Thanks Kevin. My doubt is, to get the average coverage, since I have overlapping regions, it is necessary to get the per-base read depth, right?

ADD REPLYlink written 17 months ago by fbrundu280

Hey, it really depends on what exactly you want.

If you want BEDTools to treat each BED region independently (and calculate the mean read depth in each), then use bedtools coverage -a BED -b BAM -mean > MeanCoverageBED.bedgraph. This will ignore the fact that regions may be overlapping, I believe.

Another option is to first edit your BED regions so that overlapping regions are merged into a single entry, using BEDTools merge. After you do that and re-run the above command for determining mean read depth, you'll get a more informative mean for your regions.

You can also output the per-base read-depth, but you'll then have to calculate the mean yourself (possibly using awk again).

BEDTools is very powerful but one has to exactly sure what they want to get and which specific commands to use. There are many other tools for computing coverage/read-depth metrics too.

ADD REPLYlink written 17 months ago by Kevin Blighe39k
1

Thanks.. bedtools merge is exactly what I was looking for!

ADD REPLYlink written 17 months ago by fbrundu280

Great! Glad to have helped out

ADD REPLYlink written 17 months ago by Kevin Blighe39k
1

What version of bedtools is that? Version: v2.17.0 doesn't have the "-mean:" option. only -hist

nevermind, found it in v2.27.1

ADD REPLYlink modified 3 days ago • written 3 days ago by Jon1710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 832 users visited in the last hour