targets in a bed at or above a coverage
2
0
Entering edit mode
4.8 years ago
bioguy24 ▴ 230

Is there a tool that will calculate percentage of targets in a bed file at or above 250x reads from a bam file? Currently I use bedtools to get all the reads that map to a target, followed by awk to calculate percentage above 250. However, since now our sequencing has increased there may be a more efficient way. Thank you :).

bed bam • 2.0k views
ADD COMMENT
0
Entering edit mode
4.8 years ago
bari.ballew ▴ 460

I'd recommend mosdepth. From the docs (especially relevant bullet in bold):

mosdepth can output:

  • per-base depth about 2x as fast samtools depth--about 25 minutes of CPU time for a 30X genome.
  • mean per-window depth given a window size--as would be used for CNV calling.
  • the mean per-region given a BED file of regions.
  • a distribution of proportion of bases covered at or above a given threshhold for each chromosome and genome-wide.
  • quantized output that merges adjacent bases as long as they fall in the same coverage bins e.g. (10-20)
  • threshold output to indicate how many bases in each region are covered at the given thresholds.
  • A summary of mean depths per chromosome and within specified regions per chromosome.

It should be fairly straightforward to go from this output to the percentage of regions above 250x.

ADD COMMENT
0
Entering edit mode
4.8 years ago

BEDtools will get you 90% there:

bedtools genomecov -ibam FILE.BAM -bg | awk '$4 > 249' > OUTPUT.BG
bedtools intersect -a TARGET.BED -b OUTPUT.BG -wao > OVERLAP

The first command reports the regions in your BAM with the desired coverage, the second reports the number of basepairs that overlap. Then it's a matter of dividing the overlap value by the size of the target.

ADD COMMENT

Login before adding your answer.

Traffic: 2427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6