Extract Coverage Track From Bam To Bed File
3
1
Entering edit mode
8.9 years ago
William ★ 5.0k

How can I extract an coverage track from a BAM file to a bed file. I want the bed file to contain all the regions of the reference genome that have a minimum coverage of x (x =1 in this case, but also might be 20).

I guess this should be possible with bedtools or bedops. But I only find functions like bamtobed which converts every read to a bed entry. I am looking to extract the coverage track.

Edit: I am looking for coverage including indels. Because I finally want to use the covered regions to compare calls from call sets in regions that have a minimum coverage of x for both samples.

bam bed • 12k views
1
Entering edit mode

Have you tried bedtools genomecov? If you have coverage for each position, you could afterwards filter out the regions you are interested in.

0
Entering edit mode

coverageBed -abam aln.bam -b exons.bed > exons.bed.coverage

3
Entering edit mode
8.9 years ago

Use the BEDGRAPH output option -bg in bedtools' genomecov tool.

bedtools genomecov -ibam aln.bam -bg

The output will be four column BEDGRAPH format, where the fourth column is the depth observed for each interval.

To restrict the output to all regions == 1X coverage, use awk (this example shows how to filter output based on the fourth column. awk is your friend).

bedtools genomecov -ibam aln.bam -bg | awk '$4==1' To restrict the output to all regions with > 20X coverage: bedtools genomecov -ibam aln.bam -bg | awk '$4>20'

Here are the docs for this tool: http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html

Here are the specific docs for the -bg option: http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html#bg-reporting-genome-coverage-in-bedgraph-format

0
Entering edit mode

This works except that It removes the indels from the reads. The reason I am extracting a coverage track is because I want to compare 2 variantcall sets in regions that are covered in both bams. So I also need the indels to be present in the coverage track. genomecov shows indels as not being covered.

1
Entering edit mode

I see. You should update your question to include this important detail. Currently, it reads like a basic "coverage" question.

0
Entering edit mode

Use a bed file to select the regions of the reference youre interested in. Then, to compare the two bams, use mpileup, then a simple awk on the coverage columns.

1
Entering edit mode
8.9 years ago
Gabriel R. ★ 2.8k

If I were to do it, I would do:

samtools mpileup [bam file] | awk '{script}'

Where the awk would have a variable to check whether you are in a region of sufficient coverage and output the chr, start and end.

1
Entering edit mode
8.9 years ago
William ★ 5.0k

For extracting a coverage track with the minimum of just 1 read of coverage I used the following:

bamToBed -i input.bam > inputBam_bamToBed.bed

bedops --merge inputBam_bamToBed.bed > inputBam_bamToBedCollapsed.bed

For a minimum other than 1 I am not sure what combination of bam and bed tools to use. genomecov removes the indels from the coverage, which I want to count as coverage.

0
Entering edit mode

Can you speak awk a bit ? I think my solution works.

0
Entering edit mode

I don't have much experience with awk and I would prefer somethings without scripting. This kind of questions is what tools like bedops and bedtools should solve.

0
Entering edit mode

Note that you can do it all in one go with BEDOPS tools and UNIX I/O piping:

$bam2bed < input.bam \ | bedops --merge - \ > inputBam_bam2BedCollapsed.bed This pipeline also ensures correct sort order for the input going into bedops, as using bamToBed may not guarantee the correct ordering of the BED data coming out of the conversion. Here, the bam2bed script uses sort-bed behind the scenes, passing sorted output to the next step. This ensures correct output from the merge step. Alternatively, you could use bamToBed with sort-bed directly:$ bamToBed -i input.bam \
| sort-bed - \
| bedops --merge - \
> inputBam_bamToBedCollapsed.bed

The main point is that bedops consumes sorted data. If the data are not sorted going in, then incorrect results may occur.