Question: Extract Coverage Track From Bam To Bed File
1
gravatar for William
5.5 years ago by
William4.3k
Europe
William4.3k wrote:

How can I extract an coverage track from a BAM file to a bed file. I want the bed file to contain all the regions of the reference genome that have a minimum coverage of x (x =1 in this case, but also might be 20).

I guess this should be possible with bedtools or bedops. But I only find functions like bamtobed which converts every read to a bed entry. I am looking to extract the coverage track.

Edit: I am looking for coverage including indels. Because I finally want to use the covered regions to compare calls from call sets in regions that have a minimum coverage of x for both samples.

bed bam • 7.4k views
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by William4.3k
1

Have you tried bedtools genomecov? If you have coverage for each position, you could afterwards filter out the regions you are interested in.

ADD REPLYlink written 5.5 years ago by skymningen330

coverageBed -abam aln.bam -b exons.bed > exons.bed.coverage

ADD REPLYlink written 5.5 years ago by Raony Guimarães860
2
gravatar for Aaronquinlan
5.5 years ago by
Aaronquinlan10k
United States
Aaronquinlan10k wrote:

Use the BEDGRAPH output option -bg in bedtools' genomecov tool.

bedtools genomecov -ibam aln.bam -bg

The output will be four column BEDGRAPH format, where the fourth column is the depth observed for each interval.

To restrict the output to all regions == 1X coverage, use awk (this example shows how to filter output based on the fourth column. awk is your friend).

bedtools genomecov -ibam aln.bam -bg | awk '$4==1'

To restrict the output to all regions with > 20X coverage:

bedtools genomecov -ibam aln.bam -bg | awk '$4>20'

Here are the docs for this tool: http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html

Here are the specific docs for the -bg option: http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html#bg-reporting-genome-coverage-in-bedgraph-format

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Aaronquinlan10k

This works except that It removes the indels from the reads. The reason I am extracting a coverage track is because I want to compare 2 variantcall sets in regions that are covered in both bams. So I also need the indels to be present in the coverage track. genomecov shows indels as not being covered.

ADD REPLYlink written 5.5 years ago by William4.3k
1

I see. You should update your question to include this important detail. Currently, it reads like a basic "coverage" question.

ADD REPLYlink written 5.5 years ago by Aaronquinlan10k

Use a bed file to select the regions of the reference youre interested in. Then, to compare the two bams, use mpileup, then a simple awk on the coverage columns.

ADD REPLYlink written 5.5 years ago by fridhackery140
1
gravatar for Gabriel R.
5.5 years ago by
Gabriel R.2.5k
Center for Geogenetik Københavns Universitet
Gabriel R.2.5k wrote:

If I were to do it, I would do:

 samtools mpileup [bam file] | awk '{script}'

Where the awk would have a variable to check whether you are in a region of sufficient coverage and output the chr, start and end.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Gabriel R.2.5k
1
gravatar for William
5.5 years ago by
William4.3k
Europe
William4.3k wrote:

For extracting a coverage track with the minimum of just 1 read of coverage I used the following:

bamToBed -i input.bam > inputBam_bamToBed.bed

bedops --merge inputBam_bamToBed.bed > inputBam_bamToBedCollapsed.bed

For a minimum other than 1 I am not sure what combination of bam and bed tools to use. genomecov removes the indels from the coverage, which I want to count as coverage.

ADD COMMENTlink written 5.5 years ago by William4.3k

Can you speak awk a bit ? I think my solution works.

ADD REPLYlink written 5.5 years ago by Gabriel R.2.5k

I don't have much experience with awk and I would prefer somethings without scripting. This kind of questions is what tools like bedops and bedtools should solve.

ADD REPLYlink written 5.5 years ago by William4.3k

Note that you can do it all in one go with BEDOPS tools and UNIX I/O piping:

$ bam2bed < input.bam \
     | bedops --merge - \
     > inputBam_bam2BedCollapsed.bed

This pipeline also ensures correct sort order for the input going into bedops, as using bamToBed may not guarantee the correct ordering of the BED data coming out of the conversion.

Here, the bam2bed script uses sort-bed behind the scenes, passing sorted output to the next step. This ensures correct output from the merge step.

Alternatively, you could use bamToBed with sort-bed directly:

$ bamToBed -i input.bam \
    | sort-bed - \
    | bedops --merge - \
    > inputBam_bamToBedCollapsed.bed

The main point is that bedops consumes sorted data. If the data are not sorted going in, then incorrect results may occur.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Alex Reynolds26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1586 users visited in the last hour