Question: How To Determine The Low Read Coverage Regions In A Sam File
gravatar for amirbenjab
6.8 years ago by
amirbenjab10 wrote:


There are multiple FASTA files contain contigs. And also multiple SAM files contain the alignment of reads to the contigs.

Assuming the sam was converted to bam, I want to determine the low region coverage in the bam files.

For each bam file, should I compute the coverage of multiple regions or just one coverage region (i.e. mean(coverage(ranges)))?


read sequence contigs • 4.9k views
ADD COMMENTlink modified 6.8 years ago by Jorge Amigo11k • written 6.8 years ago by amirbenjab10

How are you defining low coverage? Is this transcriptomic or genomic coverage?

ADD REPLYlink written 6.8 years ago by Damian Kao15k

It's genomic coverage. All results are generated by BWA.

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by amirbenjab10
gravatar for Jorge Amigo
6.8 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

we (and I guess everyone that has been doing NGS data analysis) have faced this issue since the very NGS beginning. I find out our way of solving it very logical, so I guess most of you may have already implemented some of these solutions, but in case describing them could help let me tell you the 3 different ways in which we've addressed the issue:

  • one would be a very descriptive one, which helps a lot to see how the experiment has performed on your genes of interest. it uses GATK's DepthOfCoverage walker, and it provides you with very convenient summary metrics for the regions of interest (genes in our particular case), including the percentage of which a particular region (or gene) has been sequenced above a certain coverage threshold (which of course can be customized).

  • the next one would be also a coverage summary for all the samples sequenced, position per position. it goes through generating a bedgraph for each of your sam/bam files using bedtools' genomeCoverageBed, and it then uses bedtools' unionBedGraphs to join them all into a single massive table in which, for each position sequenced, you may check at once the coverage of all samples.

  • finally, we also sometimes do sample by sample inspection, and it's very useful for us to highlight regions of coverage below a threshold. we have build a simple script that parses a previously generated bedgraph using again bedtools' genomeCoverageBed for a particular sam/bam file and it outputs a bed file with all the positions under the specified threshold grouped by regions into a IGV readable bed file. this allow you to browse you sam/bam file with a parallel track of low coverage regions that help your eye to quickly locate those kind of regions.

ADD COMMENTlink written 6.8 years ago by Jorge Amigo11k
gravatar for Istvan Albert
6.8 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

First you would need to come up with the definition of what you mean by low coverage (with respect of your problem set). Then you can use samtools mpileup to compute the coverage of each base or bedtools to compute genomic coverages

Some posts that cover this are:

tools to calculate average coverage for a bam file?, How do you get the quality score and coverage for every single position of a reference assembly, how to calculated coverage of the genome/contig slice?

You may also want to search for genomic coverage, Tip: you should also make use of the Similar posts link on each of these questions.

Loading <script src="&lt;a href=" http:="""" jsapi"="" rel="nofollow">" type="text/javascript"></script> <script type="text/javascript"> google.load('search', '1', {language : 'en', style : google.loader.themes.GREENSKY}); google.setOnLoadCallback(function() { var customSearchOptions = {}; var customSearchControl = new '003596843386727440968:raditaczxza', customSearchOptions); customSearchControl.setResultSetSize(; customSearchControl.draw('cse'); }, true); </script>

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Istvan Albert ♦♦ 80k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1624 users visited in the last hour