Question: Getting The Average Coverage From The Coverage Counts At Each Depth.
gravatar for Jordan
6.4 years ago by
Jordan1.1k wrote:


I have read quite a few posts here about coverage already. But I still had a few questions. I have a BAM file I'm trying to find the coverage of it (typically like say 30X).

So, I decided to use genomeCoverageBed for my analysis. And I used the following command:

genomeCoverageBed -ibam file.bam -g ~/refs/human_g1k_v37.fasta > coverage.txt

As many are aware, the output of the file looks something like this:

genome    0    26849578    100286070 0.26773
genome    1    30938928    100286070     0.308507
genome    2    21764479    100286070    0.217024
genome    3    11775917    100286070    0.117423
genome    4    5346208    100286070    0.0533096
genome    5    2135366    100286070    0.0212927
genome    6    785983    100286070    0.00783741
genome    7    281282    100286070    0.0028048
genome    8    106971    100286070    0.00106666
genome    9    47419    100286070    0.000472837
genome    10    27403    100286070    0.000273248

To find the coverage, I multiplied col2 (depth) with col3 (number of bases in genome with that depth) and then summed the entire column. Then, I divided it by genome length to get the coverage. In this case, col2 * col3 is:


And the sum is: 150098740. Since the genome length is 100286070, the coverage is 150098740/100286070 = 1.5. That is to say it is, 1.5X. I have only considered the first 10 depth from the file here, but you get the idea. So, is this the right way to get the physical coverage?

Note: While the output file (coverage.txt) gives individual chromosome details, I only took the genome details i.e., col1 labeled as genome.

ADD COMMENTlink modified 6.4 years ago by Jorge Amigo11k • written 6.4 years ago by Jordan1.1k
gravatar for Jorge Amigo
6.4 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

Picard tools' CalculateHsMetrics does perfectly the job directly from the bam file, allowing even to distinguish target from bait design in case it's needed (not on the following example though):

java -jar \$HOME/bin/picard-tools/CalculateHsMetrics.jar \
TARGET_INTERVALS=intervals.bed \
BAIT_INTERVALS=intervals.bed \
INPUT=input.bam \

all the output fields in that hsmetrics.txt are described at the HsMetrics' class description, which of course include the MEAN_TARGET_COVERAGE you're after.

ADD COMMENTlink written 6.4 years ago by Jorge Amigo11k
gravatar for Istvan Albert
6.4 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

You kind of seem to go the other way, from more specific answers to less specific ones. Typically the average coverage is very easy to get:

number of reads * length of each read / size of the genome

You have results already counted by individual coverage, and while the formula that you use seems to be right note how the result that you are getting are not that useful. You average coverage ends up to be 1.5x even though 26% of the bases have no coverage at all and 30% of reads have a coverage of 1. I think this is because your distribution is heavily skewed, in which case averages are not that meaningful quantities.

ADD COMMENTlink written 6.4 years ago by Istvan Albert ♦♦ 81k

Does this logic apply to exomes too? Like say, human exome is 30MB I think. And I have 30 million mapped reads, with 32bp reads. Then, the average coverage would be 30X right?

ADD REPLYlink written 6.4 years ago by Jordan1.1k

yes, to get the idealized coverage you would divide with the total length of the DNA that is supposedly the source of the reads.

ADD REPLYlink written 6.4 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2881 users visited in the last hour