Question: [samtools depth] Calculate percent of the genome covered
1
6.2 years ago by

I received a BAM of one chromosome and need to determine the percent of the genome covered. Our genome size is 16,600 bp and we did paired-end sequencing each reads being 150 bp long. To my knowledge it should be calculated as follow:

• Genome size : 16,670 bp
• No of base covered: 16,639 bp
• Percent of the genome covered = [(No. of base covered) / (Genome size) ] *100 = (16,639/16,670) * 100 = 99.81%

I did the following with samtools depth:

1. samtools depth -q0 -Q0 my.bam > qry.depth
2. wc -l qry.depth
3. 16639 qry.depth

Though I don't fully understand the output of samtools depth. To my knowledge the output should be as follow:

• Column 1: ID
• Column 2: Genomic position
• Column 3: No of reads covering that position
• Number of line in qry.depth represents the number of bases covered in the reference

Here is a sample of my qry.depth file:

 chrM 1 3 chrM 2 6 chrM 3 16 chrM 4 32 chrM 5 34
 chrM 7783 0 chrM 7915 0

The line count of qry.depth is 16,639 but I still have 2 positions with no reads covering them,  then what is the percent of the genome covered?? Maybe I misunderstood something in the output file format..

coverage samtools bam genome • 12k views
modified 6.2 years ago by Devon Ryan96k • written 6.2 years ago by madkitty610
0
6.2 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

It'll output a 0 if it would have otherwise produced a different number, had you not specified `-q`. Yes, it would be nice if this were documented.

Hi Devon,

Sorry. I cannot understand completely. I also encounter the question:

The reference length: 134962

Line count of *.depth is : 134184

No. of positions have only 0: 26, as below:

``````Triodia_melvillei    67472    0    0    0
Triodia_melvillei    67473    0    0    0
Triodia_melvillei    67474    0    0    0
Triodia_melvillei    67475    0    0    0
Triodia_melvillei    67476    0    0    0
Triodia_melvillei    110232    0    0    0
Triodia_melvillei    110233    0    0    0
Triodia_melvillei    110234    0    0    0
Triodia_melvillei    110235    0    0    0
Triodia_melvillei    110236    0    0    0
Triodia_melvillei    110237    0    0    0
Triodia_melvillei    110238    0    0    0
Triodia_melvillei    110239    0    0    0
``````

My question is: why there are 0? If there are reads to support these positions, why samtools output them?

Thanks!

Normally `-a -a` will output absolutely every position, or at least it should. If there are reads supporting coverage of a positions then they should get counted unless they're above the maximus depth (`-d`) or being filtered out (play around with `-q` and `-Q`).