Question

Does anyone know how the Y axis (% of reference) for the "coverage distribution" graph of the BAMStats program is calculated ?

1

Entering edit mode

9.8 years ago

kay ▴ 370

Hello,

I am using the BAMStats program to calculate the coverage for my BAM file.

I am trying to understand how the Y axis (% of reference) for the "coverage distribution (mapped only)" graph of the BAMStats program is calculated.

If anyone can help me understand, that would be great.

Thanks
Kay

next-gen bam RNA-Seq • 2.5k views

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by kay ▴ 370

0

Entering edit mode

At what level are you wanting to know this? In other words, do you want to know the mathematical details of the calculation algorithm, or rather the general process for calculating this metric?

ADD REPLY • link 9.8 years ago by Dan D 7.4k

Ram · Answer 1 · 2014-07-08

I'll go ahead and answer the latter possibility of my comment. A BAM file contains information about precisely where on a reference a particular read has been mapped. Thus, for each base of the reference genome, you can calculate how many sample reads have a base which aligns at that locus.

The number of times that a base is covered by sample reads is the depth of coverage for that base. If a given reference base has 30 reads which have one of their bases mapped to it, then that reference base has 30X coverage. If you then bin these coverage depths you can make a histogram: 5,000 bases have exactly 30X. 5,500 bases have exactly 25X coverage, and so on.

In the case of bamstats, you're going one step further and calculating the percentage of total reference bases which have a given sequencing depth. If your reference genome is 10,000 bases in length, and exactly 100 bases have a depth of coverage of exactly 30X, then 1% of your reference has 30X coverage.