There are a number of biostars posts on how to calculate coverage and read depth, and what they mean. I'm still confused.
This is how I currently understand things:
Depth (or Read Depth) at a BP coordinate
The number of "hits" on that coordinate resulting from alignment. In other words, the number of aligned reads that land on that coordinate. The height of the bar over that position in a genome browser.
The sum of all the depths across a particular coordinate range such as a gene or a peak.
Total Read Depth
The sum of all the depths across a particular coordinate range such as a gene or a peak. This is NOT exactly the same as
[read length] × [num reads, i.e. fastq lines ÷ 4] ÷ [length of reference in bp]
because not all reads in the fastq will get aligned or completely aligned.
Same as Total Read Depth
Average Read Depth
[Total Read Depth] ÷ [number of base pairs (coordinates) in aligned region or reference]
For example, I want to calculate the differential binding between chipseq samples. This is roughly
[coverage under peak in treatment 1] ÷ [coverage under peak in treatment 2]
However, I should really normalize by the average read depth of each treatment. So
[coverage under peak in treatment 1 ÷ average read depth in treatment 1 bam]
[coverage under peak in treatment 2 ÷ average read depth in treatment 2 bam]
Is this correct?