I'm using the equation
Num_Reads * Avg_Read_Length / Genome Size
To calculate coverage. Here are my questions
Should I only consider mapped bases in the Query to calculate the read length? Specifically,
=tags in the CIGAR string? For example, if a read has
60M10Sfor a CIGAR string, would the read length be 60 since 10 were not aligned properly?
Given 1. why is it that my calculation of coverage differs from
samtools mpileup? Wouldn't the average of column 4 in mpileup output equal the average coverage?
Thanks for any help.