I may be asking a very basic question, but it has been quite a discussion topic for me!
What should we infer while we say per gene coverage ? I am asking this in context with whole genome sequencing.
for each gene in question, how many reads aligned completely/partially to it? This could be calculated by simple tools like HTSeq (coded in Python). Obviously, it is possible that 10 out of 10 reads mapped to particular region of the gene and hence this is not in true sense coverage means.
for each gene in question, calculate the number of N's (gaps) and then coverage would be calculated as ( ( gene_length - no. of gaps ) / gene_length ) * 100. A small perl/python script or a awk one liner will be enough. This will give how much region of the gene in question was covered by atleast one read/base.
Or am I entirely misunderstanding the key concept of per gene coverage (it is neither 1 or 2).
Any insights and ways to calculate the same will be really helpful.