Question: Understanding the key concept of per gene coverage in context to WGS
gravatar for lakhujanivijay
3.4 years ago by
lakhujanivijay5.0k wrote:

I may be asking a very basic question, but it has been quite a discussion topic for me!

What should we infer while we say per gene coverage ? I am asking this in context with whole genome sequencing.

  1. for each gene in question, how many reads aligned completely/partially to it? This could be calculated by simple tools like HTSeq (coded in Python). Obviously, it is possible that 10 out of 10 reads mapped to particular region of the gene and hence this is not in true sense coverage means.

  2. for each gene in question, calculate the number of N's (gaps) and then coverage would be calculated as ( ( gene_length - no. of gaps ) / gene_length ) * 100. A small perl/python script or a awk one liner will be enough. This will give how much region of the gene in question was covered by atleast one read/base.

Or am I entirely misunderstanding the key concept of per gene coverage (it is neither 1 or 2).

Any insights and ways to calculate the same will be really helpful.

per gene coverage wgs htseq • 1.1k views
ADD COMMENTlink modified 3.4 years ago by Brian Bushnell17k • written 3.4 years ago by lakhujanivijay5.0k
gravatar for Brian Bushnell
3.4 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

There might not be a useful universal definition. I can think of many that would be situationally useful, though. Particularly for RNA-seq, it might be interesting to measure the highest depth of any gene in an exon, and consider that the gene's depth. If all isoforms share a certain subset of exons, then average coverage across those exons might be used as the depth. Otherwise, one could simple average the coverage across all exons and call that the depth. Or, "the gene is covered by 15,000 reads" - that sounds like a useful statement, and is not affected by differential splicing or read length, which is convenient. Usually I think of coverage as equivalent to depth, though.

For whole-genome DNA sequencing, it's less clear to me where "per gene coverage" is relevant, but I'd probably calculate it by counting the number of read bases that align to exonic bases (counting only match/mismatch/noref, not indels) and dividing by the sum of the length of said exons.

ADD COMMENTlink written 3.4 years ago by Brian Bushnell17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 729 users visited in the last hour