Question: Coverage for sample with many species
gravatar for biobio
4.9 years ago by
United States
biobio30 wrote:


I have sequence from grapevines and I'm trying to see what viruses there are. I was able to assemble most of a viral genome (15000 out of 18000 bp in one contig). I'm trying to estimate the coverage. Here's a breakdown of what I did:

raw reads -> trim adapters -> map to grape reference and remove mapped reads -> assemble trimmed, unmapped reads.

To estimate the coverage, I used the virus that had the largest contig (grapevine leafroll associated virus 3) and I mapped the trimmed unmapped reads to it. I started with about 7 million reads and 1.3 million of them mapped to the genome. The average read length was 50 bp and the total genome size was 18Kbp. Using the equation presented in this: I get

Coverage = Length of read * number of reads / haploid genome length

Coverage = 50 * 1.3x10^6 / 1.8x10^4

Coverage = 3611x? Could that be right?

coverage assembly • 1.3k views
ADD COMMENTlink modified 4.9 years ago by Michele Busby2.0k • written 4.9 years ago by biobio30
gravatar for Michele Busby
4.9 years ago by
Michele Busby2.0k
United States
Michele Busby2.0k wrote:

For viruses, we often get that kind of coverage.

To sanity check it, I would open up the alignment in a viewer like IGV and see if it looks about right.  

RNA viruses have really uneven coverage so you should expect to have regions of really high and (alas) no coverage.  I don't know about DNA viruses.  It may be the RNA secondary structure that makes it uneven.

ADD COMMENTlink written 4.9 years ago by Michele Busby2.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2190 users visited in the last hour