Coverage for sample with many species
Entering edit mode
9.2 years ago
biobio ▴ 50


I have sequence from grapevines and I'm trying to see what viruses there are. I was able to assemble most of a viral genome (15000 out of 18000 bp in one contig). I'm trying to estimate the coverage. Here's a breakdown of what I did:

raw reads -> trim adapters -> map to grape reference and remove mapped reads -> assemble trimmed, unmapped reads.

To estimate the coverage, I used the virus that had the largest contig (grapevine leafroll associated virus 3) and I mapped the trimmed unmapped reads to it. I started with about 7 million reads and 1.3 million of them mapped to the genome. The average read length was 50 bp and the total genome size was 18Kbp. Using the equation presented in this: I get

Coverage = Length of read * number of reads / haploid genome length

Coverage = 50 * 1.3x10^6 / 1.8x10^4

Coverage = 3611x? Could that be right?

coverage Assembly • 2.1k views
Entering edit mode
9.2 years ago
Michele Busby ★ 2.2k

For viruses, we often get that kind of coverage.

To sanity check it, I would open up the alignment in a viewer like IGV and see if it looks about right.

RNA viruses have really uneven coverage so you should expect to have regions of really high and (alas) no coverage. I don't know about DNA viruses. It may be the RNA secondary structure that makes it uneven.


Login before adding your answer.

Traffic: 1503 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6