Question

Coverage for sample with many species

1

Entering edit mode

9.8 years ago

biobio ▴ 50

Hi,

I have sequence from grapevines and I'm trying to see what viruses there are. I was able to assemble most of a viral genome (15000 out of 18000 bp in one contig). I'm trying to estimate the coverage. Here's a breakdown of what I did:

raw reads -> trim adapters -> map to grape reference and remove mapped reads -> assemble trimmed, unmapped reads.

To estimate the coverage, I used the virus that had the largest contig (grapevine leafroll associated virus 3) and I mapped the trimmed unmapped reads to it. I started with about 7 million reads and 1.3 million of them mapped to the genome. The average read length was 50 bp and the total genome size was 18Kbp. Using the equation presented in this: http://res.illumina.com/documents/products/technotes/technote_coverage_calculation.pdf I get

Coverage = Length of read * number of reads / haploid genome length

Coverage = 50 * 1.3x10^6 / 1.8x10^4

Coverage = 3611x? Could that be right?

coverage Assembly • 2.2k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by biobio ▴ 50

Ram · Accepted Answer · 2014-07-02

3

Entering edit mode

9.8 years ago

Michele Busby ★ 2.2k

For viruses, we often get that kind of coverage.

To sanity check it, I would open up the alignment in a viewer like IGV and see if it looks about right.

RNA viruses have really uneven coverage so you should expect to have regions of really high and (alas) no coverage. I don't know about DNA viruses. It may be the RNA secondary structure that makes it uneven.

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Michele Busby ★ 2.2k