Question: Total Length Of Assembled Scaffolds Is Greater Than Genome Length
gravatar for AW
5.6 years ago by
United Kingdom
AW350 wrote:


I would greatly appreciate some help with my problem.

I have just assembled denovo a genome from Illumina 100bp paired end reads, using SOAPdenovo2 and then GapCloser.

My total scaffold length is 1,062,995,336 base pairs (from 207528 scaffolds) and my haploid genome is approximately 1.2 Gb. From this I calculate a percentage coverage of 104%?

Have I calculated coverage incorrectly, or should I have filtered short scaffolds? I am unsure why the coverage is greater than 100%?

Thanks very much for any help


genome assembly coverage denovo • 2.2k views
ADD COMMENTlink modified 5.6 years ago by ugly.betty771.0k • written 5.6 years ago by AW350

How did you calculate 104%? from what you've said, your assembly is 1.06 Gb in size, and you are expecting 1.2 Gb so wouldn't your coverage be 88% (1.06/1.2)?

ADD REPLYlink written 5.6 years ago by cts1.6k
gravatar for Gabriel R.
5.6 years ago by
Gabriel R.2.6k
Center for Geogenetik Københavns Universitet
Gabriel R.2.6k wrote:

What I would do would be to align the raw reads back to your scaffolds then genotype to compute your coverage.

ADD COMMENTlink written 5.6 years ago by Gabriel R.2.6k
gravatar for ugly.betty77
5.6 years ago by
United States
ugly.betty771.0k wrote:

For one assembly I have been doing currently, I experienced similar problem with SGA. Jared Simpson recommended me to remove anything smaller than 2x read length to avoid polymorphic or repetitive being over-counted. After I removed those short scaffolds, the total size of assembly came to be close to what I got from other assemblers.

ADD COMMENTlink written 5.6 years ago by ugly.betty771.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2167 users visited in the last hour