Question: Lower Assembly Quality Despite Higher Coverage?
gravatar for GTR
3 months ago by
GTR0 wrote:

Hi everyone,

I wanted to see the effect of coverage on the assembly quality to see at which point there is a diminishing return. I am using paired-end reads only (101x2 with 520 insert size), no mate pairs or long reads. Normally a higher coverage is supposed to increase NGA50, but instead, the contig NGA50 has gone down while the LGA50 has gone up. I have five levels of coverage: 10X, 15X, 20X, 25X and 30X.

15X has the highest contig NGA50 and lowest LGA50, while 30X has the lowest NGA50 and highest LGA50. The order of high NGA50 to low NGA50 is in this order: (Best)15X, 10X, 20X, 25X, 30X(Worst).

The k-mer size used for assembly was 25 bp and was run with SOAPDenovo2 and low-frequency k-mers were not discarded.

I used QUAST to evaluate the assembly.

What explanation(s) could there be for these results?

Thank you.

ADD COMMENTlink modified 3 months ago by h.mon24k • written 3 months ago by GTR0
gravatar for h.mon
3 months ago by
h.mon24k wrote:

low-frequency k-mers were not discarded.

Low frequency kmers most of the time are errors. If you increase coverage, you increase the amount of errors, and this may be the cause of the worst NGA50 with increased coverage. Did you perform error correction before assembly? You may try error correcting the reads, or removing bad kmers.

Even so, there is no simple answer to your question. Taking only one two measures (NGA50 and LGA50) as overall measure of assembly quality is not advisable. Your coverage range is very narrow, many assemblers need 50x-100x coverage. Also, the interplay between assembly quality and coverage also depends on assembler, genome complexity, and other factors.

ADD COMMENTlink modified 3 months ago • written 3 months ago by h.mon24k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1029 users visited in the last hour