I am trying to assemble a genome of ~5MB and I have 250bp PE reads. I tried assembling with Abyss:
abyss-pe k=64 name=novo in='r1.fastq r2.fastq'
This resulted in a scaffold N50 of 50k and total assembly size of 5MB (and if you run with k=128, N50=60k and total assembly size=4.5MB). I wanted to see if I can improve this using SPades so I ran it like this:
spades.py -1 r1.fastq -2 r2.fastq --careful -k 21,33,55,77,99,127 -o spades_assembly
And then used QUAST to get assembly stats like so:
quast-5.0.2/quast.py scaffolds.fasta -o report
The surprising result is that the resulting stats are a lot worse (low N50 of 1455, very high total length of 18 million) and I have to think that something went wrong or maybe I am missing something. --careful flag runs error correcting which is not done in Abyss but I don't think this is the reason? Full output of QUAST is below:
Assembly scaffolds # contigs (>= 0 bp) 32842 # contigs (>= 1000 bp) 2918 # contigs (>= 5000 bp) 170 # contigs (>= 10000 bp) 85 # contigs (>= 25000 bp) 50 # contigs (>= 50000 bp) 34 Total length (>= 0 bp) 25940092 Total length (>= 1000 bp) 11070707 Total length (>= 5000 bp) 6648867 Total length (>= 10000 bp) 6094453 Total length (>= 25000 bp) 5557460 Total length (>= 50000 bp) 4920491 # contigs 14063 Largest contig 446405 Total length 18374693 GC (%) 42.66 N50 1455 N75 708 L50 1327 L75 6199 # N's per 100 kbp 84.79
Thanks for any input!