Question: Spades vs Velvelt_why do I get much more contigs with SPADEs?
1
gravatar for anna
6 months ago by
anna10
anna10 wrote:

Hi all,

I am running SPADEs assembly for Illumina reads on several bacterial genomes. I used this script:

spades.py -1 ...R1.fastq -2 ...R2.fastq --careful -t 3 -m 30 -o

The same genomes were previously assembled with Velvet (exactly same raw data).

When I looked at the results, I have much more contigs with Spades!! Any idea why? Is there something I can do after running spades to improve the assembly quality of my genomes? Here an example of the differences I get with the 2 assemblers (after running "seqkit stats"):

scaffolds.fasta  FASTA   DNA        877  2,223,301       56  2,535.1  116,526  111  216    297       17  24,875 SPADES
Velvet.fa       FASTA   DNA        313  2,108,423      197  6,736.2  116,024  240  414  7,858        0  24,461  VELVET

file             format  type  num_seqs    sum_len  min_len  avg_len  max_len   Q1   Q2     Q3  sum_gap     N50 
scaffolds.fasta  FASTA   DNA      1,234  2,319,934       56    1,880  132,700  168  223    295       18  25,849 SPADES
Velvet.fa       FASTA   DNA        332  2,122,470      193    6,393  132,734  234  344  6,301        0  26,473  VELVET

thanks for any possible help! Anna

assembly • 535 views
ADD COMMENTlink modified 4 weeks ago by Biostar ♦♦ 20 • written 6 months ago by anna10

Your N50s are basically the same, its just that spades has kept many of the smaller contigs. Presumably velvet is stricter in its filtering processes, unless specified.

Your spades assembly is not necessarily worse than velvet, as the N50s are both good. Spades has actually given you more data, but now you have to consider what you’ll do with the potentially lower quality stuff. First thing you could do is just discard anything smaller than a kilo base (for example), and then run seqkit again - the numbers will start to look a lot more like Velvet’s I suspect.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by jrj.healey6.8k
0
gravatar for h.mon
6 months ago by
h.mon19k
Brazil
h.mon19k wrote:

It seems you are applying different minimal contig length to both datasets, with a lower threshold for SPAdes. This would increase contig count, without improving assembly quality - in fact, the opposite is probably true, these small contigs are most likely noise (unresolved repeats, low quality / low coverage contigs, etc).

Apply the same filter to both assemblies before comparing. And, most importantly, remember that length metrics alone are not a good indicator of assembly quality.

ADD COMMENTlink written 6 months ago by h.mon19k

thanks for your suggestion. However, I could not find a way to set the min contig size in Spades. Any idea if this is possible and how?

ADD REPLYlink written 6 months ago by anna10

Why don't you filter the assembly fasta? Suggestions on how to do this here, and here. You can also use reformat.sh from the BBTools package.

ADD REPLYlink written 6 months ago by h.mon19k
0
gravatar for Carambakaracho
6 months ago by
Switzerland
Carambakaracho510 wrote:

I'd recommend using quast for the comparison of assemblies. It gives you the number of contigs bigger than certain thresholds, N50, plus some overview graphs. Definitely worth trying

ADD COMMENTlink written 6 months ago by Carambakaracho510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 620 users visited in the last hour