Question: Spades vs Velvelt_why do I get much more contigs with SPADEs?
gravatar for anna
2.1 years ago by
anna10 wrote:

Hi all,

I am running SPADEs assembly for Illumina reads on several bacterial genomes. I used this script: -1 ...R1.fastq -2 ...R2.fastq --careful -t 3 -m 30 -o

The same genomes were previously assembled with Velvet (exactly same raw data).

When I looked at the results, I have much more contigs with Spades!! Any idea why? Is there something I can do after running spades to improve the assembly quality of my genomes? Here an example of the differences I get with the 2 assemblers (after running "seqkit stats"):

scaffolds.fasta  FASTA   DNA        877  2,223,301       56  2,535.1  116,526  111  216    297       17  24,875 SPADES
Velvet.fa       FASTA   DNA        313  2,108,423      197  6,736.2  116,024  240  414  7,858        0  24,461  VELVET

file             format  type  num_seqs    sum_len  min_len  avg_len  max_len   Q1   Q2     Q3  sum_gap     N50 
scaffolds.fasta  FASTA   DNA      1,234  2,319,934       56    1,880  132,700  168  223    295       18  25,849 SPADES
Velvet.fa       FASTA   DNA        332  2,122,470      193    6,393  132,734  234  344  6,301        0  26,473  VELVET

thanks for any possible help! Anna

assembly • 1.9k views
ADD COMMENTlink modified 19 months ago by Biostar ♦♦ 20 • written 2.1 years ago by anna10

Your N50s are basically the same, its just that spades has kept many of the smaller contigs. Presumably velvet is stricter in its filtering processes, unless specified.

Your spades assembly is not necessarily worse than velvet, as the N50s are both good. Spades has actually given you more data, but now you have to consider what you’ll do with the potentially lower quality stuff. First thing you could do is just discard anything smaller than a kilo base (for example), and then run seqkit again - the numbers will start to look a lot more like Velvet’s I suspect.

ADD REPLYlink modified 19 months ago • written 19 months ago by Joe16k

You could try --careful option while running the SPADEs, It may reduce the number of mismatches and short indels.

ADD REPLYlink written 10 weeks ago by meowz40
gravatar for h.mon
2.1 years ago by
h.mon29k wrote:

It seems you are applying different minimal contig length to both datasets, with a lower threshold for SPAdes. This would increase contig count, without improving assembly quality - in fact, the opposite is probably true, these small contigs are most likely noise (unresolved repeats, low quality / low coverage contigs, etc).

Apply the same filter to both assemblies before comparing. And, most importantly, remember that length metrics alone are not a good indicator of assembly quality.

ADD COMMENTlink written 2.1 years ago by h.mon29k

thanks for your suggestion. However, I could not find a way to set the min contig size in Spades. Any idea if this is possible and how?

ADD REPLYlink written 2.1 years ago by anna10

Why don't you filter the assembly fasta? Suggestions on how to do this here, and here. You can also use from the BBTools package.

ADD REPLYlink written 2.1 years ago by h.mon29k
gravatar for Carambakaracho
2.1 years ago by
Carambakaracho2.0k wrote:

I'd recommend using quast for the comparison of assemblies. It gives you the number of contigs bigger than certain thresholds, N50, plus some overview graphs. Definitely worth trying

ADD COMMENTlink written 2.1 years ago by Carambakaracho2.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1669 users visited in the last hour