Question

Spades is running over a week

2

Entering edit mode

2.2 years ago

Princy ▴ 60

Hi, I am using Spades for bitter melon WGS fastq data. now it's been one week but it is still running and there is no contig.fasta file has been created yet. now only these folder has created yet.

ls
assembly_graph_after_simplification.gfa  contigs.paths       K21   mismatch_corrector  run_spades.yaml  tmp
assembly_graph.fastg                     corrected           K33   params.txt          scaffolds.fasta
assembly_graph_with_scaffolds.gfa        dataset.info        K55   pipeline_state      scaffolds.paths
before_rr.fasta                          input_dataset.yaml  misc  run_spades.sh       spades.log

command i used -

spades.py  -t 30  --careful -1 S1_1.fastq -2  S1_2.fastq  -o S681

and it is still running.

Fastq Spades WGS Assembly • 1.4k views

ADD COMMENT • link 2.2 years ago by Princy ▴ 60

1

Entering edit mode

There are a lot of caveats and recommendations at SPAdes - St. Petersburg genome assembler github site. Have you tried any variations? Or getting subcomponents of the pipeline to run successfully? Does the process have sufficient memory and disk (i.e. your computer is not swapping, etc.). Have you every run it successfully on maybe a smaller genome in the past? (to see how it performs in your environment?).

ADD REPLY • link 2.2 years ago by seidel 11k

score 2 · Answer 1 · 2022-02-17

2

Entering edit mode

2.2 years ago

Mensur Dlakic ★ 27k

First, using the --careful switch will make your assembly slower in general, so you are already pushing it towards longer times. Next, there are three K?? directories created, which tells you how many k-mer overlaps have been done already. You are now on third k-mer and will have 3-4 more to go, so you are roughly half-way through.

Why is it so slow? Most likely because you have a large dataset (large number of reads) or a slow computer. Or both. Impossible to tell until you let us know about your computer and your dataset.

If you type tail -f spades.log it will give you a running summary of the process. You can leave it like that and it will keep adding new lines as it goes along, so you will get some idea that things are happening. Opening the same file in a text editor will tell you somewhere near top what k-mer lengths will be done, so you can gauge how much time is left.

ADD COMMENT • link 2.2 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Thank you so much sir for your kind comment. I am working on the cluster using 1 node and 1 ppn. The total file size is 13 Gb. I need contig.fasta file, will it create at last?

tail -f spades.log
106:33:34.054   781M / 789M  INFO   DatasetProcessor         (dataset_processor.cpp     : 118)   processed 44000000reads, flushing
110:05:39.557   781M / 789M  INFO   DatasetProcessor         (dataset_processor.cpp     : 118)   processed 45000000reads, flushing
114:42:24.199   781M / 793M  INFO   DatasetProcessor         (dataset_processor.cpp     : 118)   processed 46000000reads, flushing
118:55:20.147   781M / 793M  INFO   DatasetProcessor         (dataset_processor.cpp     : 118)   processed 47000000reads, flushing
122:24:26.209   781M / 793M  INFO   DatasetProcessor         (dataset_processor.cpp     : 118)   processed 48000000reads, flushing
127:05:21.575   781M / 793M  INFO   DatasetProcessor         (dataset_processor.cpp     : 118)   processed 49000000reads, flushing
130:53:51.666   781M / 793M  INFO   DatasetProcessor         (dataset_processor.cpp     : 118)   processed 50000000reads, flushing
134:14:40.983   781M / 793M  INFO   DatasetProcessor         (dataset_processor.cpp     : 118)   processed 51000000reads, flushing
138:50:25.776   781M / 793M  INFO   DatasetProcessor         (dataset_processor.cpp     : 118)   processed 52000000reads, flushing
142:40:18.478   781M / 793M  INFO   DatasetProcessor         (dataset_processor.cpp     : 118)   processed 53000000reads, flushing

ADD REPLY • link 2.2 years ago by Princy ▴ 60

2

Entering edit mode

I need contig.fasta file, will it create at last?

At some point you will get a contig.fasta file but it is difficult to say how long it takes. It depends on the -k argument used in the command line. By default, spades will use the following k-mers 21,33,55,77,99,127 and right now you are halfway through (k-mer 55).

With longer k-mers spades should take less time to solve the assembly-graph into a final assembly so it will take less than a week.

Finally, for a genome (bitter melon) of 300 Mb it is quite normal for spades to take this long. The algorithm is not well optimized for large genomes.

ADD REPLY • link 2.2 years ago by andres.firrincieli 3.6k