I am assembling a bacterial genome roughly ~7 Mbp in size from approximately 20 million 101 BP paired end reads, which should give me excellent coverage. Velvet completes this assembly and gives an okay n50 in approximately 5 minutes, but SoapDenovo2 has been running on the file for 6+ hours without even getting past the pregraph step. The same thing happened for both the 63mer and the 127mer programs. The output says it's on something like the 10 billionth read, which doesn't seem to make any sense. The server has plenty of RAM (120 GB) and 8 cores, and SoapDenovo2 is barely using any of that RAM, so that's clearly not the issue. The command I'm currently running is:
all -s /data/config -K 63 -R -F -o graph_prefix 1>ass.log 2>ass.err
and the config file is:
#maximal read length max_rd_len=101 [LIB] #average insert size avg_ins=300 #if sequence needs to be reversed reverse_seq=0 #in which part(s) the reads are used asm_flags=3 #use only first 100 bps of each read rd_len_cutoff=101 #in which order the reads are used while scaffolding rank=1 # cutoff of pair number for a reliable connection (at least 3 for short insert size) pair_num_cutoff=3 #minimum aligned length to contigs for a reliable read location (at least 32 for short insert size) map_len=32 #fastq file for single reads p=assembly.fastq
Where assembly.fastq is an interleaved paired end reads file. Does anyone know what I might be doing wrong to get such a long assembly time?