Question

Very slow SoapDenovo2 assembly

0

Entering edit mode

9.4 years ago

jamesT ▴ 30

I am assembling a bacterial genome roughly ~7 Mbp in size from approximately 20 million 101 BP paired end reads, which should give me excellent coverage. Velvet completes this assembly and gives an okay n50 in approximately 5 minutes, but SoapDenovo2 has been running on the file for 6+ hours without even getting past the pregraph step. The same thing happened for both the 63mer and the 127mer programs. The output says it's on something like the 10 billionth read, which doesn't seem to make any sense. The server has plenty of RAM (120 GB) and 8 cores, and SoapDenovo2 is barely using any of that RAM, so that's clearly not the issue. The command I'm currently running is:

all -s /data/config -K 63 -R -F -o graph_prefix 1>ass.log 2>ass.err

and the config file is:

#maximal read length
max_rd_len=101
[LIB]
#average insert size
avg_ins=300
#if sequence needs to be reversed
reverse_seq=0
#in which part(s) the reads are used
asm_flags=3
#use only first 100 bps of each read
rd_len_cutoff=101
#in which order the reads are used while scaffolding
rank=1
# cutoff of pair number for a reliable connection (at least 3 for short insert size)
pair_num_cutoff=3
#minimum aligned length to contigs for a reliable read location (at least 32 for short insert size)
map_len=32
#fastq file for single reads
p=assembly.fastq

Where assembly.fastq is an interleaved paired end reads file. Does anyone know what I might be doing wrong to get such a long assembly time?

genome Assembly • 3.1k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.4 years ago by jamesT ▴ 30

1

Entering edit mode

I'm not sure that this will be the solution to your problem but, for SE reads, you should use "q" instead of "p". In your case:

q=assembly.fastq

ADD REPLY • link 9.3 years ago by iraun 6.2k