I've been working on genome assemblies for strains of Bacillus subtilis with 10kb pacbio reads. I have about 1.1 million 10kb reads for each strain, which is far more coverage than I need to get a good assembly, so I randomly sampled 100,000 reads for each strain and I use those 100,000 reads for the assembly. One of the strains I'm working with was evolved during a competition experiment, rather than constructed as the other two were. I've had no problem using flye to complete the assemblies for the two constructed strains. However, when I try to assemble the .fastq sequences for the evolved strain, the assembly will always get stuck on one particular step, 'Aligning reads to the graph'. The graph in question should be the repeat graph that the program builds a few steps prior. I've never been able to get past this step, it will always time out after about 12 hours (because I only allotted 12 hours with our job managing system).
I know these assemblies can take a lot of time, however my other assemblies only took about 70 minutes to complete. And the particular step, 'Aligning reads to the graph' only takes about 5 minutes according to the logs from the flye read out. They also provide a progress indicator during that step where every 10% of the process is noted, and I've never even seen the 0% indicator for the assembly of the evolved strain.
Does anyone know what my problem could be here or how I could work around it? I tried sampling more reads (200,000) to use for the assembly and that gave me the same result. I'm currently running a job where I use smaller sampling of reads (50,000; which should still plenty of coverage, >80X) but I'm still not sure if that will work.