I tried to use SSPACE-longread to scaffold the output of Platanus using 13X PacBIO coverage. The genome of the organism is around 1Gbp. The input scaffolds from platanus contain 521597 entries, 945Mbp, and has an N50 of 22573. Ran the program like this:
nice perl ~/src/SSPACE-LongRead_v1-1/SSPACE-LongRead.pl \ -c /tmp/LvPtA_gapClosed.fa \ -p /tmp/Lv_pacbio.all.fasta \ -t 40 -k 1 >run1.log 2>&1 &
It ran for about 3 days, slowly adding entries to scaffolds.fasta until it was up to 347Mb in size. (It was single threaded after blasr completed.) Then no change for 24 hours even thought it was spinning on 1 cpu at 100%. The end of the log file (which like all the other files had not changed in all of that time) was:
Used adding = 9068\n 2.f: Extend f113171,f481259,f44020,f35044,f487885,f77079 with (72888, f77079) ,f62008,f317278 with 1 links and best = 1 and total = 1\n 2.r: Extend f113171,f48125
EOL's are indicated to illustrate that there wasn't one on the final line. It looks like the program just hung somewhere mid write. I didn't notice anything in the logfile it created to indicate what the problem might be, but since that was half a Gb, it would have been easy to miss it.
Has anybody else managed to get this program to fully scaffold an organism this size? Or experienced this odd hang and found a way around it?
Too bad that it failed because the scaffolds it built seem to be very good, they line up nicely against the few BACs we have. The N50 for that file is 94058bp and it only had 3843 entries yet was about 1/3 the size of the organism. The program was slow but I would happily have let it run 3X as long if it had finished!