Question: Assembler suggestions for ~90Mb genome (using single-end Illumina & PacBio data only)
14 months ago
jmartin0 wrote:

I have a large pool of single-ended Illumina data (HiSeq 2500) and PacBio RS data and I am looking for suggestions of the best assemblers to use in this situation.

I was first interested in MaSuRCA because it would run a hybrid assembly using both the short reads and the long reads in the same assembly operation (as opposed to building an assembly w/ Illumina and using the long reads to close gaps via some other tool). But I think MaSuRCA needs paired end illumina data to work. I am not 100% sure on that since they also give instructions for using single-ended data, and that single-ended data is entered as 'paired end' data in the config. So its possible they actually meant that Illumina data is the requirement (ie. it can't work on just long reads). But I think that's wishful thinking on my part.

So, are there any good assemblers that can use short Illumina SE + long PacBio RS data? What is the best assembly strategy for this kind of input data? Unfortunately I am not in a position to generate PE data for this organism.

14 months ago
14 months ago
h.mon31k wrote:

The fastest - and probably best in this case - strategy is to assemble using only PacBio (using wtdbg2 or flye, for example), then polish (several rounds) with Illumina, or Illumina+PacBio.

ADD COMMENTlink written 14 months ago by h.mon31k
14 months ago
Hannover Medical School
colindaven2.3k wrote:

h.mon's answer seems very reasonable.

I would also just assemble with Pacbio, perhaps using Canu or wtdbg2.

Even SPADES might be worth a try since the genome is not huge. I would forget the hybrid assemblers and just use the illumina reads for polishing using RACON etc, if SE reads are useful.

ADD COMMENTlink written 14 months ago by colindaven2.3k
