Assembler suggestions for ~90Mb genome (using single-end Illumina & PacBio data only)
2
0
Entering edit mode
4.7 years ago
jmartin • 0

I have a large pool of single-ended Illumina data (HiSeq 2500) and PacBio RS data and I am looking for suggestions of the best assemblers to use in this situation.

I was first interested in MaSuRCA because it would run a hybrid assembly using both the short reads and the long reads in the same assembly operation (as opposed to building an assembly w/ Illumina and using the long reads to close gaps via some other tool). But I think MaSuRCA needs paired end illumina data to work. I am not 100% sure on that since they also give instructions for using single-ended data, and that single-ended data is entered as 'paired end' data in the config. So its possible they actually meant that Illumina data is the requirement (ie. it can't work on just long reads). But I think that's wishful thinking on my part.

So, are there any good assemblers that can use short Illumina SE + long PacBio RS data? What is the best assembly strategy for this kind of input data? Unfortunately I am not in a position to generate PE data for this organism.

Assembly • 966 views
ADD COMMENT
1
Entering edit mode
4.7 years ago
h.mon 35k

The fastest - and probably best in this case - strategy is to assemble using only PacBio (using wtdbg2 or flye, for example), then polish (several rounds) with Illumina, or Illumina+PacBio.

ADD COMMENT
0
Entering edit mode
4.7 years ago

h.mon's answer seems very reasonable.

I would also just assemble with Pacbio, perhaps using Canu or wtdbg2.

Even SPADES might be worth a try since the genome is not huge. I would forget the hybrid assemblers and just use the illumina reads for polishing using RACON etc, if SE reads are useful.

ADD COMMENT

Login before adding your answer.

Traffic: 1356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6