Question: De novo plant genome assembly
2.3 years ago by
alslonik150 wrote:

Hello clever community!

I need your advice. I am working on a de novo plant genome assembly of ~400 Mb. I have Chromium 10x data, which was assembled with supernova. I also have Illumina paired end reads. Now I have additional data of PacBio reads, 120x roughly. The genome is diploid and I am thinking about using Falcon.

What do you think should be the best strategy:

  1. Assembling PacBio reads and then using a tool to integrate the two assemblies? Is there anything like this? Which tool would you use?

  2. Using a tool that can assemble the genome from both the chromium and the PacBio reads? Is there anything like it?

  3. Assembling the PacBio reads and using chromium 10x and the illumina for polishing? If I assemble with Falcon, what tool should I use for polishing?

4? Anything else that I am missing to get the best out of what I can get?

Thank you very much in advance! Alex

Give a look to the BioNano optical maps and its use in getting an assembled genome

What do the results of the chromium assembly look like? What about the Illumina PE reads? Have you tried to assemble them? It would be useful to see some stats of what those two assemblies look like.

Here is a nice tutorial about how to polish PacBio assemblie: Polish PacBio assembly with latest PacBio tools : an affordable solution for everyone

Thanks. Re: 10x assembly: it is ~200Mb size after ordering with ALLMAPs which is 2/3 of the expected size. BUSCO shows 88% complete plant BUSCOs. Illumina repeats were never successfully assembled. I actually am thinking of doing it now.

DBG2OLC is a hybrid assembler.

Thanks, Ric. will check it out.

2.3 years ago by
VIB, Ghent, Belgium
lieven.sterck8.7k wrote:

Falcon is not a bad choice, an alternative might be Canu (if you have the computational resources for it)

1) MEDUSA (as well as QuickMerge) is one of those integrating assembly/scaffolding tools

3) Pilon, Arrow, and there will be others I guess

4) Canu, but with the same remark as Carambakaracho for MaSuRCa

2.3 years ago by
France / Toulouse / GeT-Plage
Rox1.2k wrote:

Hello again alslonik !

Here I'll add my little pinch of salt and recommend you having a look on that great manual : . Of course it was not tested on plant genome, but it help you to orient your choice concerning assembly strategy depending on the technology you used and on your sequencing depth.

I already saw that you wanted to give it a try to quickmerge so you may have already saw that manual. As I tried quickmerge myself with 2 different PacBio only assembly, I have to say I was really satisfied with the result of quickmerge concerning contiguity and completeness. As you did a Falcon assembly, you can try merging a Falcon assembly and a Canu assembly, it may give some improvement as well, if you have the time of trying that of course !



2.3 years ago by
Carambakaracho2.2k wrote:

I don't have any experience with Falcon, so I can't help on that, but this is my advice on your other questions. In any case, your PacBio coverage is quite decent, so you might expect relatively good results from the Falcon assembly.

  1. Integration of assemblies is usually not trivial - though I might just lack a good reference.
  2. SPAdes can integrate all that data afaik. You can use the chromium contigs as "trusted contigs"
  3. No experience, but my guess is you risk to polish out any heterzygosity.
  4. MaSuRCa can handle both PacBio and Illumina - however, you won't be able to use the chromium data directly.
