Hi Everyone,
I am working with Fusarium Oxysporum genomes (size: ~50-60 mb) and we are going for genome sequencing. Main goal is to perform De-novo genome assemblies for downstream analysis.
Goal: Get chromosome level or near-chromosome level or longest possible Scaffolds in genome assembly, for comparison and identify Core chromosomes and accessory chromosomes.
Background information:
- Total 45 samples sequenced with
- Illumina short Read Sequencing at 100x
- 12 samples also sequenced with Nanopore Long Read Sequencing at 75x
Assembly Methodology I thought of:
- Illumina Short Reads: primary assembly via SPADES. (also via Masurca and combine both assemblies via quickMerge)
- Nanopore Reads: Hybrid assembly using NanoPore+Illumina sequences togather in Spades and Masurca.
In publications, i see that authors use different methodologies and tools for genome assemblies. My questions are
- Is there any Best Practice in eukaryotic genome assmebly ?
- At the specified coverage, is hybrid assembly a good approach ?
- Is quickmerg (merges multiple assembles togather) a good appoach to get longer scaffolds?
Any help or point toward resources will be helpfull.
For the short-read strains:
I would recommend just using SPADES
For the strains with long-reads:
I think spades and masurca as a hybrid long+short read pipeline can be essentially forgotten now.
Assembling only the long-reads with an assembler like flye will likely give you the most contiguous assembly (or very close to it), assuming a decent read length.
If you need to use the short reads for accuracy, polish the long-read assembly afterwards.
I think it is very hard to give a generic pipeline, however you can evaluate all your assemblies using BUSCO/OMARK/Merqury in order to see if the long-read assemblies are doing as well as the short-read and public assemblies.
Luckily oxysporum already has plenty of contiguous assemblies so you can compare with them.
Hi, Thanks for your suggestions.
Previously, I used Pilon for polishing and i did 10 rounds. (did polishing over and over untill i was left with less entries in changes file generated by pilon) Is this a good approach ? as i have more samples now and running this many rounds will take alot of time.
I think 10 rounds in excessive. I usually only run 1-3 rounds (as long-read accuracy has improved I have used fewer and fewer rounds), but the first round does the vast majority of the work regardless.
You can compare accuracy metrics versus an illumina only assembly of the same strain or reference genome to see if the changes actually improve anything.
Should I use both Recon and Pilon for polshing ? Recon uses Long-reads and Pilon uses short-read. if both should be used, is their any prefered order of usage ?
You can check stats if it helps with the polishing. You can also look at medaka for nanopore long-read polishing too. Generally pipelines polish with the short-reads last.