Question

Best practices in Fungal Genome Assembly

0

Entering edit mode

8 weeks ago

Umer ▴ 100

Hi Everyone,

I am working with Fusarium Oxysporum genomes (size: ~50-60 mb) and we are going for genome sequencing. Main goal is to perform De-novo genome assemblies for downstream analysis.

Goal: Get chromosome level or near-chromosome level or longest possible Scaffolds in genome assembly, for comparison and identify Core chromosomes and accessory chromosomes.

Background information:

Total 45 samples sequenced with
Illumina short Read Sequencing at 100x
12 samples also sequenced with Nanopore Long Read Sequencing at 75x

Assembly Methodology I thought of:

Illumina Short Reads: primary assembly via SPADES. (also via Masurca and combine both assemblies via quickMerge)
Nanopore Reads: Hybrid assembly using NanoPore+Illumina sequences togather in Spades and Masurca.

In publications, i see that authors use different methodologies and tools for genome assemblies. My questions are

Is there any Best Practice in eukaryotic genome assmebly ?
At the specified coverage, is hybrid assembly a good approach ?
Is quickmerg (merges multiple assembles togather) a good appoach to get longer scaffolds?

Any help or point toward resources will be helpfull.

illumina assembly nanopore genome • 606 views

ADD COMMENT • link updated 8 weeks ago by samuel.a.odonnell ▴ 560 • written 8 weeks ago by Umer ▴ 100

1

Entering edit mode

For the short-read strains:
I would recommend just using SPADES

For the strains with long-reads:
I think spades and masurca as a hybrid long+short read pipeline can be essentially forgotten now.
Assembling only the long-reads with an assembler like flye will likely give you the most contiguous assembly (or very close to it), assuming a decent read length.
If you need to use the short reads for accuracy, polish the long-read assembly afterwards.

I think it is very hard to give a generic pipeline, however you can evaluate all your assemblies using BUSCO/OMARK/Merqury in order to see if the long-read assemblies are doing as well as the short-read and public assemblies.
Luckily oxysporum already has plenty of contiguous assemblies so you can compare with them.

ADD REPLY • link 8 weeks ago by samuel.a.odonnell ▴ 560

0

Entering edit mode

Hi, Thanks for your suggestions.

polish the long-read assembly afterwards

Previously, I used Pilon for polishing and i did 10 rounds. (did polishing over and over untill i was left with less entries in changes file generated by pilon) Is this a good approach ? as i have more samples now and running this many rounds will take alot of time.

ADD REPLY • link 8 weeks ago by Umer ▴ 100

0

Entering edit mode

I think 10 rounds in excessive. I usually only run 1-3 rounds (as long-read accuracy has improved I have used fewer and fewer rounds), but the first round does the vast majority of the work regardless.
You can compare accuracy metrics versus an illumina only assembly of the same strain or reference genome to see if the changes actually improve anything.

ADD REPLY • link 8 weeks ago by samuel.a.odonnell ▴ 560

0

Entering edit mode

Should I use both Recon and Pilon for polshing ? Recon uses Long-reads and Pilon uses short-read. if both should be used, is their any prefered order of usage ?

ADD REPLY • link 8 weeks ago by Umer ▴ 100

0

Entering edit mode

You can check stats if it helps with the polishing. You can also look at medaka for nanopore long-read polishing too. Generally pipelines polish with the short-reads last.

ADD REPLY • link 8 weeks ago by samuel.a.odonnell ▴ 560

score 1 · Answer 1 · 2024-05-28

The authors of the Vertebrate Genome Project published a very useful guide for assembling complex genomes which will be helpful for you and will cover the best practices and important considerations part of your question.

At the specified coverage, is hybrid assembly a good approach ?

This is impossible to say without a more detailed knowledge of the complexity of the genome. Sometimes hybrid approaches make a huge difference, but it all comes down to whether the long reads are long enough and provide enough coverage in the right areas to overcome said complexities. TLDR, you'll have to wait and see for F. oxysporum.

Is quickmerg (merges multiple assembles togather) a good appoach to get longer scaffolds?

Personally, I think this is a bad idea. By merging assemblies you can end up adding errors to your assembly. The difference between two or more assemblies may be real and biologically relevant. So by merging them, you not old inherit any assembly errors, but also can remove rare and biologically relevant aspects of the assembly.