Question

Fungal Genome Assembly: In a messed up situation

0

Entering edit mode

10 weeks ago

SomeOne ▴ 250

Hello.

Background:

I have to assemble some fungal genomes, which are known to have accessory chromosomes.
Initially we tried Nanopore 1D + Illumina 150x2bp sequencing with a hope that i will be able to assemble comlete accessory chromosomes but results were fragmented specially the accessory chromosomes part.
I tried multiple assembles including Flye, Canu, NextDenovo, SMARTDenovo
Flye gave me kinda better assemblies with less number of contigs and good BUSCO scores ( i.e. for one of my samples C:99.8%[S:98.9%,D:0.9%],F:0.0%,M:0.2%,n:4494)
So i followed stadard procedure and performed polishing
- 1st: using Racon 2-rounds with ONT-data
- 2nd: using Medaka 1-round with ONT-data
- 3rd: using PILON 3-rounds with Illumina-data
Although I got almost complete core-chromosomes, but accessory chromosomes were fragmented.

Now:

We sequenced some of samples again with PacBIo HiFi
So far i have assembled HiFi reads using Flye, HiCanu, HiFiasm and Verkko assemblers.
assemblies seem to be good (I am performing assembly QC yet)

My Questions

Do i need to Polish HiFi assemblies too ? if yes then with which tool and which data for polishing ?
What are the next steps in PacBIo HiFi assembly? Should i just move toward repeats annotation and genome annotation ?
How to clean the HiFi assemblies ? I tried Funannotate-Clean with 1000bp cutoff. Although all contigs in my assemnlies are >100bp, funannotate-clean aligns the contigs with each other using minimap2 to identify duplicates and it removed alot of them because they had >95% percent_identity and percent-coverage (i.e in one sample hifiasm generated 261 contigs and funannotate-clean removed 101 contigs)
Is thier any way to make hybrid assemblies using all 3 data types (HiFi + ONT + Illumina). I was looking at FLYE assembler, but in documentation/issues on github i found that its not a good idea to mix HiFI and ONT as the error-rate is too much difference between them.
Any thoughts on merging the assemblies with tool like Quickmerge ?

Any Help is highly appreciated in this regard. Thank you.

Illumina Pacbio Nanopore assembly HiFi • 13k views

ADD COMMENT • link 6 weeks ago by SomeOne ▴ 250

0

Entering edit mode

out of curiosity: why do you consider this a "messed up situation"?

Sounds rather a nearly everyday quite common situation :)

ADD REPLY • link 10 weeks ago by lieven.sterck 16k

0

Entering edit mode

Perhaps this is related to a prior post from OP here --> Fungal Annotation Comparison

New addition this time appears to be PacBio data.

ADD REPLY • link 10 weeks ago by GenoMax 154k

0

Entering edit mode

Yes. Previously went upto annotation with the NP+Illumina data. But now starting from scratch again ;(

ADD REPLY • link 10 weeks ago by SomeOne ▴ 250

0

Entering edit mode

Actually, as i have now so much data, i personally feel not to totally drop any of it. Like we sequenced same samples with pacbio which were sequenced with nanopore. Option 1 is to just use pacbio and leave nanopore.

But i kinda feel like both can be used. Messed up situation is that I can’t seem to find answer to “how i should use them” :)

ADD REPLY • link 10 weeks ago by SomeOne ▴ 250

0

Entering edit mode

Sounds like you now have HiFi data (which should be the best of the lot) and it may be adequate to generate assemblies (which you seem to indicate are already good).

You can use the nanopore/illumina data later to align to the assemblies and see where it fell short (if it did). If the reads with nanopore are longer then you may be able to identify structural variants that may be missed in your assemblies (if applicable).

ADD REPLY • link 10 weeks ago by GenoMax 154k

0

Entering edit mode

Hi, thank you for the suggestions.

Technically, with Nanopore I got reads between 8000-9000 bp long (median length of reads in sample) but with HiFi we got more than double value (15-18 kb median read length)

So the Questions stays if i should just drop the nanopore and focus only on HiFi ? or there is anyother way to utilize them togather.

ADD REPLY • link 9 weeks ago by SomeOne ▴ 250

score 1 · Answer 1 · 2025-08-23

If the Nanopore reads are from the 10.4.1 flow cell, I suggest assembling them with Hifiasm using the "--ont" option.

Usually, assemblies made from HiFi reads are left unpolished, but some tools exist (for example, https://github.com/Nextomics/NextPolish2) that can slightly improve the HiFi assembly accuracy by polishing.

If by cleaning you mean removal of haplotypic duplication, a good choice is Purge_dups (https://github.com/dfguan/purge_dups).

I would try making assemblies with Nanopore and PacBio reads together. One solution is to combine them into a single file and provide it to Flye and Hifiasm as a file containing Nanopore reads. It's quite possible that this would lead to a better assembly than if using only reads of one type.

In my experience, Quickmerge introduces errors when applied to low-quality assemblies (see combine hifiasm and hifiasm-ont assemblies.). However, if your assemblies are of high quality, Quickmerge is probably worth trying.