Fungal Genome Assembly: In a messed up situation
1
0
Entering edit mode
23 days ago
SomeOne ▴ 240

Hello.

Background:

  • I have to assemble some fungal genomes, which are known to have accessory chromosomes.
  • Initially we tried Nanopore 1D + Illumina 150x2bp sequencing with a hope that i will be able to assemble comlete accessory chromosomes but results were fragmented specially the accessory chromosomes part.
  • I tried multiple assembles including Flye, Canu, NextDenovo, SMARTDenovo
  • Flye gave me kinda better assemblies with less number of contigs and good BUSCO scores ( i.e. for one of my samples C:99.8%[S:98.9%,D:0.9%],F:0.0%,M:0.2%,n:4494)
  • So i followed stadard procedure and performed polishing
    • 1st: using Racon 2-rounds with ONT-data
    • 2nd: using Medaka 1-round with ONT-data
    • 3rd: using PILON 3-rounds with Illumina-data
  • Although I got almost complete core-chromosomes, but accessory chromosomes were fragmented.

Now:

  • We sequenced some of samples again with PacBIo HiFi
  • So far i have assembled HiFi reads using Flye, HiCanu, HiFiasm and Verkko assemblers.
  • assemblies seem to be good (I am performing assembly QC yet)

My Questions

  1. Do i need to Polish HiFi assemblies too ? if yes then with which tool and which data for polishing ?

  2. What are the next steps in PacBIo HiFi assembly? Should i just move toward repeats annotation and genome annotation ?

  3. How to clean the HiFi assemblies ? I tried Funannotate-Clean with 1000bp cutoff. Although all contigs in my assemnlies are >100bp, funannotate-clean aligns the contigs with each other using minimap2 to identify duplicates and it removed alot of them because they had >95% percent_identity and percent-coverage (i.e in one sample hifiasm generated 261 contigs and funannotate-clean removed 101 contigs)

  4. Is thier any way to make hybrid assemblies using all 3 data types (HiFi + ONT + Illumina). I was looking at FLYE assembler, but in documentation/issues on github i found that its not a good idea to mix HiFI and ONT as the error-rate is too much difference between them.

  5. Any thoughts on merging the assemblies with tool like Quickmerge ?

Any Help is highly appreciated in this regard. Thank you.

Illumina Pacbio Nanopore assembly HiFi • 12k views
ADD COMMENT
0
Entering edit mode

out of curiosity: why do you consider this a "messed up situation"?

Sounds rather a nearly everyday quite common situation :)

ADD REPLY
0
Entering edit mode

Perhaps this is related to a prior post from OP here --> Fungal Annotation Comparison

New addition this time appears to be PacBio data.

ADD REPLY
0
Entering edit mode

Yes. Previously went upto annotation with the NP+Illumina data. But now starting from scratch again ;(

ADD REPLY
0
Entering edit mode

Actually, as i have now so much data, i personally feel not to totally drop any of it. Like we sequenced same samples with pacbio which were sequenced with nanopore. Option 1 is to just use pacbio and leave nanopore.

But i kinda feel like both can be used. Messed up situation is that I can’t seem to find answer to “how i should use them” :)

ADD REPLY
0
Entering edit mode

Sounds like you now have HiFi data (which should be the best of the lot) and it may be adequate to generate assemblies (which you seem to indicate are already good).

You can use the nanopore/illumina data later to align to the assemblies and see where it fell short (if it did). If the reads with nanopore are longer then you may be able to identify structural variants that may be missed in your assemblies (if applicable).

ADD REPLY
0
Entering edit mode

Hi, thank you for the suggestions.

Technically, with Nanopore I got reads between 8000-9000 bp long (median length of reads in sample) but with HiFi we got more than double value (15-18 kb median read length)

So the Questions stays if i should just drop the nanopore and focus only on HiFi ? or there is anyother way to utilize them togather.

ADD REPLY
1
Entering edit mode
21 days ago
shelkmike ★ 1.7k

If the Nanopore reads are from the 10.4.1 flow cell, I suggest assembling them with Hifiasm using the "--ont" option.

Usually, assemblies made from HiFi reads are left unpolished, but some tools exist (for example, https://github.com/Nextomics/NextPolish2) that can slightly improve the HiFi assembly accuracy by polishing.

If by cleaning you mean removal of haplotypic duplication, a good choice is Purge_dups (https://github.com/dfguan/purge_dups).

I would try making assemblies with Nanopore and PacBio reads together. One solution is to combine them into a single file and provide it to Flye and Hifiasm as a file containing Nanopore reads. It's quite possible that this would lead to a better assembly than if using only reads of one type.

In my experience, Quickmerge introduces errors when applied to low-quality assemblies (see combine hifiasm and hifiasm-ont assemblies.). However, if your assemblies are of high quality, Quickmerge is probably worth trying.

ADD COMMENT

Login before adding your answer.

Traffic: 3249 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6