Hello.
Background:
- I have to assemble some fungal genomes, which are known to have accessory chromosomes.
- Initially we tried
Nanopore 1D + Illumina 150x2bp
sequencing with a hope that i will be able to assemble comlete accessory chromosomes but results were fragmented specially the accessory chromosomes part. - I tried multiple assembles including
Flye
,Canu
,NextDenovo
,SMARTDenovo
Flye
gave me kinda better assemblies with less number of contigs and good BUSCO scores ( i.e. for one of my samples C:99.8%[S:98.9%,D:0.9%],F:0.0%,M:0.2%,n:4494)- So i followed stadard procedure and performed polishing
- 1st: using
Racon
2-rounds with ONT-data - 2nd: using
Medaka
1-round with ONT-data - 3rd: using
PILON
3-rounds with Illumina-data
- 1st: using
- Although I got almost complete core-chromosomes, but accessory chromosomes were fragmented.
Now:
- We sequenced some of samples again with
PacBIo HiFi
- So far i have assembled HiFi reads using
Flye
,HiCanu
,HiFiasm
andVerkko
assemblers. - assemblies seem to be good (I am performing assembly QC yet)
My Questions
Do i need to Polish HiFi assemblies too ? if yes then with which tool and which data for polishing ?
What are the next steps in PacBIo HiFi assembly? Should i just move toward repeats annotation and genome annotation ?
How to clean the HiFi assemblies ? I tried
Funannotate-Clean
with 1000bp cutoff. Although all contigs in my assemnlies are >100bp, funannotate-clean aligns the contigs with each other usingminimap2
to identify duplicates and it removed alot of them because they had >95% percent_identity and percent-coverage (i.e in one sample hifiasm generated 261 contigs and funannotate-clean removed 101 contigs)Is thier any way to make hybrid assemblies using all 3 data types (HiFi + ONT + Illumina). I was looking at FLYE assembler, but in documentation/issues on github i found that its not a good idea to mix HiFI and ONT as the error-rate is too much difference between them.
Any thoughts on merging the assemblies with tool like
Quickmerge
?
Any Help is highly appreciated in this regard. Thank you.
out of curiosity: why do you consider this a "messed up situation"?
Sounds rather a nearly everyday quite common situation :)
Perhaps this is related to a prior post from OP here --> Fungal Annotation Comparison
New addition this time appears to be PacBio data.
Yes. Previously went upto annotation with the NP+Illumina data. But now starting from scratch again ;(
Actually, as i have now so much data, i personally feel not to totally drop any of it. Like we sequenced same samples with pacbio which were sequenced with nanopore. Option 1 is to just use pacbio and leave nanopore.
But i kinda feel like both can be used. Messed up situation is that I can’t seem to find answer to “how i should use them” :)
Sounds like you now have HiFi data (which should be the best of the lot) and it may be adequate to generate assemblies (which you seem to indicate are already good).
You can use the nanopore/illumina data later to align to the assemblies and see where it fell short (if it did). If the reads with nanopore are longer then you may be able to identify structural variants that may be missed in your assemblies (if applicable).
Hi, thank you for the suggestions.
Technically, with Nanopore I got reads between 8000-9000 bp long (median length of reads in sample) but with HiFi we got more than double value (15-18 kb median read length)
So the Questions stays if i should just drop the nanopore and focus only on HiFi ? or there is anyother way to utilize them togather.