Hello,
I am currently generating genome assemblies for fungal samples from Pacbio Hifi data. WHat i did so far is
- Generated assemblies using
FLYE
assembler and ranQUAST
+BUSCO
to see assembly stats and Completness- Flye Gave me really good assemblies but some of the chromosomes were still split into 2 scaffolds when comparing to a reference genome.
- Generated assemblies using
Hifiasm
assembler and ranQUAST
+BUSCO
to see assembly stats and Completness
- Generated assemblies using
- Hifiasm also gave me good assemblies and the split scaffolds where coming up as simgle chromosomes but these had too many extra scaffolds.
- Flye Gave me really good assemblies but some of the chromosomes were still split into 2 scaffolds when comparing to a reference genome.
- So i ran
RagTag-Scaffold
keeping the FLye assemblies as Query-input and Hifiasm assemblies as Reference-input- this resulted in some really good assemblies and i got down to really good number of chromosoems. QUAST and BUSCO stats look really good.
Now i was wondering if their is any other way to evaluate the assemblies to see if their are ny mis assembles, repeat collapsed reagions or anything else which should be evaluated in the genome assemblies.
I have a vague idea that reads are aligned back to assembly to generated BAM files (which i have done using minimap2 -x ava-hifi
) but i am not sure what to look for in these bam files. or how to evaluate the assemblies further.
ANy ideas/Hints in this regard will be really helpful.
Regards
Since this saga has been on going for a long time, it will be helpful to add a comment as to how you finally got to this point of what seem to be good assemblies.
It sounds like you only used PacBio HiFi data in this final iteration, It would be helpful for others to know for sure, as they decide what type of data to generate (not everyone will have the means to get illumina/ONT/HiFi data like you seem to have used over time, if I recall right).
Hi,
I am not sure if i can call them good assemblies. Although based on QUAST and BUSCO stats, everything looks too good but somehow i have a doubt and for that i wanted to know how assemblies are evaluated further by aligning raw-reads to assembly. (If you can also point out some hints those will be great.)
So just uing the HiFi data, I generated assemblies using FLYE, which gave contigs in range of 30-50 for initial assemblies and BUSCO (compared to fungi_odb12) were 99.7% 99.8% completness. Hifiasm on other-end Gave contings in 100 or more but BUSCO scores were same as >99.5% completness
I tested Quickmerge to merge assemblies, but it didnot work out for me. So i tested RAGTAG-scaffold. Which seems too work. BUSCO stats were same but in QUAST, #contigs < 30 adn n50 ~4.4mb
and now i am at this point to further analyze.
If you can also point out something, that will be great.