Hello,
I am currently generating genome assemblies for fungal samples from Pacbio Hifi data. WHat i did so far is
- Generated assemblies using
FLYE
assembler and ranQUAST
+BUSCO
to see assembly stats and Completness- Flye Gave me really good assemblies but some of the chromosomes were still split into 2 scaffolds when comparing to a reference genome.
- Generated assemblies using
Hifiasm
assembler and ranQUAST
+BUSCO
to see assembly stats and Completness
- Generated assemblies using
- Hifiasm also gave me good assemblies and the split scaffolds where coming up as simgle chromosomes but these had too many extra scaffolds.
- Flye Gave me really good assemblies but some of the chromosomes were still split into 2 scaffolds when comparing to a reference genome.
- So i ran
RagTag-Scaffold
keeping the FLye assemblies as Query-input and Hifiasm assemblies as Reference-input- this resulted in some really good assemblies and i got down to really good number of chromosoems. QUAST and BUSCO stats look really good.
Now i was wondering if their is any other way to evaluate the assemblies to see if their are ny mis assembles, repeat collapsed reagions or anything else which should be evaluated in the genome assemblies.
I have a vague idea that reads are aligned back to assembly to generated BAM files (which i have done using minimap2 -x ava-hifi
) but i am not sure what to look for in these bam files. or how to evaluate the assemblies further.
ANy ideas/Hints in this regard will be really helpful.
Regards
Since this saga has been on going for a long time, it will be helpful to add a comment as to how you finally got to this point of what seem to be good assemblies.
It sounds like you only used PacBio HiFi data in this final iteration, but it would be helpful for others, as they decide what type of data to generate (not everyone will have the means to get illumina/ONT/HiFi data like you seem to have used over time, if I recall right).