Question

Genome Assembly QC from BAM files

0

Entering edit mode

8 hours ago

SomeOne ▴ 240

Hello,

I am currently generating genome assemblies for fungal samples from Pacbio Hifi data. WHat i did so far is

Generated assemblies using FLYE assembler and ran QUAST + BUSCO to see assembly stats and Completness
- Flye Gave me really good assemblies but some of the chromosomes were still split into 2 scaffolds when comparing to a reference genome.
  1. Generated assemblies using Hifiasm assembler and ran QUAST + BUSCO to see assembly stats and Completness
- Hifiasm also gave me good assemblies and the split scaffolds where coming up as simgle chromosomes but these had too many extra scaffolds.
So i ran RagTag-Scaffold keeping the FLye assemblies as Query-input and Hifiasm assemblies as Reference-input
- this resulted in some really good assemblies and i got down to really good number of chromosoems. QUAST and BUSCO stats look really good.

Now i was wondering if their is any other way to evaluate the assemblies to see if their are ny mis assembles, repeat collapsed reagions or anything else which should be evaluated in the genome assemblies.

I have a vague idea that reads are aligned back to assembly to generated BAM files (which i have done using minimap2 -x ava-hifi) but i am not sure what to look for in these bam files. or how to evaluate the assemblies further.

ANy ideas/Hints in this regard will be really helpful.

Regards

QC assembly BAM HiFi • 283 views

ADD COMMENT • link 3 hours ago by SomeOne ▴ 240

0

Entering edit mode

Since this saga has been on going for a long time, it will be helpful to add a comment as to how you finally got to this point of what seem to be good assemblies.

It sounds like you only used PacBio HiFi data in this final iteration, It would be helpful for others to know for sure, as they decide what type of data to generate (not everyone will have the means to get illumina/ONT/HiFi data like you seem to have used over time, if I recall right).

ADD REPLY • link 3 hours ago by GenoMax 153k

0

Entering edit mode

Hi,

I am not sure if i can call them good assemblies. Although based on QUAST and BUSCO stats, everything looks too good but somehow i have a doubt and for that i wanted to know how assemblies are evaluated further by aligning raw-reads to assembly. (If you can also point out some hints those will be great.)

Our initial attempt included ONT+Illumina sequencing for some samples to generate assemblies. This one did give us good Core-Chr but Accessory-Chr were too fragmented.
so we decided to go for PacBio HiFi as now it was cheaper.
For my own curiosity, i wanted to do assemblies based on atleat HiFi+ONT data but the results were not so good, as the N50 of ONT data was way less than N50 of HiFi. and read lengths too. (Hifi ~15kb and ONT ~8-9kb)

So just uing the HiFi data, I generated assemblies using FLYE, which gave contigs in range of 30-50 for initial assemblies and BUSCO (compared to fungi_odb12) were 99.7% 99.8% completness. Hifiasm on other-end Gave contings in 100 or more but BUSCO scores were same as >99.5% completness

I tested Quickmerge to merge assemblies, but it didnot work out for me. So i tested RAGTAG-scaffold. Which seems too work. BUSCO stats were same but in QUAST, #contigs < 30 adn n50 ~4.4mb

and now i am at this point to further analyze.

If you can also point out something, that will be great.

ADD REPLY • link 3 hours ago by SomeOne ▴ 240

score 2 · Accepted Answer · 2025-09-23

2

Entering edit mode

8 hours ago

colindaven 7.9k

You can have a look at the tools in the PAQman pipeline https://github.com/SAMtoBAM/PAQman, and maybe also Inspector https://github.com/Maggi-Chen/Inspector to evaluate assembly qc.