Question

Scaffolding hifiasm assembly with CLR?

0

Entering edit mode

7 months ago

Phlupp ▴ 20

Hi all!

For a plant genome assembly project, I have 55x CLR, 28x HiFi and 14x Illumina reads.

Which workflow would be the best to get the most of the data?

Hifiasm alone resulted in 417 contigs with L90: 58 and N50: 20Mb. 96.8% Busco score

The best scaffold result is 307 scaffolds with L90: 46, N50: 30Mb, gaps: 0.015%, 96.7% Busco score. I have canu correcttrimmed the CLR reads and (Racon) polished them with the Hifi reads before using them to scaffold (LRScaf) above mentioned hifiasm assembly. Then I racon and pilon polished the scaffolded assembly to achieve above mentioned best result.

Is this already the best approach? Or is it possible that I introduced errors to the hifiasm assembly by adding the CLR information? I know that Hi-C would improve the hifiasm assembly much more, but I got the task to test out everything with the provided data.

I would be very thankful for your advice and thoughts on this!

hifiasm scaffolding CLR HiFi • 1.6k views

ADD COMMENT • link updated 7 months ago by lieven.sterck 15k • written 7 months ago by Phlupp ▴ 20

GenoMax · Answer 1 · 2025-02-06

1

Entering edit mode

7 months ago

lieven.sterck 15k

Sounds like a decent and acceptable approach to me, also the stats of the result look fine (I assume few very large scaffolds, Mb size, and a lot very small ones?, not very uncommon though) .

You could consider switching Hifiasm to flye for the assembly step, but should not expect miracle improvements by doing so

What is the expected genome size of your species btw?

ADD COMMENT • link 7 months ago by lieven.sterck 15k

1

Entering edit mode

Thank you for your reply lieven.sterck!

It is nice to know, that I am on the right track.

Yes, the scaffold length histogram shows that I have around 30 large ones (60Mb - 10Mb) and the rest are small (<10Mb).

The estimated genome size is 1.2Gb with 15 chromosomes.

Okay, I am running a CLR only flye assembly right now to test it. Or do you mean a flye assembly with the HiFi reads?

ADD REPLY • link 7 months ago by Phlupp ▴ 20

1

Entering edit mode

I think flye accepts all those inputs (or can be tricked into accepting them all ;) ) , so I would personally run an assembly with all possible data to start off with.

Flye has quite decent documentation and 'protocols' so go and have a look at it.

ADD REPLY • link 7 months ago by lieven.sterck 15k

1

Entering edit mode

Ah interesting, thanks for this tip lieven.sterck!

After reading through some closed issues about hybrid assemblies and the faq, docs I am running this hybrid approach:

flye --pacbio-raw $HiFireads $CLRreads $ILreads1 $ILreads2
--iterations 0 
...

Then resume flye-polishing with HiFireads as recommended in the faq (under hybrid assembly with HiFi and ONT).

ADD REPLY • link updated 7 months ago by GenoMax 153k • written 7 months ago by Phlupp ▴ 20