Question

Help in interpreting an haploid assembled genome analyzed through BUSCO

0

Entering edit mode

4.5 years ago

Antonio R. Franco ★ 5.2k

We have assembled HiFi PacBio reads with an haploid-aware assembler (hifiasm). It is a tree genome

After analyzing the over 7900 contigs we got (expected genome size of 1,8Gb), we got these results obtained after using BUSCO, where it can be seen a very high number of duplicated sequences BUSCO analysis

Since the assembling has been done through an haploid aware assembler, we are wondering whether this high number of duplicated sequences is a consequence of the presence of the two haploid genomes in the analysis, or is a true indicative that our genome is full of duplicated and/or repetitive sequences

¿Any hint or clue?

BUSCO assembling • 1.4k views

ADD COMMENT • link updated 4.5 years ago by lieven.sterck 15k • written 4.5 years ago by Antonio R. Franco ★ 5.2k

score 2 · Answer 1 · 2021-01-12

From this picture only it will be hard to tell I'm afraid.

a few things anyway: if the two halpotypes would be present in your assembly the total assembly size should be nearly double as what you would expect. I assume the genome size you mentioned is the haploid size? (== 1C value) .

It could indeed also very well be that this particular genome is (recently or well retained) duplicated, that will also give a similar BUSCO result. Is there any other indication of duplication? (eg Ks value analysis or such). If you would make this plot for the poplar genome I believe it will be similar to what you present here. If you are working with haplotypes there is of course a thin line between allelic regions and duplicated regions.