Help in interpreting an haploid assembled genome analyzed through BUSCO
9 months ago

We have assembled HiFi PacBio reads with an haploid-aware assembler (hifiasm). It is a tree genome

After analyzing the over 7900 contigs we got (expected genome size of 1,8Gb), we got these results obtained after using BUSCO, where it can be seen a very high number of duplicated sequences BUSCO analysis

Since the assembling has been done through an haploid aware assembler, we are wondering whether this high number of duplicated sequences is a consequence of the presence of the two haploid genomes in the analysis, or is a true indicative that our genome is full of duplicated and/or repetitive sequences

┬┐Any hint or clue?

9 months ago

From this picture only it will be hard to tell I'm afraid.

a few things anyway: if the two halpotypes would be present in your assembly the total assembly size should be nearly double as what you would expect. I assume the genome size you mentioned is the haploid size? (== 1C value) .

It could indeed also very well be that this particular genome is (recently or well retained) duplicated, that will also give a similar BUSCO result. Is there any other indication of duplication? (eg Ks value analysis or such). If you would make this plot for the poplar genome I believe it will be similar to what you present here. If you are working with haplotypes there is of course a thin line between allelic regions and duplicated regions.


