Help in interpreting an haploid assembled genome analyzed through BUSCO
1
0
Entering edit mode
3.3 years ago

We have assembled HiFi PacBio reads with an haploid-aware assembler (hifiasm). It is a tree genome

After analyzing the over 7900 contigs we got (expected genome size of 1,8Gb), we got these results obtained after using BUSCO, where it can be seen a very high number of duplicated sequences BUSCO analysis

Since the assembling has been done through an haploid aware assembler, we are wondering whether this high number of duplicated sequences is a consequence of the presence of the two haploid genomes in the analysis, or is a true indicative that our genome is full of duplicated and/or repetitive sequences

¿Any hint or clue?

BUSCO assembling • 1.1k views
ADD COMMENT
2
Entering edit mode
3.3 years ago

From this picture only it will be hard to tell I'm afraid.

a few things anyway: if the two halpotypes would be present in your assembly the total assembly size should be nearly double as what you would expect. I assume the genome size you mentioned is the haploid size? (== 1C value) .

It could indeed also very well be that this particular genome is (recently or well retained) duplicated, that will also give a similar BUSCO result. Is there any other indication of duplication? (eg Ks value analysis or such). If you would make this plot for the poplar genome I believe it will be similar to what you present here. If you are working with haplotypes there is of course a thin line between allelic regions and duplicated regions.

ADD COMMENT

Login before adding your answer.

Traffic: 2168 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6