How important is it to reduce uncollapsed heterozygosity in a Genome assembly before proceeding to Gene annotation?
With uncollapsed heterozygosity I mean: The existence of alternative contigs (haplotigs) for one same region of the genome, for an organism that possesses multiple alternative chromosomes (diploid, triploid, tetraploid, etc..)
I have heard that uncollapsed heterozygosity is harmful for scaffolding attempts, but don't know about gene annotation.
I use duplication in Busco results as a proxy for heterozygosity. But there is a tradeoff between reducing duplication and avoiding missing genes.
Busco results for assembly
Complete , Single-copy , Duplicated , Fragmented , Missing
2070 (98 %) 1646 (78 %) 424 (20 %) 28 (1 %) 23 (1 %) Before eliminating some haplotigs
2065 (97 %) 1710 (81 %) 355 (17 %) 25 (1 %) 31 (1 %) After eliminating some haplotigs