Hi,
I have a diploid, outcrossing plant genome assembly of ~1.1 Gb size. The original assembly is generated from PacBio reads. After genome annotation with the Mercator
software, the original assembly returned a gene count of 59,615. Then, the diploid genome assembly was phased with Hifiasm+Hi-C data. There, after running the annotation on two haplotype assemblies, the gene count was 75,936 for haplotype 1 and 67,233 for haplotype 2.
Then, we sent the pacbio assembly together with the haplotypes to Dovetail genomics for scaffolding. After the scaffolding with HiRise and annotation with Mercator
, I got the following gene count for the three assemblies:
Scaffolded_assembly- 75,874 Scaffolded_haplotype_1- 67,243 Scaffolded_haplotype_2- 86,465
Can anyone please shed some light on why there is such an improvement in gene count? Also, the gene count for haplotypes seems to be the opposite after scaffolding. Do you have any ideas? I do not know how many of those are high confidence or low confidence gene or splice variants.
It would be nice if you can give some feedback or any paper explaining such phenomena. Thanks.