Question

How to understand improvement in genome annotation?

0

Entering edit mode

2.2 years ago

anikcropscience ▴ 230

Hi, I have a diploid, outcrossing plant genome assembly of ~1.1 Gb size. The original assembly is generated from PacBio reads. After genome annotation with the Mercator software, the original assembly returned a gene count of 59,615. Then, the diploid genome assembly was phased with Hifiasm+Hi-C data. There, after running the annotation on two haplotype assemblies, the gene count was 75,936 for haplotype 1 and 67,233 for haplotype 2.

Then, we sent the pacbio assembly together with the haplotypes to Dovetail genomics for scaffolding. After the scaffolding with HiRise and annotation with Mercator, I got the following gene count for the three assemblies:

Scaffolded_assembly- 75,874 Scaffolded_haplotype_1- 67,243 Scaffolded_haplotype_2- 86,465

Can anyone please shed some light on why there is such an improvement in gene count? Also, the gene count for haplotypes seems to be the opposite after scaffolding. Do you have any ideas? I do not know how many of those are high confidence or low confidence gene or splice variants.

It would be nice if you can give some feedback or any paper explaining such phenomena. Thanks.

Hi-C genome pacbio diploid annotation • 321 views

ADD COMMENT • link 2.2 years ago by anikcropscience ▴ 230