Is Genome heterozygosity a problem for gene annotation?
1
0
Entering edit mode
4.3 years ago

Quick question:

How important is it to reduce uncollapsed heterozygosity in a Genome assembly before proceeding to Gene annotation?

With uncollapsed heterozygosity I mean: The existence of alternative contigs (haplotigs) for one same region of the genome, for an organism that possesses multiple alternative chromosomes (diploid, triploid, tetraploid, etc..)

I have heard that uncollapsed heterozygosity is harmful for scaffolding attempts, but don't know about gene annotation.

I use duplication in Busco results as a proxy for heterozygosity. But there is a tradeoff between reducing duplication and avoiding missing genes.

Busco results for assembly

Complete , Single-copy , Duplicated , Fragmented , Missing

2070 (98 %) 1646 (78 %) 424 (20 %) 28 (1 %) 23 (1 %) Before eliminating some haplotigs

2065 (97 %) 1710 (81 %) 355 (17 %) 25 (1 %) 31 (1 %) After eliminating some haplotigs

Cheers, Ricardo

heterozygosity diploid genome gene annotation • 910 views
ADD COMMENT
3
Entering edit mode
4.3 years ago

solely for the technical aspect of gene prediction: not I would say. Perhaps you might encounter some issues with RNAseq data (if you're using that) having higher multi-map rate than it should be, but other then that I don't really see any issue.

interpretation of the results will be a different thing though. Eg. the final number of genes predicted (or rather: truly present in the genome) will of course not be accurate.

The proxy you're using is perhaps also not the best one: if in your species the genome (or some regions in it) are effectively duplicated then you will overestimate the heterozygosity.

ADD COMMENT

Login before adding your answer.

Traffic: 2346 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6