What's a good method to resolve segmental duplications vs high levels of heterozygosity in a genome assembly? I've mapped my reads back to the genomic contigs (unscaffolded) and plotted a distribution of coverage:
This shows that a portion of the contigs (~1/4 of the assembly) has double the coverage (100X) of the other portion (50X). I interpret this as either:
- The contigs at double the coverage is duplicated in the genome. I guess this will have to be a recent duplication?
- The contigs at half the coverage are extremely heterozygous, which would mean most of the genome is very heterozygous.
I guess it could also be a mix of both cases.
The contig assembly was performed with Abyss. It's an diploid arthropod estimated to be 3.6gb experimentally.