Question: k-mer distribution to estimate the heterozygosity of my assembly
gravatar for pablo
12 weeks ago by
pablo160 wrote:


I had PacBio CCS reads I assembled using hifiasm/v12 . I got an assembly of 1.9Gb whereas I expected a genome of 1.3Gb, which means there is a possible high reads heterozygosity rate (this is a plant genome, which could be possible).

To check that, I used GenomeScope. I got that k-mer distribution :


We can see two peaks, which corresponds to a diploïd genome/assembly and a heterzygosity rate of 3.13% which is pretty high.

Then, I used purge_dups tool to remove the heterozygous contigs of my assembly. I got a purged assembly of 1.2Gb, what's close to reality. I also checked the k-mer distribution :


We can see a very high peak but also a little hump at 35X coverage. Does this hump correspond to the diploid peak, which means purge_dups didn't work well on my assembly? Or is it like an artefact and I really have that high peak, which means my assembly is now hapoïd, purged of the heterozygous sequences?


pacbio kmer assembly purge_dups • 208 views
ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by pablo160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2683 users visited in the last hour