Hello,
The haploid genome size estimated from my diploid plant is about 480 Mb. This is consistent with the number from K-mer analysis using GenomeScope2. However, my assembly size, produced from HiFi PacBio reads is about 780 Mb.
My first guess was that my assembly contains uncollapsed haplotypes. I looked at the distribution of coverage of one of my bam files (Illumina reads mapped to the genome assembly). I observe one pick around 1.5 which would be a coverage of about 30X, what I would expect for my sample. If I had uncollapsed haplotypes, I would expect to see a peak around 15X as well but I don't. I wonder where this discrepancy could come from and what other tests I could do to check this?
Just some numbers: The total length of scaffolds with coverage between 10X and 20X is only about 23 Mb. And total length of scaffolds with no reads mapped to them is only about 6 Mb. The genome is repeat-rich, using repeatmodeler and repeat masker, about 50% of the genome was masked as repeats.
Thank you!