I am facing a problem while estimating genome size using jellyfish. We have illumina reads for a shrimp and have done kmer analyses using kmers of 17 upto 32. All the histos when observed have dual peaks but when compared, the second peak does not change according to kmer size. So we considered the second peak as homozygous peak and took the peak height as coverage, calculated the genome size. But it grossly underestimates the genome size when compared to the estimation done using flow cytometry.
So now we are confused as to whether we should completely omit the first peak at all. Please suggest an approach or formula to estimate near accurately.