I have a question regarding the kmer frequency distribution and Genome complexity.
I have the kmer distribution numbers (from kmergenie) for a newly sequenced genome of lizards, which are known to be highly polymorphic and have a highly repetitive genome. The distribution shows peaks at different kmer frequency numbers. For instance 19, 25, 29, 37, 49, 59, 69, 79, 83, 89. I do not understand this multiple peaks and multiple sub-peaks, including sub-peaks at half the kmer of a bigger peak.
I have read that kmer frequency peak at lesser than 20, about 17 means bacterial contamination. And a sub peak at half the value of a main peak means polymorphism.
http://arxiv.org/pdf/1308.2012.pdf - Heterozygosity and halk peak
https://groups.google.com/forum/#!topic/bgi-soap/xKS39Nz4SCE - Polymorphism with multiple peaks
But I cannot make anything out of the pattern I have right now. Does it mean there are chances of contamination. Or it that the genome is heterozygous, polymorphic and repetitive genome? Or something that I am missing??