Most of us working on large repetitive genomes are probably familiar with the kmer distribution analysis on raw short reads, where we find the peak for the diploid portion of the genome, a hunch in case of polyploidy and a smaller flatter peak for the duplicated portion of the genome. This is usually done to make kmer based genome size estimation.
My question is, does it make sense to look at distribution of kmers in already assembled sequences? And if it does make sense, is it more logical to use large or short kmers?
I looked at the 7mer, 21mer, 55mer and 155mer distribution in an assembly of beet (Plant, eudicotyledon). It's 'just' a peakless descending curve, where sometimes a hunch is distinguishable. On a biological level, is this anyway informative?