Regarding K mer analysis
3.6 years ago
DL ▴ 30

Hello everyone,

I am going to do genome assembly for that i did kmer analysis to see that which kmer value is good for the assembly. genome is heterozygous so i found two peaks in kmer analysis. i performed kmer analysis using kmer value 21, 32 and 64 now i am confused which kmer value is best for assembly. At kmer value=64, i got large genome size. Please explain which parameter i will see for good assembly. K_mer_analysis

genome Assembly next-gen sequence • 2.1k views
From my perspective, everything in your figure says to me that the best assembled genome out of the three is the one with kmer=64. I looked at each statistic and kmer=64 has the best result for each, including the narrowest gap between min and max. A larger k-mer is ideal as it essentially provides for more 'uniqueness' in the base sequences that will form the assembled genome. The downside to using a large k-mer is the computational expense associated with it, as the resulting de Bruijn graph that is generated using such a larger k-mer would be complex.

Some questions for you:

  • Why did you choose these k-mers in particular?
  • How are the alignment rates if you re-align your sample(s) to each assembled genome? - I assume that the highest alignment is at k-mer=64
Thank You Kevin for your reply.

i have one doubt. what should i do if the genome is heterozygous ?? for that 63 kmer is ohk ???

For that, I cannot comment - apologies. I just know that by the standard measures, the k-mer=63 genome looks better.

You may find this thread of use: Recommendations For Heterozygous Genome Assembly Software

I also found this recent (2014) publication:

Apologies that I cannot assist further!


