Question: How To Estimate Genome Size Using K-Mer Coverage
0
gravatar for GAO Yang
6.9 years ago by
GAO Yang250
GAO Yang250 wrote:

Hi, I just got a genome assembly (de novo), and I want to estimate the genome size. According to some published papers, it can be done using the k-mer coverage. But I am not quite following this one: How to cut the genome to chosen K-mer ? And how to summary the K-mer abundance and plot a Poisson distribution, just like in the papers?

Could anybody provide any software name,or Perl module, or even some Pseudo-code? Thanks for your help!

genome coverage • 7.7k views
ADD COMMENTlink written 6.9 years ago by GAO Yang250

can you mention the paper you are referring to...

ADD REPLYlink written 6.9 years ago by Gjain5.3k

Sure, for example "The genome of the domesticated apple" Nature genetics 2010, supplementary Page9

ADD REPLYlink written 6.9 years ago by GAO Yang250
1

There is no mention of k-mer in supplementary, I believe you confused kmer with read. Otherwise you should read about the Lander-Waterman statistics. The only difficult part is the fitting of the poisson distribution mentioned in the article.

ADD REPLYlink written 6.9 years ago by Michael Dondrup45k

+1. Even though I wrote about K-mers, this article seems to have nothing about K-mers.

ADD REPLYlink written 6.9 years ago by Arun2.3k

Sorry about that~ I mixed them up~ But plz check this one:"Genome sequencing reveals insights into physiology and longevity of the naked mole rat" supplementary P3,doi:10.1038/nature10533

ADD REPLYlink written 6.9 years ago by GAO Yang250

In that case, you should know that the first link I have pointed to, which explains K-mer coverage related to genome size, is a tool (quake) to obtain all what you've asked for.

ADD REPLYlink written 6.9 years ago by Arun2.3k
2
gravatar for Arun
6.9 years ago by
Arun2.3k
Germany
Arun2.3k wrote:

Regarding your first question, about K-mer coverage and genome size, there seems to be different methods/algorithms different softwares use. EDIT: The idea in general is explained very well here. I don't follow what you mean by "how to cut the genome to chosen K-mer, could you please elaborate? To speculate about the K-mer distribution, it is done by obtaining the histogram/density plot by binning K-mers over different coverage. You'll see a smooth curve that resembles a poisson distribution. If there is bias in your k-mer distribution, you'll normally see an initial peak like this, from which you can decide the cut-off of coverage that you'll have to use to get rid of this bias.

ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Arun2.3k

Yeah, This is what I need! Thanks for that, I am going on with it @_@

ADD REPLYlink written 6.9 years ago by GAO Yang250

By the way, do you know how to apply this software on the Color-space reads (SOLiD output)? Maybe I need post another question about it ~ :)

ADD REPLYlink written 6.9 years ago by GAO Yang250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 776 users visited in the last hour