Question: Is there a concensus on which k-mers should be counted in a histogram graph of kmer multiplicity vs frequency to estimate the genome size?
0
gravatar for Tom
23 months ago by
Tom20
United States
Tom20 wrote:

I'm trying to develop my own algorithm to count the correct number of true k-mers that are apart of the genome, and exclude all the unique/singleton kmers that are the result of sequencing errors (or snps). The only question is, I'm not sure where would be the most accurate possible cutoff. I want to start counting at the first local minima of the camel hump graph, however, some the k-mers starting at that threshold are still considered "noise" kmers.

 

Also, is there a term to distinguish kmers that should be counted as part of the genome? I've just been calling them true-kmers, but am not able to find a formalized term for it yet.

 

reference: http://pritchardlab.stanford.edu/publications/pdfs/Melsted11.pdf

jellyfish kmer kmergenie genome • 898 views
ADD COMMENTlink modified 22 months ago by trausch760 • written 23 months ago by Tom20
1

I call them "genomic kmers" and "error kmers".  Whether there is a consensus or not, the best place to draw the line depends on your specific dataset, and the relative impact of false-positives versus false-negatives for your particular purpose.  The concept of drawing a line at some specific threshold already forces a lot of assumptions on the data.

ADD REPLYlink modified 23 months ago • written 23 months ago by Brian Bushnell14k
0
gravatar for trausch
22 months ago by
trausch760
Germany
trausch760 wrote:

Instead of using a cutoff you may want to model the k-mer count distribution as a mixture of Poisson distributions for genomic k-mers and artificial k-mers as proposed by sga preqc, preprint is here http://arxiv.org/pdf/1307.8026v1.pdf. The preprint also discusses selection strategies for a suitable k.

 

ADD COMMENTlink modified 22 months ago • written 22 months ago by trausch760
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 858 users visited in the last hour