Question: In simple words - what is k-mer??
0
gravatar for alisa.kazarina21
3.0 years ago by
alisa.kazarina210 wrote:

Hi everyone! I'm quite new to NGS field, I'm working at the moment with 16S rRNA sequencing on Ion Torrent and I am trying to find a way to analyze my data. Everything is going +/- ok, but during alignment and taxonomic classification in Mothur I recieve many notes about my sequences that look like that:

> 1read-161813 is bad. It has no kmers of length 8. [WARNING]:
> 1read-161813 could not be classified. You can use the remove.lineage
> command with taxon=unknown; to remove such sequences.

And for one particular sample, due to this error, "unclassified" turned out to be 68 000 reads out of 160 000, which seems to me like a lot.

I've searched the internet to understand what is kmer but not sure i understand it completely. Is here anyone who could try to explain to me what is going on? >.< Can I just remove these sequences? Or should I change the kmer length from 8 to, say, 6 and try again?

Thank you!!

ADD COMMENTlink modified 3.0 years ago by Martombo2.7k • written 3.0 years ago by alisa.kazarina210
1

k-mer entry at WikiPedia.

all the possible substrings of length k that are contained in a string

ADD REPLYlink written 3.0 years ago by genomax92k
1

You do know many specific k-mers: an hexamer is a k-mer of length 6, a dimer is a k-mer of length 2. Etc.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Thibault D.690

dimer, trimer, pentamer, hexamer, septamer, octomer for sure. nonomer? decamer?

ADD REPLYlink written 3.0 years ago by i.sudbery9.7k

also read Oligonucleotide Vs K-Mer - one of my favorite biostar questions

ADD REPLYlink written 3.0 years ago by Jeremy Leipzig19k
2
gravatar for i.sudbery
3.0 years ago by
i.sudbery9.7k
Sheffield, UK
i.sudbery9.7k wrote:

A kmer is just a nucleotide sequence of a certain length. For instance a dinucleotide is a kmer where k=2.

When we talk about all kmers to talk about all the possible sequences of that length. So for example, when K=2 all the possible kmers are: AA AT AC AG TA TT TC TG CA CT CC CG GA GT GC GG

K is usually bigger than 2, so we can talk about all 4mers (256 of them), all 6mers (4096 of them), all 7mers (16,384 of them) etc.

ADD COMMENTlink written 3.0 years ago by i.sudbery9.7k
0
gravatar for Nicolas Rosewick
3.0 years ago by
Belgium, Brussels
Nicolas Rosewick9.2k wrote:

From wikipedia : The term k-mer typically refers to all the possible substrings of length k that are contained in a string

https://en.wikipedia.org/wiki/K-mer

Check these 68000 reads. What are their length ? sequences ?

ADD COMMENTlink written 3.0 years ago by Nicolas Rosewick9.2k
0
gravatar for H.Hasani
3.0 years ago by
H.Hasani970
Freiburg, Germany
H.Hasani970 wrote:

Hi,

k-mer can indicate low quality or contamination in your sequences. Usually you compute it by checking if there is a string of length k that occurs in the reads more than chance. Tools like Fastqc can also give you a visual representation to that and to what kind of sequence you see. The benefit of having such a qc measurement is that sometime, the adapter is not fully removed and therefore it will escape the direct test. Regarding the length, maybe can this post answer one thing or two.

hth

ADD COMMENTlink written 3.0 years ago by H.Hasani970
0
gravatar for Martombo
3.0 years ago by
Martombo2.7k
Seville, ES
Martombo2.7k wrote:

since nobody commented on what to do with these sequences: if a read doesn't have a kmer of length 8, it has to be shorter than 8 nucleotides (or maybe it has Ns that are removed), which means it's not going to be very informative for your analysis (even if you change the kmer length) and can be discarded.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Martombo2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1003 users visited in the last hour