Question: My kmer distribution is weird
gravatar for Picasa
3.7 years ago by
Picasa530 wrote:


I used kmergenie and sga preqc to evaluate my data before assembly.

However my graphs of kmer distribution are a bit weird; I am not sure how to interpret it.

I know that k-mers with low count typically contain sequencing errors and I should have a peak somewhere.

But here I have no peak, do you have a clue about what is going on ?

Sga preqc


distribution kmer • 1.5k views
ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 3.7 years ago by Picasa530

Did you by any chance pre-filter your data by quality? Usually the absence of peaks indicates too less coverage for your species or contamination in your samples.

If you did filter it, just try to run it without filtering or leniency while trimming by quality values (something like q=10 instead of the usual 20 or 30)

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Rohit1.4k

I used Trimmomatic to filter ma data with Q>30 and min(length)=40.

Those graphs are the raw reads; However I discard only 5% after trimming step so the graphs are quite close for trimmed data.

ADD REPLYlink written 3.7 years ago by Picasa530

If they are for the raw-reads then probably it is just the coverage problem i.e. you need much more coverage to get your species sequenced. Try to check how much of coverage you might have with the (TotalBases/GenomeSize).

If you have run these on trimmed reads, your min-len=40 and the kmer=51, which means a significant amount of data might be lost, so just increase the min-length to 52. Q>30 is already too strict.

ADD REPLYlink written 3.7 years ago by Rohit1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1086 users visited in the last hour