Question: puzzled about "-kmer" options during de novo assembly
0
gravatar for Yingzi Zhang
23 months ago by
Yingzi Zhang60
Jeddah
Yingzi Zhang60 wrote:

Hi all, I am puzzled about "-kmer" options during de novo assembly.

First, I did k-mer frequency analysis.

Reported:

For P(x): Possible peaks including: 100 the unique peak is 100

For F(x): Possible peaks including: 10 103 the unique peak is 103

Raw kmer depth estiamtion:

Curve peak expect_depth

k-mer species 100 100.687

k-mer individuals 103 102.643

Thus I thought the kmer depth of my data is about 101. I thought I should use this value in the following analysis.

Then I began to correct sequencing errors and trim reads containing singleton kmers using bfc. I got advice from a boss. He said I just need to set -kmer value as 61. (my data is 100bp x 2) I once read another paper which set -kmer 61 also. So is it right to just set kmer value as 61? Is there nothing to do with my own data? Why? Thank you.

Yingzi

sequencing assembly • 687 views
ADD COMMENTlink modified 23 months ago by lieven.sterck7.8k • written 23 months ago by Yingzi Zhang60

You can also use kmergenie to find the optimal range for the assembly.

Generally speaking though people tend to keep 2/3rds of the read length as the kmer however it is always better to have multiple assemblies, and evaluate the same.

ADD REPLYlink written 22 months ago by harish280
2
gravatar for lieven.sterck
23 months ago by
lieven.sterck7.8k
VIB, Ghent, Belgium
lieven.sterck7.8k wrote:

The kmer you use for Kmer-freq analysis is not (or does not have) to be related to the kmer you use for the actual assembly and certainly not with the peak value of your freq analysis

The rule-of-thumb is set it at approx 2/3 of your read length (at least initially), so in that sense 61 is probably not a bad choice. It certainly can NOT be bigger than your read length!

However the kmer story is much more complex then this, it also has to do with your data quality, the heterzogosity level of your species etc

ADD COMMENTlink modified 23 months ago • written 23 months ago by lieven.sterck7.8k

Cool. additionally, where should kmer peak value be used, would you please explain a little bit? I know kmer frequency analysis can help estimate genome size and the extent of heterozygosity, is that where peak value be used? Also, I don't know how to evaluate the heterzogosity level (unfortunately I have to evaluate because some options depend on them). If it reported like this, is the heterzogosity level low enough?

for hybrid: a[1/2]=0.226337 a1=0.728825

kmer-species heterozygous ratio is about 0.12761

for hybrid: b[1/2]=0.167228 b1=0.569748

kmer-individual heterozygous ratio is about 0.0912432

ADD REPLYlink modified 23 months ago • written 23 months ago by Yingzi Zhang60
1

Kmer peak value is (to my knowledge) only used in genome size estimations indeed.

You can always upload your Kmer count table to for instance GenomeScope website , which will give you a nice overview (and graphs) of how your data looks like, including the heterozygosity estimation

minor EDIT: some assembly software (eg. ABySS) does uses this kmer-freq plot info to find for instance the lower-bound coverage (below which data is considered noise)

ADD REPLYlink modified 23 months ago • written 23 months ago by lieven.sterck7.8k

Wonderful. Many thanks.

ADD REPLYlink written 23 months ago by Yingzi Zhang60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1793 users visited in the last hour