Question: kmergenie does not show all k-mers in diploid mode
0
gravatar for robin.vanvelzen.wur
2.5 years ago by
Netherlands
robin.vanvelzen.wur0 wrote:

Dear all, 

I am trying out kmergenie to determine optimal kmer values for plant genome assembly. Using the default settings I get a nice histogram for all the different k-mers, but using the --diploid parameter the histogram is truncated. See .dat outputs below. 

It seems that many of the k-mer histograms do not have any associated model fits (this is apparent in the html output (not shown here). Do you know what may be going wrong?

Many thanks for any advice! 

Robin

 

Default (haploid) model $ kmergenie filelist.txt -k 85 -t 8 -o kmergenie:

k genomic.kmers cov.cutoff
15 135570989 1
25 373492340 1
35 425306591 1
45 460648430 1
55 480548886 1
59 487082292 1
61 486570123 1
63 486719928 1
65 488075925 1
67 485561924 1
69 484863969 1
71 483620710 1
75 468859001 1
85 1932760 22

Diploid model $ kmergenie filelist.txt --diploid -k 85 -t 8 -o kmergeniediploid (note that estimates for k15, k45 and k>57 are missing):

k genomic.kmers cov.cutoff
25 349086825 1
35 390717507 1
51 425391521 1
53 425679437 1
55 426942015 1
57 426544332 1

 

kmergenie • 924 views
ADD COMMENTlink modified 2.5 years ago by Rayan Chikhi1.2k • written 2.5 years ago by robin.vanvelzen.wur0

Can you please send both HTML reports to kmergenie@cse.psu.edu?

The diploid model is more constrained, so it has higher chance to not fit to an histogram, as opposed to the haploid model, that is less constrained.

ADD REPLYlink written 2.5 years ago by Rayan Chikhi1.2k
2
gravatar for Rayan Chikhi
2.5 years ago by
Rayan Chikhi1.2k
France, Lille, CNRS
Rayan Chikhi1.2k wrote:

Thanks for sending me the histograms by email.

The coverage is very low (30x 15-mer coverage for homozygous regions), and heterozygosity looks low too. One can barery see a peak that would correspond to heterozygous k-mers. Yet, this peak is what the diploid model expects. So the haploid model for this type of histograms is supposed to work much better, I recommend using it.

On a side note, I expect that the heterozygous regions will not assemble well, and homozygous regions, better. The k predicted by kmergenie looks quite okay given the looks of the histograms.

Thanks for spotting a bug in the documentation, have corrected it.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Rayan Chikhi1.2k

Thanks for the help and the advice!

Heterozygosity is indeed low (that was one of the criteria to select the sample for sequencing). So if I understand correctly, the diploid model requires a substantial level of heterozygosity to work. It may be good to mention that requirement in the documentation. 

I will use the haploid model except for samples with higher levels of heterozygosity. 

ADD REPLYlink written 2.5 years ago by robin.vanvelzen.wur0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 553 users visited in the last hour