Question: kmergenie does not predicted a best k value
gravatar for StudentBio
2.3 years ago by
StudentBio0 wrote:

hi please, I am trying out kmergenie to determine optimal kmer values for phoenix dactylifera genome assembly. and i get this error any suggestion please

./kmergenie /home/vshare/outils/trimmed.fastq --diploid

running histogram estimation

Setting maximum kmer length to: 133 bp

computing histograms (from k=21 to k=121): 21 31 41 51 61 71 81 91 101 111 121 

ntCard wall-clock time over all k values: 2172 seconds 

fitting model to histograms to estimate best k

could not fit histograms-k101.histo

could not fit histograms-k111.histo

could not fit histograms-k121.histo

could not fit histograms-k21.histo

could not fit histograms-k31.histo

could not fit histograms-k41.histo

could not fit histograms-k51.histo

could not fit histograms-k61.histo

could not fit histograms-k71.histo

could not fit histograms-k81.histo

could not fit histograms-k91.histo

could not predict a best k value

No best k found
kmergenie assembly • 870 views
ADD COMMENTlink modified 2.3 years ago by h.mon31k • written 2.3 years ago by StudentBio0

What is the expected genome size and ploidy, and target sequencing coverage? Did you check for contaminants (bacterial, human, whatever) and did you remove sequencing adapters?

ADD REPLYlink written 2.3 years ago by h.mon31k

i'm sorry but i dont know how i can expect genome size and ploidy, and target sequencing coverage and this for what i'm trying to find the best K for use Genomescope qui (detecting the genome characteristics) according to fastqc report: Sequence length 20-397 and %GC 42

about my reads i trimmed them using sickle

(I use the diploid option because according to a study they find that the phoenix dactylifera genome contains 18 pairs chromosomes )

ADD REPLYlink written 2.3 years ago by StudentBio0

Acording to another study, the genome size should be around 670Mb. You can calculate target sequencing coverage using this estimative of genome size. These considerations are important to design the best sequencing strategy and choose an appropriate assembler.

Why do you want to assemble, if there is a reference genome availbale? If all the data you have at hand are these short (length 20-397) reads, most likely your assembly will be a worst than the published genome. What analyses you intend to perform downstream? I have the feeling mapping to this reference genome will be a better approach.

ADD REPLYlink written 2.3 years ago by h.mon31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1628 users visited in the last hour