Question: Question about k-mer genie output
0
gravatar for stacy734
9 months ago by
stacy73420
stacy73420 wrote:

Hi everyone,

I installed k-mer genie and ran it on a 120GB Illumina dataset.

I get this output:


running histogram estimation Extrapolating number of distinct kmers 70000| processing | [---------------------------------------------------Total time Wallclock 20664.4 s Linear estimation: ~23513 M distinct 71-mers are in the reads K-mer sampling: 1/4484 going to estimate histograms for values of k: 121 111 101 91 81 71 61 51 41 31 21 fitting model to histograms to estimate best k

Caught exception in fit_histogram worker thread (histfile = histograms-k121.histo): Traceback (most recent call last): File "/panfs/pan1.be-md.ncbi.nlm.nih.gov/refseq_wgs/kmer_genie/kmergenie-1.7016/scripts/decide", line 63, in fit_histogram rc, stdout, stderr = run(command) File "/panfs/pan1.be-md.ncbi.nlm.nih.gov/refseq_wgs/kmer_genie/kmergenie-1.7016/scripts/decide", line 44, in run process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) File "/opt/python-2.7env/lib/python2.7/subprocess.py", line 711, in __init__ errread, errwrite) File "/opt/python-2.7env/lib/python2.7/subprocess.py", line 1343, in _execute_child raise child_exception OSError: [Errno 13] Permission denied

(similar lines deleted)

Execution of 'scripts/decide' failed (return code 1). If this is a fresh Kmergenie install, try running 'make check'.


Note:
1. make check was "OK". 2. Rscript did not produce an error

Question:

Since the output indicated "Linear estimation: ~23513 M distinct 71-mers are in the reads", does that mean that 71 was the best value of k? If that is the case I don't really need to see the histogram.

Any advice on what the problem might be, or the interpretation of the output I did get would be appreciated.

kmer genie kmer k-mer • 381 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by stacy73420

First off, I do not believe that Kmer Genie is based on solid theoretical ground, and do not recommend using it to pick a kmer length; there is no reason to believe the kmer length with the greatest unique kmer count would yield the best assembly. The best approach is to do multiple assemblies and pick the one that comes out the best. You can do that very quickly with BBMap's tadwrapper.sh program, but for optimal results (time permitting), you should do it with the actual assembler you plan to use.

If you want to use Kmer Genie anyway, and are encountering errors, please make sure you are using the latest version. That said, it kind of looks like you do not have permission to read one of the executable files, or permission to write somewhere, so check the permissions of the directory you are executing it in, and make sure Python is installed correctly.

ADD REPLYlink written 9 months ago by Brian Bushnell14k

Thank you for the sane advice.

I will proceed with multiple assemblies with different k-mers.

ADD REPLYlink written 9 months ago by stacy73420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1283 users visited in the last hour