Question about k-mer genie output
0
0
Entering edit mode
7.3 years ago
stacy734 ▴ 40

Hi everyone,

I installed k-mer genie and ran it on a 120GB Illumina dataset.

I get this output:


running histogram estimation Extrapolating number of distinct kmers 70000| processing | [---------------------------------------------------Total time Wallclock 20664.4 s Linear estimation: ~23513 M distinct 71-mers are in the reads K-mer sampling: 1/4484 going to estimate histograms for values of k: 121 111 101 91 81 71 61 51 41 31 21 fitting model to histograms to estimate best k

Caught exception in fit_histogram worker thread (histfile = histograms-k121.histo): Traceback (most recent call last): File "/panfs/pan1.be-md.ncbi.nlm.nih.gov/refseq_wgs/kmer_genie/kmergenie-1.7016/scripts/decide", line 63, in fit_histogram rc, stdout, stderr = run(command) File "/panfs/pan1.be-md.ncbi.nlm.nih.gov/refseq_wgs/kmer_genie/kmergenie-1.7016/scripts/decide", line 44, in run process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) File "/opt/python-2.7env/lib/python2.7/subprocess.py", line 711, in __init__ errread, errwrite) File "/opt/python-2.7env/lib/python2.7/subprocess.py", line 1343, in _execute_child raise child_exception OSError: [Errno 13] Permission denied

(similar lines deleted)

Execution of 'scripts/decide' failed (return code 1). If this is a fresh Kmergenie install, try running 'make check'.


Note:
1. make check was "OK". 2. Rscript did not produce an error

Question:

Since the output indicated "Linear estimation: ~23513 M distinct 71-mers are in the reads", does that mean that 71 was the best value of k? If that is the case I don't really need to see the histogram.

Any advice on what the problem might be, or the interpretation of the output I did get would be appreciated.

kmer k-mer kmer genie • 2.2k views
ADD COMMENT
1
Entering edit mode

First off, I do not believe that Kmer Genie is based on solid theoretical ground, and do not recommend using it to pick a kmer length; there is no reason to believe the kmer length with the greatest unique kmer count would yield the best assembly. The best approach is to do multiple assemblies and pick the one that comes out the best. You can do that very quickly with BBMap's tadwrapper.sh program, but for optimal results (time permitting), you should do it with the actual assembler you plan to use.

If you want to use Kmer Genie anyway, and are encountering errors, please make sure you are using the latest version. That said, it kind of looks like you do not have permission to read one of the executable files, or permission to write somewhere, so check the permissions of the directory you are executing it in, and make sure Python is installed correctly.

ADD REPLY
0
Entering edit mode

Thank you for the sane advice.

I will proceed with multiple assemblies with different k-mers.

ADD REPLY

Login before adding your answer.

Traffic: 2047 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6