Question

Question about k-mer genie output

0

Entering edit mode

7.3 years ago

stacy734 ▴ 40

Hi everyone,

I installed k-mer genie and ran it on a 120GB Illumina dataset.

I get this output:

running histogram estimation Extrapolating number of distinct kmers 70000| processing | [---------------------------------------------------Total time Wallclock 20664.4 s Linear estimation: ~23513 M distinct 71-mers are in the reads K-mer sampling: 1/4484 going to estimate histograms for values of k: 121 111 101 91 81 71 61 51 41 31 21 fitting model to histograms to estimate best k

Caught exception in fit_histogram worker thread (histfile = histograms-k121.histo): Traceback (most recent call last): File "/panfs/pan1.be-md.ncbi.nlm.nih.gov/refseq_wgs/kmer_genie/kmergenie-1.7016/scripts/decide", line 63, in fit_histogram rc, stdout, stderr = run(command) File "/panfs/pan1.be-md.ncbi.nlm.nih.gov/refseq_wgs/kmer_genie/kmergenie-1.7016/scripts/decide", line 44, in run process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) File "/opt/python-2.7env/lib/python2.7/subprocess.py", line 711, in __init__ errread, errwrite) File "/opt/python-2.7env/lib/python2.7/subprocess.py", line 1343, in _execute_child raise child_exception OSError: [Errno 13] Permission denied

(similar lines deleted)

Execution of 'scripts/decide' failed (return code 1). If this is a fresh Kmergenie install, try running 'make check'.

Note:
1. make check was "OK". 2. Rscript did not produce an error

Question:

Since the output indicated "Linear estimation: ~23513 M distinct 71-mers are in the reads", does that mean that 71 was the best value of k? If that is the case I don't really need to see the histogram.

Any advice on what the problem might be, or the interpretation of the output I did get would be appreciated.

kmer k-mer kmer genie • 2.2k views

ADD COMMENT • link 7.3 years ago by stacy734 ▴ 40

1

Entering edit mode

First off, I do not believe that Kmer Genie is based on solid theoretical ground, and do not recommend using it to pick a kmer length; there is no reason to believe the kmer length with the greatest unique kmer count would yield the best assembly. The best approach is to do multiple assemblies and pick the one that comes out the best. You can do that very quickly with BBMap's tadwrapper.sh program, but for optimal results (time permitting), you should do it with the actual assembler you plan to use.

If you want to use Kmer Genie anyway, and are encountering errors, please make sure you are using the latest version. That said, it kind of looks like you do not have permission to read one of the executable files, or permission to write somewhere, so check the permissions of the directory you are executing it in, and make sure Python is installed correctly.

ADD REPLY • link 7.3 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you for the sane advice.

I will proceed with multiple assemblies with different k-mers.

ADD REPLY • link 7.3 years ago by stacy734 ▴ 40