Question: DSK error on k-mer lengths up to, say, 160
0
gravatar for s.vandenhurk
2.5 years ago by
s.vandenhurk10
Netherlands
s.vandenhurk10 wrote:

I am looking for a k-mer analysis tool that is capable of analysing kmers longer then 31. I thought I hit the jackpot when I read about DKS but so far I have had no luck. Is there any other easy to use tool that is capable of k-mers with a size up to 91+?

I have tried the linux package, and got it working up to 31-mers. then I installed from source and also got it working up to 31-mers. then I read the entire manual (should have done so in the first place). I ran:

rm -Rf CMake* && cmake -Dk4=160 .. && make

and got some errors, 31-mers still worked fine. Then I changed my source installation from:

cmake

to:

cmake -Dk4=150 ..

and got the same errors. so now I am wondering if you could help me fix this situation.

I have gcc version 4.8.2

The errors I get:

[100%] Building CXX object utils/CMakeFiles/dsk2ascii.dir/dsk2ascii.cpp.o
In file included from /home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/gatb_core.hpp:38:0,
                 from /home/SHU/Desktop/DSK/dsk-2.0.2-Source/utils/dsk2ascii.cpp:3:
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/tools/collections/impl/BagFile.hpp: In instantiation of ‘gatb::core::tools::collections::impl::BagCountCompressedFile<Item>::~BagCountCompressedFile() [with Item = gatb::core::kmer::impl::Kmer<>::Count]’:
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/utils/dsk2ascii.cpp:72:1:   required from here
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/tools/collections/impl/BagFile.hpp:179:189: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘u_int64_t {aka long unsigned int}’ [-Wformat=]
         printf("In %llu B  (%llu MB ) Out %llu  B  (%llu MB ) ratio  %f \n",_sizeInput,_sizeInput/(1024LL*1024LL), _sizeOutput,_sizeOutput/(1024LL*1024LL), _sizeInput / (float) _sizeOutput);
                                                                                                                                                                                             ^
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/tools/collections/impl/BagFile.hpp:179:189: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 4 has type ‘u_int64_t {aka long unsigned int}’ [-Wformat=]
cc1plus: warning: unrecognized command line option "-Wno-ambiguous-member-template" [enabled by default]

from /home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/bank/impl/BankBinary.cpp:20:
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/system/api/Exception.hpp: In constructor ‘gatb::core::system::ExceptionErrno::ExceptionErrno(const char*, ...)’:
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/system/api/Exception.hpp:140:47: warning: ignoring return value of ‘char* strerror_r(int, char*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
             strerror_r (errno, buffer, BUFSIZ);
                                               ^
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/bank/impl/BankBinary.cpp: In function ‘bool gatb::core::bank::impl::checkMagic(FILE*)’:
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/bank/impl/BankBinary.cpp:54:43: warning: ignoring return value of ‘size_t fread(void*, size_t, size_t, FILE*)’, declared with attribute warn_unused_result [-Wunused-result]
     fread (&value, sizeof(value), 1, file);
                                           ^

error I get on running dsk2ascii:

EXCEPTION: Type 'LargeInt<1>' has too low precision (64 bits) for the required 51 kmer size

 

dsk k-meranalysis • 951 views
ADD COMMENTlink modified 2.5 years ago by edrezen680 • written 2.5 years ago by s.vandenhurk10

Hello,

Just to be sure:  did you actually manage to build "dsk" with your k4 setting ? From the traces you gave, it seems that the compilation worked (got only warnings).

However, I think you have pointed a potential issue in the "dsk2ascii" binary for kmer size >=128. I am going to provide a correction and I will tell you when it's available.

Note that "dsk" itself should work, only "dsk2ascii" has the issue.

 

ADD REPLYlink written 2.5 years ago by edrezen680

dsk itself seems to work just fine, that is true. but dsk2ascii does not. it works perfect with 31mers, but it does not with 32+mers

ADD REPLYlink written 2.5 years ago by s.vandenhurk10
3
gravatar for edrezen
2.5 years ago by
edrezen680
France
edrezen680 wrote:

Hello,

I put a new version of DSK with a correction for dsk2ascii here : http://gatb-tools.gforge.inria.fr/versions/src/dsk-2.0.3-Source.tar.gz

Can you tell me if it works ?

ADD COMMENTlink written 2.5 years ago by edrezen680
1

It works, thank you so much

ADD REPLYlink written 2.5 years ago by s.vandenhurk10

I will let you know if it works when I get back to work on monday. I don't have access to the network from home. Thanks in advance

ADD REPLYlink written 2.5 years ago by s.vandenhurk10

It works up to a kmer size of 101 after that the partitioning step get's stuck at 0% with a -1% memory usage. any ideas?

ADD REPLYlink written 2.4 years ago by s.vandenhurk10

In order to investigate, could you tell us:

  • what is the size of your bank ?
  • the sequences of the bank have all the same size ? if not, do you have an idea of the min/max sizes of the sequences ?

Is it possible for you to provide the bank ? It would be easier for us to find what happens.

ADD REPLYlink written 2.4 years ago by edrezen680

I am not 100% sure, but I believe all sequences in my file are 101 bp long. the fasta file I want to count is 11GB big. Too bad I can't share the actual file because it contains company property.

/Desktop/DSK/dsk-2.0.3-Source/build$ ./dsk -file 'input/location.fasta' -kmer-size 160
[counting kmers]  0    %   elapsed:   0 min 0  sec    estimated remaining:   0 min 0  sec   cpu:   -
[DSK: Collecting stats on read sample   ]  0    %   elapsed:   0 min 0  sec    estimated remaining: 
[DSK: Collecting stats on read sample   ]  2    %   elapsed:   0 min 0  sec    estimated remaining: 
etc
[DSK: Collecting stats on read sample   ]  97   %   elapsed:   0 min 19 sec    estimated remaining: 
[DSK: Collecting stats on read sample   ]  98   %   elapsed:   0 min 19 sec    estimated remaining: 
[DSK: Collecting stats on read sample   ]  99   %   elapsed:   0 min 19 sec    estimated remaining: 
[DSK: Collecting stats on read sample   ]  100  %   elapsed:   0 min 20 sec    estimated remaining: 
[DSK: Collecting stats on read sample   ]  100  %   elapsed:   0 min 20 sec    estimated remaining:   0 min 0  sec   cpu:   21.9 %   mem: [ 98,  98,  98] MB 
[DSK: Pass 1/10328094, Step 1: partitioning    ]  0    %   elapsed:   0 min 0  sec    estimated remaining:   0 min 0  sec   cpu:   -1.0 %   mem: [ 98,  98,  99] MB

it remains stuck on this part

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by s.vandenhurk10

Ok, thanks for the information.

In fact, dsk cuts each sequence into kmers, so a sequence of length N will have N-K+1 kmers, where K is the kmer size. In your case, you try to use a kmer size of 160, which may be longer than the sequences of 101 bp. In other words, you should not try to use kmer size bigger than the length of your sequences.

Nevertheless, we have found a flaw in dsk in case there are many sequences of same size and a few sequences of much bigger size. The consequence is that the pass number (see 10328094 in the output you gave) is wrongly computed and may lead to strange behaviors.

Right now, I would suggest to use a kmer size not too big (99 for instance in your case) to be sure that it is less than your sequences length.

ADD REPLYlink written 2.4 years ago by edrezen680

yea Im not sure about what I was thinking with trying kmers of bigger then 101. I should map my reads and create contigs and do a kmer analysis on those contigs... thanks for waking me up

ADD REPLYlink written 2.4 years ago by s.vandenhurk10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 743 users visited in the last hour