Question: kmergenie gives k value larger than read size
0
gravatar for Illinu
2.9 years ago by
Illinu90
Belgium
Illinu90 wrote:

Hello,

kmergenie is supposed to support different libraries in the same run. The manual states to include all paired-end libraries that will be used for the assembly. I calculated the best k value for 2 pe libraries, one has reads 90-99bp and the other one 290-299bp. The best k value is 103 which is not possible because it is larger than the smaller read.

Any ideas?

kmergenie libraries • 1.0k views
ADD COMMENTlink modified 21 months ago by Biostar ♦♦ 20 • written 2.9 years ago by Illinu90
0
gravatar for Rayan Chikhi
2.9 years ago by
Rayan Chikhi1.2k
France, Lille, CNRS
Rayan Chikhi1.2k wrote:

Hi,

Yes, perhaps you have so much coverage on your 290bp library, that using it alone is sufficient to get a very good assembly (with high k), than setting a low k just for the sake of using the 90bp library. Could you try it?

 

Rayan

ADD COMMENTlink written 2.9 years ago by Rayan Chikhi1.2k
0
gravatar for Illinu
2.9 years ago by
Illinu90
Belgium
Illinu90 wrote:

Hi Rayan,

The funny thing is that when I run kmergenie only with the 'larger' pe library I get a best k of 81...

ADD COMMENTlink written 2.9 years ago by Illinu90

Ohh that is odd. Could you please send me both HTML reports?

ADD REPLYlink written 2.9 years ago by Rayan Chikhi1.2k

Hi Rayan, when I run kmergenie in the cluster the html report does not generate. I tried running it in my desktop but it takes forever. Any alternative?

ADD REPLYlink written 2.9 years ago by Illinu90

It might be sufficient to copy-paste here the .dat file, and if possible, send me the .histo/.pdf files to kmergenie@cse.psu.edu, could you do that please?

To get reports, you can contact your cluster administrator, to ask him to install ghostscript. Kmergenie uses it to generate reports on machines where X is not running, i.e. clusters.

ADD REPLYlink written 2.9 years ago by Rayan Chikhi1.2k

I sent you everything by email. Thanks

ADD REPLYlink written 2.9 years ago by Illinu90

I've replied to Illinu by email, but let me copy my response here if anyone's interested. Also note that his organism is diploid.

 

Thanks much for the data, it's very interesting.

It seems that for the long reads alone, a k value of 180 would work as
well as short+long reads. To see this, notice that the histogram (.pdf) of
long reads at k=180 looks very similar to the short+long reads histogram
at k=180. However in the former, Kmergenie failed to fit the diploid model
to it, hence could not predict the number of genomic kmers.

Anyhow, I think that k=81 prediction for the long reads alone is probably
not the best here.

 

It seems that the diploid fit in Kmergenie could be improved to handle this dataset, but I don't really know how right now.

Anyhow, a best k value longer than the smaller library read size is still very likely here.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Rayan Chikhi1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1676 users visited in the last hour