SPAdes assembler k-mer > default 127 possible?
2
0
Entering edit mode
7.7 years ago
Anand Rao ▴ 640

This is cross-posted with Increasing kmer limit in SPAdes, but because I may never receive a reply from that post, which is > 1 year old, I am re-posting afresh.

Based on assembly stats comparison at choice of k-mer size for metagenomic assembly, between k-mer=20 Vs higher values, I seek answers to these questions:

1.. Is increase of k-mer > 127 possible with SPAdes?

2.. Is it NECESSARY to check k-mer values > 127 for SPAdes? [Read length trimming, prior to SPAdes assembly, yielded trimmed length distribution of - 41- 150bp]

I've got 2*150bp (PE data) for fungal spores (haploid genome) where I want to play with higher k-mer values to assess any improvement in assembly contiguity and completeness. So sharing your experience would help me. Thank you!

Genome assembly SPAdes k-mer • 5.7k views
ADD COMMENT
4
Entering edit mode
7.7 years ago

With 150bp reads, it is very unlikely that you will benefit from K>127, which is convenient because you can't do that with Spades. You can extend reads by merging them or with Tadpole, though, which will often improve the assembly. If you download BBMap, there is a suggested procedure for this process in bbmap/pipelines/assemblyPipeline.sh.

Incidentally, Tadpole supports arbitrarily long values of K, but Spades will still generally give a substantially more contiguous assembly.

ADD COMMENT
0
Entering edit mode

OK, good to know k>127 is not possible with SPAdes.

This is possibly premature question, I've gotta first read your

suggested procedure for this process in bbmap/pipelines/assemblyPipeline.sh. :)

But is your recommendation to extend reads using Tadpole, and THEN using SPAdes on these results?

I'll take a look at BBTools 37.5, after the admins of my univ. HPCC install it. I am assuming this merge you talk of does not use bbmerge.sh, or else you might have suggested that instead....

ADD REPLY
1
Entering edit mode

No, the the assembler options are just there to give the usage syntax because I tend to forget it; you only need to use one of them. The choice of assembler kind of depends on the dataset.

As for BBMerge, yes, I am talking about that when I say "merge" :)

ADD REPLY
3
Entering edit mode
7.0 years ago
ikangkim ▴ 50

https://github.com/ablab/spades/issues/40

The above page may be helpful for you. I tried with SPAdes 3.11.1 and succeeded.

In brief,

(1) To use k-mer > 127, you have to download and compile source code.

(2) Edit "spades_compile.sh" file as below and run the script.

cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX="$PREFIX" -DSPADES_MAX_K=251 $* "$BASEDIR/src"

(3) After compile, edit "options_storage.py" file in /SPAdes-3.xx.x/share/spades/spades_pipeline as below. (maybe around line 40 in the section #other constants)

MAX_K = 251

(Note that 251 was just my preference.)

ADD COMMENT
0
Entering edit mode

Thanks, these comments are very useful. We ran into installation and memory problems with this process (spades v3.15.2), but we eventually got around it. The full process is outlined below if anyone encounters similar problems.

(i) Line 224 in the spades_compile.sh script was replaced with “cmake -G “Unix Make-files” -DCMAKE_INSTALL_PREFIX = “$PREFIX” -DSPADES_MAX_K = 251 $* “$BASEDIR/src”,

(ii) CMake (v3.15.3) was loaded in local Linux environment, and the modified compilation script was run per standard installation instructions.

(iii) After compilation and the creation of the python script “options_storage.py” in /share/spades/spades_pipeline/, this new script was edited at line 59 to “MAX_K = 251” to ensure longer kmers would be tolerated when running spades, and line 74 was changed to “K_MERS_250 = [21, 33, 55, 77, 99, 127, 249]” to ensure that longer kmers were automatically included when spades generated combined assembly from different kmer sizes (note: for memory reasons, a value of 249 is used here as it is just under the new maximum limit of 251).

(iv) As these modifications to utilize longer kmers caused run failure in initial testing due to memory issues, lines 69 and 70 of the python script “options_storage.py” were also changed to “THREADS = 32” and “MEMORY = 800” respectively, thus allowing assembly processes to successfully run to completion.

I hope this is of further use to others that might have similars issues during this process.

ADD REPLY

Login before adding your answer.

Traffic: 2386 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6