Kmer selection for bacterial WGS denovo assembly using SPAdes or SOAP-denovo
1
0
Entering edit mode
5.0 years ago

Hi friends,

We have WGS data for a bacterial sample and the read length is 75bp (paired-end) with more than 200X coverage. Trimmed reads ranging between 20-75bp.

I am going to try denovo assembly using SPAdes3.13.1 and SOAP-denovo assemblers.

What criteria should be used to select the Kmers for assembly?

Assembly SPAdes Soap-denovo WGS bacterial • 2.4k views
ADD COMMENT
2
Entering edit mode
5.0 years ago
h.mon 35k

For SPAdes, it is best to let the assembler pick the kmer sizes automatically.

For SOAPdenovo, a good starting point is 2/3 of the maximal read length. There are some BioStars posts on the subject, by the way:

How To Choose The K Value Of Kmer In Soapdenovo?

Kmergenie k-mer estimate and multiple k-mers

using soap de novo assembly

Guidelines to choose K-mer size for De bruijn graph based assembly (2nd generation sequencing reads)?

ADD COMMENT
0
Entering edit mode

Hi h.mon, I am currently using SPAdes with default settings and auto is the default one for kmer. I have more than 430 million reads for a bacterial sample. Do you know any tool subset/downsample them to lesser coverage?

ADD REPLY
2
Entering edit mode

Don't subset, use digital normalization, which is a better technique to reduce coverage without loosing information. There are several packages which perform digital normalization, I use BBNorm (from BBTools package) when I need to.

If you really want to down-sample, you can use reformat.sh (from the same BBTools package). For example, to down-sample to 10% of the original reads:

reformat.sh samplerate=0.1 in=original.fastq out=downsampled.fastq
ADD REPLY
0
Entering edit mode

Hi h.mon, I am aiming to reduce the coverage from 10000X to 1000X. so in my case, I need to do digital normalization using BBNorm rather than reformat.sh(downsample).

bbnorm.sh in=reads.fq out=normalized.fq target=1000 min=30

What "min" is reasonable to get 1000X coverage?

ADD REPLY
0
Entering edit mode

I have asked a different set of questions in the post (Should I consider contigs.fa or scaffolds.fa from SPAdes output for downstream analyses?) that are related to this post

ADD REPLY
0
Entering edit mode

Hi h.mon,

using BBNorm, can we downsample to a specific read coverage? I saw the target option in BBNorm is about the kmer coverage. How much should I keep for the target option in order to get 100X read coverage?

ADD REPLY
1
Entering edit mode

As far as I can think of, one can't down-sample straight to a target read coverage without an assembled genome, so you have to content yourself with kmer coverage. Use target=100 and, after assembly, map the reads and check if you got the expected coverage, then adjust target as needed - but I expect it would be close enough. As reads may contain errors, I expect target=100 will end up with slight higher read coverage.

However, why do you want to do this? de Bruijin assemblers measure coverage in kmers, not reads.

ADD REPLY

Login before adding your answer.

Traffic: 2207 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6