Question

Jellyfish for transcriptome assembly

0

Entering edit mode

7.8 years ago

SJ Basu ▴ 50

Hello,

I have 2X150 reads of plant transcriptome and would like to assemble it using oases/velvet pipeline but I need to provide a kmer length for which I was using jellyfish. Now my question is how do I estimate a "appropriate" value for -m option in jellyfish count ??

PS: I used -m 21 to estimate kmer size for 2X250 genomic data of a bacteria and used it to assemble in velvet, it worked wonder but is not working in this case.

RNA-Seq Assembly jellyfish K-mer velvet • 2.6k views

ADD COMMENT • link updated 7.8 years ago by Brian Bushnell 20k • written 7.8 years ago by SJ Basu ▴ 50

0

Entering edit mode

KmerGenie

KmerGenie estimates the best k-mer length for genome de novo assembly. Given a set of reads, KmerGenie first computes the k-mer abundance histogram for many values of k. Then, for each value of k, it predicts the number of distinct genomic k-mers in the dataset, and returns the k-mer length which maximizes this number. Experiments show that KmerGenie's choices lead to assemblies that are close to the best possible over all k-mer lengths. KmerGenie predictions can be applied to single-k genome assemblers (e.g. Velvet, SOAPdenovo 2, ABySS, Minia). However, multi-k genome assemblers (e.g. SPAdes, IDBA) generally perform better with default parameters (using multiple k values), rather than the single best k predicted by KmerGenie.

ADD REPLY • link 7.8 years ago by Medhat 9.7k

score 4 · Accepted Answer · 2016-07-20

4

Entering edit mode

7.8 years ago

Brian Bushnell 20k

For 2x150bp, depending on your coverage, I suggest you try a few values around K=60 to 100 and see which seems to give the best assembly. Methods of estimating the best kmer length for genomes do not work well on transcriptomes due to the highly variable coverage.

ADD COMMENT • link 7.8 years ago by Brian Bushnell 20k