what's the best kmer size for sailfish
7.5 years ago
Ann ★ 2.3k


Anyone know the best (or recommended) kmer size to use with sailfish and Arabidopsis thaliana RNA-seq data?

Background: To use sailfish, you first must create a transcript index using a fasta file of your transcripts. To make the index, you need to pick the kmer size.

Also if anyone needs a copy of Arabidopsis transcript fasta sequences let me know. I can give you the same for other plants if you want. It's very easy using getFasta from bedtools and a few other CL utilities.


6.0 years ago
Rob 5.0k

Now that Sailfish makes use of quasi-mapping rather than independent k-mer counting, the effect of the k parameter has a much different interpretation (and, generally, less significant effect). The answer to this question should depend primarily on the read length and quality that you're planning to use for quantification. If you're dealing with reads > 75bp and of "reasonable" quality, it's probably safe to stick with the default of k = 31. For shorter reads, you may want to reduce the value of k.


