A paper for RNAseq alignment-free isoform quantification, with the software called sailfish, was recently posted to arXiv. Discussion here, here and here. The method is "18 to 29 times faster than the next fastest method while providing expression estimates of equal accuracy" by using counts of k-mers instead of mapping reads.
Looks promising - any thoughts or experience with this? How quickly (if at all?) should we be switching from our current expression quantification pipelines?
Well, I tried it out! It still looks promising. I just ran it on some yeast data and compared it a bit to manually calculated RPKMs and some from cufflinks. It was really fast and the RPKMs look good -- cufflinks is the oddball here, while the sailfish RPKMs are closer to the manually calculated ones (from tophat counts with uniquely aligned reads). The bias correction looks a bit odd for one sample, so I'd look into that a bit more before I used that.
I assume the researchers we work with might feel more comfortable using something like this once it's actually published and everything, but it seems worth a try to me. Especially for cases where people are interested in a fairly simple known gene expression type of analysis, this could offer a nice (quick) quantification.
Of course, anytime you switch to a new method, it may be good to do both for awhile just so you notice if anything is really different.
Edited to add -- The following was my fault, do not worry about it. Note: I initially tried it on some human data and building the index was taking too long, so I switched to the yeast data set. Perhaps I was being too polite with our shared server and it would have sped up more if I asked for more threads. But I let it run for 5 days or so before I gave up. Of course, this step only has to be done once for a given transcriptome and k-mer size.