As always there are currently more pipelines and tools available for quantification of gene/transcript expression than ever before.
What has changed is that there are many different types of algorithms and strategies that affect how data are aligned (alignment / pseudoalignment / lightweight alignment, etc) and analyzed (results compatible or not with tools for general use).
For instance, we have numerous crossroads, with significant impact:
A. Gene expression: Gene counts vs Transcript-level quantification (and summation for each gene)
B. Transcripts: De novo vs reference transcriptome..
C. Alignment vs the new and super-fast kmer approaches (Sailfish, Kallisto, RNA-Skim, Salmon)
and so on..
The available tools and modes for each tool are many:
- Count-based methods (e.g. HT-Seq): counts over gene region / counts spanning only exons in gene region / counts over meta-transcripts, etc...
- Transcript Quantification methods: bitseq, cufflinks, express, isoem, rsem, etc.
- Kmer based approaches (such as those mentioned in (C) of the previous list)
N. Hybrid approaches (e.g. use of salmon with aligned reads)
Of course, then there is the issue of how are the results for DE analysis used (downstream): incorporation in a "standard" pipeline (e.g. DESeq, edgeR, limma) or in the accompanying tool (if available): e.g sleuth for kallisto or bitseq's own DE pipeline and so on.
Since I'm currently using (mostly for keeping up to date) many of these tools and combinations, I was wondering what are the weapons of choice for fellow biostars now?
What is currently your favorite combination for fundamental tasks, including:
- Gene expression quantification for the identification of DE genes
- Transcript quantification for DE transcripts
- Source of Reference Transcriptome (which of course affects everything) //
Of course, (as I do), many people have more than one combination to suit different needs (highest accuracy, speed, poorly annotated species, etc).
As a reference, for interested colleagues there are great relevant biostar discussions, such as:
- Transcript to gene level count for DEseq(2) use- Salmon/Sailfish/Kallisto etc. or
- Can Kallisto be followed by DESeq, EdgeR or Cuffdiff? on whether you can/should use standard DE tools following a kmer expression estimation approach or
- Using rna-seq, why map reads to the genome but calculate gene expression by aligning to the transcriptome? on what is lost when aligning against the transcriptome
and publications, including the recent:
And of course the most updated comparisons are in the most recent publications including:
- Sailfish: http://www.nature.com/nbt/journal/v32/n5/full/nbt.2862.html
- RNA-Skim: http://bioinformatics.oxfordjournals.org/content/30/12/i283.full
- Kallisto: http://arxiv.org/abs/1505.02710v2 [v2. pre-submission]
- Salmon: http://biorxiv.org/content/early/2015/06/27/021592 [pre-submission]
- BitSeq: I'll throw in the comparisons performed in the upcoming BitSeq paper (rsem, bitseq, express, callisto, casper, sailfish, cufflinks and tigar2) http://arxiv.org/pdf/1412.5995v3.pdf [pre-submission]