Has Anyone Tried Rum For Aligning/Counting Illumina Rna-Seq Data?
6
6
Entering edit mode
11.2 years ago
Travis ★ 2.8k

The publication for this has just arrived: http://rna-seqblog.com/data-analysis/splicing-junction/rna-seq-unified-mapper-rum/

Based on speed and accuracy, it looks like a contender. Comparable to GSNAP in the accuracy stakes whilst being considerably faster.

I'm, wondering if anyone has trialled it yet? I ran a simulated data set through it just to see how it ran and it went both fast and smoothly. I am unsure as to how the read counts are calculated for closely related isoforms (i.e. how it distinguishes between them). I contacted the author about it but got no response.

Has anyone else looked into this? Can you offer any thoughts?

rna alignment • 3.8k views
0
Entering edit mode

I would also be interested in reading people's results on their trials

5
Entering edit mode
10.9 years ago
Greg ▴ 50

Hi, I'm the RUM developer. I also used our simulated data to benchmark cufflinks and scripture. Cufflinks had a False Positive rate around 99% while scripture had a FP rate around 99.9%. As such I would not use either algorithm, I would look for differential expression on the exon level and then try to drill down on those genes to figure out what is happening. The isoform expression problem is a holy grail, the popular solutions do not work, they in my opinion were just the ones that the authors were willing to hype. I think the problem is solvable but it's not there yet. Scripture is particularly bad because it is based on peak calling and tophat, instead of a decent aligner and junctions. I'll check back here if anybody has any further questions. Sorry if I didn't reply to all the emails, got a bit swamped after the paper came out. A new version should be out soon. Thanks for your interest. -Greg

3
Entering edit mode
11.2 years ago

I have, and it was pretty easy to install and use. The results looked good as well - the mapping rate was high compared to other tools I tried. I don't have an answer to your question about distinguishing between isoforms, though. Given that I did not have a "gold standard" data set to compare with (which is usually the case!), it's hard to say how good the expression values were, but at least they had good correlations to values obtained from TopHat/Cufflinks and CLC Bio.

1
Entering edit mode
11.2 years ago
Travis ★ 2.8k

I spoke to the author who told me:

RUM just sums the quantifications over the
exons which is really not reliable, however there is no good solution
to the transcript isoform problem...


He recommends working on the junction or individual exon level.

The software certainly does seem to install easily and run nicely - I recommend anyone working in this area have a look.

0
Entering edit mode

Hm, saying "no good solution to the transcript isoform problem" is a bit debatable; it is certainly possible to try to address it e. g. using expectation maximization such as those used by e. g. Cufflinks and Avadis, or setting up equation systems such as e. g. rQuant. All of those methods have their drawbacks (e. g. for Cufflinks, that you may get different results from run to run because of the Monte Carlo sampling approach used in the EM calculation) but that doesn't mean you should just give up.

0
Entering edit mode

Hm, saying "there is no good solution to the transcript isoform problem" is a bit debatable; it is certainly possible to try to address it e. g. using expectation maximization approaches such as those used by e. g. Cufflinks and Avadis, or trying to solve equation systems such as e. g. rQuant does. All of those methods have their drawbacks (e. g. for Cufflinks you may get different results from run to run because of the Monte Carlo sampling approach used in the EM calculation) but that doesn't mean you should just give up.

0
Entering edit mode

Good to know, though, that RUM uses the "exon union" method in the terminology of this review paper: http://www.nature.com/nmeth/journal/v8/n6/abs/nmeth.1613.html. By the way, this paper says that it can be shown that this way of quantification will underestimate the expression of alternatively spliced genes.

0
Entering edit mode
8.3 years ago
ruansun1983 ▴ 30

It is much more easier and friendly to install/use than tophat.

The status check of RUM is also very friendly designed. You can know very easily what it is doing and is there any error.

didn't test the performance yet, but will do it soon.

0
Entering edit mode

RUM require read1 and read2 files have exactly the same file size. So I have to refill trimmed reads with N after quality control with Prinseq, which is a stupid step.

I think it is not necessary. RUM author should consider to remove this restriction.