Forum: Best RNA-Seq aligner: A comparison of mapping tools
gravatar for David Langenberger
22 months ago by
David Langenberger9.5k wrote:

RNA-Seq has replaced microarrays for many applications in the area of biomarker discovery. The prices have been fallen substantially in recent years. The sequence data allows to extract more information than gene expression only. And there is no requirement that a reference genome must exist. However, the analysis of the resulting data is much more challenging and requires more ressources than other approaches.

So the question is: How to choose between the available analysis tools? read more ...

enter image description here

ADD COMMENTlink written 22 months ago by David Langenberger9.5k

Interesting! Is there a reason your comparison is missing Tophat(2) and Hisat?

Also, I believe one of the ten commandments was "thou shalt not chop the y-axis to visually inflate differences which are actually minor" (referencing to bar chart of on-target hits)... ;-)

ADD REPLYlink written 22 months ago by WouterDeCoster44k

We actually chopped the y-axis to visualise these minor differences, since people might be interested in exactly these. I think it is pretty clear that the y-axis is chopped and that the differences are minor. But thanks for highlighting it. We will add an extra note to the plot, to assure that the graph is not misleading.

Regarding Tophat(2) and Hisat: There are many other tools that are also missing. This is on-going work. We encourage all readers to add a particular NGS mapper to this comparison. Please feel free to download the test set, do the analysis, send us the results and we will add it to the benchmarking. A detailed explanation on how to run the benchmark can be found here (goto 3. Read aligner comparison)

ADD REPLYlink written 22 months ago by David Langenberger9.5k

True. When I just scroll through your post I see a large difference between e.g. bwa-mem and STAR, but indeed when I spend more attention it is clear that the y-axis is chopped. This is just my opinion, but I'd say that plots have to be visually clear immediately. And in that context, the on-target hits is maybe not even worth showing given that they're all that similar.

I mentioned TopHat because it has been extremely widely used, and while superseded by other aligners is still one of the most used tools in 'high-impact publications'. It's more or less the "standard" go-to tool. We have been pushing back on people using TopHat ad infinitum, but it would be interesting to see if that's justified.

ADD REPLYlink written 22 months ago by WouterDeCoster44k

I completely see your point, but sometimes it is of high interest to detect the 2% which can not be found by some mappers. 2% of some billion reads is actually quite a number. And in these cases, I would go for a special mapping tool? So, I think it is important to highlight these minor differences and not hide them by not chopping the axis. But that is just my opinion and I'm in 99% of cases absolutely with you.... please do not chop the axis.

ADD REPLYlink written 22 months ago by David Langenberger9.5k

In addition to Wouter's comment, I find these types of reports misrepresentative of the tools that you are using. Only 1 sample was used in your report, for example, whereas these tools were designed using different types of input data. I could likely do my own report and flip all of your findings the other way. You have neither shown the exact parameter configurations of your commands used for each tool. Each has defaults that differ.

As a practical example: I recently saw a report on variant callers from the Broad / Google group. They naturally found that GATK and DeepVariant were 'better' than other variant callers. From my experience, for germline single nucleotide variants, samtools/bcftools mpileup has higher sensitivity/specificity to the gold standard than all others. For indels, this is not the case.

The point is that each tool has its own niche area, some overlapping of course.

A disclaimer: I am neither currently nor never have been involved in any way in the development of any of these listed tools

ADD REPLYlink modified 22 months ago • written 22 months ago by Kevin Blighe67k
  1. If you follow the links, you find the exact parameters for every single tool.
  2. The question we asked is pretty straight forward: 'If a full sensitive mapper finds a hit for a read, is a mapping tool able to locate a hit at the same position.' Regardless of the design of the tool, it should be able to find it, no?
  3. Feel free to redo the analysis and show with your own dataset that the graph look highly different. I'm curious actually.
ADD REPLYlink modified 21 months ago by RamRS30k • written 22 months ago by David Langenberger9.5k

Thank you for elaborating on my comments, David.

ADD REPLYlink written 22 months ago by Kevin Blighe67k

And the discussion with n=1 we already had. I see your point. :) But still, all scripts and calls are given. We really really encourage all curious researchers to do the benchmarking themselves, since you really start to understand these mapping tools, when you do that. Based on experience, most non-bioinformaticians just take a tool that was used in a publication and use it with default parameters. That's in many cases just wrong.

ADD REPLYlink written 22 months ago by David Langenberger9.5k

most non-bioinformaticians just take a tool that was used in a publication and use it with default parameters

Yes, that is correct, and not a good practice of course. I appreciate your efforts with your report, in this regard. I should add that a good report is one that encourages discussion / debate :)

ADD REPLYlink written 22 months ago by Kevin Blighe67k

I think there is no perfect benchmark possible. And all tools mentioned are really good. We use all of them during courses and/or in-house.

And, of course, you’re absolutely right: the discussions here are awesome and always very fruitful. But that’s due to the fact that this forum is full of highly intelligent people and real experts in this field. That’s why I spend quite some time here. You always learn something new. Even when you thought you know everything in a specific field, there are people to show you better ways to do something.

ADD REPLYlink modified 22 months ago • written 22 months ago by David Langenberger9.5k

If you follow the links, you find the exact parameters for every single tool.

Perhaps I am blind but I am not sure where these are on the blog page?

And all tools mentioned are really good.

If that is the case what is the reason for publishing this comparison?

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax92k
  1. You have to follow the links. :) If you click on the ‚here‘ of ‘Please find more information in the benchmark details here.’, you will be forwarded to another page and at the bottom of this is a link to the proceedings. There is everything then: scripts, calls, datasets, etc.

  2. Sometimes you need a tool that is very sensitive, sometimes you need a tool that is very fast, sometimes you need a tool that is very memory efficient ... there might be some situations where you might actually want to use the results of such a benchmark to make a decision.

At least, it makes some people think about the differences of mappers. I think it is an interesting benchmark. Is it a perfect benchmark? No.

ADD REPLYlink modified 21 months ago by RamRS30k • written 22 months ago by David Langenberger9.5k

If I follow the links the page I reach is this. One link I see on there is for a paper from 2014. Other takes me to this page, which does not appear to contain information about the current study published here (I am only seeing STAR, blat and segmehl command lines). Would you mind providing a direct link if I am overlooking it?

Benchmark serves as a standard by which other things are measured. None of the tools here can be considered a benchmark. So I feel that while very valuable this is just a comparison of aligners. As you mention above people will need to make a choice based on their use specific case and requirements.

One point that is missing from your comparison is information on ability of tools to run on multiple platforms. BBMap for example is able to run on Windows, macOS and unix as long as Java is available. How many of the other tools are able to do that? Also these tools are written various languages, use different methodologies and as a result they are going to have inherent differences in their resource use/execution times.

Disclaimer: I am not a tool developer and am not associated with development of tools mentioned. I am a trainer (just like you) and OS/program agnostic as far as practical. I do use a lot of BBTools.

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax92k

The point with the different languages and platforms is a good one. I always think everyone is using Linux, but obviously for quite a number a Windows solution is of interest. We should add this information to our next update.

And also the exact calls of the tools we compared. It should be directly on the page. It’s on the todo list. Thanks for the input.

ADD REPLYlink written 22 months ago by David Langenberger9.5k

Just another benchmark to add

enter image description here

Baruzzo G, Hayer KE, Kim EJ, Di Camillo B, FitzGerald GA, Grant GR. (2016) Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat Methods.

ADD REPLYlink written 22 months ago by Lars760

STAR looks similar and Tophat2 not so good. Now I'm curious about Tophat2, too. :)

ADD REPLYlink written 22 months ago by David Langenberger9.5k

Authors of Tophat begged to stop using it. It's a very outdated tool.

ADD REPLYlink written 22 months ago by predeus1.4k

I have definitely seen this before, and I feel kind of bad always tagging a comment to this (but I think the counter-point is kind of important).

With all due respect, I strongly disagree about the conclusion about not using TopHat. In terms of the general idea, I think you need to plan to have at least some testing for each project (most likely, requiring a substantial amount of time for analysis and discussion - and, hopefully, discovering something new and unexpected). However, in terms of the less controversial topic of how TopHat can be useful:

  1. I know of at least one study (currently unpublished, as far as I know), where I needed to use TopHat to recover a known splicing event

  2. I've had situations where the more conservative alignment and/or unaligned reads from TopHat were useful

  3. For gene expression, if I see something potentially strange with a TopHat alignment, I have yet to find an example where the trend was noticeably different (there probably are some examples for some genes, but I would guess the gene expression quantification with htseq-count are usually similar for TopHat and STAR single-end alignments for most genes)

(please note that I copied this from A: Tophat problem - R1 and R2 reads mapped to different strandeds of viral genome )

You might also want to check out Tophat2/Bowtie failing to align some examples of partial intron retention / exon loss

To be fair, when I was in Michigan, I essentially recommended people to use minfi over COHCAP. However, I think I was being unfairly hard on myself (although there are a lot of post-2016 updates, when I returned to City of Hope). While I am not necessarily saying this is not representative of Lior's current opinion, it is important to keep in mind that opinions can change over time (and it is important to take time to develop your own opinion, especially in the context of your own dataset).

Also, I think an important positive note is that you can still have some amount of support for popular open-source software through discussions like this forum. In other words, even if the software is not officially supported, it can still be very useful (and I would have a problem if this software was not continued to be made freely available to download, even without direct support from the developer).

ADD REPLYlink modified 21 months ago by RamRS30k • written 22 months ago by Charles Warden7.9k

Each tool will definitely have situations in which one is a better choice than the others, but I don't think most users make this comparison. At the same time it is pointless to do this comparison over and over again while comprehensive benchmarks have been published.

What we generally try to push back against is new users blindly picking Tophat for their project, only because it has so many citations and is part of so many tutorials.

ADD REPLYlink written 22 months ago by WouterDeCoster44k

If people encounter problems, it is very important to know that other methods are available for troubleshooting.

Not spending enough time on a project can be a serious issue, and I believe people need to gradually expand their skill sets (and plan for a substantial amount of time for analysis of high-throughput data).

So, I think performing some of your own benchmarks is important to help gain an intuition about the analysis (and what results are most robust for your particular dataset - or, if there are differences, trying to understand the underlying cause of those differences).

In other words, in the most polite way possible, I disagree with the statement "it is pointless to do this comparison over and over again while comprehensive benchmarks have been published," but I think I otherwise agree with you (in saying you shouldn't immediately use the first result you get in a publication).

ADD REPLYlink written 22 months ago by Charles Warden7.9k

I think there is a difference between using tophat to detect some very special splice sites in very special cases (and probably loosing a lot of others) and tophat being still one of the most cited tool (with knowing that all these people using it will most likely loose a lot of splice sites).

ADD REPLYlink written 22 months ago by David Langenberger9.5k

I believe this paper (or at least some figures) were discussed using an IGC Bioinformatics meeting.

Figure 3 is for the "T3" sequences with more mutations - the T1 and T2 are probably more typical for your experiment.

While I have seen some situations where more divergent sequence aligned with STAR but not TopHat, that could be either a "bug" or a "feature" depending upon your project. For example, if that divergent sequence is from a not 100% pure mouse Xenograft (or some other unintended sequence), perhaps noticing the different mutation / splicing results could help you see the need to revise your analysis strategy (such as having a joint genome reference, where at the locus where the different alignments actually was more similar to the TopHat alignment, or the desired species). On the flip side, if you were using a well annotated genome for analysis in another species, perhaps that would be a disadvantage.

In terms of gene expression (and overall alignment), I think Supplemental Figure 4 T1/T2 matches my experience that the TopHat alignment is usually OK. However, it's also possible that something else about the simulated data may be affecting performance.

Either way, it is extremely important to take your time and review your data carefully before publication (with several iterations of analysis and discussion). I would say this should involve some testing of different methods for each project. However, understanding the concept that the TopHat alignments may be less mutation tolerant (with default parameters) is worth considering.

P.S. If you happened to look at this before my last update, I did change the Supplemental Figure that I link to (Supplemental Figure 11 looks similar, but represents something different).

ADD REPLYlink modified 22 months ago • written 22 months ago by Charles Warden7.9k

Sorry, but I don't understand your arguments.. The main difference for well established genomes and transcriptomes would be speed. And Tophat2 is so, so slow. As far as other things you mentioned,

1) In STAR, you can provide GTF file when generating reference, so when if you really want that known splice site to be there, it will be.

2) Having a more conservative (which is not a clearly defined term) alignment is a matter of adjusting options, not changing aligners. And both hisat2 and STAR can save un-aligned reads separately.

3) Seeing something "strange" is really not a good way to assess an alignment program in general. Simulated datasets are probably the most consistent and reproducible way. And htseq-count should also not be used, it doesn't count multimapping reads, and is also very slow. If you insist on simple quantification, featureCounts would give the same result a lot faster.

ADD REPLYlink modified 22 months ago • written 22 months ago by predeus1.4k

With 50 bp SE reads, I don't think the speed difference isn't so bad (at least if limited to 4 cores and 8 GB for each method). However, I think speed is more of an issue with PE reads.

1) In this situation, there was an event with an exogenous sequence, not in the reference genome. It seems like you could potentially still see the difference in frequencies with the difference in annotated frequencies (which is essentially what I did, but I believe there was some sort of issue with that).

2) I think there is value in having multiple open-source aligners (and, if TopHat2 was free, and STAR/HISAT cost money, I would say TopHat2 is good enough to start with for many projects - although I am very thankful that is not the case). I suspect that there situations where you would not get completely identical results that can be achieved with different parameters for different programs, but you have a good point about taking time to become more familiar with becoming more familiar with the parameters available for the different programs.

3a) I think one issue about the simulations is whether they are realistic for what is typically encountered in actual samples. Also, I suspect no one strategy for simulating data will work equally well for all projects.

3b) I seem to remember there was something a little different about PE versus SE in reads (for htseq-count versus featureCounts, although maybe I am thinking about exonic rather than gene quantifications). For SE reads, you can use featureCounts to get faster unique reads (however, if you are quantifying samples in parallel, I don't think this is a huge issue).

3c) I would typically be choosing to use htseq-count (or the most similar set of parameters for featureCounts) with the intention of focusing on the unique reads. While I do consider use of a transcriptome quantification as a troubleshooting option, I think having replicates (which I would always recommend) can be useful if transcripts with less accurate assignments of multi-mapped reads show more variability between replicates (so, if the multi-mapped read assignments are less robust, presumably the p-values will be higher; I would argue that is good for helping filter some false positives at either the gene or transcript level, but that would also be an example where it may be better to use unique reads for a gene quantification).

ADD REPLYlink modified 22 months ago • written 22 months ago by Charles Warden7.9k

I think many people (including myself, until I read it more carefully) may be misinterpreting that tweet:

Strictly speaking, Lior is talking about TopHat1 (he is even saying TopHat2 is greatly improved over TopHat1).

While I still think there is value in having the code for previous versions of TopHat (such as TopHat1), the tweet is not exactly part of the TopHat2 versus STAR versus HISAT/HISAT2 debate.

I have tried to collect my thoughts in the matter here:

ADD REPLYlink modified 22 months ago • written 22 months ago by Charles Warden7.9k
gravatar for Gordon Smyth
21 months ago by
Gordon Smyth2.0k
Gordon Smyth2.0k wrote:

Subread is another key tool that should at least be included in the comparison, and might well win it. See our recent paper:

Liao et al (2019). The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Research

The comparisons in the article are with TopHat2 and STAR, but our unpublished comparisons show that subread outperforms HISAT2 as well on the same tests.

The article focuses on the R package implementation of subread, but the same performance metrics would hold for the unix command line version.

ADD COMMENTlink modified 21 months ago • written 21 months ago by Gordon Smyth2.0k

I don't think it is quite right to think of there being an overall "winner" - I think each user needs to be able to develop their own opinion (in the context of their own experiment). So, I believe comparisons like this can be helpful, but I would still say there needs to be several rounds of analysis / discussion for each project to critically assess the results (which should probably involve at least some methods testing for every project).

However, thank you for sharing the link, and I certainly think it is helpful to have Rsubread as an option!

ADD REPLYlink written 21 months ago by Charles Warden7.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1712 users visited in the last hour