Star Or Tophat?
5
15
Entering edit mode
10.3 years ago
lkmklsmn ▴ 970

Hi,

I am analyzing RNA seq experiment and I would like to hear what you guys think about the STAR and Tophat alignment programs. Which one do you prefer? Why? Pros and Cons of both of them.

rnaseq alignment tophat • 33k views
ADD COMMENT
1
Entering edit mode

After 7 years I would say STAR

ADD REPLY
33
Entering edit mode
10.3 years ago

STAR is better in most ways, from mapping accuracy to speed. The big caveat to STAR is that you need a good bit of RAM. For a nice objective look at STAR and other RNAseq aligners, I would recommend that you have a quick read through this recent and very thorough comparison from the RNA-seq Genome Annotation Assessment Project in Nature Methods (there's a similar comparison by the same collaboration for transcript reconstruction in the same issue).

BTW, the take-home message from that paper can probably be summed up from Figure 3 (the paper is open access, so this is a direct link) Mapping accuracy comparison from Engström et al. 2013

Edit: Have a look at IV's answer as well. I hadn't mentioned Gsnap, but I can also say that it's always produced very good results if you have an annotation (this seems to be confirmed in the review that I linked to).

ADD COMMENT
0
Entering edit mode

TopHat2 (especially with annotations) looks quite good to me based on just that figure. I'll have to re-read the paper to remember what "partly correctly mapped means" and whether that could cause problems.

ADD REPLY
1
Entering edit mode

Yeah, tophat2 is still a pretty good all around option. The biggest downside is how long it takes to run.

ADD REPLY
3
Entering edit mode

On our architecture STAR can map 60 million reads in about 4 mins. We have had tophat 2 take about 2-6 days on the same data.

ADD REPLY
19
Entering edit mode
10.3 years ago
IV ★ 1.3k

To my opinion, some of the most important pros and cons:

Tophat

Pros

  • Widely used + huge community to ask questions in fora
  • No fuss connections with cufflinks and any other Tuxedo pipeline tool
  • A great part of published results are based on this aligner and is widely accepted
  • Provides a ready to use junction file

Cons

  • Really slow response rates from the relevant helpdesk email
  • Doesn't do read clipping for partial read alignment (which is really useful in many scenarios)
  • Inner mate distance and sd have to be calculated beforehand for optimal performance

Star

Pros

  • Super fast
  • The latest versions get really good statistics in comparisons
  • Can do read clipping
  • It has a mode of output compatible with cufflinks
  • Provides a ready to use junction file

Cons

  • The first versions had many issues.
  • Not so many users as other aligners but I think that there is a strong user base, especially after ENCODE
  • I'll add in the list also GSNAP, which we also widely use in the lab

GSNAP

Pros

  • Always one of the best in any comparison article out there (usually 1st or 2nd)
  • Can handle partial matches with clipping and indels
  • Not that resource intensive
  • The creator really supports the aligner and answers really fast in emails
  • Gets splicing junctions really well
  • Has a mode compatible with cufflinks
  • Provides multiple sam output (concordant, halfmapping, paired, halfmapping, unique, multimapping, etc)

Cons

  • The last version (with the suffix array) is a lot faster than any previous version but still slower than Star [unless it's run within the ultrafast algorithm max allowed mismatches]
  • Not so many users as Tophat (even though you can get also really good feedback from Trinity users)
ADD COMMENT
1
Entering edit mode

Forgot to mention:

  1. in TopHat it's better to provide an estimation of mean mate inner distance and standard deviation, which needs some time to calculate. This has been a very frequent question in blogs and fora. From what I've seen so far, most people run with default settings.
  2. In Tophat and Star you get an output file with the junctions but in GSNAP you have to run a script afterwards to get them. I know that it's not much of a fuss but it's one more step in the pipeline.
  3. In GSNAP there is a superfast exhaustive mode that can be run when mismatches are equal or less than ((readlength+2)/kmer - 2). kmer is usually 15. From what I remember search is exhaustive within these settings and it runs in a small fractiion of the usual run time.

I'll add those too to the list for completeness

ADD REPLY
0
Entering edit mode

Forgot to mention GSNAP's persistent segmentation fault errors. I've used many different versions and each one eventually seg. faults or writes a faulty cigar string.

ADD REPLY
2
Entering edit mode
10.3 years ago
Ming Tommy Tang ★ 3.9k

STAR is much faster than Tophat

ADD COMMENT
1
Entering edit mode
10.3 years ago

TopHat is more widely used, and if you need help with it, there are a lot more users who can help. (see how many people use the TopHat tag over the STAR tag)

ADD COMMENT
1
Entering edit mode
10.0 years ago
super ▴ 60

STAR is much faster than Tophat, but I don't know which result is more reliable. But I think both are OK

ADD COMMENT

Login before adding your answer.

Traffic: 1833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6