Question: Star Or Tophat?
15
gravatar for lkmklsmn
6.9 years ago by
lkmklsmn930
United States
lkmklsmn930 wrote:

Hi,
I am analyzing RNA seq experiment and I would like to hear what you guys think about the STAR and Tophat alignment programs. Which one do you prefer? Why? Pros and Cons of both of them.

rnaseq tophat alignment • 27k views
ADD COMMENTlink modified 6.6 years ago by super60 • written 6.9 years ago by lkmklsmn930
33
gravatar for Devon Ryan
6.9 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

STAR is better in most ways, from mapping accuracy to speed. The big caveat to STAR is that you need a good bit of RAM. For a nice objective look at STAR and other RNAseq aligners, I would recommend that you have a quick read through this recent and very thorough comparison from the RNA-seq Genome Annotation Assessment Project in Nature Methods (there's a similar comparison by the same collaboration for transcript reconstruction in the same issue).

BTW, the take-home message from that paper can probably be summed up from Figure 3 (the paper is open access, so this is a direct link): Mapping accuracy comparison from Engström et al. 2013

Edit: Have a look at IV's answer as well. I hadn't mentioned Gsnap, but I can also say that it's always produced very good results if you have an annotation (this seems to be confirmed in the review that I linked to).

ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Devon Ryan97k

TopHat2 (especially with annotations) looks quite good to me based on just that figure. I'll have to re-read the paper to remember what "partly correctly mapped means" and whether that could cause problems.

ADD REPLYlink written 6.9 years ago by brentp23k
1

Yeah, tophat2 is still a pretty good all around option. The biggest downside is how long it takes to run.

ADD REPLYlink written 6.9 years ago by Devon Ryan97k
3

On our architecture STAR can map 60 million reads in about 4 mins. We have had tophat 2 take about 2-6 days on the same data.

ADD REPLYlink modified 11 months ago by _r_am30k • written 6.6 years ago by Alastair Kerr5.3k
19
gravatar for IV
6.9 years ago by
IV1.3k
USA
IV1.3k wrote:

To my opinion, some of the most important pros and cons:

Tophat

Pros

  • Widely used + huge community to ask questions in fora
  • No fuss connections with cufflinks and any other Tuxedo pipeline tool
  • A great part of published results are based on this aligner and is widely accepted
  • Provides a ready to use junction file

Cons

  • Really slow response rates from the relevant helpdesk email
  • Doesn't do read clipping for partial read alignment (which is really useful in many scenarios)
  • Inner mate distance and sd have to be calculated beforehand for optimal performance

Star

Pros

  • Super fast
  • The latest versions get really good statistics in comparisons
  • Can do read clipping
  • It has a mode of output compatible with cufflinks
  • Provides a ready to use junction file

Cons

  • The first versions had many issues.
  • Not so many users as other aligners but I think that there is a strong user base, especially after ENCODE
  • I'll add in the list also GSNAP, which we also widely use in the lab

GSNAP

Pros

  • Always one of the best in any comparison article out there (usually 1st or 2nd)
  • Can handle partial matches with clipping and indels
  • Not that resource intensive
  • The creator really supports the aligner and answers really fast in emails
  • Gets splicing junctions really well
  • Has a mode compatible with cufflinks
  • Provides multiple sam output (concordant, halfmapping, paired, halfmapping, unique, multimapping, etc)

Cons

  • The last version (with the suffix array) is a lot faster than any previous version but still slower than Star [unless it's run within the ultrafast algorithm max allowed mismatches]
  • Not so many users as Tophat (even though you can get also really good feedback from Trinity users)
ADD COMMENTlink modified 5.7 years ago by Leonor Palmeira3.7k • written 6.9 years ago by IV1.3k
1

Forgot to mention: 1) in TopHat it's better to provide an estimation of mean mate inner distance and standard deviation, which needs some time to calculate. This has been a very frequent question in blogs and fora. From what I've seen so far, most people run with default settings. 2) In Tophat and Star you get an output file with the junctions but in GSNAP you have to run a script afterwards to get them. I know that it's not much of a fuss but it's one more step in the pipeline. 3) In GSNAP there is a superfast exhaustive mode that can be run when mismatches are equal or less than ((readlength+2)/kmer - 2). kmer is usually 15. From what I remember search is exhaustive within these settings and it runs in a small fractiion of the usual run time. I'll add those too to the list for completeness

ADD REPLYlink modified 6.9 years ago • written 6.9 years ago by IV1.3k

Forgot to mention GSNAP's persistent segmentation fault errors. I've used many different versions and each one eventually seg. faults or writes a faulty cigar string.

ADD REPLYlink written 5.0 years ago by informatics bot640
1
gravatar for Ming Tang
6.9 years ago by
Ming Tang2.6k
Houston/MD Anderson Cancer Center
Ming Tang2.6k wrote:

STAR is much faster than Tophat

ADD COMMENTlink written 6.9 years ago by Ming Tang2.6k
1
gravatar for swbarnes2
6.9 years ago by
swbarnes29.2k
United States
swbarnes29.2k wrote:

TopHat is more widely used, and if you need help with it, there are a lot more users who can help. (see how many people use the TopHat tag over the STAR tag)

ADD COMMENTlink written 6.9 years ago by swbarnes29.2k
1
gravatar for super
6.6 years ago by
super60
super60 wrote:

STAR is much faster than Tophat, but I don't know which result is more reliable. But I think both are OK

ADD COMMENTlink written 6.6 years ago by super60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1405 users visited in the last hour