I am analyzing RNA seq experiment and I would like to hear what you guys think about the STAR and Tophat alignment programs. Which one do you prefer? Why? Pros and Cons of both of them.
STAR is better in most ways, from mapping accuracy to speed. The big caveat to STAR is that you need a good bit of RAM. For a nice objective look at STAR and other RNAseq aligners, I would recommend that you have a quick read through this recent and very thorough comparison from the RNA-seq Genome Annotation Assessment Project in Nature Methods (there's a similar comparison by the same collaboration for transcript reconstruction in the same issue).
BTW, the take-home message from that paper can probably be summed up from Figure 3 (the paper is open access, so this is a direct link)
Edit: Have a look at IV's answer as well. I hadn't mentioned Gsnap, but I can also say that it's always produced very good results if you have an annotation (this seems to be confirmed in the review that I linked to).
TopHat2 (especially with annotations) looks quite good to me based on just that figure. I'll have to re-read the paper to remember what "partly correctly mapped means" and whether that could cause problems.
The last version (with the suffix array) is a lot faster than any previous version but still slower than Star [unless it's run within the ultrafast algorithm max allowed mismatches]
Not so many users as Tophat (even though you can get also really good feedback from Trinity users)
ADD COMMENT
• link
updated 22 months ago by
Ram
37k
•
written 9.0 years ago by
IV
★
1.3k
1
Entering edit mode
Forgot to mention:
in TopHat it's better to provide an estimation of mean mate inner distance and standard deviation, which needs some time to calculate. This has been a very frequent question in blogs and fora. From what I've seen so far, most people run with default settings.
In Tophat and Star you get an output file with the junctions but in GSNAP you have to run a script afterwards to get them. I know that it's not much of a fuss but it's one more step in the pipeline.
In GSNAP there is a superfast exhaustive mode that can be run when mismatches are equal or less than ((readlength+2)/kmer - 2). kmer is usually 15. From what I remember search is exhaustive within these settings and it runs in a small fractiion of the usual run time.
I'll add those too to the list for completeness
ADD REPLY
• link
updated 22 months ago by
Ram
37k
•
written 9.0 years ago by
IV
★
1.3k
0
Entering edit mode
Forgot to mention GSNAP's persistent segmentation fault errors. I've used many different versions and each one eventually seg. faults or writes a faulty cigar string.
TopHat is more widely used, and if you need help with it, there are a lot more users who can help. (see how many people use the TopHat tag over the STAR tag)
After 7 years I would say
STAR