Is Tophat The Only Mapper To Consider For Rna-Seq Data?
8
21
Entering edit mode
11.3 years ago
Lisa ▴ 330

Hi. This is probably a ridiculous question, but I'm just getting confused the more I read up on it.

Basically, when mapping reads generated by RNA-seq against a genome, is there a specific mapping software you should use? I thought that it was ok to use any of them, like either Tophat, BWA or bowtie, but I was reading other articles and posts that say only Tophat should be used on RNA-seq data. So is this the case?

I should probably point out that my data is single end and strand specific RNA-seq of a yeast species.

I just want to know if I'm doing things right, as I was going to go with BWA as it's given me a slightly higher mapping percentage than Tophat.

Thanks, I really appreciate any pointers you can provide.

bwa tophat rna-seq • 33k views
ADD COMMENT
0
Entering edit mode

I agree with all the excellent answers below. But, I was just curious. Which species of yeast are your working with? Apparently only a small percentage (3-4%) of S. cerivisiae genes are spliced. That could explain why you were "getting away" with a non-splice-aware aligner like bwa. Probably still want to interrogate those spliced genes with an aligner like tophat though. :-)

ADD REPLY
0
Entering edit mode

I was working with Candida parapsilosis. I went with Tophat in the end cause I wanted to be aware of the possible exon/intron junctions.

ADD REPLY
27
Entering edit mode
11.3 years ago
lh3 33k

First of all, bwa/bwa-sw is NOT aware of splicing. If you want to perform typical RNA-seq analysis, bwa/bwa-sw is not the right choice.

In addition to tophat, there are several other RNA-seq mappers that are supposed to have the same functionality as tophat. SOAP-splice, STAR and gsnap are among them and there are more. Some of them are claimed to be better than tophat, but I do not really know what is the best. Evaluating the performance of splice mapping on real data is very difficult.

For typical RNA-seq analysis (e.g. investigating expression levels, discovering new splice forms, etc), using a splice-aware mapping is preferred. However, for some special analyses, we would prefer a blast-like local aligner instead of a splice-aware aligner. Most splice-aware aligners introduce complex heuristics. When you find something wrong or interesting, it is not always easy to tell whether it is caused by artifacts or true biology. Simple ascertainment will give you more accurate picture. An example is Joe Pickrell's work on the large amount of unannotated splicing junctions. In that application, we would like to make sure most unannotated splicing are correct. Plain tophat mapping may not give us that level of confidence. I guess for RNA-editing and allele-specific expression, generic local alignment also plays an important role.

Splice-aware aligners are not the best choice for every application; tophat is not necessarily the best among splice-aware aligners. Choose your tools based on needs. There is no a single rule for everything.

ADD COMMENT
14
Entering edit mode
11.3 years ago
Kanne ▴ 450

I suspect the reason articles/papers/etc were saying that you should use Tophat is because it is a spliced read aligner which is a critical feature when aligning RNA-seq reads to a reference genome (Spliced Read Aligner means that it maps reads over exon/intron junctions). There are other aligners which do this too, but bowtie is not one of them (tophat actually uses bowtie to do it's alignment, but that's another story...)

Does BWA map reads over introns? I did a brief search but couldn't find anything to suggest it does. If it does not, you should consider carefully whether it is a good choice. Mapping RNA-seq reads without a way of mapping over junctions will result in exon 'islands' and will not map any read which spans an intron. Unless this really isn't a problem for the aim of your experiment (rare), you should go with a spliced read aligner like Tophat.

Personally, I use Tophat to map RNA-seq reads and BWA to map genomic DNA-derived reads (from ChIPseq).

Does this answer your question?

ADD COMMENT
0
Entering edit mode

Yeh it does answer my questions, thanks. I thought it was going to be something to do with that the intron junctions, I just wanted to make sure. Thanks.

ADD REPLY
0
Entering edit mode

Hi Kanne/Lisa

I have been working a lot with RNA seq data and from my analysis it has come out till now that bowtie if used alone as an aligner would never give you spliced alignments i.e. CIGAR string having "N". Now since RNA seq is all about splicing thing bowtie is out of question. TOPHAT uses bowtie but since it is a splice junction mapper gives accurate results. I still have to check on bwa. There are other mappers like HMMsplicer for RNA seq

ADD REPLY
9
Entering edit mode
11.3 years ago

TopHat is is the most popular mapper for RNA-seq data, but others exist. The RUM paper evaluated several:

ADD COMMENT
0
Entering edit mode

With GSNAP you need to parse the sam/bam file using bedtools bamToBed to get junctions file..

ADD REPLY
8
Entering edit mode
11.3 years ago
Fred ▴ 780

Other spliced reads aligners exist, you may also try STAR for example that seems to give quite good mappings.

ADD COMMENT
0
Entering edit mode

Cool, thanks for that, I'll look into it.

ADD REPLY
1
Entering edit mode

+multithreaded and very fast. ~3-4 mins on my server for 8M reads against hg19

ADD REPLY
1
Entering edit mode

Indeed, at the expense of more RAM (~30GB for human)

ADD REPLY
2
Entering edit mode

Yeah, although this can (and should) be loaded in to shared memory so you can align multiple datasets with only the one hit to RAM usage. I find it so fast that it is worth it. Haven;t thoroughly compared it to TopHat yet in terms of results but early indications from my work and from some other people I know testing it is that it does at least as good of a job.

ADD REPLY
6
Entering edit mode
11.1 years ago

Another splice-aware alignment tools that was recently published is:

CRAC: an integrated approach to the analysis of RNA-seq reads

Provisional abstract: "A large number of RNA-sequencing studies set out to predict mutations, splice junctions or fusion RNAs. We propose a method, CRAC, that integrates genomic locations and local coverage to enable such predictions to be made directly from RNA-seq read analysis. A k-mer profiling approach detects candidate mutations, indels and splice or chimeric junctions in each single read. CRAC increases precision compared with existing tools, reaching 99:5% for splice junctions, without losing sensitivity. Importantly, CRAC predictions improve with read length. In cancer libraries, CRAC recovered 74% of validated fusion RNAs and predicted novel recurrent chimeric junctions. CRAC is available at http://crac.gforge.inria.fr."

ADD COMMENT
0
Entering edit mode

Yes CRAC is super accurate and very sensitive. Way better thant Tophat!

ADD REPLY
2
Entering edit mode
11.3 years ago
kun ▴ 180

Hi, Of course not, SOAP-splice can also do this work. However, Tophat is the most popular software because it can handle gene splice and easy to use. If you use bwa directly, alternative splice will be ignored. If you want to get a higher mapping percentage, you can use some parameters in bowtie, such like --very-sensitive. And tophat2 should be better than tophat. I hope this can help you.

ADD COMMENT
0
Entering edit mode

thanks. I used the --very-sensitive parameter too. And I am also using tophat 2, sorry meant to say that.

ADD REPLY
0
Entering edit mode

Hi Kun

Do you mean to say that bowtie can alone be used for RNA seq data as an aligner (without TOPHAT). Because i used bowtie2 alone as an aligner and did not get spliced reads for RNA seq(no N in the cigar string).

ADD REPLY
0
Entering edit mode

bowtie or bowtie2 ARE NOT splice junction aware.

ADD REPLY
2
Entering edit mode
11.0 years ago
Shaojiang Cai ▴ 100

I would recommend RNA-seq blog to you. It traces all publications regarding RNA-seq. You can find most mapping tools there.

ADD COMMENT
1
Entering edit mode
9.1 years ago
n124080 ▴ 10

Recently, there is a new tool called StringTie, they published the paper in NATURE BIOTECHNOLOGY 18 February 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

In our lab, there are some discussion about StringTie and Tophat. But we haven't made the final decision.

ADD COMMENT
2
Entering edit mode

I think you might want to look into HISAT (http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3317.html). As NicoBxl pointed out, StringTie isn't a spliced aligner; however, HISAT may serve your purposes well.

ADD REPLY
1
Entering edit mode

StringTie is a transcriptome assembly tool. not a mapper

ADD REPLY

Login before adding your answer.

Traffic: 2813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6