How To Build A Basic Rna-Seq Pipeline
3
5
Entering edit mode
12.8 years ago
Travis ★ 2.8k

Hi all,

I've been looking around and there doesn't seem to be much information on the development of RNA-Seq pipelines for differential expression analysis. I am about to start work on setting up a basic skeleton pipeline.

As a very high level overview, how do the following steps look? Can anyone comment on/add/remove steps? I have also added some questions regarding the steps to aid in my own understanding.

1) Align reads to genome using Tophat/Bowtie
(perhaps use the new Tophat-fusion to find fusion transcripts? I also guess it is important to ensure that the genome we use will match our preferred annotation source downstream e.g. if we prefer Ensembl, we should use NCBIv37 rather than hg19 to ensure consistency in chromosome names?)

2) Mark/remove duplicate reads.

3) Use Cufflinks to assemble transcripts.

4) Run Cuffdiff to assess differential expression.

5) Annotate transcripts
(unsure on how exactly this is done - can anyone comment? For example, what program might be used and what happens when we attempt to annotate novel-spliced or fully novel transcripts? Will these be recognised somehow?)

6) At this point I guess we have an annotated matrix that could be used in next gen or classical visualisation programs? Any suggestions on how to view?

next-gen sequencing rna gene • 15k views
ADD COMMENT
6
Entering edit mode
12.8 years ago

Some comments on your steps:

1) TopHat is fine, but Bowtie (or BWA) only makes sense if you are mapping directly against the transcriptome (IMO). Mapping against the transcriptome may be a good idea for many applications, although mapping against the genome is much more common and I haven't seen any in-depth comparison of which one is more sensitive/specific. Apart from TopHat, there are other good spliced mapping methods such as MapSplice, SpliceMap and (especially) RUM.

Yes, you should take care to keep your reference genome and annotation "in sync".

2) Yes, at least for paired-end it's a good idea to remove duplicates.

3) In my opinion, assembling the transcripts with Cufflinks only makes sense if you don't have a good annotation. If you are sequencing human RNA, I would just run Cufflinks with a GTF file to quantify the expression of annotated transcripts. If you want to run DESeq or other count-based methods for differential expression later, you would use HTSeq or something similar here instead of Cufflinks.

4) Cuffdiff is probably OK, or you could use e g DESeq, which uses counts.

5)-6) Not sure I understand the questions.

ADD COMMENT
1
Entering edit mode

@Travis: BWA does gapped alignment, but the gaps are on the order of 1-10 bp; BWA does not handle gaps the size of introns. You need to use a splice-aware aligner when aligning to the genome. See my answer and Mikael's above for some aligner suggestions.

ADD REPLY
0
Entering edit mode

Why do Bowtie or BWA only make sense if mapping to the transcriptome?

ADD REPLY
0
Entering edit mode

The genome has gaps between the exons and bowtie and bwa cannot map a read that crosses those gaps.

ADD REPLY
0
Entering edit mode

But BWA does do gapped alignment, doesn't it?

ADD REPLY
0
Entering edit mode

Thanks for that. Off the cuff, it makes me wonder why anyone would use bowtie for RNA-Seq!

ADD REPLY
0
Entering edit mode

The more I think about this, the more I have to ask - do aligners like Bowtie/Eland discard intron-spanning reads?

ADD REPLY
0
Entering edit mode

yes, but Solexa reads used to be shorter so the junction spanners were not common when ~30bp. software is always fighting the last war.

ADD REPLY
2
Entering edit mode
12.8 years ago

See this blog post for a quick start. See this question and consider alternative alignment algorithms. Consider reading some of these review articles.

ADD COMMENT
0
Entering edit mode

can you walk us through your GSNAP-based pipeline?

ADD REPLY
0
Entering edit mode

GSNAP is used for the alignment step only. After that, the workflow can be similar to those used for tophat or any other aligner and could include the cufflinks suite, DESeq, etc.

ADD REPLY
0
Entering edit mode

how do you divvy up hits to different overlapping transcripts?

ADD REPLY
2
Entering edit mode
12.8 years ago

SEQanswers.com have published an interesting post for a basic pipeline than you can refine later, I forgot the link to the post but found the guide on RNA seq blog

CLICK HERE

Hope that help

Radhouane

ADD COMMENT

Login before adding your answer.

Traffic: 2641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6