Rna-Seq Pipeline
8
45
Entering edit mode
11.1 years ago
brentp 23k

So, there're papers on designing an RNA-seq experiment, and normalizing the data (Bullard et. al and the recent Genetics paper are good reads) but what do folks do for the actual pipeline.

I'm looking at

1. filter on quality. (what are your quality/parameter cutoffs?)
2. any other pre-processing?
3. tophat
5. repeat 1-4 for different set of reads and find differentially expressed genes (cuffdiff)

First, any steps I should add?

Second, there doesn't seem to be much about how to do this. I mean I can read the manuals and execute the commands (steps 3, 4 seem no problem), but I'm looking any pointers to either:

1. fully documented pipelines with a explanation of the processing at each step
2. shell script(s) of going from reads to differentially expressed genes.
3. pubs where this is documented.

I realize each set of data will be different, but it'd be nice to base it on something.

rna pipeline next-gen sequencing rna-seq • 28k views
11
Entering edit mode
11.1 years ago
Dstan ▴ 160

We're getting ready to publish a study in which we use RNA-seq, and we used a piece of software called GNUMAP. We did not apply any filtering on the read qualities, as we found that lower-quality reads simply didn't map as well. As far as the post-mapping analysis, we're still waiting to hear back from our statistics colleagues on the model they've developed.

As far as an out-of-the-box solution for RNA-seq, I'm not sure how much you'll be able to find.

2
Entering edit mode

hadn't heard of GNUMAP, checking it out now. i'm not expecting an out-of-the-box solution, just trying to make use of existing knowledge.

10
Entering edit mode
11.1 years ago
Wjeck ▴ 480

No idea about where these steps exist as a well documented whole, but I can pass on our experience. We're doing a pretty massive amount of RNA-seq at our institution as part of The Cancer Genome Atlas, and our methods are along the lines you describe.

Bowtie/Tophat for mapping has been our best bet for spliced sequence alignment. I know the group working on this tried other techniques with mapping onto a reference "transcriptome" that has some advantages in terms of mapping but can be harder to deconvolute in cases where transcripts overlap.

0
Entering edit mode

thanks, at least it's good to know you decided on a similar overall pipeline after looking around.

6
Entering edit mode
11.0 years ago

I think, one important step that is missing here could be

in the filtering step. A large amount of reads could be e.g. artifacts from a PCR step in the wet-lab pipeline. This can be done e.g. with the tool FASTA collapser from the FASTX tools. For a quantitative approach I would prefer this, but I guess it's controversial. Any experiences with that?

Another filtering step can be to clip the reads removing low-quality regions instead of removing only total reads.

2
Entering edit mode

My understanding is that removing identical reads is a step that is typical for DNA analysis, but more controversial when it comes to RNA-Seq because the rationale for it is less clear here (are we only removing PCR artifacts, or also introducing a quantitative bias?).

0
Entering edit mode

Note, I wrote this almost 3 years ago. Now, I wouldn't do it anymore for a differential analysis, with the argument that on average PCR-artifacts should equally affect both conditions. That's possibly still controversial.

0
Entering edit mode

I'll admit that I didn't see the date of the original answer :)

5
Entering edit mode
5.8 years ago

We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.

This material was released alongside this publication:

Malachi Griffith, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith. 2015. Informatics for RNA-seq: A web resource for analysis on the cloud.11(8):e1004393.

The Supplementary Information for this publication includes an extensive review of RNA-seq wet lab and analysis concepts, existing tools, common questions, etc.

All materials associated with this publication, including high resolution and original figure files, supplementary tables, etc. are available here

This publication was inspired by workshops that we have taught at CBW, CSHL, and NYGC over the last few years. These workshops are ongoing and we hope to maintain and expand the content in the coming years.

2
Entering edit mode
8.0 years ago

For anyone still interested in this type of thing:

If using Tophat Cufflinks, the authors generally do not recommend removing poor quality reads since their process will simply down value the alignments of poor quality reads and sometimes they can actually help things.

As for 3-5:

I have recently written a pipeline called Blacktie to do just this, plus do some automated analysis with cummeRbund.

Installation via pip:

[sudo] pip install -U blacktie

0
Entering edit mode

Could you give a source for the top statement about pre-filtering reads for tophat? I've been trying to learn about this topic and haven't found a whole lot honestly.

1
Entering edit mode
8.0 years ago
Biojl ★ 1.7k

You may want to take a look to The Simple Fool’s Guide to Population Genomics via RNA-Seq done at the PALUMBI lab. It's a functional fully documented pipeline from 0.

http://sfg.stanford.edu/guide.html

Edit PD: OK, yes, I didn't saw this post was from 3 years ago.

0
Entering edit mode
8.0 years ago
xiangwulu ▴ 100
1. fastqc could be used for the quality control
2. adptor may need to be removed before the alignment, in case the long adaptor affects the aligning result
3. & 4 other aligner may worth to look at depends on the length of the reads. (BWA, Bowtie, Bfast)

list of alignment software:

http://en.wikipedia.org/wiki/List_of_sequence_alignment_software

http://elements.eaglegenomics.com/

0
Entering edit mode
6.3 years ago
Czh3 ▴ 190

try this:

https://github.com/Czh3/NGSTools