Question: How Do I Get Started Working With Rna-Seq Data
gravatar for HNK
6.2 years ago by
HNK120 wrote:

hey everyone I have to start my work on RNA-seq. i am totally new to this RNA seq approach. I have to work on the data given by neurological department. The data has 96 samples(reads ..fastq files), the samples were derived from formalin fixed paraffin embedded. . I have to determine somatic variation, gene expression, SNV and fusion genes between subgroups from RNAseq. CAn any1 help me out, how should i start my work. How to analyse teh RNAseq and cancer genome data.

gene-expression rna-seq • 15k views
ADD COMMENTlink modified 4.2 years ago by dnaseiseq200 • written 6.2 years ago by HNK120

Welcome to Biostar! This is not a great question, as there are many guides to RNA sequence analysis online just a search away. I'd recommend finding one, starting to follow it, and if you get stuck, then come back and ask specific questions. You're more likely to get useful responses that way. Look at Section 6 here for more details. (Do your homework before posting);jsessionid=A2C2B677241104800E044DA36AFB577B

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Chris Miller21k

I recently found this book, which I think does a good job of giving an overview of the RNA-Seq tools available and what they do. It also gives code snippets to explain how to execute them.

As slides:

on amazon:

ADD REPLYlink modified 4 months ago by RamRS26k • written 4.5 years ago by matthew.sapio20
gravatar for Michele Busby
5.0 years ago by
Michele Busby2.1k
United States
Michele Busby2.1k wrote:

We have a blog post here that goes over basic concepts in RNA Seq:

I have to edit it (I've been told by complaining readers) to add something on normalization and I also want to add stuff on biases and complexity.

Since those are FFPE samples some are likely crappy so you will have a lot of biases, etc. which means that just running it through an existing pipeline may not be optimal, though it may be a good first step.  i.e. you may need to do something like principle component analysis to see what your confounders are. It's not trivial but others have done it.  


ADD COMMENTlink written 5.0 years ago by Michele Busby2.1k
gravatar for Carlos Borroto
6.1 years ago by
Carlos Borroto1.9k
Washington Metropolitan Area
Carlos Borroto1.9k wrote:

I would start by reading this paper from the authors of the Tuxedo pipeline.

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.

ADD COMMENTlink written 6.1 years ago by Carlos Borroto1.9k

this is a really good point, I always tell people the same thing. There is probably no better way to get started, just work through a paper or to.

ADD REPLYlink written 6.1 years ago by Istvan Albert ♦♦ 83k

Thankyou so much. I have started reading this paper.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by HNK120
gravatar for dnaseiseq
4.2 years ago by
United Kingdom
dnaseiseq200 wrote:


Just published:

A survey of best practices for RNA-seq data analysis

Genome Biology 2016, 17:13 doi:10.1186/s13059-016-0881-8

ADD COMMENTlink modified 4 months ago by RamRS26k • written 4.2 years ago by dnaseiseq200
gravatar for Malachi Griffith
4.6 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith18k wrote:

We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at

This material was released alongside this publication:

Malachi Griffith*, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith*. 2015. Informatics for RNA-seq: A web resource for analysis on the cloud.11(8):e1004393.

The Supplementary Information for this publication includes an extensive review of RNA-seq wet lab and analysis concepts, existing tools, common questions, etc.

All materials associated with this publication, including high resolution and original figure files, supplementary tables, etc. are available here:

This publication was inspired by workshops that we have taught at CBWCSHL, and NYGC over the last few years.  These workshops are ongoing and we hope to maintain and expand the content in the coming years.

ADD COMMENTlink written 4.6 years ago by Malachi Griffith18k
gravatar for Charles Warden
6.1 years ago by
Charles Warden7.6k
Duarte, CA
Charles Warden7.6k wrote:

Somatic variation is really meant for DNA-Seq data. Although you can look for RNA-editing events with paired DNA-Seq and RNA-Seq data, I think you will have a hard time distinguishing true variants from tumor-specific RNA-editing events if you are comparing two RNA-Seq samples (or SNV calling in RNA-Seq sample against a reference genome).

For gene expression, I've included some benchmarks here (which I ran using paired tumor-normal RNA-Seq data):

I don't think there is a gold standard for gene fusion events, but I've liked chimerascan the best. TopHat-fusion is probably the most popular option.

ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by Charles Warden7.6k
gravatar for Czh3
5.0 years ago by
Czh3190 wrote:

RNA-seq pipeline:

ADD COMMENTlink written 5.0 years ago by Czh3190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1637 users visited in the last hour