hey everyone I have to start my work on RNA-seq. i am totally new to this RNA seq approach. I have to work on the data given by neurological department. The data has 96 samples(reads ..fastq files), the samples were derived from formalin fixed paraffin embedded. . I have to determine somatic variation, gene expression, SNV and fusion genes between subgroups from RNAseq. CAn any1 help me out, how should i start my work. How to analyse teh RNAseq and cancer genome data.
We have a blog post here that goes over basic concepts in RNA Seq:
I have to edit it (I've been told by complaining readers) to add something on normalization and I also want to add stuff on biases and complexity.
Since those are FFPE samples some are likely crappy so you will have a lot of biases, etc. which means that just running it through an existing pipeline may not be optimal, though it may be a good first step. i.e. you may need to do something like principle component analysis to see what your confounders are. It's not trivial but others have done it.
I would start by reading this paper from the authors of the Tuxedo pipeline.
We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.
This material was released alongside this publication:
Malachi Griffith*, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith*. 2015. Informatics for RNA-seq: A web resource for analysis on the cloud.11(8):e1004393.
The Supplementary Information for this publication includes an extensive review of RNA-seq wet lab and analysis concepts, existing tools, common questions, etc.
All materials associated with this publication, including high resolution and original figure files, supplementary tables, etc. are available here: https://github.com/griffithlab/rnaseq_tutorial
This publication was inspired by workshops that we have taught at CBW, CSHL, and NYGC over the last few years. These workshops are ongoing and we hope to maintain and expand the content in the coming years.
Somatic variation is really meant for DNA-Seq data. Although you can look for RNA-editing events with paired DNA-Seq and RNA-Seq data, I think you will have a hard time distinguishing true variants from tumor-specific RNA-editing events if you are comparing two RNA-Seq samples (or SNV calling in RNA-Seq sample against a reference genome).
For gene expression, I've included some benchmarks here (which I ran using paired tumor-normal RNA-Seq data):
I don't think there is a gold standard for gene fusion events, but I've liked chimerascan the best. TopHat-fusion is probably the most popular option.