Genome-free RNA-seq assembly and transcriptome analysis leveraging Trinity
Berlin, 11th-15th June 2018
Instructors:
Brian Haas (The Broad Institute of MIT & Harvard, USA)
Dr. Nicolas Delhomme (Umeå Plant Science Center, SE)
Workshop overview
RNA-Seq technology has been transformative in our ability to explore gene content and gene expression in all realms of biology, and de novo transcriptome assembly has enabled opportunities to expand transcriptome analysis to non-model organisms. This workshop provides an overview of modern applications of transcriptome sequencing and popular tools and algorithms for exploring transcript reconstruction and expression analysis in a genome-free manner, leveraging the Trinity software and analysis framework. Attendees will perform quality assessment of Illumina RNA-Seq data, assemble a transcriptome using Trinity, quantify transcript expression, leverage Bioconductor tools for differential expression analysis, and apply Trinotate to functionally annotate transcripts. Additional methods will be explored for characterizing the assembled transcriptome and revealing biological findings. Intended audience
This workshop is aimed primarily at biologist researchers that have basic bioinformatics skills and are pursuing RNA-Seq projects in non-model organisms. Attendees will gain skills needed to successfully approach transcriptome sequencing, de novo transcriptome assembly, expression analysis, and functional annotation as applied to organisms lacking a high quality reference genome sequence. Attendees are also invited to bring a subset of their own data.
Teaching format
The workshop will be delivered over the course of four and a half days, with each session entailing lectures followed by practical hands-on sessions. Most all computing will be done on the cloud and attendees will use their own laptop computers with the Google Chrome web browser providing all the necessary interfaces to the cloud computing environment, including the linux command terminal.
Assumed background for the participants
Basic experience with linux command-line execution and execution of bioinformatics tools would be helpful. We will begin the course with a review of basic linux commands and operations as a refresher. No programming or scripting knowledge is required.
Session content
Monday 11th - Classes from 09:30 to 17:30
Session 1- Intro to the Trinity RNA-Seq workshop
- Intro to RNA-Seq
- Intro to next-gen sequence analysis
- Overview of unix and workshop setup
- Practical: exploring the computational infrastructure
- Read quality assessment and trimming
- Practical: using FASTQC and TRIMMOMATIC
Tuesday 12th - Classes from 09:30 to 17:30
Session 2-Trinity de novo assembly, expression quantitation, and assembly QC
- Overview of Trinity de novo transcriptome assembly
- Practical: assemble rna-seq data using Trinity
- Intro to expression quantification using RNA-Seq
- Practical: quantify expression for Trinity assembly
- Initial data exploration: assembly quality, and QC samples and replicates
- Practical: using IGV
- Practical: replicate correlation matrix and PCA
Wednesday 13th - Classes from 09:30 to 17:30
Session 3 - Differential expression analysis
- Overview of statistical methods for differential expression (DE).
- Practical: using Bioconductor tools for DE analysis.
- Transcript clustering and expression profiling
- Practical: generating heatmaps and extracting transcript clusters.
Thursday 14th - Classes from 09:30 to 17:30
Session 4 - Functional annotation and Functional enrichment studies
- Overview of methods for functional annotation
- Practical: applying Trinotate to find coding regions in transcripts and predict biological function.
- Overview of functional enrichment analysis
- Practical: applying GOseq to identify significantly enriched Gene Ontology categories among transcript clusters.
Friday 15th - Classes from 09:30 to 17:30
Session 5- Review and custom data analyses