Course: Analysis of RNA sequencing data with R/Bioconductor
Where: Freie Universitat Berlin (Germany)
When: 22-26 June 2020
This course will provide biologists and bioinformaticians with practical statistical analysis skills to perform rigorous analysis of RNAseq data with R and Bioconductor. The course assumes basic familiarity with genomics, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-throughput data generated by next-generation sequencing, including: exploratory data analysis, principal components analysis, clustering, differential expression, and gene set analysis.
Session 1 – Introduction
Monday - 09:30 to 17:30
Lecture 1: Data distributions
- random variables
- distributions
- population and samples
Hands-On 1: Introduction to R
Lecture 2: Creating high-quality graphics in R
- Visualizing data in 1D, 2D & more than two dimensions
- Heatmaps
- Data transformations
Hands-On 2: Graphics with base R and ggplot2
Session 2 – Hypothesis testing
Tuesday - 09:30 to 17:30
Lecture 1: Hypothesis testing theory
- type I and II error and power
- multiple hypothesis testing: false discovery rate, familywise error rate
- exploratory data analysis (EDA)
Hands-On 1: Standard tests & EDA
Lecture 2: Hypothesis testing in practice
- hypothesis tests for categorical variables (chi-square, Fisher's exact)
- Monte Carlo simulation
- Permutation tests
Hands-On 2: Permutation tests
Session 3 - Bioconductor
Wednesday – Classes from 09:30 to 17:30
Lecture 1: Introduction to Bioconductor
- Incorporating Bioconductor in your data analysis
- ExpressionSet / SummarizedExperiment
- Annotation resources
Hands-On 1: Leveraging Bioconductor annotation resources
Lecture 2: Genomic intervals
- Introduction to genomic region algebra
- Basic operations: construction, intra- and inter-region operations
- Finding overlaps
Hands-On 2: Solving common bioinformatic challenges with GenomicRanges
Session 4 - Next-generation sequencing data
Thursday - 09:30 to 17:30
Lecture 1: High-throughput count data
- Characteristics of count data
- Exploring count data
- Modeling count data
Hands-On 1: Analyzing next-generation sequencing data
Lecture 2: Clustering and Principal Components Analysis
- Measures of similarity
- Hierarchical clustering
- Dimension reduction
- Principal components analysis (PCA)
Hands-On 2: Clustering & PCA
Session 5 - Differential expression and gene set analysis
Friday - 09:30 to 17:30
Lecture 1 - Differential expression analysis
- Normalization
- Experimental designs
- Generalized linear models
Lab 1: Performing differential expression analysis with DESeq2
Lecture 2 - Gene set analysis
- A primer on terminology, existing methods & statistical theory
- GO/KEGG overrepresentation analysis
- Functional class scoring & permutation testing
- Network-based enrichment analysis
Lab 2: Performing gene set enrichment analysis with the EnrichmentBrowser