## Course: Analysis of RNA sequencing data with R/Bioconductor

Where: Freie Universitat Berlin (Germany)

When: 22-26 June 2020

This course will provide biologists and bioinformaticians with practical statistical analysis skills to perform rigorous analysis of RNAseq data with R and Bioconductor. The course assumes basic familiarity with genomics, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-throughput data generated by next-generation sequencing, including: exploratory data analysis, principal components analysis, clustering, differential expression, and gene set analysis.

### Session 1 – Introduction

Monday - 09:30 to 17:30

Lecture 1: Data distributions

- random variables
- distributions
- population and samples

Hands-On 1: Introduction to R

Lecture 2: Creating high-quality graphics in R

- Visualizing data in 1D, 2D & more than two dimensions
- Heatmaps
- Data transformations

Hands-On 2: Graphics with base R and ggplot2

### Session 2 – Hypothesis testing

Tuesday - 09:30 to 17:30

Lecture 1: Hypothesis testing theory

- type I and II error and power
- multiple hypothesis testing: false discovery rate, familywise error rate
- exploratory data analysis (EDA)

Hands-On 1: Standard tests & EDA

Lecture 2: Hypothesis testing in practice

- hypothesis tests for categorical variables (chi-square, Fisher's exact)
- Monte Carlo simulation
- Permutation tests

Hands-On 2: Permutation tests

### Session 3 - Bioconductor

Wednesday – Classes from 09:30 to 17:30

Lecture 1: Introduction to Bioconductor

- Incorporating Bioconductor in your data analysis
- ExpressionSet / SummarizedExperiment
- Annotation resources

Hands-On 1: Leveraging Bioconductor annotation resources

Lecture 2: Genomic intervals

- Introduction to genomic region algebra
- Basic operations: construction, intra- and inter-region operations
- Finding overlaps

Hands-On 2: Solving common bioinformatic challenges with GenomicRanges

### Session 4 - Next-generation sequencing data

Thursday - 09:30 to 17:30

Lecture 1: High-throughput count data

- Characteristics of count data
- Exploring count data
- Modeling count data

Hands-On 1: Analyzing next-generation sequencing data

Lecture 2: Clustering and Principal Components Analysis

- Measures of similarity
- Hierarchical clustering
- Dimension reduction
- Principal components analysis (PCA)

Hands-On 2: Clustering & PCA

### Session 5 - Differential expression and gene set analysis

Friday - 09:30 to 17:30

Lecture 1 - Differential expression analysis

- Normalization
- Experimental designs
- Generalized linear models

Lab 1: Performing differential expression analysis with DESeq2

Lecture 2 - Gene set analysis

- A primer on terminology, existing methods & statistical theory
- GO/KEGG overrepresentation analysis
- Functional class scoring & permutation testing
- Network-based enrichment analysis

Lab 2: Performing gene set enrichment analysis with the EnrichmentBrowser