Applied Statistics and Bioinformatics with R and Bioconductor
Dates
22-26 January 2018
Where
Berlin
Overview
This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by high-throughput sequencing, including: exploratory data analysis, principal components analysis, unsupervised clustering, batch effects, linear modeling for differential expression, gene set analysis
Preparation
Come to the first class with the following installed:
- R and Bioconductor: www.bioconductor.org/install
- R Studio: https://www.rstudio.com/products/rstudio/download3/
- Github desktop client (or any other Github client): https://desktop.github.com/
Additionally, please create an account at www.github.com, and use it to introduce yourself at https://github.com/waldronlab/AppStatTrento/issues.
Labs
Each day will include a hands-on lab session, that students should attempt and hand in before the following day by committing to the course Github repository. A selection of labs will be reviewed the following day.
Program
Monday 22nd - Classes from 09:30 to 17:30
Session 1 - Introduction
Lecture 1: Data distributions
- random variables
distributions
Lecture 2: Statistical inference and sampling
populations and samples
- Central Limit Theorem
t-distribution
Lab 1: Introduction to R and Bioconductor
Lab 2: Creating graphics
Tuesday 23nd - Classes from 09:30 to 17:30
Session 2- Hypothesis testing
Lecture 1: hypothesis testing concepts
- type I and II error and power
- confidence intervals
multiple hypothesis testing: false discovery rate, familywise error rate
Lecture 2: hypothesis testing in practice
hypothesis tests for categorical variables (chi-square, Fisher's exact)
- Monte Carlo simulation
- permutation tests
- bootstrap simulation
exploratory data analysis
Lab: bootstrap simulation and permutation tests
Wednesday 24th - Classes from 09:30 to 17:30
Session 3 - Linear modeling
Lecture 1: linear modeling
- linear regression and multiple regression
- model matrix and model formulae
Lecture 2: generalized linear models for count data
- intro to generalized linear models
- logistic regression and log-linear models
- Poisson and Negative Binomial error models
- Zero-inflated models
Lab: RNA-seq differential expression workflow
Thursday 25th - Classes from 09:30 to 17:30
Session 4 - Unsupervised methods
Lecture 1: distances and PCA
- distance in high dimensions
- singular value decomposition
- principal components analysis and multidimensional scaling
Lecture 2: unsupervised clustering
- unsupervised clustering
- batch effects
Lab 1: applications of unsupervised methods to shotgun metagenomics microbiome data analysis
Lab 2: option to work on students' own data.
Friday 26th - Classes from 09:30 to 17:30
Session 5 - Gene set and multi-omics data analysis
Lecture 1 - gene set enrichment analysis
- background on gene set testing
- types and interpretations of gene set tests
- advantages and pitfalls of gene set testing
Lab 1 - gene set analysis with applications to gene expression and multi-omics experiments
Lab 2 - multi-omics data analysis
Lab 3 - option to work on students' own data.
For more information about the course, please visit our WEBSITE