Applied Statistics and Bioinformatics with R and Bioconductor
22-26 January 2018, Berlin (Germany)
https://www.physalia-courses.org/courses-workshops/course19/
Instructors
Dr. Levi Waldron
Dr. Ludwig Geistlinger
Waldron lab for computational biostatistics CUNY School of Public Health in New York City (http://waldronlab.org/)
Overview
This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by high-throughput sequencing, including: exploratory data analysis, principal components analysis, unsupervised clustering, batch effects, linear modeling for differential expression, gene set analysis.
Labs
Each day will include a hands-on lab session, that students should attempt and hand in before the following day by committing to the course Github repository. A selection of labs will be reviewed the following day.
Program
Monday 22nd - Classes from 09:30 to 17:30
Session 1 - Introduction
Lecture 1: Data distributions
- random variables
- distributions
Lecture 2: Statistical inference and sampling
- populations and samples
- Central Limit Theorem
- t-distribution
Lab 1: Introduction to R and Bioconductor
Lab 2: Creating graphics
Tuesday 23nd - Classes from 09:30 to 17:30
Session 2- Hypothesis testing
Lecture 1: hypothesis testing concepts
- type I and II error and power
- confidence intervals
- multiple hypothesis testing: false discovery rate, familywise error rate
Lecture 2: hypothesis testing in practice
- hypothesis tests for categorical variables (chi-square, Fisher's exact)
- Monte Carlo simulation
- permutation tests
- bootstrap simulation
- exploratory data analysis
Lab: bootstrap simulation and permutation tests
Wednesday 24th - Classes from 09:30 to 17:30
Session 3 - Linear modeling
Lecture 1: linear modeling
- linear regression and multiple regression
- model matrix and model formulae
Lecture 2: generalized linear models for count data
- intro to generalized linear models
- logistic regression and log-linear models
- Poisson and Negative Binomial error models
- Zero-inflated models
Lab: RNA-seq differential expression workflow
Thursday 25th - Classes from 09:30 to 17:30
Session 4 - Unsupervised methods
Lecture 1: distances and PCA
- distance in high dimensions
- singular value decomposition
- principal components analysis and multidimensional scaling
Lecture 2: unsupervised clustering
- unsupervised clustering
- batch effects
Lab 1: applications of unsupervised methods to shotgun metagenomics microbiome data analysis
Lab 2: option to work on students' own data.
Friday 26th - Classes from 09:30 to 17:30
Session 5 - Gene set and multi-omics data analysis
Lecture 1 - gene set enrichment analysis
- background on gene set testing
- types and interpretations of gene set tests
- advantages and pitfalls of gene set testing
Lab 1 - gene set analysis with applications to gene expression and multi-omics experiments
Lab 2 - multi-omics data analysis
Lab 3 - option to work on students' own data.
Packages available
- Course-only: includes course material and refreshments (530 euros; VAT incl.)
- All-inclusive: includes course material, refreshments, meals (breakfast, lunch and dinner), accommodation (795 euros; VAT incl.)
Registration deadline: December 20th , 2017.
Full list of our courses and Workshops