Applied Statistics and Bioinformatics with R and Bioconductor
Dates
22-26 January 2018
Where
Berlin
Overview
This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by high-throughput sequencing, including: exploratory data analysis, principal components analysis, unsupervised clustering, batch effects, linear modeling for differential expression, gene set analysis
Preparation
Come to the first class with the following installed:
● R and Bioconductor: www.bioconductor.org/install
● R Studio: https://www.rstudio.com/products/rstudio/download3/
● Github desktop client (or any other Github client): https://desktop.github.com/
Additionally, please create an account at www.github.com, and use it to introduce yourself at https://github.com/waldronlab/AppStatTrento/issues.
Labs
Each day will include a hands-on lab session, that students should attempt and hand in before the following day by committing to the course Github repository. A selection of labs will be reviewed the following day.
Program
Monday 22nd – Classes from 09:30 to 17:30
Session 1 – Introduction
Lecture 1: Data distributions
● random variables
● distributions
Lecture 2: Statistical inference and sampling
● populations and samples
● Central Limit Theorem
● t-distribution
Lab 1: Introduction to R and Bioconductor
Lab 2: Creating graphics
Tuesday 23nd – Classes from 09:30 to 17:30
Session 2– Hypothesis testing
Lecture 1: hypothesis testing concepts
● type I and II error and power
● confidence intervals
● multiple hypothesis testing: false discovery rate, familywise error rate
Lecture 2: hypothesis testing in practice
● hypothesis tests for categorical variables (chi-square, Fisher's exact)
● Monte Carlo simulation
● permutation tests
● bootstrap simulation
● exploratory data analysis
Lab: bootstrap simulation and permutation tests
Wednesday 24th – Classes from 09:30 to 17:30
Session 3 - Linear modeling
Lecture 1: linear modeling
● linear regression and multiple regression
● model matrix and model formulae
Lecture 2: generalized linear models for count data
● intro to generalized linear models
● logistic regression and log-linear models
● Poisson and Negative Binomial error models
● Zero-inflated models
Lab: RNA-seq differential expression workflow
Thursday 25th – Classes from 09:30 to 17:30
Session 4 - Unsupervised methods
Lecture 1: distances and PCA
● distance in high dimensions
● singular value decomposition
● principal components analysis and multidimensional scaling
Lecture 2: unsupervised clustering
● unsupervised clustering
● batch effects
Lab 1: applications of unsupervised methods to shotgun metagenomics microbiome data analysis
Lab 2: option to work on students’ own data.
Friday 26th – Classes from 09:30 to 17:30
Session 5 - Gene set and multi-omics data analysis
Lecture 1 - gene set enrichment analysis
● background on gene set testing
● types and interpretations of gene set tests
● advantages and pitfalls of gene set testing
Lab 1 - gene set analysis with applications to gene expression and multi-omics experiments
Lab 2 - multi-omics data analysis
Lab 3 - option to work on students’ own data.
For more information about the course, please visit our WEBSITE