9-13 March 2020
Freie Universität Berlin; Habelschwerdter Allee 45, 14195 Berlin
This course will provide a thorough introduction to the application of metabarcoding techniques in microbial ecology. The topics covered by the course range from bioinformatic processing of next-generation sequencing data to the most important approaches in multivariate statistics. Using a combination of theoretical lectures and hands-on exercises, the participants will learn the most important computational steps of a metabarcoding study from the processing of raw sequencing reads down to the final statistical evaluations. After completing the course, the participants should be able to understand the potential and limitations of metabarcoding techniques as well as to process their own datasets to answer the questions under investigation.
This course is designed for researchers and students with strong interests in applying novel high-throughput DNA sequencing technologies to answer questions in the area of community ecology and biodiversity. The course will mainly focus on the analysis of phylogenetic markers to study bacterial, archaeal and fungal assemblages in the environment, but the theoretical concepts and computational procedures can be equally applied to any taxonomic group or gene of interest.
The participants should have some basic background in biology and understand the central role of DNA for biodiversity studies. No programming or scripting expertise is required and some basic introduction to UNIX-based command line applications will be provided on the first day. However, some basic experience with using command line and/or R is clearly an advantage as not all the basics can be thoroughly covered in that short amount of time.
All the hands-on exercises will be carried out using QIIME2 platform (https://qiime2.org/). No previous knowledge of computer science is required but a basic knowledge of “bash” would allow to focus more on the microbial analysis.
- Understanding the concept, potential and limitation of microbial metabarcoding techniques.
- Learning how to process raw sequencing reads to obtain meaningful information.
- Obtaining experience on how to statistically evaluate and visualize your data.
- Being able to make informed decisions on best practices for your own data.
Monday from 09:30 to 17:30
Lecture 1 – Introduction to NGS in microbial ecology
• Key concepts (metabarcoding, metagenomics, single-cell sequencing) • Sequencing platforms (core concepts, read length, read numbers, error rates) • In-depth example of sequencing with Illumina platforms (over-and under-loading, sequencing process) • Genetic markers for metabarcoding (markers, primer selection & evaluation) • Experimental design (library preparation, replication, multiplexing, coverage, costs) • Understanding data formats (FASTQ, FASTA, others) • Core concept of computational pipeline for amplicons • Introduction of the QIIME2 suite
Lab 1 – Introduction to compute lab
• Introduction to the BASH command line (e.g. basic UNIX commands, batch processing) • Check functionality of computational environment with demo data • Checking basic characteristics of datasets (number of reads, read length, read quality)
Tuesday from 09:30 to 17:30
Lecture 2 – Quality control of NGS reads
• Pre-PCR noise (under-sampling, DNA extraction bias, sample storage, contamination, metadata collection) • PCR-dependent noise (single nucleotide mis-incorporations, PCR chimeras, primer dimers, unspecific amplification, preferential amplification, template concentrations) • Sequencing-dependent noise (filtering/trimming poor base calls, dealing with substitution, insertion/deletion errors, index cross-talk, amplicon carry-over)
Lecture 3 – Binning into operational taxonomic units (OTUs) vs Exact Sequence Variants (exact sequence variants)
• Core concept of OTUs and ESV • OTU binning strategies (de-novo vs. reference-based, impact of alignment strategies, hierarchical clustering algorithms, seed-based clustering algorithms, model-based clustering algorithms) • OTUs versus ESVs
Lab 2 – Sequence quality control and clustering into operational taxonomic units
• Denoising, OTU binning, and ESV calling (e.g. paired-end merging, sequence filtering, dereplication, OTU clustering, chimera removal, target verification)
Tools: DADA2, VSEARCH
Wednesday from 09:30 to 17:30
Lecture 4 – Taxonomic Classification
• Core concepts of taxonomic classification • Reference databases (INSDCs, SILVA, RDP, GREENGENES, UNITE) • Classification algorithms (similarity-based, composition-based, phylogeny-based) • Popular assignment approaches (Naïve Bayesian Classifier, BLAST)
Lab 3 – Taxonomic classification
• Finishing Lab 2 if required • Taxonomic classification using Naïve Bayesian Classifiers and VSEARCH taxonomy implemented in QIIME2 • Dealing with the preparation of custom databases for any genetic marker from NCBI
Thursday from 09:30 to 17:30
Lecture 5 – Multivariate analysis of ecological communitie
• Traits of Alpha and Beta Diversity (richness, evenness, dispersion) • Ordination Techniques (Constrained vs Unconstrained) • Multivariate Tests for differences in microbial community composition
Lab 4 - Multivariate Statistics
• Data import & preparation (normalisations, transformations, metadata) • Alpha Diversity (indices of diversity, rarefaction curves) • Heatmaps to visualise microbial community differences • Unconstrained and Constrained Ordination (PCoA, NMDS, CCA, DCA) • Multivariate tests for differences in community composition (PERMANOVA, PERMDISP) • Taxon-level responses (ANCOM, DESeq2) • Core concept of alpha and beta diversity (indices, distance and dissimilarity metrics) • Unconstrained and constrained ordination techniques • Multivariate tests to infer structural differences • Statistical tests to assess taxon-level responses
Friday from 09:30 to 17:30
Lecture 6 –GLMs and Mixed Models for Microbiome Data
• Using Traits of Microbiome structure in GLMs and Mixed Models • Model selection for GLMs and (G)LMMs • Combining Microbiome data and life history data
Lab 5 – Mixed Models
• Fitting GLMs and (G)LMMs in R • Model Selection and presentation of results • Plotting effects
Lecture 7 –Quantifying Taxon-level Changes in Abundance
• Absolute vs Relative Abundance • Indicator Species and Community composition • Differential Abundance testing
Lab 6 – taxon-level statistics
• DESeq2 • Indicator Analysis • ANCOM