A Beginner's Guide to RNA-Seq Data Analysis
Quality Control, Read Mapping, Visualization and Downstream Analyses
1 - 5 February 2016
iad Pc-Pool, Rosa-Luxemburg-Straße 23, Leipzig, Germany
Scope and Topics
The purpose of this workshop is to get a deeper understanding in Next-Generation Sequencing (NGS) with a special focus on bioinformatics issues. Additionally, all workshop participants should be enabled to perform important tasks of NGS data analysis tasks themselves.
The first workshop module is an introduction to data analysis using Linux, assuring that all participants are able to follow the practical parts. The second module discusses advantages and disadvantages of current sequencing technologies and their implications on data analysis. The most important NGS file formats (fastq, sam/bam, bigWig, etc.) are introduced and one proceeds with first hands-on analyses (QC, mapping, visualization). You will learn how to read and interpret QC plots, clip adapter sequences and/or trim bad quality read ends, get bioinformatics backgrounds about the read mapping and understand its problems (dynamic programming, alignment visualization, NGS mapping heuristics, etc.), perform your own mapping statistics and visualize your data in different ways (IGV, UCSC, etc.). The last module addresses a specific applications of NGS: RNA-seq data analysis and detection of differentially expressed genes.
This workshop has been redesigned and adapted to the needs of beginners in the field of NGS bioinformatics and comprises this three course modules:
- Linux for Bioinformatics: This module will introduce the essential tools and file formats required for NGS data analysis. It helps to overcome the first hurdles when entering this (for NGS analyses) unavoidable operating system.
- Introduction to NGS data analysis: Different methods of NGS will be explained, the most important notations be given and first analyses be performed. This course covers essential knowledge for analysing data of many different NGS applications.
- RNA-seq Data Analyses: RNA-Seq for model-organisms
- biologists or data analysts with no or little experience in analyzing RNA-Seq data
Included in the Course
- Course materials
- Conference Dinner
- Gero Doose (University of Leipzig) found and published several circularized RNAs in various RNA-Seq experiments. He specialized on split-read analysis some years ago and has a strong expertise in downstream analyses.
- Dr. Christian Otto (CCR BioIT) is one of the developers of the split-read mapping tool segemehl and is an expert on implementing efficient algorithms for HTS data analyses.
- Dr. David Langenberger (ecSeq Bioinformatics) started working with small non-coding RNAs in 2006. Since 2009 he uses HTS technologies to investigate these short regulatory RNAs as well as other targets. He has been part of several large HTS projects, for example the International Cancer Genome Consortium (ICGC).
- Dr. Mario Fasold (ecSeq Bioinformatics) works in the analysis of microarray data since 2007 and developed several bioinformatics tools such as the Bioconductor package AffyRNADegradation and the Larpack program package. Since 2011 he specialized in the field of HTS data analysis and helped analysing sequencing data of several large consortium projects.
- Opening Date of Registration: 1 Juli 2015
- Closing Date of Registration:
15 January 2016
- Workshop: 1 - 5 February 2016 (8 am - 5 pm)
- Location: iad Pc-Pool, Rosa-Luxemburg-Straße 23, Leipzig, Germany
- Language: English
- Available seats: 24 (first-come, first-served)
Registration fee: 1,390 EUR (without VAT)
>> get more information or register
For the course we will use a full c. elegans genome and a real Illumina miSeq RNA-Seq sample. For the introduction part we decided to go for c. elegans, since read mapping is pretty time and memory consuming. The used genome and the miSeq run are small enough to use 'normal' machines (4 CPUs and 8GB RAM), without running out of memory and/or wait forever.
BUT: All tools, calls and pipelines can be used in exactly the same way for all other genomes, like human, bacteria, plants, etc.! (for some species some parameters should be adapted)
For the RNA-Seq part, we work with real human data, but limited the analysis to one chromosome. Again, the analysis can be done in exactly the same way for all chromosomes, you just have to wait a bit longer, if you have a weak machine.