GENOMIC DATA VISUALISATION AND MANIPULATION USING PYTHON
This course is being delivered by Dr Martin Jones, an expert in Python and author of two text books,
Python for Biologists [http://www.amazon.com/Python-Biologists-complete-programming-beginners/dp/1492346136/]
Advanced Python for Biologists [http://www.amazon.com/Advanced-Python-Biologists-Martin-Jones/dp/1495244377/].
COURSE OVERVIEW: One of the strengths of the Python language is the availability of mature, high-quality libraries for working with scientific data. Integration between the most popular libraries has lead to the concept of a “scientific Python stack”: a collection of packages which are designed to work well together. In this workshop we will see how to leverage these libraries to efficiently work with and visualize large volumes of data.
INTENDED AUDIENCE: This workshop is aimed at researchers and technical workers with a background in biology and a basic knowledge of Python (if you’ve taken the Introductory Python course then you have the Python knowledge; if you’re not sure whether you know enough Python to benefit from this course then just drop us an email).
TEACHING FORMAT: The workshop is delivered over nine half-day sessions. Each session consists of roughly a one hour lecture followed by two hours of practical exercises, with breaks at the organiser’s discretion. Each session uses examples and exercises that build on material from the previous one, so it’s important that students attend all sessions. The last session will be kept free for students to work on their own datasets with the assistance of the instructor.
ASSUMED COMPUTER BACKGROUND: Students should also have some basic Python experience (the Introduction to Python course will fulfil these requirements). Students should be familiar with the use of lists, loops, functions and conditions in Python and have written at least a few small programs from scratch.
Curriculum: Day 1 Module 1: Introduction and datasets Jupyter (formerly iPython) is a programming environment that is rapidly becoming the de facto standard for scientific data analysis. In this session we’ll learn why Jupyter is so useful, covering its ability to mix notes and code, to render inline plots, charts and tables, to use custom styles and to create polished web pages. We’ll also take a look at the datasets that we’ll be investigating during the course and discuss the different types of data we encounter in bioinformatics work.
Module 2: Introduction to pandas In this session we introduce the first part of the scientific Python stack: the pandas data manipulation package. We’ll learn about Data frames — the core data structure that much of the rest of the course will rely on — and how they allow us to quickly select, sort, filter and summarize large datasets. We’ll also see how to extend existing Data frames by writing functions to create new columns, as well as how to deal with common problems like missing or inconsistent values in datasets. We’ll get our first look at data visualization by using pandas’ built in plotting ability to investigate basic properties of our datasets.
Day 2 Module 3: Grouping and pivoting with pandas This session continues our look at pandas with advanced uses of Dataframes that allow us to answer more complicated questions. We’ll look two very powerful tools: grouping, which allows us to aggregate information in datasets, and pivoting/stacking, which allows us to flexibly rearrange data (a key step in preparing datasets for visualization). In this session we’ll also go into more detail about pandas indexing system.
Module 4: Advanced manipulation with pandas In this final session on the pandas library we’ll look at a few common types of data manipulation — binning data (very useful for working with time series), carrying out principal component analysis, and creating networks. We’ll also cover some features of pandas designed for working with specific types of data like timestamps and ordered categories.
Day 3 Module 5: Introduction to seaborn This session introduces the seaborn charting library by showing how we can use it to investigate relationships between different variables in our datasets. Initially we concentrate on showing distributions with histograms, scatter plots and regressions, as well as a few more exotic chart types like hexbins and KDE plots. We also cover heatmaps, in particular looking at how they lend themselves to displaying the type of aggregate data that we can generate with pandas.
Module 6: Categories in seaborn This session is devoted to seaborn’s primary use case: visualizing relationships across multiple categories in complex datasets. We see how we can use colour and shape to distinguish categories in single plots, and how these features work together with the pandas tools we havealready seen to allow us to very quickly explore a dataset. We continue by using seaborn to build small multiple or facet plots, separating categories by rows and columns. Finally, we look at chart types that are designed to show distributions across categories: box and violin plots, and the more exotic swarm and strip plots.
Day 4 Module 7: Customization with seaborn For the final session on seaborn, we go over some common types of customization that can be tricky. To achieve very fine control over the style and layout of our plots, we’ll learn how to work directly with axes and chart objects to implement things like custom heatmap labels, log axis scales, and sorted categories.
Module 8: Matplotlib The final teaching session, we look at the library that both pandas and seaborn rely on for their charting tools: matplotlib. We’ll see how by using matplotlib directly we can do things that would be impossible in pandas or seaborn, such as adding custom annotations to our charts. We’ll also look at using matplotlib to build completely new, custom visualization by combining primitive shapes.
Day 5 Module 9: Data workshop The session on the final day are set aside for a data workshop. Students can practice applying the tools they’ve learned to their own datasets with the help of an instructor, or continue to work on exercises from the previous day. There may also be time for some demonstrations of topics of particular interest, such as interactive visualization tools and animations.
Please send inquiries to firstname.lastname@example.org or visit the website www.prinformatics.com
Please feel free to distribute this information anywhere you think suitable.
PR INFORMATICS Upcoming courses - email for details email@example.com
INTRODUCTION TO BIOINFORMATICS USING LINUX #IBUL 16th – 20th October, Scotland, Dr. Martin Jones http://www.prstatistics.com/course/introduction-to-bioinformatics-using-linux-ibul02/
INTRODUCTION TO PYTHON FOR BIOLOGISTS #IPYB 27th Nov – 1st Dec, Wales, Dr. Martin Jones http://www.prinformatics.com/course/introduction-to-python-for-biologists-ipyb04/
INTRODUCTION REMOTE SENSING AND GIS APPLICATIONS FOR ECOLOGISTS #IRMS 27th Nov – 1st Dec, Wales, Dr Duccio Rocchini, Dr. Luca Delucchi http://www.prstatistics.com/course/introduction-to-remote-sensing-and-gis-for-ecological-applications-irms01/
GENOMIC DATA VISUALISATION AND MANIPULATION USING PYTHON #DVMP 11th – 15th December 2017, Wales, Dr. Martin Jones http://www.prinformatics.com/course/data-visualisation-and-manipulation-using-python-dvmp01/
EUKARYOTIC METABARCODING 23rd – 27th July 2018, Scotland, Dr. Owen Wangensteen http://www.prinformatics.com/course/eukaryotic-metabarcoding-eukb01/
CODING, DATA MANAGEMENT AND SHINY APPLICATIONS USING RSTUDIO FOR EVOLUTIONARY BIOLOGISTS AND ECOLOGISTS #CDSR Dr. Aline Quadros
BIOINFORMATICS FOR GENETICISTS AND BIOLOGISTS #BIGB Scotland, Dr. Nic Blouin, Dr. Ian Misner
PR STATISTICS Upcoming courses - email for details firstname.lastname@example.org
ECOLOGICAL NICHE MODELLING USING R #ENMR 16th – 20th October 2017, Scotland, Dr. Neftali Sillero http://www.prstatistics.com/course/ecological-niche-modelling-using-r-enmr01/
REPRODUCIBLE DATA SCIENCE FOR POPULATION GENETICS #RDPG 23rd – 27th October 2017, Wales, Dr. Thibaut Jombart, Zhian Kavar https://www.prstatistics.com/course/reproducible-data-science-for-population-genetics-rdpg01/
STRUCTURAL EQUATION MODELLING FOR ECOLOGISTS AND EVOLUTIONARY BIOLOGISTS USING R #SEMR 23rd – 27th October 2017, Wales, Prof Jarrett Byrnes, Dr. Jon Lefcheck http://www.prstatistics.com/course/structural-equation-modelling-for-ecologists-and-evolutionary-biologists-semr01/
LANDSCAPE (POPULATION) GENETIC DATA ANALYSIS USING R #LNDG 6th – 10th November 2017, Wales, Prof. Rodney Dyer http://www.prstatistics.com/course/landscape-genetic-data-analysis-using-r-lndg02/
APPLIED BAYESIAN MODELLING FOR ECOLOGISTS AND EPIDEMIOLOGISTS #ABME 20th - 25th November 2017, Scotland, Dr. Matt Denwood http://www.prstatistics.com/course/applied-bayesian-modelling-ecologists-epidemiologists-abme03/
ADVANCING IN STATISTICAL MODELLING USING R #ADVR 11th – 15th December 2017, Wales, Dr. Luc Bussiere, Dr. Tom Houslay, Dr. Ane Timenes Laugen, http://www.prstatistics.com/course/advancing-statistical-modelling-using-r-advr07/
INTRODUCTION TO BAYESIAN HIERARCHICAL MODELLING #IBHM 29th Jan – 2nd Feb 2018, Scotland, Dr. Andrew Parnell http://www.prstatistics.com/course/introduction-to-bayesian-hierarchical-modelling-using-r-ibhm02/
PHYLOGENETIC DATA ANALYSIS USING R (TBC) #PHYL 28th Jan – Feb 2nd 2018 Dr. Emmanuel Paradis – Date and location to be confirmed https://www.prstatistics.com/course/introduction-to-phylogenetic-analysis-with-r-phyg-phyl02/
ANIMAL MOVEMENT ECOLOGY #ANME 19th – 23rd February 2018, Wales, Dr Luca Borger, Dr. John Fieberg
GEOMETRIC MORPHOMETRICS USING R #GMMR 19th – 23rd February 2018, Scotland, Prof. Dean Adams, Prof. Michael Collyer, Dr. Antigoni Kaliontzopoulou http://www.prstatistics.com/course/geometric-morphometrics-using-r-gmmr01/
FUNCTIONAL ECOLOGY FROM ORGANISM TO ECOSYSTEM: THEORY AND COMPUTATION #FEER 5th – 9th March 2018, Scotland, Dr. Francesco de Bello, Dr. Lars Götzenberger, Dr. Carlos Carmona http://www.prstatistics.com/course/functional-ecology-from-organism-to-ecosystem-theory-and-computation-feer01/
SPATIAL PRIORITIZATION USING MARXAN 5th - 9th March 2018, Wales, Jennifer McGowan https://www.prstatistics.com/course/introduction-to-marxan-mrxn01/
ECOLOGICAL NICHE MODELLING USING R #ENMR 12th - 16th March 2018, Scotland, Dr. Neftali Sillero http://www.prstatistics.com/course/ecological-niche-modelling-using-r-enmr02/
MULTIVARIATE ANALYSIS OF ECOLOGICAL COMMUNITIES USING THE VEGAN PACKAGE #VGNR 23rd – 27th April 2018, Scotland, Dr. Peter Solymos, Dr. Guillaume Blanchet https://www.prstatistics.com/course/multivariate-analysis-of-ecological-communities-in-r-with-the-vegan-package-vgnr01/
NETWORK ANAYLSIS FOR ECOLOGISTS USING R #NTWA 9th – 13th April 2018, Scotland, Dr. Marco Scotti https://www.prstatistics.com/course/network-analysis-ecologists-ntwa02/
QUANTITATIVE GEOGRAPHIC ECOLOGY: MODELING GENOMES, NICHES, AND COMMUNITIES 30th April – 4th May 2018, Scotland, Dr. Dan Warren, Dr. Matt Fitzpatrick
INTRODUCTION TO MIXED MODELS FOR ECOLOGISTS #IMMR 28th May – 1st June 2018, Canada, Prof Subhash Lele, Dr. Guillaume Blanchet
MODEL BASE MULTIVARIATE ANALYSIS OF ABUNDANCE DATA USING R #MBMV0 8th – 12th July 2018, Prof David Warton https://www.prstatistics.com/course/model-base-multivariate-analysis-of-abundance-data-using-r-mbmv02/
ADVANCES IN MULTIVARIATE ANALYSIS OF SPATIAL ECOLOGICAL DATA USING R #MVSP Prof. Pierre Legendre, Dr. Olivier Gauthier - Date and location to be confirmed
STABLE ISOTOPE MIXING MODELS USING SIAR, SIBER AND MIXSIAR #SIMM Dr. Andrew Parnell, Dr. Andrew Jackson – Date and location to be confirmed
MULTIVARIATE ANALYSIS OF SPATIAL ECOLOGICAL DATA #MASE Prof. Subhash Lele, Dr. Peter Solymos - Date and location to be confirmed
TIME SERIES MODELS FOR ECOLOGISTS USING R (JUNE 2017 #TSME Dr. Andrew Parnell - Date and location to be confirmed
META-ANALYSIS IN ECOLOGY, EVOLUTION AND ENVIRONMENTAL SCIENCES #METR01 Prof. Jason Matthiopoulos – Date and location to be confirmed
META-ANALYSIS IN ECOLOGY, EVOLUTION AND ENVIRONMENTAL SCIENCES #METR0 Prof. Julia Koricheva, Prof. Elena Kulinskaya – Date and location to be confirmed
Oliver Hooker PhD. PR statistics
3/1 128 Brunswick Street Glasgow G1 1TF
+44 (0) 7966500340