Question

Job:Lead Engineer for Site Reliability and Automation - University of Chicago

6

Entering edit mode

8.8 years ago

Danielle ▴ 310

We're looking for a problem solver with a highly technical background to work closely with our development & system infrastructure teams to build out and refine the automation methods for our large-scale data intensive systems. You will join the team as the primary engineer leading this work, and soon have an opportunity to build out your group as we continue to grow. Elevate your career with this opportunity to work with a number of automation tools across the full stack and use the latest technologies. You will join a team of innovative engineers and intelligent research scientists who will keep you challenged in our demanding environment.

This role focuses on the Genomic Data Commons, which by its nature lies at the intersection of cutting edge research and production systems, both in terms of the bioinformatics and the computer science principles being utilized. The Genomic Data Commons is the one of the world's largest collection of harmonized cancer genomics data. Developing a deep technical and quantitative understanding of the system, software, and security architecture will be critical to success in this role.

You will focus on system availability, performance, and capacity monitoring, along with installation, configuration, and operations procedures. You will be given broadly defined goals and expected to work collaboratively across functional teams to determine best methods for achieving objectives. You will be expected to use quantitative models for understanding and improving the overall performance of the system. You will identify, establish, and manage proof of concept environments and report on design outcomes to inform rapid technology advancement.

Key responsibilities:

Automation Frameworks - Build out and maintain automation frameworks across systems, software, data management, and security aspects of a complex platform across on-premise and public cloud environments with a mix of best practices and custom solutions

Production Support - Triage, research, communicate, address production incidents

Production Monitoring - Wrangle disparate system monitoring assets and develop common analytics to inform optimization, define benchmarks and confidence intervals, and forecast to proactively mitigate production incidents

Build Monitoring - Troubleshoot source code management and deployment issues and participate in continuous delivery objectives

Security Automation - assist with the automation of our security and compliance procedures.

Technical Writing - Contribute written knowledge and expertise to system documentation, security documentation, scientific manuscripts, reporting, grant proposals and reports, and presentation materials.

Stay abreast of broad technical knowledge of existing and emerging technologies, including public cloud offerings from Amazon Web Services, Microsoft Azure, and Google Cloud.

Qualifications

Required

Master's degree in computer science, mathematics, statistics, engineering, or a quantitative field required. Master's degree in computer science, mathematics, statistics, engineering, or a quantitative field required.
Minimum of two (2) years experience in designing and developing infrastructure, configuration and/or deployment automation at large scale and high complexity required. Hands-on scripting experience (Bash, Python, or other dynamic language) required.
Unix/Linux programming or system administration experience required.

Preferred

PhD in computer science, mathematics, statistics, engineering, or a relevant quantitative field preferred.
Experience with AWS (EC2/S3/Glacier) preferred.
Internal cloud (OpenStack) experience preferred.
Experience with configuration management utility (Chef, Puppet, Ansible) preferred.
Experience with F5 or other load balancing technologies preferred.
Experience with source control and build systems (SVN, Git, Jenkins, etc) preferred.
Experience with Virtualization (Hyper-V, Docker) preferred.
Experience with log aggregation tools (ELK stack, Splunk) preferred.
Experience with security frameworks (FISMA, NIST, FIPS) preferred.
Experience leading in an agile environment preferred.
Experience overseeing and/or mentoring early career engineers preferred.

About the Genomic Data Commons The Genomic Data Commons (GDC) is a comprehensive computational facility to centralize and harmonize cancer genomic data generated from NCI-funded programs. The GDC is the foundation for a genomic precision medicine platform and will enable the development of a knowledge system for cancer. The GDC will provide an open-source, scalable, modern informatics framework that uses community standards to make raw and processed genomic data broadly accessible. This will enable previously infeasible collaborative efforts between scientists.

About the Center for Data Intensive Science The Center for Data Intensive Science at the University of Chicago is developing the emerging field of data science with a focus on applications to problems in biology, medicine, and health care. Our vision is a world in which researchers have ready access to the data and tools required to make discoveries that lead to deeper understanding and improved quality of life. We democratize access, speed discovery, create new knowledge and foster innovation through implementation using data at scale. Our scientific data clouds and commons include the Genomic Data Commons, Bionimbus Protected Data Cloud, and Open Science Data Cloud.

Apply under Requisition#101252 at jobopportunities.uchicago.edu

system-analytics system-automation devops • 2.5k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 8.8 years ago by Danielle ▴ 310