Duke University's Center for Genomic and Computational Biology (GCB) is seeking a Scientific Applications Programmer to design, create and deploy informatics solutions, including front-end interfaces, applications, databases, and other informatics tools that enable the Center’s scientists to work effectively with the diverse and massive amounts of data generated or utilized by its research, education, and core facility operations.
GCB was launched in July 2014 with the mission to foster innovative multi-disciplinary data-intensive research and education in the fields of genomic and computational biology. The Center employs state-of-the-art high-performance computing, networking, and storage infrastructure at a massive scale, and its faculty collaborates in teams that integrate approaches from different disciplines to solve complex problems not easily addressable within traditional departments.
The incumbent will design, create, and enhance front-end interfaces, workflows, databases, and other informatics tools to enable more efficient and reproducible capture, tracking, moving, distribution, sharing, integration, and analysis of a variety of genomic, genetic, phenotypic, clinical and related data ranging in volume from small to tens of Terabytes. The incumbent will frequently collaborate with others in the GCB Informatics team, and with Center scientists and Core Facility staff. The Center’s Informatics group is actively involved in initiatives to promote data and software skills as well as best practices for more productive and more reproducible computational genomics science. The incumbent will have opportunities to put these to practice in collaboration with researchers from GCB’s labs and core facilities. The incumbent will also participate in identifying, evaluating, and recommending new and emerging technologies to continually improve the data management, integration, querying, and analysis capabilities of the Center.
Specific responsibilities and activities include the following.
- Work with Center scientists and Core Facility staff to identify, document, and refine requirements for working with genomic and other data effectively, scalably, and reproducibly.
- Design and create front-ends to tools, APIs, and compute infrastructure that are highly usable and that empower end-users to increasingly self-service their compute, data distribution, and sharing needs.
- Create tools that enable scalable and reproducible management, tracking, distribution, sharing, analysis, and archival of data, including tools that efficiently move data between data stores and high-performance computing environments.
- Design and implement data models, and deploy corresponding data stores that best meet users’ needs, including relational, key/value, document, and graph data stores.
- Participate in emerging technology and best practice evaluation, recommendation, and adoption projects for improving the data informatics capabilities of the Center. Identify and recommend candidate technologies.
The position reports to the Center’s Director of Informatics. GCB’s Informatics group has a strong commitment to open-source and open science (see our Github organization for ongoing projects), and software developed by the group will be released as open-source wherever possible.
We are looking for someone who is passionate about applying their software engineering skills to empower scientists to do more and better science, who derives energy from working in a team, and who is curious to explore, acquire, and share new skills and information science technology know-how. Specific qualifications include the following:
- Demonstrated ability to gather requirements from users and to translate these into technical software requirements and specifications.
- Experience with and strong knowledge of programming data-centered tools, interfaces, and workflows in languages frequently used in scientific computing, ideally in Python, Perl, Ruby, or Scala.
- Knowledge of implementing and querying relational data stores, in particular for PostgreSQL (or MySQL, Oracle).
- Experience with developing and deploying software tools for Unix, in particular Linux.
- Demonstrated ability to work independently as well as in teams, and to collaborate and communicate effectively with diverse groups of people ranging from technical IT staff to academic researchers and students.
In addition to the above, some combination of the following is desirable:
- B.S. degree in a Information Science, Computer Science, Information Technology, Bioinformatics, or related field, and at least 3 years of relevant professional experience.
- Experience with NoSQL data stores (such as CouchDB, MongoDB or Redis), graph databases, or RDF triple stores (such as Neo4J, OpenLink Virtuoso, or Blazegraph®).
- Experience in developing tools or data stores for biological big data, in particular genomic, genetic, next-generation sequencing, and other large-volume data.
- Experience with implementing data management, processing, and analysis workflows and tools for massively parallel or distributed execution on high-performance computational infrastructure.
- Experience contributing to open-source and collaborative software projects, and to working with distributed version control (in particular Git).
How to Apply: Please submit your application at http://www.hr.duke.edu/jobs/, requisition# 400959866. For inquiries about the position please contact Hilmar Lapp, Director of Informatics (GCB), at email@example.com.