Question

Job:Genomic Database Engineer, sema4genomics, Stamford, CT, USA

0

Entering edit mode

6.5 years ago

christine.fulton • 0

Sema4, a health information company, is seeking talented, self-motivated individuals to participate in leading edge work in big data analysis and with clinical diagnostics in translational bioinformatics as members of Bioinformatics R&D department. Successful applicants will be part of an interdisciplinary team that develops computational databases and methods to annotate and interpret large-scale human genome and exome sequencing data to better understand cancer mutations and the genetics of Mendelian and complex diseases. Successful applicants will also play a role in developing systems for integrating novel informatics and genomic tools and methodologies into clinical practice.

Responsibilities:

Build and maintain comprehensive variant databases from a wide variety of public repositories.
Identify new data sources and databases from literatures
Build and maintain a comprehensive variant store from over 100,000 exomes.
Assist bioinformatics scientists to integrate different types of genetic, functional, and clinical data to discover causal variants and genes for cardiovascular diseases, Alzheimer's disease, cancer, and other genetic diseases.

Requirements:

Must have strong genomic research background
Extensive experience with RDBMS, SQL programming (especially schema design), and ETL processes.
Strong coding proficiency in Python, R, and Perl programming languages in a Linux environment.
Hands-on experience building biomedical databases from public repositories, such as Uniprot, dbSNP, Medline, GTEx, 1000 Genomes, UK10K, Clinvar, COSMIC.
Domain knowledge in genetics and genomics, especially data representation and conventions for exchanging information about genetic variants.
Hands-on experience working with NGS and genotyping tools and data/file formats, especially VCF.
2 years post-graduate experience in above categories.

Desirable experience:

Experience with Hadoop (Impala/Parquet, Spark/Shark) and programming in Java/Scala.
Experience with clinical genetic test is a plus.
Developing codebases using distributed version control tools (especially Git or Mercurial) and software issue tracking systems (especially JIRA).
Deploying jobs/pipelines on a high-performance Linux computing cluster.

Location - Stamford, CT

Contact: Christine.fulton@sema4genomics.com

Database Big-Data-Analytics Genomics Cancer • 2.4k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 6.5 years ago by christine.fulton • 0