Sema4, a health information company, is seeking talented, self-motivated individuals to participate in leading edge work in big data analysis and with clinical diagnostics in translational bioinformatics as members of Bioinformatics R&D department. Successful applicants will be part of an interdisciplinary team that develops computational databases and methods to annotate and interpret large-scale human genome and exome sequencing data to better understand cancer mutations and the genetics of Mendelian and complex diseases. Successful applicants will also play a role in developing systems for integrating novel informatics and genomic tools and methodologies into clinical practice.
· Build and maintain comprehensive variant databases from a wide variety of public repositories.
· Identify new data sources and databases from literatures
· Build and maintain a comprehensive variant store from over 100,000 exomes.
· Assist bioinformatics scientists to integrate different types of genetic, functional, and clinical data to discover causal variants and genes for cardiovascular diseases, Alzheimer’s disease, cancer, and other genetic diseases.
· Must have strong genomic research background
· Extensive experience with RDBMS, SQL programming (especially schema design), and ETL processes.
· Strong coding proficiency in Python, R, and Perl programming languages in a Linux environment.
· Hands-on experience building biomedical databases from public repositories, such as Uniprot, dbSNP, Medline, GTEx, 1000 Genomes, UK10K, Clinvar, COSMIC.
· Domain knowledge in genetics and genomics, especially data representation and conventions for exchanging information about genetic variants.
· Hands-on experience working with NGS and genotyping tools and data/file formats, especially VCF.
· 2 years post-graduate experience in above categories.
· Experience with Hadoop (Impala/Parquet, Spark/Shark) and programming in Java/Scala.
· Experience with clinical genetic test is a plus.
· Developing codebases using distributed version control tools (especially Git or Mercurial) and software issue tracking systems (especially JIRA).
· Deploying jobs/pipelines on a high-performance Linux computing cluster.
Location - Stamford, CT Contact: Christine.email@example.com