Machine Learning Pipeline Engineer (Nextflow + Omics) Location: Remote (U.S. only)
About PreOncology PreOncology is building next-generation cancer risk models that integrate clinical, genetic, and longitudinal data to enable earlier detection and prevention. We are seeking a Machine Learning Pipeline Engineer to build scalable workflows and deploy ML models that make a real clinical impact.
The Role You will design and optimize Nextflow pipelines for large-scale genomics and risk-modeling workflows, and integrate machine learning and deep-learning models into production-ready systems. The role is fully remote and ideal for someone who enjoys bridging bioinformatics, data engineering, and ML implementation.
What You’ll Do • Build and maintain Nextflow pipelines for large-scale genomics and ML workflows. • Train, tune, and validate ML models (Cox, DeepSurv, RSF, gradient boosting, CNNs). • Engineer genomic and longitudinal features (PRS, rare variants, trajectories). • Run workflows on cloud platforms (AWS preferred). • Deploy reproducible pipelines with Docker or Singularity.
Must-Haves • 2+ years building production pipelines in Nextflow. • Strong Python skills for data processing and ML integration. • Experience with omics data (ideally cancer). • Proven work training and validating ML models. • Authorized to work in the U.S. now and in the future (no sponsorship available).
How to Apply Email your resume to Luke.Stetson@preoncology.com and include brief answers (1–2 sentences each) to the following:
- The largest Nextflow pipeline you have built.
- Your omics experience.
- The machine learning or deep learning models you have trained and how they were applied.