Machine Learning Scientist - Medicinal Chemistry & Lead Optimization bei Flagship Pioneering, Inc.
Flagship Pioneering, Inc. · Cambridge, Vereinigte Staaten Von Amerika · Onsite
- Senior
- Optionales Büro in Cambridge
🚀 About Lila
Lila Sciences is the world’s first scientific superintelligence platform and autonomous lab for life, chemistry, and materials science. We are pioneering a new age of boundless discovery by building the capabilities to apply AI to every aspect of the scientific method. We are introducing scientific superintelligence to solve humankind's greatest challenges, enabling scientists to bring forth solutions in human health, climate, and sustainability at a pace and scale never experienced before. Learn more about this mission at www.lila.ai
If this sounds like an environment you’d love to work in, even if you only have some of the experience listed below, we encourage you to apply.
🌟 Your Impact at Lila
Join our Drug Discovery group to build and deploy ligand-based AI that turns noisy, real-world assay data into decisive design guidance for hit-to-lead and lead optimization. You’ll create QSAR models, retrosynthesis-aware generative design tools, and active-learning loops that partner with medicinal chemists to deliver better compounds, faster. This role complements our structure-based docking team by focusing on assay-driven, synthesis-constrained optimization—even when structures are uncertain or unavailable—ultimately accelerating DMTA cycles and improving candidate quality.
🛠️ What You'll Be Building
- Ligand-based QSAR modeling: Develop multi-task and transfer-learned models for potency, selectivity, and developability (e.g., solubility, permeability, clearance, CYP/hERG, safety liabilities) using graph/message-passing, and conformer-aware features. Handle activity cliffs, applicability domain, and calibration for robust decision-making.
- Assay-driven hit triage and prioritization: Build models that learn from HTS, DEL, and follow-up assays; robust curve-fitting (4PL/5PL), plate/batch effect correction, dose–response QC, and time-split/scaffold-split evaluations to ensure prospective reliability.
- Closed-loop DMTA and MPO: Create active learning and Bayesian optimization strategies to propose the next best analogs under multi-parameter objectives (potency, selectivity, exposure, safety, IP). Incorporate uncertainty, diversity, and experimental cost to maximize information gain per cycle.
- Synthesis-aware design and retrosynthesis: Integrate template-based and template-free retrosynthesis with reaction prediction, condition and yield modeling, building-block availability, and cost/time/risk scoring. Make design suggestions that are directly makeable and prioritize routes compatible with internal/partner capabilities.
- Generative and enumerative libraries: Build BRICS/RECAP/fragment-linking enumerations and property-conditioned generative models (diffusion/RL/flow) that respect synthetic constraints and matched molecular pair (MMP) rules for local SAR exploration and scaffold hopping.
- SAR mining and explainability: Automate MMP analysis, local SAR maps, and substructure attributions to surface chemist-actionable insights; link assay deltas to specific modifications and highlight potential bioisosteres and de-risking moves.
- Data foundations: Establish cheminformatics pipelines for standardization (tautomer/salt/charge), deduplication, structure normalization, and assay/ELN/LIMS ingestion; define ontologies and metadata for traceability and reproducibility.
- Rigorous evaluation and deployment: Design leakage-safe splits (scaffold, temporal, series-aware), conformal prediction for calibrated decisions, and prospective tests. Ship APIs and tools that integrate with medchem workflows, procurement, and automated synthesis.
- Cross-functional partnership: Work closely with medicinal chemists, DMPK, biology, and automation to translate TPPs into modeling objectives and to operationalize model recommendations in real make–test cycles. Collaborate with structure-based colleagues to fuse physics- and assay-derived signals where beneficial.
🧰 What You’ll Need to Succeed
- Strong proficiency in Python and modern ML (PyTorch/JAX/TF, scikit-learn, XGBoost/CatBoost), with experience training at scale and deploying end-to-end pipelines.
- Deep experience in ligand-based modeling (QSAR/QSPR, multi-task learning, uncertainty and applicability domain, calibration) and ADMET prediction for medicinal chemistry.
- Solid grasp of medicinal chemistry principles: SAR development, bioisosteres, property tuning (pKa/logD/PSA), selectivity design, and liability mitigation (CYP, hERG, reactivity, permeability, solubility).
- Cheminformatics and data tooling: RDKit, Chemprop/DeepChem, conformer generation, fingerprints/descriptors, ELN/LIMS integration, and assay data processing/curve-fitting.
- Retrosynthesis and synthesis planning: Familiarity with template-based/template-free methods, route scoring, reaction/yield/condition prediction, building block catalogs, and makeability constraints.
- Active learning and design-of-experiments: Bayesian optimization, diversity sampling, and portfolio-aware selection under experimental and synthesis budgets.
- Ability to design rigorous, leakage-controlled benchmarks and prospective validations; experience with scaffold/time splits and activity-cliff-aware evaluation.
- Strong self-starter with excellent attention to detail and clear communication; able to collaborate tightly with chemists and biologists.
- Demonstrated industry experience or academic achievement.
✨ Bonus Points For
- PhD in Chemoinformatics, Medicinal Chemistry, Computational Chemistry, Computer Science, or related field with a strong publication record in ML/drug discovery venues.
- Experience building synthesis-aware generative models and integrating retrosynthesis into design loops; familiarity with tools like ASKCOS/AiZynth-style planners or equivalent.
- Track record improving DMTA cycle time and MPO outcomes in live programs; integration with procurement and automated synthesis platforms.
- Expertise with MMPA, activity-cliff handling, conformal prediction, and applicability-domain diagnostics in production.
- Experience triaging HTS/DEL data, PAINS/aggregator/covalent liability filters, and off-target/polypharmacology prediction.
- MLOps for cheminformatics: data versioning, experiment tracking, model serving/monitoring, and cloud/HPC scaling.
🌈 We’re All In
Lila Sciences is committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.
🤝 A Note to Agencies
Lila Sciences does not accept unsolicited resumes from any source other than candidates. The submission of unsolicited resumes by recruitment or staffing agencies to Lila Sciences or its employees is strictly prohibited unless contacted directly by Lila Science’s internal Talent Acquisition team. Any resume submitted by an agency in the absence of a signed agreement will automatically become the property of Lila Sciences, and Lila Sciences will not owe any referral or other fees with respect thereto.
Jetzt bewerben