Platzhalter Bild

Software Engineer Intern, Machine Learning bei DatologyAI

DatologyAI · Redwood City, Vereinigte Staaten Von Amerika · Hybrid

Jetzt bewerben

About the Company

Companies want to train their own large models on their own data. The current industry standard is to train on a random sample of your data, which is inefficient at best and actively harmful to model quality at worst. There is compelling research showing that smarter data selection can train better models faster—we know because we did much of this research. Given the high costs of training, this presents a huge market opportunity. We founded DatologyAI to translate this research into tools that enable enterprise customers to identify the right data on which to train, resulting in better models for cheaper. Our team has pioneered deep learning data research, built startups, and created tools for enterprise ML. For more details, check out our recent blog posts sharing our high-level results for text models and image-text models.

We've raised over $57M in funding from top investors like Radical Ventures, Amplify Partners, Felicis, Microsoft, Amazon, and notable angels like Jeff Dean, Geoff Hinton, Yann LeCun and Elad Gil. We're rapidly scaling our team and computing resources to revolutionize data curation across modalities.

This role is based in Redwood City, CA. We are in office 4 days a week.

About the Role

We’re looking for a Software Engineer Intern to join our Machine Learning team at DatologyAI. You’ll work alongside experienced engineers and researchers to design, prototype, and deploy the ML systems that power our data curation platform. Our work sits at the intersection of large-scale infrastructure, machine learning research, and data quality — building systems that make model training faster, cheaper, and smarter.

As an intern, you’ll take on impactful projects that touch real production workflows, from scaling data pipelines to implementing experimental research ideas. You’ll gain hands-on experience building ML systems in a fast-paced, deeply technical environment while learning how world-class teams approach AI infrastructure.

What You'll Work On

  • Build and improve components of our ML training and data curation pipelines

  • Prototype and evaluate algorithms that identify informative data samples at scale

  • Work with researchers to bring new data selection and model evaluation techniques into production

  • Contribute to reliable and efficient distributed ML systems

  • Learn how to take an idea from research to real-world deployment

About You

  • Pursuing a BS, MS, or PhD in Computer Science, Electrical Engineering, or a related field

  • Strong programming skills in Python and familiarity with libraries like PyTorch, TensorFlow, or JAX

  • Solid understanding of data structures, algorithms, and ML fundamentals

  • Curious about large-scale training systems, data curation, and the infrastructure behind AI models

  • Eager to learn from experienced engineers and contribute to production-quality code

  • Collaborative, detail-oriented, and driven by curiosity

Compensation

This is a paid internship with a standard monthly stipend. If you are not currently located in the Bay Area, we provide a relocation stipend to help cover travel and living expenses during your three months on-site.

We offer a comprehensive benefits package to support our employees' well-being and professional growth:

  • 100% covered health benefits (medical, vision, and dental).

  • 401(k) plan with a generous 4% company match.

  • Unlimited PTO policy

  • Annual $2,000 wellness stipend.

  • Annual $1,000 learning and development stipend.

  • Daily lunches and snacks are provided in our office!

  • Relocation assistance for employees moving to the Bay Area.

Jetzt bewerben

Weitere Jobs