ML Infrastructure Engineer at Gridmatic

Gridmatic · Cupertino, United States Of America · Hybrid

2025-10-10 14:00:00.0

$174,000.00 - $231,000.00

SQL

Kubernetes Container Platforms

Microsoft Excel

Software Engineer

Professional
Office in Cupertino

Apply Now

The Company

Gridmatic Inc. is a high-growth startup with offices in the Bay Area and Houston that is accelerating the clean energy transition by applying our expertise in data, machine learning, and energy to power markets. We are the rare startup that has multiple years of profitability without raising venture capital. At Gridmatic, we foster a collaborative and inclusive culture where learning and growth are constant. We move quickly, solve problems with integrity, and balance environmental responsibility with data-driven excellence.

We are looking for a Machine Learning Infrastructure Engineer to accelerate the decarbonization of the electricity system by building and optimizing the backbone of our ML platform. The ideal candidate will have solid expertise in machine learning, distributed systems and GPU-based training, and will design scalable, high-performance infrastructure for training, inference, and evaluation. They will push the boundaries of throughput and efficiency on large-scale time-series and weather datasets, while shaping the long-term vision of our ML platform and generalizing solutions for broader use. A successful candidate will thrive on continuous learning across engineering, ML systems, and energy markets, while contributing to a collaborative, mission-driven team. The ideal candidate must have strong deep learning fundamentals in addition to strong software engineering skills.

Own a significant piece of our ML platform while rapidly building and iterating scalable, robust distributed infrastructure for ML training, inference, and evaluation on large-scale time-series and weather datasets.

Optimize throughput and cost by supporting model training and deployment across multiple clusters and clouds.

Improve the efficiency of machine learning models and other workloads by optimizing latency, throughput, and memory consumption. This involves pushing the boundaries of current hardware capabilities through techniques like GPU performance engineering.

Help define the long-term vision for Gridmatic’s ML platform.

Play a key role in mentoring junior engineers and interns, contributing to a collaborative, innovative, and growth-oriented team culture.

You might be a good fit if you are:

A strong engineer with 3+ years of experience who is committed to technical excellence. You possess a deep understanding of the codebases you work in and write readable, scalable code.

Experienced in researching and implementing deep learning models.

Experienced in distributed training and inference of large models on GPU clusters, utilizing core libraries and frameworks such as PyTorch, PyTorch Lightning, and Ray.

Comfortable with large-scale data storage infrastructure and formats, e.g. Zarr, SQL, and feature stores

A self-starter with a strong sense of independence and ownership, and the capability to engineer large, robust systems from the initial design and conceptualization to productionization.

A mission-driven individual who is enthusiastic about working toward a renewable grid and diving into the intersection of ML and energy. No prior energy experience required, but curiosity and a willingness to learn are must-haves!

Nice to haves:

End to end proficiency in building, maintaining, and debugging cluster infrastructure, utilizing Kubernetes and Terraform.

Expertise in identifying performance bottlenecks and designing and writing high-performance code for large-scale ML workloads.

Experience with at least one of: torch.profiler, TorchDynamo, TorchInductor, Triton, or other deep learning compiler stacks.

Knowledge of cluster communication protocols such as nccl or gloo

Experience working with any of the following: weather data, energy systems, time-series forecasting, electricity markets, or financial trading.

#LI-DNI

Join our team and make a difference! Click below or email us at [email protected].

Apply Now

ML Infrastructure Engineer at Gridmatic

Additional benefits

Other home office and work from home jobs

Process Engineer

Senior Analyst, Content Marketing

Senior Specialist, Compensation

Search job

Menu

Choose a language

Sign in

Cookie Settings

Cookie Settings

Target group oriented cookies

We use cookies

ML Infrastructure Engineer at Gridmatic

Additional benefits

Other home office and work from home jobs

Process Engineer

Senior Analyst, Content Marketing

Senior Specialist, Compensation

Search job

The latest home office jobs weekly by email.

Menu

Choose a language

Sign in

Cookie Settings

Cookie Settings

Target group oriented cookies

We use cookies

The latest home office jobs
weekly by email.