Product Manager (Lighthouse) at FluidStack
FluidStack · San Francisco, United States Of America · Onsite
- Professional
- Office in San Francisco
About FluidStack
Fluidstack is the AI Cloud Platform. We build GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.
Our team is small, highly motivated, and focused on providing a world class supercomputing experience. We put our customers first in everything we do, working hard to not just win the sale, but to win repeated business and customer referrals.
We hold ourselves and each other to high standards. We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.
You must work hard, take ownership from inception to delivery, and approach every problem with an open mind and a positive attitude. We value effectiveness, competence, and a growth mindset.
About the Role
We're looking for a Product Manager to lead Lighthouse, our MLOps and observability platform. You'll own the complete product lifecycle—from strategy and roadmap to execution and customer success.
You will work directly with our engineering and infrastructure teams as well as collaborate closely with customers to ensure that we're providing ML developers the metrics that matter. You will have the opportunity to partner with top tier AI labs to increase their utilization and performance as well as scale our infrastructure to hundreds of thousands of GPUs.
Focus
- Building and executing on the roadmap for Lighthouse. 
- Partner with engineering to translate customer requirements into technical specifications and guide implementation. 
- Creating alerting rules for GPU cluster health, job failures, and resource bottlenecks 
- Designing dashboards for ML-specific KPIs (training loss curves, inference latency, batch processing metrics) 
- Collaborate with sales and customer success teams to drive adoption, gather feedback, and ensure customer satisfaction. 
- Engage directly with AI labs and enterprises to understand their observability challenges and shape the product roadmap accordingly. 
About You
- 3-5+ years of experience building developer tools or cloud infrastructure, ideally in the observability space. 
- Deeply experienced with the LGTM stack, Alertmanager, or proprietary observability tools like Datadog, etc. 
- Have an understanding of the metrics that matter to an AI/ML customer, including infrastructure availability, performance, and utilization, as well as application level metrics like MFU. 
- Understanding of GPU monitoring tools (DCGM, nvidia-smi, GPU exporters for Prometheus). 
- Knowledge of Infrastructure-as-Code (IaC) tools (e.g. Terraform, Pulumi) to standardize and simplify the deployment of the observability stack. 
- Comfortable writing SQL queries. 
- Understanding of SLA, SLO, frameworks and error budget management. 
- Experience with ML-specific monitoring tools (Weights & Biases, ClearML, etc.). 
Benefits
- Competitive total compensation package (salary + equity). 
- Retirement or pension plan, in line with local norms. 
- Health, dental, and vision insurance. 
- Generous PTO policy, in line with local norms. 
 
			 
			 
			 
			