SRE & MLOps Engineer (Platform Reliability & AI Operations) presso Blue Machines
Blue Machines · Bengaluru, India · Hybrid
- Professional
- Ufficio in Bengaluru
About Blue Machines
Blue Machines powers large-scale, real-time Voice AI and Agentic Workflows across BFSI,
Healthcare, HRTech, and Global Enterprises.
Role: SRE & MLOps Engineer (2–5 Years Experience)
Location: Bangalore (Hybrid)
What You Will Own
1. Platform Uptime & Reliability
- Maintain 99.9%+ uptime.
- Monitor and optimize latency for voice agents.
2. Observability, Monitoring & Incident Response
- Build and maintain monitoring dashboards.
- Configure alerts; first responder for incidents.
3. MLOps & Model Provider Reliability
- Monitor STT/TTS/LLM providers.
- Manage failovers and latency SLAs.
4. Kubernetes & Infrastructure
- Manage GKE clusters, autoscaling, deployments.
5. Internal Platform Tooling
- Build automation around scaling, canaries, logs.
6. Security & Compliance
- Enforce encryption, network policies, audit support.
Requirements
You Are a Great Fit If You…
- 2–5 years SRE/DevOps/MLOps experience.
- Strong with Kubernetes, Prometheus, ELK, Redis, Pub/Sub.
- Understand streaming, SIP, WebSockets.
- Good communication and incident ownership.
Preferred Skills
- Experience with LLM pipelines, telephony, GPU, GCP.
Why Blue Machines
- Build India’s most advanced Voice AI platform.
- High-scale, low-latency engineering.
- Work with CTO’s office on reliability.
Candidarsi ora