Platzhalter Bild

SRE & MLOps Engineer (Platform Reliability & AI Operations) presso Blue Machines

Blue Machines · Bengaluru, India · Hybrid

Candidarsi ora

About Blue Machines

Blue Machines powers large-scale, real-time Voice AI and Agentic Workflows across BFSI,

Healthcare, HRTech, and Global Enterprises.

Role: SRE & MLOps Engineer (2–5 Years Experience)

Location: Bangalore (Hybrid)

What You Will Own

1. Platform Uptime & Reliability

- Maintain 99.9%+ uptime.

- Monitor and optimize latency for voice agents.

2. Observability, Monitoring & Incident Response

- Build and maintain monitoring dashboards.

- Configure alerts; first responder for incidents.

3. MLOps & Model Provider Reliability

- Monitor STT/TTS/LLM providers.

- Manage failovers and latency SLAs.

4. Kubernetes & Infrastructure

- Manage GKE clusters, autoscaling, deployments.

5. Internal Platform Tooling

- Build automation around scaling, canaries, logs.

6. Security & Compliance

- Enforce encryption, network policies, audit support.

Requirements

You Are a Great Fit If You…

- 2–5 years SRE/DevOps/MLOps experience.

- Strong with Kubernetes, Prometheus, ELK, Redis, Pub/Sub.

- Understand streaming, SIP, WebSockets.

- Good communication and incident ownership.

Preferred Skills

- Experience with LLM pipelines, telephony, GPU, GCP.

Why Blue Machines

- Build India’s most advanced Voice AI platform.

- High-scale, low-latency engineering.

- Work with CTO’s office on reliability.

Candidarsi ora

Altri lavori