Site Reliability Engineer en Albatross

Albatross · Alemania · Remote

2026-01-24 05:53:42.0

Kubernetes Container Platforms

DevOps Automation Tools

Description

Location
Remote, right to work and travel in Europe.

Albatross
At Albatross, we’re building the second pillar of AI: a perception layer that understands how users actually experience content, in real time. Trained on live user interactions, Albatross learns and reasons on the fly. Our technology powers real-time, in-session discovery by adapting to evolving user interests, in real-time. We have raised significant funding and our platform already operates at scale, with billions of events being processed and hundreds of millions of predictions served.

The Role
We’re looking for a Site Reliability Engineer to own the reliability and observability of our platform. This is a hands-on leadership role where you’ll design, build, and maintain our observability stack, lead incident response, oversee releases, and establish the processes and standards that allow the team to ship quickly and confidently. More specifically you will:

Observability & Monitoring:
Own and evolve our observability stack (Prometheus, Grafana, Loki, Jaeger), including dashboards, alerts, and SLOs.
Instrument services for meaningful metrics and tracing, reducing noise and improving signal.
Reliability & Incident Response:
Lead incident response and establish blameless postmortems, runbooks, and automated remediation.
Define, track, and improve SLIs/SLOs to proactively reduce reliability risk.
Release Management:
Own the release process end-to-end, improving deployment speed, safety, and recovery.
Implement progressive rollouts, feature flags, and rollback strategies.
Platform & Tooling:
Embed observability into the development lifecycle in close collaboration with engineering. Maintain and evolve our Kubernetes-based platform, adopting new tools when they add real value.

Requirements

5–7+ years in SRE, platform engineering, DevOps, or similar roles.
Strong production experience with Kubernetes and modern observability stacks (Prometheus, Grafana, Loki, Jaeger/OpenTelemetry).
Proven track record leading incident response and building monitoring systems teams actually use.
Deep distributed systems knowledge and production debugging experience.
Pragmatic approach to tooling and alerting that teams trust.
Clear communicator across engineering, product, and leadership.
STEM degree (Computer Science, Engineering, Mathematics, or similar).
Plus: contributions to open-source observability projects and background in high-scale or high-availability environments.

Benefits

Remote-first, async-friendly culture.
Ownership and autonomy, you'll shape how we do reliability.
A team that cares about building things right.

Solicitar ahora

Los últimos trabajos de oficina en casa
Semanalmente por correo electrónico.

Max. 4 Trabajos

No compartimos su dirección de correo electrónico. Aquí puede encontrar nuestros política de privacidad.

Site Reliability Engineer en Albatross

Description

Requirements

Benefits

Prestaciones adicionales

Otros empleos

Plutus Developer

Technical Consultant - Financial Services m/f/d

Account Executive (DACH, SMB Market)

Buscar trabajo

Menú

Elige idioma

Iniciar sesión

Configuración de cookies

Configuración de cookies

Cookies orientadas al público objetivo

Utilizamos cookies

Site Reliability Engineer en Albatross

Description

Requirements

Benefits

Prestaciones adicionales

Otros empleos

Plutus Developer

Technical Consultant - Financial Services m/f/d

Account Executive (DACH, SMB Market)

Buscar trabajo

Los últimos trabajos de oficina en casa Semanalmente por correo electrónico.

Menú

Elige idioma

Iniciar sesión

Configuración de cookies

Configuración de cookies

Cookies orientadas al público objetivo

Utilizamos cookies

Los últimos trabajos de oficina en casa
Semanalmente por correo electrónico.