Senior Site Reliability Engineer - Observability (x/f/m) at Doctolib
Doctolib · Berlin, Germany · Remote
Your Impact
What you'll do
- Lead the observability strategy across the platform, with an emphasis on building scalable, developer-friendly logging and tracing capabilities
- Identify and lead large-scale cross-cutting reliability initiatives, including improvements to our incident detection, response, and postmortem analysis capabilities
- Take part in the on-call rotation, and actively contribute to improving our on-call experience by refining alerting, reducing noise, and ensuring actionable telemetry
Who you are
- Have a solid hands-on experience (3y+) on a large-scale production platform
- Have proven experience with cloud platforms such as AWS, Azure or Google Cloud
- Have solid understanding of containerization and orchestration technologies (Docker and Kubernetes)
- Have a strong understanding of Helm for managing Kubernetes manifests and ArgoCD for GitOps workflows
- Have deep expertise in observability tooling and architecture, such as:
- Logging: Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Logstash, Vector
- Tracing: OpenTelemetry or proprietary APMs
- Metrics: Prometheus, Thanos, Datadog, or equivalent
- Have proficiency in at least one programming language (Ruby, Python, Go, Java, etc.) and a deep understanding of infrastructure as code principles
- Have experience with monitoring and observability tools
- Like troubleshooting performance issues in complex environments
- Are fluent in English
- Have experience contributing to open-source observability projects
- Have worked in a high-growth tech environment
- Are passionate about developer experience and platform engineering
Life at Doctolib Tech
- Our solutions are built on a single fully cloud-native platform that supports web and mobile app interfaces, multiple languages, and is adapted to country and healthcare specialty requirements.
- Our stack is composed of Rails, TypeScript, Java, Python, Kotlin, Swift, and React Native.
- We leverage AI ethically across our products to empower patients and health professionals. Discover our AI vision here.
What we offer
- A Deutschlandticket (Germany-wide public transport pass) fully paid for by Doctolib
- 28 vacation days + 1 additional day for each full calendar year of employment (up to a maximum of 30 days)
- Work from abroad for up to 10 days per year thanks to our flexibility days policy
- Company health insurance with great supplementary benefits through our partner Allianz
- Company pension scheme (bAV) through Allianz with an employer subsidy of 40% (15% within the probationary period)
- The Doctolib Parent Care program, which includes one month additional parental leave and much more
- Enrollment in Doctolib's long-term employee value sharing plan called DoctoGrowth
- Free mental health and coaching services through our partner Moka.care
- Subsidized sports membership through our partner Urban Sports Club
- A flexible workplace policy offering both hybrid and office-based mode
- Alongside healthy snacks and our regular breakfast buffet, we provide a subsidized meal benefit
- For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
- Relocation support in case of international mobility
- Access to the best AI tools for coding, development and dedicated training
Our interview process
- Recruiter Interview
- Technical SRE Interview
- System Design Interview
- Behavioral Interview
- At least one reference check
Job details
- Permanent position
- Tech stack: Kubernetes, Prometheus, OpenTelemetry, Loki, ArgoCD, Ruby, Python, Go
- Full-time
- Berlin, Germany
- Hybrid work setup (up to 2 remote days per week)
- Start date: as soon as possible