- Professional
- Optionales Büro in Hyderabad
Your role
As a Senior SRE, you will be responsible for ensuring the availability, performance, and resilience of our platforms. You’ll collaborate with engineering, infrastructure, and security teams to build systems that are secure, observable, and automated, while championing a culture of operational excellence.
Key Responsibilities
• Design, implement, and maintain highly available and fault-tolerant systems in a financial environment.
• Define and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to ensure system reliability and customer satisfaction.
• Passionately identify, measure, and reduce TOIL, with a proactive approach to eliminating repetitive manual tasks through automation.
• Lead incident response, post-mortems, and root cause analysis for production issues.
• Collaborate with development teams to embed reliability into the software development lifecycle.
• Integrate with observability platforms (e.g., Prometheus, Grafana, ELK, Datadog) to ensure end-to-end visibility of systems and services.
City
Job Type
Country / State
Function Category
Join us
From gaining new experiences in different roles to acquiring fresh knowledge and skills, we know that great work is never done alone. We know that it's our people, with their unique backgrounds, skills, experience levels and interests, who drive our ongoing success. Together we’re more than ourselves. Ready to be part of #teamUBS and make an impact?
Disclaimer / Policy statements
Your team
We value transparency, shared responsibility, and continuous learning. You’ll work alongside talented engineers who are passionate about building reliable systems and solving complex problems.
Your expertise
✅ Essential Experience & Skills
• Proven expertise in Site Reliability Engineering, with a background in software engineering, infrastructure, or operations.
• Hands-on experience with cloud platforms (e.g. Azure), operating systems (e.g. Linux RHEL7+ ), and networking fundamentals.
• Solid understanding of networking and storage technologies (e.g. NFS, SAN, NAS).
• Strong working knowledge of authentication and naming services (e.g. DNS, LDAP, Kerberos, Centrify).
• Proficiency in scripting and automation (e.g., Python, Go, Bash).
• Practical experience with infrastructure as code tools (e.g., Terraform, Ansible).
• Demonstrated ability to define and manage SLIs, SLOs, SLAs, and to systematically reduce TOIL.
• Ability to integrate with observability platforms to ensure system visibility.
• A metrics- and automation-driven mindset, with a strong focus on measurable reliability.
• Calm under pressure, especially during incidents and outages, with a structured approach to incident response and post-mortems.
• Strong collaboration and communication skills, with the ability to work across engineering and business teams.
• A proactive, ownership-driven attitude, always seeking opportunities to improve systems and processes.
✨ Desirable Additions
• Experience with chaos engineering, resilience testing, or disaster recovery planning.
• Familiarity with financial transaction systems, real-time data pipelines, or core banking platforms.
• An understanding of CI/CD pipelines, containerization (AKS), and orchestration (Kubernetes).
About us
We have a presence in all major financial centers in more than 50 countries.