Analyst, Site Reliability Engineer, Site Reliability Engineering bei None
None · Hyderabad, Indien · Onsite
- Professional
- Optionales Büro in Hyderabad
Business Function:
Group Technology and Operations (T&O) enable and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.
Responsibilities:
· 24/7 System Monitoring & Recovery: Perform continuous system monitoring and execute Standard Operating Procedures (SOPs) for incident detection and recovery to maintain high availability, participating in an on-call rotation to provide 24/7 operational support.
· Site Reliability Engineering (SRE): Champion and integrate SRE principles into our operational practices and system designs.
· Service Lifecycle Management: Oversee the deployment, ongoing support, and monitoring of new and existing services, platforms, and application stacks.
· Incident Reduction & Performance: Proactively improve system monitoring and alerting mechanisms to significantly reduce incident resolution times.
· Release & Deployment Management: Manage the end-to-end patching, release, and deployment functions, ensuring seamless delivery.
· Environmental Optimization: Collaborate with engineering and application development teams to enhance system performance through strategic environment upgrades and continuous improvements.
· Review and provide the continues feedback on the automation area for the change and release area.
Requirements:
· Experience in managing VPC, OpenShift, Kubernetes, Docker, RHEL.
· At least 2 years of experience of general on DevOps CI-CD tools and managements.
· Can work and lead under dynamic change environment 24/7 support and have the right attitude to learn and implement.
· Solid experience in container image deploy and release management with OpenShift and Kubernetes.
· Must have strong automation and scripting skills – proficiency in shell, groovy & python.
· Good knowledge on monitoring tools – Prometheus, Grafana and ELK
· Background in large-scale system administration and familiarity with SRE principles and Release Engineering
· Have advanced Linux System Administrator skills and advanced configuration management systems skills.
· In-depth knowledge in infrastructure areas such as virtual server technologies, networking, firewall, internet protocols.
Jetzt bewerben