Platzhalter Bild

Analyst, Site Reliability Engineer, Site Reliability Engineering en None

None · Hyderabad, Indien · Onsite

Solicitar ahora

Business Function:

Group Technology and Operations (T&O) enable and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.

Responsibilities:

·       24/7 System Monitoring & Recovery: Perform continuous system monitoring and execute Standard Operating Procedures (SOPs) for incident detection and recovery to maintain high availability, participating in an on-call rotation to provide 24/7 operational support.

·       Site Reliability Engineering (SRE): Champion and integrate SRE principles into our operational practices and system designs.

·       Service Lifecycle Management: Oversee the deployment, ongoing support, and monitoring of new and existing services, platforms, and application stacks.

·       Incident Reduction & Performance: Proactively improve system monitoring and alerting mechanisms to significantly reduce incident resolution times.

·       Release & Deployment Management: Manage the end-to-end patching, release, and deployment functions, ensuring seamless delivery.

·       Environmental Optimization: Collaborate with engineering and application development teams to enhance system performance through strategic environment upgrades and continuous improvements.

·       Review and provide the continues feedback on the automation area for the change and release area.

Requirements:

·       Experience in managing VPC, OpenShift, Kubernetes, Docker, RHEL.

·       At least 2 years of experience of general on DevOps CI-CD tools and managements.

·       Can work and lead under dynamic change environment 24/7 support and have the right attitude to learn and implement.

·       Solid experience in container image deploy and release management with OpenShift and Kubernetes.

·       Must have strong automation and scripting skills – proficiency in shell, groovy & python.

·       Good knowledge on monitoring tools – Prometheus, Grafana and ELK

·       Background in large-scale system administration and familiarity with SRE principles and Release Engineering

·       Have advanced Linux System Administrator skills and advanced configuration management systems skills.

·       In-depth knowledge in infrastructure areas such as virtual server technologies, networking, firewall, internet protocols.

Solicitar ahora

Otros empleos