- Senior
 - Escritório em Hyderabad
 
We are looking for a self-driven, software engineering mindset SRE engineer to
- Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes
 - Apply pre-emptive approach into production minimizing business impact, via SRE-driven orchestration of connecting all components of the ecosystem diagnosing anomalies prior to user & remediating through automation,
 
This is a critical enabler achieving a high resiliency during operations and also continuously improving through design during the software development lifecycle.
The Lead SRE design & support engineer is integral part of the global team with its main purpose to provide a delightful customer experience for the user of the global consumer, commercial, supply chain and enablement functions in the PepsiCo digital products application portfolio of 260+ applications, enabling a full SRE Practice incident prevention / proactive resolution model.
The scope of this role is focussed on the cloud architecture application full stack devlopment, B2B pepsiconnect and Direct to Customer and other S&T roadmap applications.
Ensures that PepsiCo DPA applications service performance, reliability and availability expected by our customers and internal groups
It requires a blend of technical expertise on SRE tools, modern applications cloud architecture i.e. full stack, IT operations experience, and analytics & influence skills.
Responsibilities:- Ensure ecosystem availability and performance in production environments, Pro-actively preventing P1, P2, potential P3s.
 - Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications.
 - Accountable to institute non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product’s offerings as part of the engineering solution.
 - Leads the team diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions
 - Collaborates with Engineering & support teams, including participation in escalations, , and blameless postmortems,
 - Work closely with customer-facing support teams to empower them with SRE insights and tooling.
 - Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical “understanding of interactions" of a full stack application alongside with peer SRE team member.
 - Continuously optimize the L2/support operations work via SRE workflow automation
 - Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams.
 - Actively engage and drive AI Ops adoption across teams
 
- 9-13 years of work experience evolving to a SRE engineer with 3-5 years of experience in continuously improving and transforming IT operations ways of working
 - Bachelor’s degree in Computer Science, Information Technology or a related field
 - Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs.
 - The ideal Engineer will be highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams to ensure SRE orchestrating solutions are meeting customer/end-user expectations
 - The candidate will take a pragmatic approach resolving incidents, including the ability to systemically triangulate root causes and work effectively with external and internal teams to meet objectives.
 - A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes with a track record for improving service offerings – pro-actively resolving incidents, providing a seamless customer/end-user experience and proactively identifying and mitigating areas of risk.
 - Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets.
 - A firm understanding of cloud archticture for distributed environments.
 - Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js.
 - Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase)
 - Infrastructure: Azure/AWS cloud platforms and/or Client / server environments.
 - Prior experience involving in shaping transformation developing SRE solutions would be a plus.