Application Sustain & Operations Engineer bei PepsiCo
PepsiCo · Plano, Vereinigte Staaten Von Amerika · Onsite
- Professional
- Optionales Büro in Plano
Role is responsible for ensuring the overall stability of production application. Reliability, availability, scalability, and efficiency of our production systems and platforms. The Operations Engineer will collaborate with cross-functional teams—including Software Engineering, Service Reliability, Infrastructure, and Business Operations—to streamline processes, manage day-to-day operations, monitor system health, and quickly resolve incidents.
The ideal candidate must be skilled in problem-solving, process automation, and root cause analysis, with a passion for operational excellence and continuous improvement.
System Reliability & Availability:
- Ensure production systems, applications, and infrastructure are reliable, performant, and available within agreed SLAs/OLAs.
Incident & Problem Management:
- Lead troubleshooting of critical incidents and drive timely resolution as part of Incident Management. Ensure the Root Cause Analysis is performed and help coordinate the implement permanent fixes on a timely basis.
- Analyze priority incidents to generate insights and identify gaps in the alerting mechanisms.
- Analyze market-specific issues and conduct comparative studies to determine why certain problems occur only in specific markets.
Monitoring & Alerting:
- Partner with the Service Reliability Engineering team to identify, develop and maintain proactive monitoring, alerting, and health checks to detect and prevent issues before business impact.
- Assist the SRE team in identifying critical health checks for order flow, Order journey and user journeys to enable dedicated notifications for key steps.
Deployment & Change Operations:
- Partner with the Software Engineering team to support safe, efficient deployments and configuration changes, ensuring minimal disruption to business operations.
- Provide insights on system performance and capacity trends; provide recommendations to the Software Engineering to implement improvements for scalability and efficiency.
Automation & Continuous Improvement:
- Identify manual operational tasks and automate processes to increase efficiency, reduce errors, and improve response times.
- Identify recurring data anomalies through analysis and assist in determining effective technical and process-related solutions.
- Review L2 team’s manual processes to uncover automation opportunities and implement technology-specific solutions aimed at improving productivity.
Collaboration with Engineering & Product Teams:
- Partner with development, infrastructure, and reliability engineering teams to design and deliver operable, scalable, and resilient solutions.
Operational Excellence & Documentation:
- Maintain runbooks, SOPs, and technical documentation; uphold IT controls, compliance, and audit readiness.
Risk & Security Management:
- Enforce operational security best practices, support vulnerability remediation, and contribute to disaster recovery and business continuity planning.
Compensation and Benefits:
- The expected compensation range for this position is between $89,000 - $149,000.
- Location, confirmed job-related skills, experience, and education will be considered in setting actual starting salary. Your recruiter can share more about the specific salary range during the hiring process.
- Bonus based on performance and eligibility target payout is 10% of annual salary paid out annually.
- Paid time off subject to eligibility, including paid parental leave, vacation, sick, and bereavement.
- In addition to salary, PepsiCo offers a comprehensive benefits package to support our employees and their families, subject to elections and eligibility: Medical, Dental, Vision, Disability, Health, and Dependent Care Reimbursement Accounts, Employee Assistance Program (EAP), Insurance (Accident, Group Legal, Life), Defined Contribution Retirement Plan.
- Bachelor’s degree in computer science, Information Technology, Engineering, or a related field (or equivalent experience).
- 5+ years of experience in operations engineering, site reliability engineering, or systems administration.
- Strong knowledge of Linux/Unix and/or Windows server environments.
- Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog, Splunk, Nagios, AppDynamics, Full Story, Ignio).
- Proficiency in at least one scripting/programming language (e.g., Python, Bash, PowerShell).
- Familiarity with CI/CD pipelines, deployment automation, and configuration management (e.g., Jenkins, Ansible, Puppet, Chef).
- Database - MySQL, MongoDB, Cassandra, Couchbase
- Understanding of networking fundamentals (DNS, TCP/IP, load balancing, firewalls).
- Hands-on experience with cloud platforms (AWS, Azure, GCP).
- Experience working with Service Now.
Our Company will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Credit Reporting Act, and all other applicable laws, including but not limited to, San Francisco Police Code Sections 4901-4919, commonly referred to as the San Francisco Fair Chance Ordinance; and Chapter XVII, Article 9 of the Los Angeles Municipal Code, commonly referred to as the Fair Chance Initiative for Hiring Ordinance.
All qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, or disability status.
PepsiCo is an Equal Opportunity Employer: Female / Minority / Disability / Protected Veteran / Sexual Orientation / Gender Identity / Age.
If you'd like more information about your EEO rights as an applicant under the law, please download the available EEO is the Law & EEO is the Law Supplement documents. View PepsiCo EEO Policy.
Please view our Pay Transparency Statement.
Jetzt bewerben