Delivery Manager - SRE bei Principal Global Services
Principal Global Services · Pune, Indien · Hybrid
- Senior
- Optionales Büro in Pune
Job Description: Site Reliability Engineering (SRE) Manager – Observability & ITOM
Indicative years of total experience: 14 – 16 years
Location:
Pune/Hyderabad
Department:
Engineering / IT Operations
Reporting relationship:
This role will report to Program Manager
Job Type:
Full-Time (Hybrid)
Job Summary:
We are seeking a seasoned SRE Manager to lead our Observability & Reliability Engineering team, with a strong focus on IT Operations Management (ITOM) practices. This role will be responsible for driving end-to-end reliability, performance, and operational excellence across our infrastructure and applications. The ideal candidate will also oversee the ServiceNow ITOM module, ensuring seamless integration and automation of IT operations workflows.
Key Responsibilities:
Leadership & Strategy
- Lead and mentor a team of SREs and Observability Engineers.
- Define and drive the strategic roadmap for reliability, observability, and ITOM practices.
- Collaborate with cross-functional teams (DevOps, Platform Engineering, Application Development, and ITSM) to align reliability goals with business objectives.
Observability & Monitoring
- Own the observability stack including metrics, logs, traces, and dashboards.
- Implement and manage tools like Prometheus, Grafana, ELK, Splunk, Datadog, or similar.
- Drive proactive monitoring, alerting, and anomaly detection to reduce MTTR and improve system health.
Reliability Engineering
- Champion SRE principles such as SLIs, SLOs, and error budgets.
- Lead incident response and postmortem processes to ensure continuous improvement.
- Automate operational tasks and improve system resilience through chaos engineering and fault injection.
ITOM Practice Management
- Oversee the implementation and optimization of ServiceNow ITOM modules (Discovery, Event Management, Orchestration, CMDB).
- Ensure accurate and up-to-date CMDB data to support incident, problem, and change management processes.
- Drive automation of IT operations workflows using ServiceNow and other orchestration tools.
Process & Governance
- Establish and enforce best practices for change management, incident management, and problem resolution.
- Ensure compliance with internal and external audit requirements related to IT operations.
Stakeholder Engagement
- Act as a key liaison between engineering, operations, and business stakeholders.
- Provide regular updates and reports on system reliability, performance, and operational KPIs.
Qualifications:
Required Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- 10+ years of experience in IT operations, DevOps, or SRE roles.
- 3+ years in a leadership or managerial role.
- Hands-on experience with observability tools and practices.
- Strong expertise in ServiceNow ITOM modules and CMDB management.
- Excellent communication, leadership, and stakeholder management skills.
Preferred Skills:
- Certifications in SRE, ServiceNow ITOM & cloud platforms (AWS, Azure, GCP).
- Experience with infrastructure as code (Terraform, Ansible).
- Familiarity with container orchestration (Kubernetes, Docker).
- Knowledge of ITIL processes and frameworks.
Required Behavioral Competency:
- Make sound business decisions
- Embrace Change
- Build strong Partnership
- Get results
- Act Strategically
- Lead Cultivate Talent
Jetzt bewerben