- Senior
- Bureau à Bangalore
Description
- Direct and grow a global team of engineers who work to analyze and maintain service stability by documenting policies and best practices in a 7x24x365 operation
- Own the day-to-day health, uptime and reliability of all network, server, storage, and ancillary infrastructure to fulfill the mission of unyielding site stewardship
- Develop, prioritize, and implement process improvement initiatives across the organization. Determine factors for compliance and enforcement
- Work closely with cross functional teams to negotiate requirements, specifications, schedules, quality, and technical acceptance criteria for onboarding new features, functionality, services, etc
- Work closely with engineering, operations, product and project management peers to develop innovative technical solutions that meet TiVo’s needs with respect to functionality, performance, scalability, and reliability for tools, processes, speed to identify and resolve outages and incidents, and onboarding of new customers, products, and services
- Work with regional leads to establish organizational goals and meet recruiting objectives
- Participate in recovery from and forensic examination of major site incidents
- Develop reports, analytics, and incorporate feedback to inform technical solutions and drive innovation and efficiencies
- Familiarity with and the ability to grow team members in Incident management, Change management, Problem management, Availability management
- Publish Metrics and KPI dashboards to continually measure and improve the NOC processes
- Responsible for managing the end-to-end lifecycle of problems arising from incidents across corporate and customer facing environments (B2B and B2C)
- Review and approve external communications from the NOC team to B2B customers regarding incidents
- Provide guidance with Failed Change Review/Root Cause Analysis sessions held with cross-functional teams to identify the root cause of failed changes/problems and prevent recurrence
- Partner and collaborate with Customer Success leaders to understand (and educate the NOC team members in) customer requirements, contractual obligations, and expectations for providing exceptional service/communications to Xperi customers in areas of responsibility (high priority incident management and event management) where working directly with customers is required
- This is an onsite role reporting into the Xperi office:
- Full time in office 5 days a week
- May require working holidays or off-hours
- Must be adaptable to Operational Requirements
- A minimum of 7+ years of experience managing an operations organization in a 24x7 global infrastructure as well as a record of individual technical achievement.
- 3+ years of experience working in an operations environment focused on resolving outages, ideally as an Incident Manager
- 2+ years of working with a formal process methodology, the ITIL V3 framework
- Bachelor's degree in Information Technology, Computer Science, or related field.
- A natural team leader who can motivate, mentor, coach and encourage personal advancement
- Excellent project management skills and the ability to work in a fast-paced and hectic work environment
- Capable of technical deep-dives into code, networking, systems and storage with experienced engineers
- Capable of leading a discussion with executive management
- Demonstrated experience in network and large scale Linux system troubleshooting and maintenance
- Demonstrated experience with root cause analysis and process improvements that prevent future outage occurrence
- Superb written and oral communication skills, absolutely fluent in written and spoken English, and proven incident leadership and ability to work with both internal and external customers
- Familiarity with Consumer Electronics and/or video-based products
- Familiarity with ServiceNow a plus
- Familiarity of Monitoring systems such as PagerDuty, Splunk, Zabbix, AWS, etc
- Familiarity of ticketing tools such as Jira, Service Now, PagerDuty, etc
- Understanding of standard server hardware and architecture
- DNS and familiar tools for troubleshooting (e.g. ping, traceroute, netstat, route, nslookup, dig, etc.)
- Distributed computing (load balancers, service clusters, server pools, communication via API, etc.)
- Competitive compensation (salary, equity and bonuses) and comprehensive benefits designed to foster work-life balance, care for your health, protect your finances and help you save and invest for the future.
- Generous paid time away from work, including flexible time off, holidays and sick time, health and wellness initiatives, and a charitable match program to help you give back to your community.
- Great perks, which vary by location and can be site-specific: employee discounts, transportation reimbursements, subsidized cafes and fitness facilities.
- A flexible, hybrid work environment combining the best of in-office collaboration and community-building along with the benefits of working from home.