DevOps / Site Reliability Engineer at Calix
Calix · Bangalore, India · Hybrid
- Professional
- Office in Bangalore
Company Culture: At Calix, we pride ourselves on fostering a culture of continuous improvement, diversity, and inclusion. Our "better, better, never best" philosophy drives us to constantly evolve and attract the best and brightest talent. We have a longstanding commitment to flexible work-from-home culture, allowing us to hire top talent regardless of location. Our team members are encouraged to share their ideas and contribute to our collective success, creating an open and collaborative environment.
A Site Reliability / DevOps Engineer focuses on ensuring the reliability and scalability of Calix's services. They collaborate with platform and software engineering teams, automate tasks, and respond to incidents, with a strong emphasis on continuous learning and improvement. They participate in a level 1 on-call rotation that is responsible for responding to and resolving alerts.
Qualifications:
Bachelor’s degree in computer/ information science engineering or any other stream.
2 to 5 years of hands-on experience as a Site Reliability Engineer (SRE) / DevOps Engineer.
Proficiency in one or more scripting languages (e.g., Python, Shell).
Experience with cloud platforms and services, GCP is preferred, but strong experience on AWS and /OR Azure can also be considered.
Experience with Kubernetes – GKE and/or EKS
Strong Linux command line skills.
Understanding networking concepts like TCP/IP, DNS, and routing.
Experience with automation tools like Terraform.
Familiarity with monitoring/logging tools like Prometheus and Grafana.
Experience with incident management processes and tools.
Willingness to participate in on call rotation.
Responsibilities:
Respond to and resolve alerts. Assist in the investigation and resolution of service outages and degradations, often working with our senior engineers.
Build and maintain dashboards to help build insights into platform health and identify areas for improvement. Continuously implement and maintain monitoring systems to proactively detect and address issues affecting system reliability.
Be obsessive about automating routine tasks to improve efficiency and reduce manual effort, enhancing reliability and scalability.
Create runbooks for alert response/resolution and for maintenance operations.
Assist with infrastructure provisioning, code deployments, access control, and configuration changes.
Work to improve reliability, observability, monitoring, instrumentation, and automation of systems, with a focus on scalability and resilience
Continuously learning and expanding knowledge of system operations and reliability practices.
Location: India – (Flexible hybrid work model - work from Bangalore office for 20 days in a quarter)
Apply Now