- Professional
- Ufficio in Chicago
The Observability Engineer II is responsible for designing, implementing, and maintaining observability solutions that ensure full visibility into infrastructure, applications, and services. This includes integrating data sources, developing dashboards, managing alerts, and enabling proactive incident response and performance optimization. While the scope includes multiple observability platforms, this role has a specialty focus on Grafana and its ecosystem.
Essential Duties & Responsibilities
- Deploy, configure, and maintain observability platforms and tools, with emphasis on Grafana and its related components (e.g., Prometheus, Loki, Tempo, Alert manager).
- Design, build, and optimize dashboards, visualizations, and reports to provide actionable insights for both technical and business stakeholders.
- Integrate and manage data sources across metrics, logs, traces, databases, and cloud environments to ensure comprehensive monitoring coverage.
- Define, implement, and refine alerting rules and incident workflows to support timely detection, triage, and resolution of issues.
- Monitor and optimize observability platform performance, scalability, and cost efficiency, including upgrades and patching.
- Implement authentication, RBAC, and secure data practices in compliance with organizational and regulatory standards.
- Collaborate with DevOps, SRE/Operations, developers, and business partners to capture monitoring requirements and improve observability capabilities.
- Automate provisioning, configuration, and ongoing tasks through scripting and Infrastructure as Code.
- Maintain documentation, runbooks, and best practice guides; provide training and enablement to end users.
- Perform other duties and responsibilities as required to support business needs.
Minimum Required Qualifications
- Bachelor’s degree in computer science, Information Systems, Engineering, or equivalent practical experience.
- Experience with observability practices (metrics, logs, traces, application performance monitoring).
- Hands-on knowledge of Grafana and related technologies (e.g., Prometheus, Loki, Tempo, ELK).
- Familiarity with cloud platforms (AWS, Azure, GCP), containers (Docker), and orchestration (Kubernetes).
- Scripting or automation experience (Python, Bash, PowerShell) and use of IaC/configuration management tools (Terraform, Ansible).
- Ability to implement SSO, RBAC, and security best practices within observability systems.
- Strong troubleshooting, analytical, and communication skills.
Preferred Qualifications
- Grafana Labs certifications or related observability certifications.
- Experience developing CI/CD integrations for observability assets and workflows.
- Background in incident response and post-incident analysis.
- Contributions to open-source monitoring or observability projects.
- Knowledge of ITIL or similar frameworks.
Physical Requirements
- This role may be performed in an office, remote, or hybrid work environment, depending on business needs and applicable work arrangements. Standard working hours apply, though flexibility may be required to accommodate deadlines or collaboration across time zones. Occasional travel may be required based on business demands.
Candidarsi ora