- Senior
- Oficina en Guadalajara
We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale — across all devices and digital mediums, and our people exist everywhere in the world (18000+ experts across 36 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in!
Job Description:- Manage end-to-end data and infrastructure operations, from writing SQL queries to CI/CD pipeline creation and optimization and VM and cloud-based deployments
- Drive incident and request management through ServiceNow, ensuring SLA compliance, ownership, and proactive issue resolution
- Implement and refine monitoring and observability frameworks using Datadog, Grafana, Prometheus to maintain uptime, identify bottlenecks, and enhance system reliability.
- Collaborate across global teams”including Data Engineering, Product, and IT Infrastructure”to resolve production issues, improve deployment practices, and optimize system performance
- Conduct root cause analyses and contribute to blameless post-incident reviews and preventive action plans
- Collaborate with security and compliance teams to uphold operational standards and data protection practices
- Contribute to automation and continuous improvement initiatives through scripting (Python, Shell) and infrastructure-as-code (Terraform, Ansible) principles
- Support the data lifecycle, ensuring accuracy, integrity, and accessibility of data pipelines and dashboards across analytics platforms
- Collaborate with Data Engineering teams to ensure data pipelines, ETL processes, and analytics platforms are performant, reliable, and production-ready
- Collaborate on capacity planning, scaling, and performance optimization to ensure reliability during growth and high-load scenarios
- Use operational metrics (MTTR, uptime, failure rate, latency) to drive service reliability improvements
- Participate in Agile ceremonies within a Scrum/Kanban model, aligning with delivery squads to ensure cross-functional visibility and operational excellence
Experience:
- 6+ years in DataOps, DevOps, infrastructure operations, site reliability engineering or analytics platform support.
- Intermediate SQL for data extraction, transformation, and diagnostics
- Strong understanding of CI/CD pipelines (Jenkins, Azure DevOps, Git-based version control)
- Proficiency in monitoring and observability tools (Datadog, Grafana, Prometheus)
- Hands-on with Python or Shell scripting for automation and diagnostics
- Familiarity with containerization (Docker, Kubernetes) and cloud platforms (AWS, Azure, GCP). Knowledge of AWS services is a must
- Solid grasp of infrastructure-as-code concepts (Terraform, Ansible)
- Proven record in incident management, maintaining SLA/SLI/SLO's for critical systems and escalation handling in enterprise environments.
- Analytical Mindset: Ability to interpret system and data metrics, identify trends, and recommend performance improvements
- Collaboration: Strong communication skills with cross-functional, global teams across technical and non-technical domains
- Agility: Comfort working in dynamic, fast-paced environments, maintaining composure and prioritization under pressure
Must have Skills: Docker (Strong), Kubernetes (Strong), DevOps - AWS (Strong), Terraform.
Good to have: ETL, Python, Shell scripting.
Solicitar ahora