Devops-automation-tools Careers in gurugram ∙ Page 1

23 Remote & work from home jobs online

Senior Engineer, Defect Management & DevOps

Thinkahead · Gurugram, India · Remote

Apply Now

Description

As a Senior Engineer (L3) specializing in Defect Management & DevOps, you will play a critical role in driving operational excellence, ensuring defect-free delivery pipelines, and strengthening reliability across cloud-native platforms. You will collaborate closely with engineering, QA, SRE, and product teams to manage end-to-end defect processes, streamline automation, and enhance service observability. The role demands deep analytical capability, strong DevOps experience, and the ability to influence cross-functional improvements through data-driven insights and advanced troubleshooting. 

You will act as a subject matter expert (SME) in DevOps and GCP/AWS, overseeing end-to-end release processes, governance, and delivery pipelines. This role requires leadership, deep technical knowledge, and excellent communication skills.

Core Responsibilities

  • Serve as the Subject Matter Expert (SME) for cloud platforms, primarily AWS (GCP exposure is a plus), providing guidance on cloud best practices, architectural decisions, and solution design. 
  • Support customers with core Managed Services technologies, including Cloud, Automation, Terraform, CI/CD, and containerization. 
  • Design, implement, and optimize cloud-native and DevOps solutions aligned with customer and organizational objectives. 
  • Lead technical discussions, demos, and customer engagements while effectively communicating complex technical concepts to both technical and non-technical stakeholders. 
  • Assist with team-building activities such as interviewing, onboarding, and aligning technical resources. 
  • Provide technical leadership, coaching, and mentorship to junior team members. 
  • Maintain strong project and situational awareness to ensure deliverables meet timelines and organizational expectations. 
  • Develop high-quality documentation including architectures, workflows, runbooks, and other written deliverables. 
  • Act as a technical expert in internal knowledge-sharing initiatives and external client interactions. 
  • Influence cloud governance, operational policies, best practices, and process improvements across teams and customer environments. 
  • Ensure precision, accuracy, and strong attention to detail across all tasks and deliverables. 
  • Requirements

  • Act as the SME for Defect Management processes, governance, tooling, and reporting. 
  • Own and manage the full defect lifecycle, including logging, triage, prioritization, RCA, corrective actions, and closure. 
  • Partner with Development, QA, SRE, and Product teams to ensure timely resolution of high-impact issues. 
  • Establish and maintain defect dashboards, KPIs, and trend analytics to drive quality and process improvements. 
  • Develop standardized runbooks, escalation workflows, and operational procedures for defect handling. 
  • Lead cross-team Root Cause Analysis (RCA) investigations and drive Corrective and Preventive Actions (CAPA) implementations. 
  • Improve operational readiness through enhanced monitoring, alerting, and structured incident-to-defect workflows. 
  • Provide guidance on CI/CD optimization, automation strategies, infrastructure stability, and reliability engineering. 
  • Mentor junior engineers in DevOps principles, tooling, defect analysis techniques, and troubleshooting best practices. 
  • Requirements

  • Defect Management Expertise 
  • Full ownership of defect lifecycle ensuring SLA adherence. 
  • Deep understanding of SDLC, change management, and ITIL best practices. 
  • Ability to analyze defect patterns, severity trends, root causes, and long-term systemic issues. 
  • Conduct structured RCA using 5 Why’s, Fishbone, Fault Tree Analysis. 
  • Define and enforce severity, categorization, and prioritization standards. 
  • Create dashboards and quality metrics to drive continuous improvement. 
  • Tools & Skills: 
  • Strong JIRA workflow, automation rule, dashboard, and reporting expertise. 
  • Ability to visualize defect trends and quality metrics effectively. 
  • Observability, Monitoring & SIEM Tools 
  • Hands-on experience with Dynatrace, Datadog, Prometheus, Grafana, CloudWatch, or similar tooling. 
  • Skilled in APM analysis, log correlation, anomaly detection, service mapping, and performance troubleshooting. 
  • Build and maintain dashboards and alert frameworks. 
  • Integrate monitoring insights with DevOps and operational workflows. 
  • Exposure to SIEM event analysis for operational and security correlation. 
  • Core DevOps Responsibilities

  • Build, enhance, and support CI/CD pipelines across multiple environments using AWS CodePipeline, CodeBuild, CodeDeploy, and Git-based workflows. 
  • Collaborate on automation initiatives using Terraform, CloudFormation, AWS CDK, or equivalent IaC tools to standardize and streamline deployments. 
  • Deploy and manage AWS cloud-native services including EKS, ECS, Lambda, API Gateway, S3, IAM, and supporting architectures. 
  • Work with containers and orchestration platforms such as Kubernetes, EKS, ECS, and AKS (where required). 
  • Implement deployment best practices such as blue/greenrolling updates, and automated rollback strategies to ensure safe, repeatable releases. 
  • Troubleshoot complex deployment issues, environment drift, infrastructure failures, performance bottlenecks, and service-level degradations. 
  • Implement and maintain observability using CloudWatch, Prometheus, Grafana, Datadog, Dynatrace, or equivalent monitoring stacks. 
  • Ensure AWS workloads adhere to resiliency, compliance, security, and operational excellence guidelines. 
  • Strong hands-on, production-grade DevOps experience in AWS (primary cloud). 
  • Deep expertise in Kubernetes, containerized workloads, microservices, autoscaling, and cloud networking. 
  • Advanced troubleshooting across AWS services, distributed systems, CI/CD pipelines, and API-driven workflows. 
  • Knowledge of AWS cost optimization, tagging, FinOps alignment, and resource lifecycle governance. 
  • Exposure to building or maintaining CI/CD pipelines within GCP ecosystems (Cloud Build, GKE, Artifact Registry, etc.). 
  • Ability to work with GCP cloud-native services where required, ensuring consistency across hybrid/multi-cloud deployments. 
  • Familiarity with GCP IAM, VPC architecture, and core compute/storage/networking components is a plus. 
  • General Qualifications

  • Strong communication, leadership, and mentoring capabilities. 
  • 6–10+ years of experience in DevOps, SRE, QA Engineering, or Cloud Operations. 
  • Expert-level AWS knowledge (GCP exposure would be a plus). 
  • Strong command of IaC tools such as Terraform, CloudFormation, CDK. 
  • Experience with CI/CD systems: Jenkins, GitLab CI, AWS CodePipeline. 
  • Proficiency with Docker, Kubernetes, and container orchestration. 
  • Experience with monitoring technologies: Datadog, Grafana, Prometheus. 
  • Experience with JIRA workflows and project tracking. 
  • Ability to excel in dynamic, fast-paced environments. 
  • Expectations

  • Demonstrate deep expertise across DevOps, cloud platforms, automation, and engineering practices. 
  • Balance hands-on delivery with leadership responsibilities and strategic initiatives. 
  • Continuously assess, refine, and enhance processes, documentation, and operational workflows. 
  • Adapt effectively to evolving customer requirements, project priorities, and technology landscapes. 
  • Engage confidently with senior stakeholders, providing clear communication and technical guidance. 
  • Lead scoping, planning, and methodology definition for major technical initiatives and transformations. 
  • Contribute to the development of new engineering standards, frameworks, and best practices across teams. 
  • Take senior-level ownership of critical defects, escalations, and operational issues, driving them to resolution. 
  • Influence and drive cross-team improvements in tooling, quality, automation, and operational efficiency. 
  • Ensure prevention mechanisms, automation guardrails, and reliability practices are embedded early in delivery cycles. 
  • Lead initiatives focused on defect prevention, observability enhancements, and overall DevOps maturity uplift. 
  • Participate in on-call rotations and provide Tier-3 technical expertise for complex issues. 
  • Continuously propose, design, and implement enhancements across tooling, automation, and operational frameworks.