Platzhalter Bild

Hybrid [Coupang Pay] Staff II, Back-end Engineer (Observability) at Coupang

Coupang ·  Seoul, South Korea, United States Of America · Hybrid

Apply Now

반드시 첨부된사내공모지원서 양식’을 작성 후 제출하여 주시기 바랍니다.  

Please complete the attached Internal Transfer Request Form and submit.  

반드시 쿠팡 이메일 계정으로 지원해 주시기 바랍니다.  

Please make sure to apply with your Coupang e-mail address 

 


 

About Coupang: At Coupang we are building the future of ecommerce. Born out of an obsession to make shopping, eating, and living easier than ever, we’re collectively disrupting the multi-billion-dollar e-commerce industry from the ground up. We exist to wow our customers. We know we’re doing the right thing when we hear our customers say, “How did we ever live without Coupang?” We are one of the fastest-growing e-commerce companies that established an unparalleled reputation for being a dominant and reliable force in South Korean commerce.

 

About CoupangPay : Coupang Pay focuses on delivering innovative payment and financial services solutions to everyone who uses the Coupang app — from customers buying products on Coupang.com, to marketplace vendors, and restaurants that offer their services via Coupang Eats. We develop solutions with our latest tech innovations to serve the growing needs of Coupang’s customers in Korea and Taiwan. This includes Coupay, an online wallet with a proprietary one-touch payment capability.

 

About the role : As a Staff Site Reliability Engineer (SRE) in CoupangPay, you will play a pivotal role in ensuring the reliability, scalability, and performance of our critical systems and services. You will be a technical leader, driving the design, implementation, and optimization of complex systems that meet the demands of a high-availability environment. This role requires deep expertise in the Observability Engineering (OE) stack—including Mimir, Loki, Tempo, and Grafana—and Terraform-based automation. Experienced in setting up, tuning, and scaling observability platforms to support business-critical services with high reliability and performance. As a Staff SRE engineer, you will be involves collaborating with cross-functional teams to architect solutions, identify and resolve system bottlenecks, and establish best practices in operational excellence. With a focus on automation, observability, and incident management, you will also mentor junior engineers, foster a culture of reliability, and contribute to the strategic direction of our product engineering initiatives. This is a unique opportunity to make a significant impact on the stability and scalability of our technology ecosystem.

 

Responsibilities

System Reliability and Performance

  • Ensure the reliability, availability, and performance of critical systems and services.
  • Proactively identify and address system bottlenecks, failures, and performance issues.

Technical Leadership

  • Lead the design, implementation, and optimization of scalable and fault-tolerant architectures.
  • Provide guidance and mentorship to junior engineers, fostering technical growth.

Automation and Tooling

  • Develop and enhance automation tools to streamline operational processes and improve efficiency.
  • Champion automation-first principles to reduce manual toil and operational overhead.

Observability and Incident Management

  • Build and operate OE stack. Involve in performance tunning, cost optimisation and observability initatives to best serve the interest of the business.
  • Drive incident response, root cause analysis, and post-incident reviews to improve systems.

Collaboration and Best Practices

  • Partner with cross-functional teams (e.g., development, product, and infrastructure) to build robust systems.
  • Define and implement best practices for reliability engineering, including CI/CD pipelines and infrastructure as code.

Strategic Contributions

  • Influence the strategic direction of infrastructure and platform engineering initiatives.
  • Evaluate and implement new technologies to enhance system resilience and operational capabilities.

Operational Excellence

  • Drive continuous improvement in operational processes, reducing time to resolution for incidents.
  • Promote a culture of accountability, innovation, and reliability throughout the engineering organization.

 

Preferred Qualification

  1. Strong proficiency in programming languages such as Python, Go, or similar.
  2. In-depth knowledge of Linux/Unix systems, networking, and distributed systems.
  3. Experience with cloud platforms (AWS, GCP, or Azure) and container orchestration tools (e.g., Kubernetes, Docker).
  4. Strong understanding of observability tools (e.g., Prometheus, Grafana, or Datadog).
  5. Proficiency in Infrastructure as Code (IaC) using Terraform
  6. Expertise in scaling and tuning Mimir and Loki for high-throughput workloads. Familiarity with distributed tracing using Tempo
  7. Knowledge of performance optimization techniques for high-availability systems.
  8. Strong collaboration skills with the ability to work across cross-functional teams.
  9. Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
  10. 8+ years of experience in Site Reliability Engineering or related roles in high-availability environments.

 

 


 

반드시 첨부된사내공모지원서 양식’을 작성 후 제출하여 주시기 바랍니다.  

Please complete the attached Internal Transfer Request Form and submit.  

반드시 쿠팡 이메일 계정으로 지원해 주시기 바랍니다.  

Please make sure to apply with your Coupang e-mail address 

 

Apply Now

Other home office and work from home jobs