- Senior
- Office in Mountain View
We exist to wow our customers. We know we’re doing the right thing when we hear our customers say, “How did we ever live without Coupang?” Born out of an obsession to make shopping, eating, and living easier than ever, we’re collectively disrupting the multi-billion-dollar e-commerce industry from the ground up. We are one of the fastest-growing e-commerce companies that established an unparalleled reputation for being a dominant and reliable force in South Korean commerce.
We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been at since our inception. We are all entrepreneurial surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day.
Our mission to build the future of commerce is real. We push the boundaries of what’s possible to solve problems and break traditional tradeoffs. Join Coupang now to create an epic experience in this always-on, high-tech, and hyper-connected world.
Job Overview:
As a Staff Systems Engineer (FinOps, Kubernetes) in Developer Platform, you will partner with leaders across platform teams to deliver cost-efficient, reliable, and scalable application runtimes.
You will engineer and develop Kubernetes-native automation that translates workload demand into optimal capacity via time-series forecasting, policy-based controls (HPA/VPA), and AI-driven decision systems. You will own the full loop from data to recommendations to safe automation, converting performance gains into measurable dollar savings at scale (millions of pods, thousands of clusters).
Key Responsibilities:
- Build and productionize time-series forecasting for resource demand (CPU, memory, I/O, network), handling seasonality, bursts, outliers, and holidays; deliver actionable confidence bounds for safe automation.
- Deliver recommendation systems for rightsizing, HPA/VPA targets, and SLO-aware scale-out/scale-in; surface guardrails and exceptions with server-side apply and strategic merge patch for safe rollouts.
- Apply reinforcement learning/bandits and Bayesian optimization to tune multi-objective tradeoffs (cost, latency, error budgets), integrating with HPA, VPA, and cluster autoscaler.
- Design cost-aware scaling and placement across multi-cluster/multi-region: spot vs on-demand/reserved, preemption and eviction handling, topology and AZ/zone awareness, budget adherence. • Build data pipelines for metrics and billing: Prometheus/Mimir/Thanos, OpenCost, and cloud billing (AWS CUR/GCP Billing/Azure); feature engineering, backfills, and drift monitoring in Python.
- Implement anomaly detection, budget tracking, and change impact analysis; drive showback/chargeback and unit economics ($/RPS, $/GB, $/message) with executive-ready reporting.
- Run controlled experiments (canary/A-B), simulation (replays/Monte Carlo), and offline policy evaluation; automate rollback and safety stops.
- Partner with product and platform teams to convert complex cost/performance goals into simple, reusable, and extensible control-plane primitives.
- Champion observability and continuous improvement with outcome-focused metrics; document playbooks and create dashboards for engineers and leaders.
Qualifications:
- Bachelor’s degree in Computer Science, Electrical Engineering, Math, or a closely related field (or equivalent experience)
- 10+ years in backend/software engineering or platform/infra roles
- 3+ years building and operating ML/time-series systems in production
- Strong Python expertise for data/ML (pandas, NumPy, SciPy, scikit-learn)
- Proven track record applying statistics/ML to optimize cost/performance
- Experience with Kubernetes and cloud platforms (AWS or Azure or GCP)
Preferred:
- Time-series forecasting: ARIMA/SARIMA, Prophet, Holt-Winters, state space models; advanced methods (XGBoost, LSTM/TFT) and feature engineering for seasonality and events
- Reinforcement learning and online learning: multi-armed bandits, policy gradient, Bayesian optimization; offline/online evaluation, regret analysis
- Kubernetes scaling: HPA (resource/custom/external), VPA, KEDA; external metrics adapters; autoscaler tuning and safe rollout patterns
- Kubernetes API and controllers: server-side apply, strategic merge patch, informers/workqueues, idempotent reconciliation; CRDs and admission webhooks
- FinOps stack: OpenCost, cloud billing exports (AWS CUR, GCP BigQuery Billing, Azure Cost Management), cost allocation, showback/chargeback
- Observability and data: Prometheus (cardinality, histograms/quantiles, exemplars), Mimir/Thanos; data pipelines with Kafka/Spark/Flink; Jupyter/MLflow for experimentation
- Programming: Python (primary) and Go for controllers/operators and services; gRPC/protobuf; JSON/YAML
- Statistics: experimental design (A/B, DOE), causal inference basics, uncertainty quantification, confidence/credible intervals
- Leadership: ability to prioritize in fast-paced environments and lead delivery of large-scale, cost-optimized services with measurable business impact
Pay & Benefits
Our compensation reflects the cost of living across several US geographic markets. At Coupang, your base pay is one part of your total compensation.
The base pay for this position ranges from $138,000/year in our lowest geographic market to $200,000/year in our highest geographic market. Pay is based on several factors including market location and may vary depending on job-related knowledge, skills, and experience.
General Description of All Benefits (Drafter: Please check with your team if any change on the benefit description)
- Medical/Dental/Vision/Life, AD&D insurance
- Flexible Spending Accounts (FSA) & Health Savings Account (HSA)
- Long-term/Short-term Disability
- Employee Assistance Program (EAP) program
- 401K Plan with Company Match
- 18-21 days of the Paid Time Off (PTO) a year based on the tenure
- 12 Paid Holidays
- XX weeks of Paid Parental leave
- Pre-tax commuter benefits
- MTV - [Free] Electric Car Charging Station
General Description of Other Compensation
“Other Compensation” includes, but is not limited to, bonuses, equity, or other forms of compensation that would be offered to the hired applicant in addition to their established salary range or wage scale.
Details to Consider
- This job posting may be closed prior to the stated end date for application if all openings are filled.
- Coupang has the right to rescind an offer of employment if a candidate is found to have submitted false information as part of the application process.
- Those eligible for employment protection (recipients of veteran’s benefits, the disabled, etc.) may receive preferential treatment for employment in accordance with applicable laws.
Privacy Notice
- Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below: https://www.coupang.jobs/privacy-policy/