Senior Reliability Engineer (Backend Focus) at Accrue
Accrue · New York City, United States Of America · Onsite
- Senior
- Office in New York City
About our team
Accrue is a fintech company that powers modern customer loyalty at the payment layer. Centered around its branded stored value wallet solution, Accrue’s platform helps brands take control of their payments and refunds, reward customer loyalty, and bypass traditional payment giants.
Build a deeper, more durable customer base while optimizing your bottom line – all with Accrue.
About the Role
We're looking for a senior backend engineer who builds reliability through elegant, production-ready code architecture. You'll have significant authority to rearchitect critical systems, replacing homegrown solutions with industry-standard tooling and patterns that handle 10k+ req/sec at scale.
This is primarily a programming role focused on building robust, observable systems through code. You'll spend most of your time architecting and implementing reliability improvements, not managing infrastructure.
What makes this role unique:
- Architectural Authority: Drive decisions on adopting technologies like Temporal.io for durable execution vs. maintaining custom retry logic
- Production Scale: Design systems that handle high-throughput payment and loyalty processing with strict SLA requirements
- Code-First Reliability: Improve system reliability by writing better application code, not just adding monitoring
- Industry Standards Over NIH: Replace internal implementations with proven, production-ready solutions
What you'll do
System Architecture & Reliability Engineering
- Rearchitect core reliability patterns: Replace custom retry mechanisms with durable execution engines like Temporal.io
- Implement robust event processing: Migrate direct webhook handling to reliable delivery systems like Hookdeck with proper delivery semantics
- Build behavioral monitoring: Integrate time-series databases to detect and alert on changing system behavioral patterns
- Eliminate technical debt: Systematically replace "not invented here" solutions with industry-standard, battle-tested alternatives
High-Scale Backend Development
- Design and implement systems that maintain performance and reliability at 10k+ requests/second
- Write production-grade code for payment processing, wallet operations, and loyalty program mechanics
- Build comprehensive error handling, circuit breakers, and graceful degradation patterns
- Implement distributed system patterns for fault tolerance and observability
Production Excellence
- Instrument deep observability into application code using existing frameworks (Datadog)
- Design monitoring that provides actionable insights into system behavior and business metrics
- Build alerting that proactively identifies reliability issues before they impact users
- Lead incident response with focus on permanent architectural fixes rather than band-aid solutions
Technical Leadership
- Evaluate and recommend new technologies and architectural patterns for production readiness
- Collaborate with product engineering teams to embed reliability patterns into new feature development
- Drive technical decisions around system architecture, scaling, and reliability patterns
- Mentor engineers on production best practices and reliable system design
What you'll need
Required
- 5+ years backend engineering experience building high-throughput, production systems (10k+ req/sec)
- Strong programming skills in modern languages - our stack uses TypeScript, but we value polyglot engineers
- Production architecture experience with distributed systems, microservices, and reliability patterns
- Systems thinking: Ability to identify when to build vs. buy vs. adopt existing solutions
- Cloud-native development with AWS services (ECS, RDS, ELB) and modern deployment patterns
- Technical leadership: Experience making architectural decisions and driving technical improvements independently
Highly Valued
- Experience with durable execution systems (Temporal.io, Step Functions, etc.)
- Background in fintech, payments, or high-reliability systems
- Knowledge of event-driven architectures and reliable message processing
- Experience with time-series databases and behavioral analytics
- Track record replacing legacy systems with modern, scalable alternatives
- Startup or high-growth experience where you've scaled systems through rapid growth
What We're NOT Looking For
- Traditional "infrastructure-first" SRE background
- Focus on Kubernetes administration or infrastructure provisioning
- Scripting-heavy operational work
- Basic monitoring setup (connecting Datadog to ECS is table stakes, not the role)
Tech Stack & Scale
- Backend: TypeScript/Node.js, REST APIs, high-throughput transaction processing
- Infrastructure: AWS (ECS, RDS, ELB), Cloudflare
- Observability: Datadog (existing), custom instrumentation and analytics
- Scale: 10k+ requests/second, real-time payment and loyalty processing
- Architecture: Distributed microservices, event-driven systems
Types of Challenges You'd Tackle
- Identifying and replacing fragile custom implementations with industry-standard solutions
- Architecting reliable event processing where current approaches show brittleness
- Building proactive monitoring for behavioral changes that current systems miss
- Designing fault-tolerant patterns for high-throughput transaction processing
- Evaluating and implementing durable execution patterns for complex workflow reliability
- Creating robust delivery semantics for webhook and event-driven architectures
Benefits & Perks
- Robust health benefits packages (1 of 4 medical plans cost $0 for full family coverage)
- Company contributed 401k plan
- An empathetic team that values mental wellness and work/life balance
- A brand new NYC office!
- Ground floor opportunity to bring a product to market that helps people in a way that doesn't exist today
Salary Information
The range listed below is just one component of Accrue's total compensation package; This role will receive a competitive salary + benefits + equity. The salary range is for US-based employees located in the listed market. Other benefits include those listed above such as healthcare and 401k with 3% contribution.
Accrue is an equal opportunity employer committed to fostering an inclusive, innovative environment. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please contact us at [email protected].
 
			 
			 
			 
			