Platzhalter Bild

Site Reliability Engineer at BetaNXT

BetaNXT · Cary, United States Of America · Hybrid

Apply Now

Level/Function: Lead Associate, Software Engineering

Title: Site Reliability Engineer

About BetaNXT:

BetaNXT is a leading provider of frictionless wealth management infrastructure, real-time data solutions, and an enhanced advisor experience. We invest in platforms, products, and partnerships to accelerate growth for the ecosystem we serve. Our connective approach empowers our clients to deliver a comprehensive, end-to-end advisor and investor experience.

BetaNXT is a premier provider of technology, data, and operations as services to a rich client base of wealth managers, institutional wealth firms, and digital brokers. It is comprised of three industry-leading businesses which, combined, provide end-to-end solutions across the investment lifecycle.

Overview of the Site Reliability Engineer:

We are seeking a skilled Site Reliability Engineer to join our Engineering team and ensure the reliability, scalability, and performance of our production systems. You will bridge the gap between Development and Product Support teams, applying software engineering principles to infrastructure and operations challenges while maintaining high availability of our services. This is a hybrid position based out of our Cary, NC office location with an in-office expectation of 3 days a week.

Duties and Responsibilities of the Site Reliability Engineer:

  • Triage production issues, and maintain highly available, scalable, and resilient systems that support our growing user base. You will be responsible for monitoring system performance, implementing automated solutions, and responding to incidents to minimize downtime and service disruptions.
  • Lead incident response efforts, conduct post-incident reviews, and implement preventive measures to avoid similar issues in the future. You will quickly diagnose and resolve production issues as they arise.
  • Monitor system metrics, logs, and alerts to proactively identify potential issues before they impact users. Build and maintain observability tools, dashboards, and alerting systems that provide visibility into system health.
  • Investigate and resolve SQL database errors, diagnose and troubleshoot database-API integration issues, including connection failures, data synchronization errors, timeout problems, authentication bottlenecks, and performance degradation in API-database communication pathways, and ensure seamless data flow between database systems and application programming interfaces in production environments.
  • Systematically identify, prioritize, and remediate technical debt across legacy systems to modernize application architecture and service infrastructure, implementing incremental improvements that enhance system reliability, reduce maintenance overhead, and minimize production downtime through strategic refactoring, dependency updates, code consolidation, and migration to current technology stacks
  • Work closely with Development and Product Support teams to improve system reliability through the implementation of best practices, participating in code reviews and identifying critical logging to assist in issue triaging.

 Skills and Experience of the Site Reliability Engineer:

  • 3+ years of experience in SRE, DevOps, or similar role supporting large-scale production systems.
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
  • Excellent programming skills in languages such as Python, Java, React or similar, with experience building production-grade software.
  • Solid understanding of Linux/Unix systems, networking concepts, and distributed systems.
  • Extensive experience with AWS services including EC2, RDS, S3, CloudWatch, IAM, and other core infrastructure components.
  • Proven SQL skills with experience managing and optimizing relational databases such as PostgreSQL, MySQL, or similar.
  • Proven ability to write complex queries, optimize database performance, and troubleshoot database-related issues.
  • Knowledge of monitoring and observability tools such as Prometheus, Grafana, ELK stack, or similar platforms. Familiarity with infrastructure as code tools like Terraform, Ansible, or CloudFormation.
  • Experience with version control systems, CI/CD pipelines, and automated testing frameworks. Understanding of database systems and their operational considerations in production environments.
  • Advanced AWS certifications and deep knowledge of AWS best practices and cost optimization.
  • Background in implementing chaos engineering practices and disaster recovery procedures. Knowledge of security best practices and compliance requirements in production environments.
  • Experience with performance optimization, capacity planning, and cost management in AWS environments. Strong analytical and problem-solving skills with ability to work effectively during high-pressure situations.
Apply Now

Other home office and work from home jobs