Site Reliability Engineer presso Ad Astra

Ad Astra · Overland Park, Stati Uniti d'America · Onsite

2025-10-20 14:00:00.0

Python

SQL

NoSQL

Docker Container Platforms

Kubernetes Container Platforms

AWS

Software Engineer

Cloud

Professional
Ufficio in Overland Park

Candidarsi ora

Competitive Compensation & Benefits Package * 401(k) with Profit Sharing * Flexible Time Off * Office Dog!!

ABOUT US

By combining our unparalleled domain expertise with leading-edge technology, Ad Astra is helping higher education in its mission to advance timely student completions. We are building a cloud-based software platform that will provide the foundation for our next generation of industry-leading solutions and analytics. Simply put, we're helping students graduate faster.

OUR CORE VALUES

We recognize talent. We recognize and appreciate the unique God-given talents that our people bring to Ad Astra. Aligning these individual gifts with our work sets team members up to succeed.
We’re unpretentious. There’s no room for ego. We admit our imperfections and have the humility to know what we don’t know.
We’re passionate. We aren’t satisfied with the status quo. We’re on a mission together to protect the value of degree completion and to transform the higher education industry.
We’re pioneering. We’re pioneering and aren’t afraid of failing—in fact, we celebrate it. We love it when our people boldly experiment with innovative solutions.
We love fun. The health of our relationships is strengthened by working with people who stretch our thinking—and by enjoying the lighter side of life together. We don’t take ourselves too seriously, but we do take fun seriously.
We have grit. Beyond talent and intelligence, our people have stick-to-itiveness. We push through challenges to make goals a reality.

POSITION SUMMARY

The Site Reliability Engineer (SRE) will ensure the performance, reliability, and scalability of our systems as we continue to grow. This role bridges the gap between software development and operations, applying software engineering principles to automate, optimize, and enhance the reliability of our infrastructure and production systems. Your role includes identifying recurring failure patterns, implementing automated solutions, and continuously improving platform performance. Leveraging your intellectual curiosity and expertise in operations and development, you will also play a pivotal role in monitoring security and reliability threats, while actively advocating effective solutions.

CORE RESPONSIBILITIES

Write automation and production code to improve system reliability and performance
Design, build, and maintain highly available, scalable systems across cloud environments (e.g., AWS, Azure, or GCP)
Maintain and extend logging, monitoring, and alerting systems to enhance observability and proactive incident response
Bridge development and operations by automating workflows, deployments, and infrastructure provisioning
Proactively monitor and respond to alerts and incidents, ensuring system uptime and performance
Collaborate with engineering, product, and operations teams to capacity plan and enhance the overall reliability and efficiency of our products
Support production systems, including participation in on-call rotations and performing limited after-hours maintenance
Lead and contribute to post-incident reviews, driving root cause analysis and long-term solutions
Document reliability patterns, runbooks, and learnings to build operational maturity
Other duties as assigned

POSITION REQUIREMENTS

Bachelor’s degree in Computer Science, Engineering, or related field preferred; equivalent experience in supporting distributed software systems accepted
2+ years of experience in Site Reliability Engineering or 4+ years of experience in Development or Systems Engineering roles
Strong understanding of networking concepts including load balancing, DNS, IPSec, and VPNs
Experience with source version control, CI/CD, and Infrastructure as Code tools (e.g., GitHub, Jenkins, Terraform, CloudFormation)
Working knowledge of Linux operating systems
Proficiency with relational or NoSQL database technologies (both preferred)
Proficiency in at least one scripting or programming language (Node.js, Python, Go, Bash, PowerShell, etc.)
Experience with containerization and orchestration (Docker, ECS, Kubernetes)
Familiarity with observability tools (Graylog, New Relic, Prometheus, Grafana, ELK Stack, etc.)
Strong collaboration, problem-solving, and communication skills

ESSENTIAL COMPETENCIES

Problem Solving
Collaborative Communication
Adaptability & Flexibility
Sense of Urgency with Quality
Attention to Detail
Creative Problem Solving
Technical Aptitude

ADDITIONAL PREFERRED QUALIFICATIONS

Expertise in git, docker, terraform, ansible and AWS
Experience with blue/green or canary deployment strategies and zero-downtime releases
Understanding of security best practices in cloud-native environments
Background in automating large-scale infrastructure management
Experience working in an agile or SaaS-based environment

KEY MEASURES OF SUCCESS

Meaningful contributions to the SRE high-value/team stories
Timely response to infrastructure alerts and ensuring system reliability
Regular preventative maintenance
Contribution to the overall success of the Cloud Ops team
Incident Mean time to Acknowledge (MTTA) < 15 minutes
Drive availability improvements to exceed 99.95% uptime

Ad Astra is proud to be an equal opportunity employer. We are committed to fostering an inclusive workplace where all individuals are treated with respect and fairness—regardless of race, color, national origin, sex, gender identity or expression, sexual orientation, religion, age, political affiliation, disability, veteran status, or any other characteristic protected by law.

All applicants must be legally authorized to work in the United States. Please note that Ad Astra is unable to provide work visa sponsorship for this position.

Candidarsi ora

Site Reliability Engineer presso Ad Astra

Vantaggi aggiuntivi

Altri lavori

Engineer, Data Solutions (Miramar, FL, US, 33132)

Data Engineer, 2

Sr. Site Reliability Engineer

Ricerca di lavoro

Menu

Seleziona una lingua

Accedi

Impostazioni dei cookie

Impostazioni dei cookie

Cookie orientati al gruppo target

Utilizziamo i cookie

Site Reliability Engineer presso Ad Astra

Vantaggi aggiuntivi

Altri lavori

Engineer, Data Solutions (Miramar, FL, US, 33132)

Data Engineer, 2

Sr. Site Reliability Engineer

Ricerca di lavoro

Le ultime offerte di lavoro da casa settimanalmente via e-mail.

Menu

Seleziona una lingua

Accedi

Impostazioni dei cookie

Impostazioni dei cookie

Cookie orientati al gruppo target

Utilizziamo i cookie

Le ultime offerte di lavoro da casa
settimanalmente via e-mail.