Site Reliability Engineer chez Airspace Link Inc
Airspace Link Inc · Detroit, États-Unis d'Amérique · Hybrid
- Professional
- Bureau à Detroit
Description
Job Summary: In a highly collaborative distributed agile team environment, this role will ensure the scalability, reliability and performance of Airspace Link’s systems and applications. This role in collaboration with the platform and software engineers will work on automating and improving operational processes.
Duties and Responsibilities
· Reliability and Performance: Design, implement and maintain systems to ensure reliability, high availability and performance
· Scalability: Optimize applications and infrastructure to handle growth
· Monitoring and Alerting: Implement monitoring systems and performance metrics to proactively identify and address issues before the impact end users
· Incident Response: Respond to incidents in the timely manner. Lead the efforts in resolving critical issues, prevent recurrence and run postmortems.
· Automation: Develop tools and scripts to automate manual operational tasks, increasing efficiency
· Infrastructure Management: Using Infrastructure as Code tools (IaC) like Terraform, Ansible etc
· Capacity Planning: Analyze and forecast future infrastructure needs.
· Change Management: Implement practices to safely release code (CI/CD, canary releases, feature flags) . Reduce risk with increasing deployment velocity
· Collaboration: Work closely with development teams to improve software quality and reliability
· Disaster Recovery: Create disaster recovery plans to mitigate systems failures
· Security and Compliance: Implement security controls and conduct audits and vulnerability assessments. Ensure systems adhere to industry standards and regulations and conduct compliance audits and assessments
Position Type: Full-Time, 40 hours per week
Status: Exempt
Location: Hybrid
Requirements
· B.S. in Computer Science or equivalent years to relevant experience or education
· 3+ years of professional experience in a similar SRE or DevOps role
· Strong programming skills (Python, Go, Java)
· Experience with cloud platforms (Azure, AWS, GCP)
· Experience with containerization technologies (Docker, Kubernetes)
· Experience with monitoring and logging tools (Prometheus, Grafana)
· Knowledge of system administration
· Strong problem solving and analytical skills
· Great teamwork skills
· An eagerness to learn and adapt to the needs of a greenfield industry
· Part 107 or another pilot’s license a plus
Postuler maintenant