Platzhalter Bild

Principal Site Reliability Engineer presso Skyralio

Skyralio · London, Regno Unito · Hybrid

Candidarsi ora
We Are Skyral:

We believe every decision maker can be empowered by technology. Skyral combines AI, leading edge simulation technology and world class expertise to transform the decision making experience. Our products and services enable faster and more confident decisions in a complex, unforgiving world. We deploy practical, intuitive and efficient solutions to governments and enterprises, delivering outstanding outcomes at the speed of relevance.

What We Do: 

The Production Engineering team partners with development teams to ensure the end-to-end lifecycle and uptime of our production environments, the crucial demo systems that showcase our capabilities, and the support for customer deployments. Our team acts as a centre of excellence, providing expert consulting to project development teams to improve their production capabilities through better tooling and best practices.

This role will be responsible for developing and implementing the technical strategy for technical operations, reliability, owning key architectural decisions, and mentoring our engineering teams to foster a culture of production excellence.

Please note that due to the nature of work, we can only accept applications from individuals who would be able to obtain government SC clearance, which as a minimum, requires a person to have resided in the UK for the past 5 years.


What We Do: The Production Engineering team partners with development teams to ensure the end-to-end lifecycle and uptime of our production environments, the crucial demo systems that showcase our capabilities, and the support for customer deployments. Our team acts as a centre of excellence, providing expert consulting to project development teams to improve their production capabilities through better tooling and best practices.This role will be responsible for developing and implementing the technical strategy for technical operations, reliability, owning key architectural decisions, and mentoring our engineering teams to foster a culture of production excellence.Please note that due to the nature of work, we can only accept applications from individuals who would be able to obtain government SC clearance, which as a minimum, requires a person to have resided in the UK for the past 5 years.

What We Are Looking For:
  • Extensive experience in a senior or principal role, such as Site Reliability Engineer, Production Engineer, or DevOps Engineer, with a proven track record of architectural and operational technical leadership.
  • Passionate about mentoring and teaching, with demonstrable experience up-skilling engineering teams in production best practices.
  • Strong software engineering background, with proficiency in a language like Go, C++, Rust or Python.
  • Expert in container orchestration with Kubernetes and have experience running it in a variety of production environments.
  • Deep understanding of observability and have hands-on experience building and running monitoring solutions (e.g., Prometheus, Grafana, Jaeger, Tempo).
  • Experienced with at least one major cloud provider (GCP, AWS, Azure) and are comfortable working with on-premise and air-gapped systems.
  • Mindset for operational excellence and are passionate about building scalable, automated systems that reduce operational toil.
  • Excellent communicator and can work effectively with both technical and non-technical stakeholders to drive change.
  • Beneficial to have experience working in the defence, national security, or another highly regulated industry, and thave the ability to obtain Government SC Clearance.




  • What You'll Be Responsible For:
  • Own the system design ,architecture and implementation for our core modelling and simulation platform from a reliability, scalability, and operability perspective. You will drive this architectural vision across multiple engineering teams to ensure alignment and successful implementation.
  • Lead the technical strategy and implementation for the reliability of our production systems, ensuring they are scalable, resilient, and secure across both cloud and on-premise environments.
  • Act as a key leader and mentor for production engineering, up-skilling the entire engineering organisation. You will develop and deliver training, establish best practices, and actively coach engineers on reliability, observability, and operational excellence.
  • Lead by example, taking a hands-on role in designing, building, and operating our robust infrastructure and automation to support our distributed systems, leveraging technologies such as Kubernetes and Infrastructure as Code (IaC).
  • Take ownership of our observability strategy, evolving our in-house observability suite (SPORE) and integrating it with other monitoring tools to provide a unified, actionable view of system health.
  • Partner with our product and engineering teams to define and manage Service Level Objectives (SLOs) and error budgets that balance the need for innovation with the requirement for stability.
  • Lead incident response and conduct blameless post-mortems to ensure we learn from failures and build a more resilient platform.
  • Contribute to the team's on-call rotation to troubleshoot and resolve critical production incidents.


  • Note: Please feel empowered to apply for this position, even if you think you may only align with some of the qualities listed above. Your unique skills and perspectives could be just what we’re looking for.

    What We Can Offer You:

     🌴Unlimited Paid Holiday - we value and support the need to maintain a strong work-life balance. 

    🏡 Hybrid Working - we understand that a one-size-fits all approach doesn’t suit everyone. We have an office based in Central London if you prefer working in an office. 

    Flexible Working Hours -  We’re not bound by the 9-to-5 model. Collaborate with your manager on determining a work schedule that suits you. 

    🍼 Enhanced Parental Leave - we’re proud to offer 26 weeks maternity leave and 4 weeks paternity leave at full pay.
     
    🏥 Private Medical & Dental Insurance - offered through Bupa. 

    💲Honest about Compensation - We maintain a well defined salary range which a member of the Talent Team will discuss with you during the first call. 

    Healthy Snacks & Drinks Provided - If you decide to come into the office, we have a range of snacks and drinks for you to enjoy.

    At Skyral, we are committed to fostering a culture of diversity, equality and inclusion. We also ensure that individuals with disabilities have access to reasonable adjustments. If you require such accommodations during the job application process we ask that you inform a member of our Talent Team.
    Candidarsi ora

    Altri lavori