- Professional
- Office in London
Formed in 2014 by a team of proven FinTech entrepreneurs, we are an FCA-regulated business providing global claim funds management and payment solutions. Operating one of the largest banking and payment settlement networks in the world, we give our customers direct access to 200 countries and currencies. Through a single integration, insurers can use this network to pay claims in as fast as 45 seconds and deliver a superior claimant experience. Our market-leading treasury proposition provides insurers with transparency and control over their claim funds, even when delegated to third parties, allowing them to have their money in the right place, at the right time, to make that all-important payment when customers need it most.
 
 With over 260 employees across our London headquarters, Europe, and the US, $93m Series C funding secured, and exceeding £15bn in processed transactions, we are only just getting started.
 
 We are collaborative, customer centric and work with integrity, whilst partnering with some of the biggest insurance leaders including Lloyd's of London and Many Pets. We take huge pride in our company culture, ensuring that everyone has a part to play, an opportunity to be heard, be involved, and the ability to make a real difference. As we continue to scale up, we want like-minded humans to join us on this exciting journey. 
Are you ready?
Your mission:
As a Site Reliability Engineer (SRE), you will play an important role in designing, building, and maintaining the infrastructure and tools necessary to support our software applications and services. You will collaborate closely with the product engineering squads, technical operations, and security teams to ensure the reliability, scalability, and security of our platform. Your responsibilities will include automating infrastructure provisioning, configuration management, and deployment pipelines, utilizing best practices and modern technologies to streamline processes and improve efficiency. You will also be responsible for monitoring system performance, identifying bottlenecks, and implementing solutions to enhance system reliability and performance.
Your responsibilities
- Cloud Platform Management: Using Azure/AWS to manage and optimize infrastructure components, ensuring scalability, reliability, and cost management.
- Infrastructure Design and Implementation: Designing, building and maintaining the cloud-based infrastructure that supports our software applications and services
- System Reliability: Ensuring the reliability, availability, and performance of systems and services by designing, implementing, and maintaining robust infrastructure.
- Infrastructure as Code (IaC): Implementing and maintaining tools for automation, monitoring, and deployment to improve efficiency and reduce manual intervention.
- Collaboration and Support: Working closely with product engineering to ensure efficient workflows and support continuous integration and delivery pipelines (CI/CD).
- Capacity Planning and Scalability: Assessing system capacity requirements and planning for future growth to ensure the system can scale and is cost efficient.
- Incident Response and Management: Monitoring system health, promptly responding to incidents, and assisting with the resolution process.
- Risk Management: Identifying potential risks and vulnerabilities in systems and implementing measures to mitigate these risks effectively.
- Monitoring and Observability: Implement and oversee monitoring tools to proactively detect and mitigate issues, ensuring high application and system availability.
- Documentation and Knowledge Sharing: Maintaining documentation and sharing knowledge with the team to ensure transparency and facilitate cross-functional collaboration.
Requirements
- 3+ years of experience in an SRE or Platform/Cloud Engineer, or similar role.
- Strong knowledge and experience in cloud platforms, we primarily host in Azure and AWS but recognize that skills are transferable.
- Experience in running and maintaining highly available and scalable platforms.
- Expertise in containerisation tools like Docker and orchestration tools such as Kubernetes.
- Experience with infrastructure as code (IaC) tools such as Terraform, Ansible, or Chef for automation and configuration management.
- Strong understanding of monitoring and observability tools.
- Knowledge of networking, security principles, and best practices in a cloud environment. Cloudflare experience would be a bonus.
- Demonstrated experience of CI/CD tools like GitHub Actions, GitLab CI/CD, or Azure DevOps for continuous integration and delivery.
- Problem-solving mindset and meticulous attention to detail.
- Strong collaboration and communication skills to work effectively with cross-functional, internationally distributed teams.
- Comfortable working in a fast-paced environment, handling incidents, and participating in on-call rotations.
- Adaptability to evolving technologies and eagerness to learn new tools and methodologies.
Benefits
- 25 days Holiday per year + Bank Holidays
- Hybrid working arrangements.
- Contributory pension scheme
- Enhanced parental leave.
- Cycle to Work Scheme
- Private Medical Insurance through Vitality
- Access to Oliva our Mental Health Therapy partners
- Discounted Gym membership
- Financial Coaching with Octopus Wealth
- 2 days of volunteering leave per year
- Sabbatical after 5 years’ service
- Ongoing Learning and Development to support you reach your career goals.
WE ARE AN EQUAL OPPORTUNITY EMPLOYER
We are committed to creating an inclusive environment that enables everyone to perform at their best, where we recognise the rights of all individuals to mutual respect and where there is an
unbiased acceptance of others. Our policies and practices aim to promote an environment that is free from all forms of Unfair discrimination and values the diversity of all people. At the heart of our policy, we seek to treat people fairly and with dignity and respect.
Apply Now 
			 
			 
			 
			