 
							
							
						
					- Senior
- Escritório em London
Senior Site Reliability Engineer - London
About Finova
Finova is the UK’s largest mortgage and savings technology provider, powering one in every five mortgages across the country. Our agile, cloud-native solutions help over 60 banks, building societies, specialist lenders and equity release providers — plus a network of 2,400+ brokers — stay ahead of the market.
We offer a flexible, proven suite of software that covers the full customer journey — from mortgage and savings origination to servicing and CRM. Backed by an open architecture and a team with deep industry expertise, our platform is built to scale. Today, we process over £50 billion in loans each year, manage nearly £50 billion in savings, and support the digital servicing of more than 650,000 UK borrower accounts.
For Lenders -
Finova offers a flexible, modular technology suite designed to help lenders move faster, scale efficiently, and deliver standout digital experiences. You can run your entire mortgage and savings business on Finova — or just use the modules you need, tailored by our team or configured in-house through our low-code platform.
Our solutions include:
- Lending –end-to-end mortgage and specialist lending software built for speed,flexibility, and scale.
- Decisioning –powerful tools for smarter, more personalised lending decisions.
- Servicing –intuitive workflows and automation to simplify day-to-day account managementand customer servicing.
- Savings –configurable onboarding and customer engagement tools across all savingsproducts.
- IntermediaryManager – broker relationship and compliance management, powered by real-timeexternal data.
- Broker ID –fast, automated broker verification and compliance tracking using KYB, KYC, andlive monitoring from public data sources.
Lenders use Finova to launch products faster, process applications up to 50% more efficiently, and reduce operational costs — all while staying fully compliant in a fast-moving market.
About the Role:
We are seeking an experienced SRE to spearhead the Site Reliability Engineering function. As an SRE, you will be responsible for the availability, scalability, and performance of our core systems, with a particular focus on monitoring our .NET applications deployed in cloud environments such as AKS, EKS, App Services, and VMs. You will work independently and collaboratively with various engineering teams to ensure our systems meet the highest standards of reliability and operational excellence.
What will you be doing?
Monitoring and Performance Optimization:
- Design, implement, and maintain robust monitoring and alerting systems for .NET applications running in AKS, EKS, App Services, and VMs.
- Analyse system performance metrics, establish baselines, identify bottlenecks, and implement improvements for scalability and efficiency.
- Set up, configure, and optimize observability tools (e.g., Prometheus, Grafana, Datadog, etc.) to monitor key system metrics, logs, and traces.
Reliability and Incident Management:
- Ensure high availability and disaster recovery for all critical systems.
- Lead incident response efforts and post-incident analysis to mitigate recurrence and improve system resilience.
- Develop and maintain SLOs, SLIs, and error budgets, ensuring services meet agreed-upon reliability targets.
Automation and Infrastructure Management:
- Automate routine tasks and processes to improve efficiency and reduce manual errors.
- Work with infrastructure-as-code tools (e.g., Terraform, Ansible, Bicep) to manage cloud resources effectively.
- Collaborate with DevOps and CloudOps teams to build and deploy infrastructure using CI/CD pipelines (e.g., Azure DevOps, GitLab CI).
Collaboration and Mentorship:
- Work closely with product development teams to ensure smooth application releases and system performance.
- Provide mentorship and guidance to junior SREs and engineers.
- Drive best practices in terms of reliability, monitoring, and incident management across the engineering organization.
Continuous Improvement:
- Identify areas for improvement in our infrastructure, monitoring, and reliability practices.
- Stay up-to-date with industry trends, tools, and technologies to continuously improve our operational processes.
About You:
- In terms of your experience, your attitude is everything, but we'd particularly love to see:
- 5+ years of experience in Site Reliability Engineering, DevOps, or Systems Engineering, with a strong focus on monitoring, alerting and incident management.
- Hands-on experience monitoring .NET applications in production environments, preferably using tools like Grafana, Datadog and Azure Monitor.
- Extensive experience with AKS, EKS, App Services, and VMs in cloud environments (AWS, Azure).
- Proven ability to work independently and manage multiple projects in a fast-paced environment.
Technical Skills:
- Strong proficiency in cloud platforms (AWS, Azure), container orchestration (Kubernetes, AKS, EKS), and microservices architecture.
- Proficiency in infrastructure-as-code tools like Terraform, Azure Resource Manager, or similar.
- Experience with monitoring and observability tools such as Prometheus, Grafana, Datadog
- Strong scripting skills (e.g., PowerShell, Bash, Python).
Soft Skills:
- Excellent communication skills, both verbal and written, with the ability to convey complex technical concepts to non-technical stakeholders.
- Strong problem-solving abilities and the ability to troubleshoot complex systems under pressure.
- A proactive and collaborative approach to work, with the ability to lead by example.
Preferred Qualifications:
- Experience with monitoring and maintaining financial services or FinOps platforms.
- Certifications in cloud platforms (AWS Certified Solutions Architect, Azure DevOps, Kubernetes Certified Administrator).
- Experience with scaling and maintaining high-performance systems with large data throughput.
What We Offer:
Hybrid working: At Finova, we believe the best outcomes come from working together - and having the flexibility to work in a way that suits both our people and our business. We operate a hybrid working model, with most teams spending around three days a week in the office and with our customers. This time together helps us stay connected, collaborate more effectively, and solve complex challenges as a team. We also know that flexibility matters. Our approach is designed to support a healthy balance, combining in-person collaboration with the freedom to work remotely where it makes sense.
Holiday: 25 days holiday plus bank holidays, bank holiday trading and holiday purchase options, the opportunity to work from anywhere in the world for up to 4 weeks per year.
Looking After You: Life Assurance, Group Income Protection, Private Medical Insurance, a pension scheme via Salary Exchange, an Employee Assistance Programme, and access to a Virtual GP.
Family-Friendly Policies: Enhanced maternity and paternity pay, as well as paid time off for fertility treatments and pregnancy loss.
Extra Perks: Cycle to Work Scheme, discounts on shops, restaurants, and gym memberships, free fresh fruit daily, and opportunities to join colleague networks and social groups.
Giving Back: One paid volunteering day annually and the Give-As-You-Earn scheme to support your favourite charities.
Equal Opportunity Statement
We value diversity and are committed to creating an inclusive environment for all employees. If you’re passionate about this role but don’t meet all the criteria, please reach out—we’d love to discuss how your skills and experiences align with our needs.
Candidatar-se agora 
			 
			 
			 
			