
- Professional
- Oficina en London
- Team: Engineering
- Location: London or Manchester. Hybrid working supported, around 10-20% on-site requirement with the ability to commute into either Manchester or London office as necessary for collaboration, monthly in-person days, and quarterly department days.
- Full-time - flexible hours and working options available (please state in your application)
- Expected Start Date: ASAP
- Salary Range: We would like to pay in the range of £40,000-50,000 per annum for this role, depending on experience and location.
About Spektrix
How We Work
How We Connect
The Role
Accountabilities and Responsibilities
Accountabilities:
- Ensuring observability tools are configured to give visibility of the health of the estate.
- Apply and track usage of known work arounds and advise the rest of the engineering team on improvements needed to reliability and quality of the product
- Ensuring problems and tasks are investigated thoroughly and solved accurately and methodically.
- Responding to incidents in a timely way, in line with our processes.
- Keeping our how-to guides and documentation up-to-date and concise.
- Working in line with our security and compliance policies and processes, particularly when working with customer data and production systems.
- Maintaining and improving the reliability, scalability, and quality of operational Spektrix services and systems.
- Documenting, reporting, resolving, and mitigating defects, problems, risks, and instances of nonconformance.
Responsibilities:
- Continuously improving how we document, investigate, and triage issues.
- Sharing what we learn through dashboards, incident reviews, updated documentation, and collaborative work such as coaching.
- Seeking opportunities to automate things and collaborating on internal improvement projects.
- Applying Lean principles, and using analysis and data to pinpoint where things are getting stuck. Identify opportunities for eliminating waste and delivering more effectively and efficiently.
- Collaborating with Product, Engineering, and our First-Line Support teams to make sure we are prioritising the right things.
- Contribute to platform resilience strategies such as capacity planning, redundancy, failover, and disaster recovery.
- Ensuring the accuracy, relevance, and usefulness of our alerts, monitoring, and observability.
- Participate in or lead post-incident reviews, and identify required actions.
- Design and maintain operational runbooks and readiness checklists.
Key Requirements
Skills and Experience:
- Experience of working in an operations or support team, managing Azure-hosted SaaS applications.
- SQL Server and Azure SQL.
- Log Analytics and Kusto Query Language (KQL).
- Able to read and understand logs and stack traces from C# .NET applications. You don’t need to be a software engineer, but familiarity with C# and TDD would be useful.
- Experience with a range of alerting and monitoring tools. We use Application Insights, Azure Monitor, Pagerduty, Grafana, Logz.io, and Cloudflare tools; experience with these particular tools is not essential - similar experience is welcome.
- Familiarity with CI/CD pipelines, Azure Devops, and Terraform would be beneficial but not essential.
Communications and Behaviours:
- Can calmly, confidently, and competently co-ordinate incident response; clearly communicating accurate, timely, and relevant information to a range of stakeholders across the organisation.
- Highly collaborative; Can communicate fluently with engineers as well as client success teams. Can break down and document complicated technical requirements concisely.
- Giving and receiving feedback in an honest, kind, and reflective manner. Learning from mistakes and being imaginative about ways to improve things.
- Curious, and keen to learn new skills and technologies.
A Day In The Life Of…
- On a typical day, you'll be working closely with colleagues pairing in a virtual meeting room, collaborating on items from the team's Kanban board and identifying areas for improvement. The team aligns at daily standup on work in progress, current priorities, and any support or assistance needed.
- We review incoming work requests together to understand their context and urgency. Using self-organising principles, the team decides how to divide the work - whether pairing, mobbing, or working solo - based on what's most effective.
- If Clients are putting high-demand tickets on-sale today, you may need to scale a client out of their DB pool to ensure everything runs smoothly, and put everything back in place after it’s over.
- You’ll contribute to a range of activities including discovery, investigation and spikes, writing or refining tickets, fault-finding and fixing, testing, documentation, and build and release tasks. When it’s the team’s turn to handle the release, you’ll take part in the release process.
- Throughout the day, you’ll monitor alerts and investigate any that arise. If needed, you may join the Incident Room alongside a small group of cross-functional colleagues to calmly and methodically identify and resolve issues. This is done in close collaboration with customer-facing teams to ensure clarity and continuity.
- At other times, you'll participate in team sessions focused on reflecting, planning, and finding ways to improve how we work together and deliver on our goals.
Benefits
- Flexible working with support for WFH set up. Different teams may have different practices that require people in the office or online at specific times.
- NHS top up scheme (covering dental, optical, therapy & counselling, prescription and other health related costs)
- Continuous development supported by Line manager, learning budget
- Enhanced Maternity, Adoption & Shared Parental Leave
- 35 days paid leave annually, inclusive of annual leave, bank holidays and a Birthday day off, all able to use flexibly
- 4 weeks paid sabbatical after 5 years of service
- 2 volunteering days per year
- Company pension scheme of 4%
- Free snacks, drinks and breakfast items in all our offices
- Varied range of regular socials across all our offices
- Cycle to work & Season Ticket Loans
- Travel stipend for commuting
- A quiet working space at home where you can consistently take video calls without interruptions
- An internet connection that supports your participation in video calls and access to our systems and service.