Anomali is headquartered in Silicon Valley and is the Leading AI-Powered Security Operations Platform that is modernizing security operations. At the center of it is an omnipresent, intelligent, and multilingual Anomali Copilot that automates important tasks and empowers your team to deliver the requisite risk insights to management and the board in seconds. The Anomali Copilot navigates a proprietary cloud-native security data lake that consolidates legacy attempts at visibility and provides first-in-market speed, scale, and performance while reducing the cost of security analytics. Anomali combines ETL, SIEM, XDR, SOAR, and the largest repository of global intelligence in one efficient platform. Protect and drive your business with better productivity and talent retention.
We're looking for a Senior Engineer, AI Evaluation & Reliability to lead the design and execution of evaluation, quality assurance, and release gating for our agentic AI features.
You'll develop the pipelines, datasets, and dashboards that measure and improve agent performance across real-world SOC workflows -- ensuring every release is safe, reliable, efficient, and production-ready.
You will guarantee that our agentic AI features operate at full production scale, ingesting and active on millions of SOC alerts per day, with measurable impact on analyst productivity and risk mitigation. This role partners closely with the Product team to deliver operational excellence and trust in every AI-drive capability.
Key Responsibilities:
o Define quality metrics: Translate SOC use cases into measurable KPI's (e.g., precision/recall, MTTR, false-positive rate, step success, latency/cost budgets).
o Build continuous evaluations: Develop offine/online evaluation pipelines, regression suites, and A/B or canary test; integrate them into CI/CD for release gating.
o Curate and manage datasets: Maintain gold-standard datasets and red-team scenarios; establish data governance and drift monitoring practices.
o Ensure safety, reliability, and explainability: Partner with Platform and Security Research to encode guardrails, policy enforcement, and runtime safety checks.
o Expand adversarial test coverage (prompt injection, data exfiltration, abuse scenarios).
o Ensure explainability and auditability of agent decisions, maintaining traceability and compliance of AI-driven workflows.
o Production reliability & observability: Monitor and maintain reliability of agentic AI features post-release -- define and uphold SLIs/SLOs, establish alerting and rollback strategies, and conduct incident post-mortems.
o Design and implement infrastructure to scale evaluation and production pipelines for real-time SOC workflows across cloud environments.
o Drive agentic system engineering: Experiment with multi-agent systems, tool-using language models, retrieval-augmented workflows, and prompt orchestration.
o Manage model and prompt lifecycle -- track version, rollout strategies, and fallbacks; measure impact through statistically sound experiments.
o Collaborate cross-functionally: Work with Product, UX and Engineering to prioritize high-leverage improvements, resolve regressions quickly, and advance overall system reliability.
Qualifications
Required Skills and Experience
o 5+years building evaluation or testing infrastructure for ML/LLM systems or large-scale distributes systems.
o Proven ability to translate product requirements into measurable metrics and test plans.
o Strong Python skills (or similar language) and experience with modern data tooling.
o Hands-on experience running A/B tests, canaries, or experiment frameworks.
o Experience defining and maintaining operational reliability metrics (SLIs/SLOs) for AI-driven systems.
o Familiarity with large-scale distributed or streaming systems serving AI/agent workflows (millions of events or alerts/day).
o Excellent communication skills -- able to clearly convey technical results and trade-offs to engineer, PMs, and analysts.
o This position is not eligible for employment visa sponsorship. The successful candidate must not now, or in the future, require visa sponsorship to work in the US
Preferred Qualifications
o Experience evaluating or deploying agentic or tool-using AI systems (multi-agent orchestration, retrieval-augmented reasoning, prompt lifecycle management).
o Familiarity with LLM evaluation frameworks (e.g., model-graded evals, pairwise/rubric scoring, preference learning).
o Exposure to AI safety testing, including prompt injection, data exfiltration, abuse taxonomies, and resilience validation.
o Understanding of explainability and compliance requirements for autonomous workflows, ensuring traceability and auditability of AI behavior.
o Background in security operations, incident response, or enterprise automation; comfortable interpreting logs, alerts, and playbooks.
o Startup experience delivering high-impact systems in fast-faced, evolving environments.
Company:
Anomali is headquartered in Silicon Valley and is the Leading AI-Powered Security Operations Platform that is modernizing security operations. At the center of it is an omnipresent, intelligent, and multilingual Anomali Copilot that automates important tasks and empowers your team to deliver the requisite risk insights to management and the board in seconds. The Anomali Copilot navigates a proprietary cloud-native security data lake that consolidates legacy attempts at visibility and provides first-in-market speed, scale, and performance while reducing the cost of security analytics. Anomali combines ETL, SIEM, XDR, SOAR, and the largest repository of global intelligence in one efficient platform. Protect and drive your business with better productivity and talent retention.
We're looking for a Senior Engineer, AI Evaluation & Reliability to lead the design and execution of evaluation, quality assurance, and release gating for our agentic AI features.
You'll develop the pipelines, datasets, and dashboards that measure and improve agent performance across real-world SOC workflows -- ensuring every release is safe, reliable, efficient, and production-ready.
You will guarantee that our agentic AI features operate at full production scale, ingesting and active on millions of SOC alerts per day, with measurable impact on analyst productivity and risk mitigation. This role partners closely with the Product team to deliver operational excellence and trust in every AI-drive capability.
Key Responsibilities:
o Define quality metrics: Translate SOC use cases into measurable KPI's (e.g., precision/recall, MTTR, false-positive rate, step success, latency/cost budgets).
o Build continuous evaluations: Develop offine/online evaluation pipelines, regression suites, and A/B or canary test; integrate them into CI/CD for release gating.
o Curate and manage datasets: Maintain gold-standard datasets and red-team scenarios; establish data governance and drift monitoring practices.
o Ensure safety, reliability, and explainability: Partner with Platform and Security Research to encode guardrails, policy enforcement, and runtime safety checks.
o Expand adversarial test coverage (prompt injection, data exfiltration, abuse scenarios).
o Ensure explainability and auditability of agent decisions, maintaining traceability and compliance of AI-driven workflows.
o Production reliability & observability: Monitor and maintain reliability of agentic AI features post-release -- define and uphold SLIs/SLOs, establish alerting and rollback strategies, and conduct incident post-mortems.
o Design and implement infrastructure to scale evaluation and production pipelines for real-time SOC workflows across cloud environments.
o Drive agentic system engineering: Experiment with multi-agent systems, tool-using language models, retrieval-augmented workflows, and prompt orchestration.
o Manage model and prompt lifecycle -- track version, rollout strategies, and fallbacks; measure impact through statistically sound experiments.
o Collaborate cross-functionally: Work with Product, UX and Engineering to prioritize high-leverage improvements, resolve regressions quickly, and advance overall system reliability.
Qualifications
Required Skills and Experience
o 5+years building evaluation or testing infrastructure for ML/LLM systems or large-scale distributes systems.
o Proven ability to translate product requirements into measurable metrics and test plans.
o Strong Python skills (or similar language) and experience with modern data tooling.
o Hands-on experience running A/B tests, canaries, or experiment frameworks.
o Experience defining and maintaining operational reliability metrics (SLIs/SLOs) for AI-driven systems.
o Familiarity with large-scale distributed or streaming systems serving AI/agent workflows (millions of events or alerts/day).
o Excellent communication skills -- able to clearly convey technical results and trade-offs to engineer, PMs, and analysts.
o This position is not eligible for employment visa sponsorship. The successful candidate must not now, or in the future, require visa sponsorship to work in the US
Preferred Qualifications
o Experience evaluating or deploying agentic or tool-using AI systems (multi-agent orchestration, retrieval-augmented reasoning, prompt lifecycle management).
o Familiarity with LLM evaluation frameworks (e.g., model-graded evals, pairwise/rubric scoring, preference learning).
o Exposure to AI safety testing, including prompt injection, data exfiltration, abuse taxonomies, and resilience validation.
o Understanding of explainability and compliance requirements for autonomous workflows, ensuring traceability and auditability of AI behavior.
o Background in security operations, incident response, or enterprise automation; comfortable interpreting logs, alerts, and playbooks.
o Startup experience delivering high-impact systems in fast-faced, evolving environments.
Company: Anomali is headquartered in Silicon Valley and is the Leading AI-Powered Security Operations Platform that is modernizing security operations. At the center of it is an omnipresent, intelligent, and multilingual Anomali Copilot that automates important tasks and empowers your team to deliver the requisite risk insights to management and the board in seconds. The Anomali Copilot navigates a proprietary cloud-native security data lake that consolidates legacy attempts at visibility and provides first-in-market speed, scale, and performance while reducing the cost of security analytics. Anomali combines ETL, SIEM, XDR, SOAR, and the largest repository of global intelligence in one efficient platform. Protect and drive your business with better productivity and talent retention. Do more with less. Be Different. Be the Anomali.Learn more at http://www.anomali.com.Job Description We're looking for a Senior Engineer, AI Evaluation & Reliability to lead the design and execution of evaluation, quality assurance, and release gating for our agentic AI features. You'll develop the pipelines, datasets, and dashboards that measure and improve agent performance across real-world SOC workflows -- ensuring every release is safe, reliable, efficient, and production-ready. You will guarantee that our agentic AI features operate at full production scale, ingesting and active on millions of SOC alerts per day, with measurable impact on analyst productivity and risk mitigation. This role partners closely with the Product team to deliver operational excellence and trust in every AI-drive capability. Key Responsibilities:o Define quality metrics: Translate SOC use cases into measurable KPI's (e.g., precision/recall, MTTR, false-positive rate, step success, latency/cost budgets). o Build continuous evaluations: Develop offine/online evaluation pipelines, regression suites, and A/B or canary test; integrate them into CI/CD for release gating. o Curate and manage datasets: Maintain gold-standard datasets and red-team scenarios; establish data governance and drift monitoring practices. o Ensure safety, reliability, and explainability: Partner with Platform and Security Research to encode guardrails, policy enforcement, and runtime safety checks. o Expand adversarial test coverage (prompt injection, data exfiltration, abuse scenarios). o Ensure explainability and auditability of agent decisions, maintaining traceability and compliance of AI-driven workflows. o Production reliability & observability: Monitor and maintain reliability of agentic AI features post-release -- define and uphold SLIs/SLOs, establish alerting and rollback strategies, and conduct incident post-mortems. o Design and implement infrastructure to scale evaluation and production pipelines for real-time SOC workflows across cloud environments. o Drive agentic system engineering: Experiment with multi-agent systems, tool-using language models, retrieval-augmented workflows, and prompt orchestration. o Manage model and prompt lifecycle -- track version, rollout strategies, and fallbacks; measure impact through statistically sound experiments. o Collaborate cross-functionally: Work with Product, UX and Engineering to prioritize high-leverage improvements, resolve regressions quickly, and advance overall system reliability. Qualifications Required Skills and Experience o 5+ years building evaluation or testing infrastructure for ML/LLM systems or large-scale distributes systems. o Proven ability to translate product requirements into measurable metrics and test plans. o Strong Python skills (or similar language) and experience with modern data tooling. o Hands-on experience running A/B tests, canaries, or experiment frameworks. o Experience defining and maintaining operational reliability metrics (SLIs/SLOs) for AI-driven systems. o Familiarity with large-scale distributed or streaming systems serving AI/agent workflows (millions of events or alerts/day). o Excellent communication skills -- able to clearly convey technical results and trade-offs to engineer, PMs, and analysts. o This position is not eligible for employment visa sponsorship. The successful candidate must not now, or in the future, require visa sponsorship to work in the USPreferred Qualifications o Experience evaluating or deploying agentic or tool-using AI systems (multi-agent orchestration, retrieval-augmented reasoning, prompt lifecycle management). o Familiarity with LLM evaluation frameworks (e.g., model-graded evals, pairwise/rubric scoring, preference learning). o Exposure to AI safety testing, including prompt injection, data exfiltration, abuse taxonomies, and resilience validation. o Understanding of explainability and compliance requirements for autonomous workflows, ensuring traceability and auditability of AI behavior. o Background in security operations, incident response, or enterprise automation; comfortable interpreting logs, alerts, and playbooks. o Startup experience delivering high-impact systems in fast-faced, evolving environments.
Equal Opportunities Monitoring
We are an Equal Opportunity Employer. It is our policy to ensure that all eligible persons have equal opportunity for employment and advancement on the basis of their ability, qualifications, and aptitude. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, pregnancy, genetic information, disability, status at a protected veteran, or any other protected category under applicable federal, state, and local laws.
If you are interested in applying for employment with Anomali and need special assistance or accommodation to apply for a posted position, contact our Recruiting team at [email protected].
Compensation Transparency
$140,000 - $190,000 USD
Please note that the annual base salary range is a guideline and, for candidates who receive an offer, the base pay will vary based on factors such as work location, as well as, knowledge, skills and experience of the candidate. In addition to base pay, this position is eligible for benefits, and may be eligible for equity.
Estas cookies son necesarias para que el sitio web funcione y no se pueden desactivar en nuestros sistemas. Puede configurar su navegador para bloquear estas cookies, pero entonces algunas partes del sitio web podrían no funcionar.
Seguridad
Experiencia de usuario
Cookies orientadas al público objetivo
Estas cookies son instaladas a través de nuestro sitio web por nuestros socios publicitarios. Estas empresas pueden utilizarlas para elaborar un perfil de sus intereses y mostrarle publicidad relevante en otros lugares.
Google Analytics
Anuncios Google
Utilizamos cookies
🍪
Nuestro sitio web utiliza cookies y tecnologías similares para personalizar el contenido, optimizar la experiencia del usuario e indvidualizar y evaluar la publicidad. Al hacer clic en Aceptar o activar una opción en la configuración de cookies, usted acepta esto.
Los mejores empleos remotos por correo electrónico
¡Únete a más de 5.000 personas que reciben alertas semanales con empleos remotos!