- Professional
- Office in Fairfax
ECS is seeking an Elastic Site Reliability Engineer to work in our Fairfax, VA office.
ECS is seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency’s (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better awareness and visibility into their security posture and cyber threats. ECS is responsible for designing, building, deploying, operating, and maintaining a complete ‘Data Services’ solution which includes the collection, normalization, visualization, and sharing of cyber data from more than 100 Federal agencies. The CDM Data Services product is an integrated suite of multiple Commercial Off the Shelf (COTS) products, software configuration packages, and custom code which work together to operate as an integrated solution tailored to meet Department of Homeland Security (DHS) requirements.
We are seeking professionals who thrive in a dynamic, fast-paced, and highly collaborative environment where problem-solving, critical thinking, and a holistic approach to serving the mission are key. Our program operates within the Scaled Agile Framework (SAFe). An aptitude and enthusiasm for continuous learning, improvement, and cyber security is a must!
ECS is currently seeking a skilled Elastic Site Reliability Engineer (SRE) to support the Department of Homeland Security (DHS) Continuous Diagnostics and Mitigation (CDM) SIEM as a Service (SIEMaaS) Project. The CDM SIEMaaS project provides SIEM platform and integration services to participating agencies to support them in focusing their respective security posture on operationalizing their SIEM. The Elastic SRE will focus on maintaining and optimizing Elastic deployments in Elastic Cloud Hosted (ECH). The Elastic SRE will ensure effective monitoring for cluster health, availability, performance, and cost.
The ideal Elasticsearch SRE Engineer candidate must be able to work independently and proactively in finding solutions, and within a dynamic team structure to achieve program objectives. This person primarily performs duties of:
- Monitor and maintain the health, uptime, and availability of Elastic Deployments in Elastic Cloud Hosted (ECH) using an Elastic logging / observability cluster, ensuring compliance with service-level agreements (SLAs) and service-level objectives (SLOs).
- Analyze and optimize cluster performance (e.g., indexing, search latency, resource utilization) to meet business and tenant requirements.
- Implement cost optimization strategies (e.g., right-sizing nodes, optimizing storage tiers) to reduce operational costs while maintaining performance and reliability.
- Support Elastic SIEM Engineers to troubleshoot service degradation impacting SLA or SLO.
- Develop and maintain automation scripts and tools (e.g., via ECH APIs, Python) for cluster management and tenant onboarding to reduce manual effort.
- Forecast resource needs and plan cluster scaling within ECH to support growth in data volume and query load, ensuring scalability and resilience.
- Conduct gap analyses for prospective tenants’ Elastic environments to assess health, stability, adherence to Elastic best practices, and optimization opportunities, providing actionable recommendations.
- Collaborate with development, DevOps and SIEM Engineers to align Elastic configurations with application needs and business objectives.
- Create and maintain comprehensive documentation for cluster configurations and monitoring processes.