UMBRA - JDE-High - Principal Site Reliability Engineer na Clarity Innovations
Clarity Innovations · Fort Meade, Estados Unidos Da América · Onsite
- Senior
- Escritório em Fort Meade
Clarity Innovations is a trusted national security partner, dedicated to safeguarding our nation’s interests and delivering innovative solutions that empower the Intelligence Community (IC) and Department of Defense (DoD) to transform data into actionable intelligence, ensuring mission success in an evolving world.
Our mission-first software and data engineering platform modernizes data operations, utilizing advanced workflows, CI/CD, and secure DevSecOps practices. We focus on challenges in Information Warfare, Cyber Operations, Operational Security, and Data Structuring, enabling end-to-end solutions that drive operational impact.
We are committed to delivering cutting-edge tools and capabilities that address the most complex national security challenges, empowering our partners to stay ahead of emerging threats and ensuring the success of their critical missions. At Clarity, we are people-focused and set on being a destination employer for top talent, offering an environment where innovation thrives, careers grow, and individuals are valued. Join us as we continue to lead innovation and tackle the most pressing challenges in national security.
Position Overview
The Network Operations Center Engineer assists the NOC Lead to manage and oversee the daily operations of an 8am - 5pm EST classified cloud development environment, with a strong emphasis on maintaining Kubernetes-hosted services. The NOC Engineer is responsible for coordinating incident response, system monitoring, team leadership, performance reporting, and ensuring the development environment’s security and availability.
Key Responsibilities
Carry out day-to-day operations of the classified NOC, ensuring adherence to service level agreements and system uptime requirements
Perform monitoring and support of cloud-based systems, networks, and containerized applications in Kubernetes clusters
Coordinate incident response, troubleshooting, and escalation procedures
Ensure timely detection, resolution, and documentation of service-impacting events
When NOC lead is absent, act as the primary point of contact for cloud system alerts, outages, and classified network incidents; communicate status to stakeholders and leadership
Ensure 24/7 observability of network, platform, and container-level components using tools such as Prometheus, Grafana, Fluentd, and Elastic Stack
Draft technical guidance for NOC staff and collaborate with engineering, cybersecurity, and cloud teams
Maintain situational awareness of the system through dashboards, logs, and proactive monitoring tools
Develop and maintain standard operating procedures, incident response plans, runbooks, and shift logs
Assist NOC lead conducting daily stand-ups, shift handovers, and weekly ops reviews
Generate operational metrics and performance reports
Ensure compliance with federal security policies and contribute to continuous accreditation of the cloud system under RMF
Perform readiness drills, after-action reviews, and contribute to lessons-learned activities
Qualifications
Must be able to obtain and maintain a TS/SCI security clearance (note, only US Citizens are eligible for security clearances)
Expertise in cloud infrastructure (AWS GovCloud, Azure Government, or C2S/C2E/JWCC), virtualization, and hybrid environments
Understanding of secure networking, load balancers, DNS in cloud-native architectures, and inter-cluster communication
Operational experience with Kubernetes, containerized workloads, and supporting technologies (Docker, Helm, Fluentd, Kustomize)
Strong understanding of monitoring tools (e.g., Prometheus, Grafana, ELK Stack) and ticketing systems (e.g., osTicket, Jira)
Familiarity with GitOps workflows and infrastructure as code using Terraform or Flux
Familiarity with DoD/IC cybersecurity compliance standards, ATO processes, and classified system governance
Excellent communication skills and the ability to clearly brief complex operational topics to leadership and mission partners
Preferred Qualifications
Active US TS/SCI security clearance with CI polygraph or higher
5+ years of experience in IT operations or network/system administration