High-Performance Computing Systems Engineer at Nextonic Solutions LLC
Nextonic Solutions LLC · Rockville, United States Of America · Onsite
- Professional
- Office in Rockville
Nextonic Solutions is seeking a High-Performance Computing (HPC) Systems Engineer to join our vibrant team at the National Institutes of Health (NIH) supporting the The National Center for Advancing Translational Sciences (NCATS) located in Rockville, MD.
The High-Performance Computing (HPC) Systems Engineer will support the Scientific Computing and Informatics (SCI) team at The National Center for Advancing Translational Sciences (NCATS). This role will focus on the design, optimization, security, and maintenance of HPC and cloud-based infrastructures that enable cutting-edge biomedical research through scalable, secure, and high-performing computing environments.
Responsibilities:
- Design, configure, and maintain scalable HPC clusters for optimal performance.
- Support documentation and ATO (Authority to Operate) processes.
- Ensure infrastructure design compliance with federal security standards and best practices.
- Implement monitoring tools such as XDMoD for transparency and user reporting.
- Integrate platforms such as JupyterHub and job schedulers (e.g., Slurm) for improved interactivity.
- Develop and manage AWS-based infrastructure using Terraform, Packer, and Ansible.
- Automate deployment workflows to streamline provisioning, updates, and scaling.
- Manage systems involved in AWS Secure Cloud Bridging (SCB) and STRIDES initiatives.
- Implement CIS benchmark-aligned system hardening using OpenSCAP.
- Administer optimized compute images (CPU/GPU) for scientific workflows.
- Leverage tools such as OpenHPC, Warewulf, and Ansible for environment management.
- Lead and coordinate quarterly patch cycles.
- Partner with researchers and external stakeholders on critical projects.
- Facilitate solution transitions to other NIH centers and collaborators.
- Contribute to publications and team objectives through deep technical engagement.
Qualifications:
- Federal ATO processes experience required
- HPC architecture and performance optimization is required
- Scientific software development and deployment
- High-speed network and parallel file system architecture
- Troubleshooting, diagnostics, and technical support
- Strong communication and multitasking skills
Programming & Scripting:
- Languages - Pascal, BASIC, Delphi, Visual Basic, C, C++
- Scripting - Bash, Perl, Python, Ruby, PEAR, Tcl
Systems & Network Administration:
- Linux – RHEL/CentOS, SUSE, Debian, Ubuntu
- Windows – 95–10; NT–Server 2016
- Networking – Active Directory, TCP/IP v4/v6, DHCP, DNS, WINS
- Legacy – NOVELL 3.1–5, VPN, Citrix, Terminal Services
Monitoring & Management Tools:
- Nagios, Ganglia, HP BAC, Precise i3
- SGI SMC, HP PCM, Bright Cluster Manager (incl. Data Analytics)
Infrastructure & Automation:
- Puppet, Cobbler, Ansible, Chef
- Red Hat Satellite, Kickstart, RPM optimization
File Systems & Archiving:
- Panasas (DirectFlow/panfs), DDN (GPFS), SGI DMF, StorHouse/RFS (Filetek)
HPC Tools & Job Scheduling:
- MOAB/MAUI, Torque, PBS Pro, Windows HPC Scheduler
Visualization & Remote Access:
- Nice DCV, EnginFrame, VNC, OpenText Exceed OnDemand, Web Remote Desktop
Containerization & GPU:
- Docker, Kubernetes, Kubeflow, NVIDIA DGX-1 GPU systems
Databases:
- SQL Server (2000–2008), MySQL, Zope
High-Speed Networking:
- Infiniband, Mellanox, OFED, Voltaire, Force10
Proven experience in:
- HPC architecture and performance tuning
- Cybersecurity in HPC/cloud environments
- Infrastructure as Code (AWS, Terraform, Ansible, Packer)
- Supporting scientific workflows in research environments