- Escritório em Houston
Description
HPC Resident Engineer (Omega Environment)
Onsite – Houston, TX
Job Summary
We are seeking an experienced HPC Resident Engineer to provide on-site technical expertise and proactive operational support for a large-scale High-Performance Computing (HPC) environment running Omega software. This role supports one of the customer’s largest global HPC cluster installations, designed to handle intensive seismic processing workloads using a combination of CPU and GPU-accelerated nodes.
The Resident Engineer will be responsible for ensuring optimal cluster performance, supporting infrastructure refresh initiatives, and driving continuous improvements in reliability, efficiency, and manageability of the HPC environment. This position requires close collaboration with internal stakeholders, vendors, and SLB to support both current and next-generation infrastructure.
Key Responsibilities
- Provide hands-on, on-site operational support for large-scale HPC clusters running Omega software
- Ensure high availability, performance, and reliability of CPU- and GPU-based compute environments
- Monitor system health, analyze performance metrics, and proactively identify and mitigate potential issues
- Support infrastructure refresh initiatives, including:
- Compute node upgrades and replacements
- Storage migrations and platform transitions
- Lead and support the transition from existing storage platforms to PixStor on Dell
- Perform break/fix troubleshooting on current hardware and software components
- Coordinate with SLB and other vendors for support of out-of-warranty systems
- Provide recommendations to improve cluster management, operational workflows, and overall efficiency
- Document system configurations, procedures, and best practices
- Act as a trusted technical advisor to the customer’s engineering and operations teams
Requirements
- Proven experience supporting High-Performance Computing (HPC) environments in production
- Strong knowledge of Linux-based systems in large, clustered environments
- Experience supporting Omega or similar seismic processing applications
- Hands-on experience with:
- CPU and GPU-accelerated compute nodes
- Large-scale cluster architectures
- High-performance storage solutions
- Strong troubleshooting skills across hardware, operating systems, and cluster components
- Experience working with vendors and managing escalations for hardware and software issues
- Ability to work independently in an on-site, customer-facing role
Preferred Qualifications
- Experience with PixStor and/or Dell HPC storage platforms
- Background supporting seismic processing or energy-sector HPC workloads
- Familiarity with cluster scheduling, performance tuning, and capacity planning
- Experience supporting infrastructure refresh or data center modernization projects
Work Environment
- 100% onsite role in Houston, TX
- Hands-on access to one of the largest HPC cluster environments for this customer globally
- Collaborative environment working with internal teams, customer stakeholders, and vendors
Benefits
Salary: 90-110K
Our comprehensive benefits package for full-time salaried employees is effective immediately upon the start date. Benefits include comprehensive PPO medical coverage with access to a Health Savings Account (HSA) option, a vision plan, and dental insurance with the base dental plan option paid for by PGTEK. Life Insurance, Short and Long-Term disability, and Critical Illness insurance have premiums covered. Additionally, PGTEK offers a matching 401(k) plan and a discount on pet insurance through ASPCA Pet Insurance. An Employee Assistance Program is available at no cost to all employees. PGTEK offers a generous amount of PTO and Holidays, and an Education Assistance Program is available after 12 months of employment.
ABOUT PGTEK:
PGTEK is a true consulting organization dedicated to helping clients achieve their business and technology objectives utilizing our decades of experience and business relationships. PGTEK invests in the educational advancements of our staff by providing the necessary resources to complete Professional and Business Certifications. Our company is our people, and we treat them like family.
EOE, including disability/veterans