HPC - Team Lead at SHI LOCUZ ENTERPRISE SOLUTIONS PVT LTD
SHI LOCUZ ENTERPRISE SOLUTIONS PVT LTD · Hyderabad, India · Onsite
- Professional
- Office in Hyderabad
- Experience architecting and maintaining HPC/AI systems.
- Linux system administration
- Cluster management
- System and software configuration management
- High speed networking
- Resource managers and schedulers
- High speed parallel storage
- Monitoring and alerting
- Strong understanding of HPC/AI architectures and concepts.
- Experience supporting and managing a group of HPC/AI Clusters.
- Excellent knowledge in prototyping and deploying HPC/AI clusters.
- Extensive experience in troubleshooting Linux OS, filesystems and cluster hardware.
- Good command of various Linux scripting tools, like bash, Perl, python, etc.
- Experience implementing, maintaining, and verifying defined security policies.
- To be willing to maintain a flexible work schedule.
- A positive attitude and willingness to help enable the lab users for success.
- Excellent guidance and teamwork skills.
- RedHat, Ubuntu, SuSE OS
- Cluster Tools (Bright, xCAT, werewolf, OpenHPC, ROCKS etc)
- InfiniBand
- Lustre, BeeGFS and GPFS architecture and maintenance
- Configuration management software (Ansible, Puppet)
- SLURM/PBS/LSF/Gridengine Scheduler
- SPACK software manager
- Experience in AI Servers & Software stack Deployment.
- Experience on container technologies and orchestration tools - docker, singularity, Apptainer, Kubernetes.
- Hands-on with AI/ML tools: TensorFlow, PyTorch, Keras, ONNX, JAX.
- Experience in benchmarking and performance optimization of large-scale HPC/AI systems
- Experience in Linux, and/or Windows Operating System (OS), including file management, scripting, editing, and security.
- Log consolidation and monitoring (ganglia, Grafana etc.)
- Lifecycle and patch management experience.
- Good logical reasoning & analytical skill
- Good communication skill
- Collaborative, co-operative, and commitment mindset.
- Teamwork
- Excellent analytical and problem-solving skills.
- Ability to work independently and within cross-functional teams.
- Detail-oriented with good documentation practices.
- Excellent interpersonal, communication, customer interaction, documentation skills and decision-making ability.