- Professional
- Escritório em New York
The candidate should utilize a framework for monitoring tools, escalate key issues, and ensure timely service implementation. They will be diagnosing, troubleshooting, installing, and repairing all software, hardware, and components. Furthermore, they should be proficient in installing, configuring, and troubleshooting networking equipment like routers and switches, and have a good understanding of the OSI Model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP, SMTP, FTP, TFTP). Configuring Terminal Servers for out-of-band management, managing daily issues including health checks of servers and processes, and working closely with end-users, development teams, and Infrastructure teams to prioritize, resolve, and mitigate outages are also part of the responsibilities.
The role also involves server installation and maintenance, network installation and maintenance, site builds and refreshes while meeting current quality standards, and interacting with onsite staff and vendors for hardware replacement, delivery, and diagnostics. Additionally, the candidate will perform operational tasks associated with data center implementation, migration, deployments, cabling, and rack and stack.
As for the requirements, the candidate should have experience with cluster bring-up, drivers, loading, and GPU end-to-end testing in a cluster with InfiniBand. They should also have experience with the setup of GPU servers in a cluster, proficiency in Linux environments, and tasks such as shell scripting. Strong skills in installation, configuration, and troubleshooting of Linux operating systems, experience in OpenStack cloud operations, and excellent data center organization skills with meticulous attention to detail are also required. Familiarity with fiber and copper network cabling, including IP and SAN deployments, and the ability to maintain acceptable ticket loads and incident SLAs, follow documented escalation procedures, and sync with global teams on various tasks and upcoming initiatives are essential.
Understanding and adhering to documented policies, processes, and procedures, assisting with process improvement initiatives, and documentation of policies, processes, and procedures, including runbooks, are also crucial. The candidate should be able to move 50+ pounds as well.
Candidatar-se agora