
Senior Linux & Cloud Administrator - Root Cause Analysis (RCA) (f/m/d) - SAP ECS (St. Leon-Rot, DE, 68789) at SAP Software Solutions | Business Applications and Technology
SAP Software Solutions | Business Applications and Technology · St. Leon-Rot, Germany · Hybrid
- Senior
- Office in St. Leon-Rot
We help the world run better
At SAP, we enable you to bring out your best. Our company culture is focused on collaboration and a shared passion to help the world run better. How? We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and future-focused work. We offer a highly collaborative, caring team environment with a strong focus on learning and development, recognition for your individual contributions, and a variety of benefit options for you to choose from.
We are seeking a highly skilled and experienced Senior Linux Infrastructure Engineer with a focus on Root Cause Analysis (RCA) to join our team. The ideal candidate will possess an extensive technical background, superior problem-solving skills, and a passion for ensuring the robustness and resilience of our Linux server infrastructure. You must feel comfortable working in a fast-paced, dynamic, and flexible environment and operate effectively in a global 24x7 international setting.
What you'll do
• Perform thorough Root Cause Analysis (RCA) to identify, analyze, and resolve complex issues within Linux server infrastructure. • Monitor, troubleshoot, and optimize the performance of Linux-based systems.
• Collaborate with cross-functional teams to gather data, replicate issues, and implement solutions.
• Create comprehensive RCA reports, system documentation, and knowledge base articles.
• Implement automation through scripting and configuration management tools to streamline diagnostic processes.
• Maintain security, compliance, and OS hardening across the infrastructure.
• Stay current with industry trends, technologies, and best practices to continuously improve systems and processes.
• Provide mentorship and detailed documentation to assist junior colleagues in implementing technical plans and adhering to best practices.
What you bring
• 10+ years of related professional experience with a focus on system diagnostics and Root Cause Analysis (RCA). Technical Skills
• Linux Systems: In-depth knowledge of Linux system internals, kernel architecture, process and memory management, filesystems, and system calls.
• Monitoring Tools: Proficiency with tools such as top, htop, vmstat, iostat, sar, ps, netstat, ss, etc.
• Logs and Tracing: Experience with journalctl, rsyslog, syslog-ng, dmesg, strace, lsof, etc.
• Networking: Advanced understanding of TCP/IP, network interfaces, routing, DNS, DHCP, firewalls, and diagnostic tools like ping, traceroute, tcpdump, wireshark, iftop, netcat, nmap, etc.
• Performance Analysis: Proficiency with tools like perf, systemd-analyze, iotop, blktrace, ioping, and benchmarks.
• Security Incident Management: Knowledge of security principles, OS hardening, compliance, and tools for vulnerability scanning and intrusion detection.
• Scripting and Automation: Strong knowledge of Shell scripting, Python, Perl, or other scripting languages, and Infrastructure-as-Code tools like Ansible, Puppet, Chef, or Terraform.
• Cloud Infrastructure: Experience with AWS, Azure, GCP, including services such as EC2, S3, IAM, VPC, security groups, and load balancers.
• Virtualization Technologies: Familiarity with Docker, Kubernetes, VMware, KVM, and other virtualization or containerization technologies.
Soft Skills:
• Analytical and Problem-Solving: Strong ability to analyze issues, identify root causes, and implement effective solutions systematically. • Documentation: Ability to create clear and detailed RCA reports and technical documentation.
• Communication: Excellent communication and networking skills, with the ability to articulate findings and solutions to technical and non-technical stakeholders.
• Incident Management: Experience with ITIL or similar frameworks for incident management.
• Continuous Learning: Proactive in acquiring new knowledge and staying updated with the latest trends and technologies.
Language Skills:
• Fluency in English, with excellent communication skills tailored towards explaining complex RCA findings.
Tools and Technologies:
• Monitoring Tools: Prometheus, Grafana. • Log Management: Splunk.
• Diagnostic Tools: top, htop, vmstat, iostat, sar, ps, netstat, ss, tcpdump, wireshark, strace, lsof.
|
||
|