Firmenlogo

Hybrid Master Principal Artifical Intelligence Architect Master Principal Artifical Intelligence Architect with verification

Oracle  ·  nan, · Hybrid

Apply Now

About the job

Job Description

The JAPAC Cloud Engineering Center of Excellence (CE CoE) is a divisional organization across Japan and Asia Pacific whose role is to accelerate consumption of Oracle Cloud Services by providing a world-class customer experience on the Oracle Cloud. Customers will have access to cutting edge and deep technical subject matter expertise and solutions across their technical lifecycle of Incubate to Implement. The primary mission of the CoE teams is to be help customers to move their workloads to Oracle Cloud Infrastructure (OCI) and playing the role their engineering partner.

We are seeking a Senior AI Architect & Infrastructure Specialist, having at-least 15-20 years of deep technical hands-on expertise in designing and implementing AI and HPC infrastructure and scalable architectures to drive the next phase of our AI growth initiatives. You have great communication skills and can interact at all levels (CIO, CTO, Product Engineering, IT Architects and developers). The job requires you to engage cross-functional teams including data scientists, AI engineers, and DevOps teams to align AI infrastructure with evolving business and technical needs. You will be accountable and empowered to drive building innovative experiences in a fast-paced, startup-like environment. Success is measured providing the field and customers with world class technical subject matter expertise and drive the following outcomes

  • Cloud consumption growth
  • New cloud customer acquisition
  • A high performance, innovative, agile and collaborative team player
  • Continuous improvement of time to value


Career Level - IC5

Responsibilities

RESPONSIBILITIES

Why Join Us?

OCI (Oracle Cloud Infrastructure) AI Infrastructure is at the forefront of building a cutting-edge, ultra-high-performance GPU platform designed to support AI/ML/HPC workloads. The CoE provides a platform to be part of the AI revolution, architecting customer centric systems and solve real-world business problems with AI

Requirements - Technical

You bring your proven experience in 3 or more of the following areas. AI Infrastructure experience and experience with LLMs is a MUST

AI Infrastructure Design: Lead the architecture and implementation of AI and HPC infrastructure, including the use of GPUs/TPUs, high-performance networking, and scalable storage solutions to support GenAI/AI/ML workloads

AI Deployment: Experience in deploying large models in production on public clouds (OCI, AWS, Azure, GCP) and hybrid cloud environments, including the use of microservices and containerization (Docker, Kubernetes) to ensure smooth deployment, scaling, and monitoring of AI/ML models in production

AI/ML Tools & Frameworks: Design and implement AI systems using industry-standard training, inferencing and deployment tools such as Kubeflow, Ray, CUDA, PyTorch, and TensorFlow, ensuring optimal performance in training and deployment. Exposure to scheduling and automation tools such as Slurm, Terraform is desirable

Large Language Models (LLMs): Expertise in working with closed and/or open-source LLMs (e.g., GPT, BERT, Bloom, LLaMA) and understanding the full AI life cycle, including training, fine-tuning, and deploying these models for inference in production environments.

Performance Optimization: Drive the optimization of AI infrastructure and applications on Oracle OCI, focusing on efficiency improvements in computational speed and resource management.

Security & Compliance: Ensure all AI infrastructure and solutions are compliant with industry standards and organizational policies related to security, privacy, and data governance

Operating Systems, Protocols and Tools: Strong Linux skills with hands-on experience in Oracle Linux/RHEL/CentOS, Ubuntu, and Debian distributions, including system administration, package management, shell scripting. Strong knowledge of networking protocols (TCP/IP, Infiniband, RDMA, UDP, HTTP) is a significant advantage. Experience on high performance storage is desirable

About Us

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s problems. True innovation starts with diverse perspectives and various abilities and backgrounds.

When everyone’s voice is heard, we’re inspired to go beyond what’s been done before. It’s why we’re committed to expanding our inclusive workforce that promotes diverse insights and perspectives.

We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.

Oracle careers open the door to global opportunities where work-life balance flourishes. We offer a highly competitive suite of employee benefits designed on the principles of parity and consistency. We put our people first with flexible medical, life insurance and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by calling +1 888 404 2494, option one.

Disclaimer:

Oracle is an Equal Employment Opportunity Employer*. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

  • Which includes being a United States Affirmative Action Employer
Apply Now

Other Jobs