AI Inference Engineer chez quadric, Inc
quadric, Inc · Burlingame, Vereinigte Staaten Von Amerika · Hybrid
- Bureau à Burlingame
Description
Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.
Role:
The AI Inference Engineer in Quadric is the key bridge between the world of AI/LLM models and Quadric unique platforms. The AI Inference Engineer at Quadric will [1] port AI models to Quadric platform; [2] optimize the model deployment for efficient inference; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI model algorithms, system architecture and AI toolchains/frameworks.
Responsibilities:
- Quantize, prune and convert models for deployment
- Port models to Quadric platform using Quadric toolchain
- Optimize inference deployment for latency, speed
- Benchmark and profile model performance and accuracy
- Develop tools to scale and speed up the deployment
- Make Improvement to SDK and runtime
- Provide technical support and documents to customers and developer community
Requirements
Requirements:
- Bachelor’s or Master’s in Computer Science and/or Electric Engineering.
- 5+ years of experience in AI/LLM model inference and deployment frameworks/tools
- experience with model quantization (PTQ, QAT) and tools
- experience with model accuracy measures
- experience with model inference performance profiling
- experience with at least one of the following frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, llamacpp
- Proficiency in C/C++ and Python
- Demonstrate good capability in problem solving, debug and communication
Benefits
- Health Care Plan (Medical, Dental & Vision)
- Retirement Plan (401k, IRA)
- Life Insurance (Basic, Voluntary & AD&D)
- Paid Time Off (Vacation, Sick & Public Holidays)
- Family Leave (Maternity, Paternity)
- Short Term & Long Term Disability
- Training & Development
- Work From Home
- Free Food & Snacks
- Stock Option Plan