Remote ML Engineer at Shipium
Shipium · United States Of America · Remote
- Professional
About the role
This position is responsible for solving complex problems related to the design, deployment, and continuous optimization of scalable machine learning platforms and production workflows. You will be responsible for architecting and scaling our ML systems to support a growing number of machine learning models and an increasing volume of real-time predictions. This role will also spearhead our initiatives in Generative AI, designing and implementing systems that leverage Large Language Models (LLMs) to translate model predictions into powerful user-facing insights and agents. You will have a significant impact on the performance, reliability, and scalability of our machine learning and AI solutions, directly contributing to the success of the Shipium platform. The ideal candidate has a proven record of building and managing large-scale ML platforms and leveraging expertise in machine learning, software engineering, Generative AI, and cloud technologies to optimize performance while collaborating effectively across teams.
What you'll do
- Architect and implement a scalable, high-performance machine learning platform to support model development, deployment, monitoring, and analysis for both predictive and Generative AI models.
- Lead the technical strategy and evaluation for our Generative AI infrastructure. This includes assessing the trade-offs between managed services and self-hosted open-source models, defining our LLM hosting strategy, and validating the end-to-end architectural approach for scalable, reliable AI features.
Ensure the platform supports a wide range of ML use cases, including real-time prediction serving, batch processing, and model experimentation. - Design and implement robust LLM orchestration for advanced applications, enabling the integration of our proprietary predictive models with LLMs to power new insights and workflows.
- Ensure the platform supports a wide range of ML use cases, including real-time prediction serving, batch processing, model experimentation, and advanced Generative AI applications.
- Optimize system performance and model latency to ensure robust, low‑latency inference across distributed systems, with a specific focus on the unique challenges of LLM serving.
- Identify bottlenecks, evaluate, and integrate new technologies and tools.
- Collaborate closely with data scientists to productionize both traditional predictive models and novel Generative AI solutions, focusing on systems that combine proprietary model outputs with LLMs to create actionable insights.
- Contribute to the overall quality of the codebase, ensuring maintainability and best practices.
- Drive ML / DS best practices, give technical recommendations on challenging problems.
Qualifications
- Core Programming & Machine Learning:
- Proficiency in Python and deep experience with its data science and ML ecosystem (e.g., PyTorch, TensorFlow, scikit-learn, Pandas, NumPy).
- Hands-on experience with Generative AI frameworks and libraries such as LangChain, LlamaIndex, or Hugging Face Transformers.
- MLOps & Infrastructure:
- Expertise in building and maintaining MLOps infrastructure, including containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines for both traditional ML models and LLM-based applications.
- Proven skill in managing cloud resources using Infrastructure as Code (Terraform).
- Cloud Platforms & Services:
- Extensive hands-on experience with cloud platforms, particularly AWS. Required experience with core services (S3, EC2, Lambda) and ML services (SageMaker).
- Direct experience with or deep knowledge of managed Generative AI services like AWS Bedrock, Amazon Titan, or equivalents (e.g., Google Vertex AI, Azure OpenAI Service).
- Data Systems & Storage:
- Advanced proficiency in SQL for complex data extraction and transformation.
- Experience with a variety of data storage solutions, including relational databases, NoSQL databases, and vector databases (e.g., Pinecone, Weaviate, ChromaDB)
- Master’s Degree in Computer Science, Software Engineering, or a related field and 4-5 years of experience building and managing production-level machine learning platforms and infrastructure, with a focus on model deployment, optimization, and scalability; demonstrated ability to improve the performance, reliability, and cost-efficiency of ML systems; strong experience with cloud-based ML infrastructure (AWS, GCP, Azure) and MLOps practices; and/or equivalent combination of education and experience such as a PhD in Computer Science, Data Science, Mathematics, Statistics, or related quantitative field and strong knowledge of machine learning.
At Shipium, Employees enjoy full medical, dental & vision coverage (with 50% coverage for dependents), optional life insurance and long-term disability coverage, a 401(k) retirement plan, fully remote work-from-home options in 25* states, 8 paid weeks of parental leave, 12 paid holidays annually, self-managed vacation time, sick & safety leave, and volunteer time off.
Shipium is committed to creating a diverse environment and is proud to be an equal opportunity employer. Women, people of color, people with disabilities, and veterans are strongly encouraged to apply. We prohibit discrimination and harassment of any kind based on race, color, sex, religion, sexual orientation, national origin, disability, genetic information, pregnancy, or any other protected characteristic as outlined by federal, state, or local laws. If you need reasonable accommodation because of a disability for any part of the employment process, please email Human Resources ([email protected]) and let us know the nature of your request and your contact information.
This applies to all employment practices within our organization, including hiring, recruiting, promotion, termination, layoff, recall, leave of absence, compensation, benefits, training, and mentorships. Shipium makes hiring decisions based solely on qualifications, merit, and business.
*Although based out of Seattle, WA, Shipium is 100% remote in the following states: Arizona, California, Colorado, Connecticut, District of Columbia, Florida, Georgia, Idaho, Illinois, Indiana, Maryland, Massachusetts, Michigan, Montana, Missouri, Nevada, New Jersey, New York, North Carolina, Ohio, Oregon, South Carolina, Tennessee, Texas, Vermont, Washington & Wisconsin.
Shipium participates in e-Verify
Must have a green card or be a U.S. Citizen
We do NOT work with OPTs, transfer H1Bs