Homeoffice MLOps / DevSecOps Engineer – AI Infrastructure Team - 1658 bei In All Media Inc
In All Media Inc · Colombia, Kolumbien · Remote
- Professional
📌 Job Title: MLOps / DevSecOps Engineer – AI Infrastructure Team
Location: Remote from LATAM | Full-time
Company: Inallmedia.com
🚀 About the Role
We're looking for a hands-on MLOps / DevSecOps Engineer to own and automate infrastructure that powers our AI platforms, ensuring robustness, cost-efficiency, and compliance at scale. You’ll work at the intersection of CI/CD, observability, and security, enabling high-throughput pipelines for code, data, model, and prompt deployments in regulated environments.
You’ll help build secure, reproducible, and cost-aware AI infrastructure, with advanced control over scalability, rollback strategies, and model performance monitoring.
🛠️ Key Responsibilities
- Build reusable Infrastructure as Code (IaC) modules for GPU clusters, distributed storage, and Zero-Trust networks 
- Deploy and operate Kubernetes clusters with GPU-optimized node pools using Cluster Autoscaler or Karpenter 
- Orchestrate multi-stage GitOps pipelines (code, data, model) with ArgoCD or Flux 
- Implement advanced rollout/rollback strategies: shadow testing, canary, blue/green 
- Integrate security scanners into CI/CD: OWASP ZAP, Snyk, Veracode, Trivy with actionable reports 
- Set up observability for model drift, budget burn, SLOs, and hallucination metrics 
- Ensure compliance with SOC 2, ISO 27034, and internal audit requirements 
- Reproduce high-performance compute (HPC) environments via Terraform or AWS CDK 
🧠 Ideal Candidate
- Solid background in Cloud DevOps and ML Infrastructure 
- Proven experience with reproducible GPU infrastructure in cloud environments 
- Hands-on expertise in: - Infrastructure as Code: Terraform, Pulumi, AWS CDK 
- Kubernetes: EKS, AKS, GKE, Cluster Autoscaler, Karpenter 
- GitOps tools: Argo CD, Flux, GitHub Actions, Azure DevOps, Kustomize 
- Observability: Prometheus, OpenTelemetry, Grafana, Arize AI, FinOps Exporter 
- Security & Compliance: Trivy, Snyk, OWASP ZAP, Veracode, Grype, Kyverno, OPA Gatekeeper 
 
☁️ Infrastructure & Environment
- Remote-first across LATAM (6+ hrs CST/EST overlap) 
- VPN + SSO access via Okta or Azure Active Directory 
- Cloud IDE access or VS Code Dev Containers 
- GitHub Enterprise, Jira, Slack/Teams 
⚙️ Nice to Have
- Familiarity with tools such as MLflow, LangChain, LangSmith, Ray, DVC, Feast, BentoML 
- Background in budget-aware engineering or FinOps tagging strategies 
- Comfortable with advanced CLI workflows (e.g., Tmux, SSH multiplexing) 
🔧 Recommended Stack (Expanded)
- IaC: Terraform, AWS CDK, Pulumi 
- Kubernetes: EKS/AKS/GKE, Cluster Autoscaler, Karpenter 
- CI/CD & GitOps: GitHub Actions, GitLab CI, Argo CD, Flux, Kustomize 
- Monitoring & Logging: Prometheus, Grafana, OpenTelemetry, ELK Stack, Datadog 
- Cloud Providers: AWS, Azure, GCP 
- Security Tools: Trivy, OWASP ZAP, Snyk, Veracode, Grype, Kyverno, OPA Gatekeeper 
- ML Frameworks: PyTorch, TensorFlow 
- Containerization & Automation: Docker, Ansible 
 
			 
			 
			 
			