Homeoffice MLOps / DevSecOps Engineer – AI Infrastructure Team - 1658 na In All Media Inc
In All Media Inc · Brazil, Brasil · Remote
- Professional
📌 Job Title: MLOps / DevSecOps Engineer – AI Infrastructure Team
Location: Remote from LATAM | Full-time
Company: Inallmedia.com
🚀 About the Role
We're looking for a hands-on MLOps / DevSecOps Engineer to own and automate infrastructure that powers our AI platforms, ensuring robustness, cost-efficiency, and compliance at scale. You’ll work at the intersection of CI/CD, observability, and security, enabling high-throughput pipelines for code, data, model, and prompt deployments in regulated environments.
You’ll help build secure, reproducible, and cost-aware AI infrastructure, with advanced control over scalability, rollback strategies, and model performance monitoring.
🛠️ Key Responsibilities
Build reusable Infrastructure as Code (IaC) modules for GPU clusters, distributed storage, and Zero-Trust networks
Deploy and operate Kubernetes clusters with GPU-optimized node pools using Cluster Autoscaler or Karpenter
Orchestrate multi-stage GitOps pipelines (code, data, model) with ArgoCD or Flux
Implement advanced rollout/rollback strategies: shadow testing, canary, blue/green
Integrate security scanners into CI/CD: OWASP ZAP, Snyk, Veracode, Trivy with actionable reports
Set up observability for model drift, budget burn, SLOs, and hallucination metrics
Ensure compliance with SOC 2, ISO 27034, and internal audit requirements
Reproduce high-performance compute (HPC) environments via Terraform or AWS CDK
🧠 Ideal Candidate
Solid background in Cloud DevOps and ML Infrastructure
Proven experience with reproducible GPU infrastructure in cloud environments
Hands-on expertise in:
Infrastructure as Code: Terraform, Pulumi, AWS CDK
Kubernetes: EKS, AKS, GKE, Cluster Autoscaler, Karpenter
GitOps tools: Argo CD, Flux, GitHub Actions, Azure DevOps, Kustomize
Observability: Prometheus, OpenTelemetry, Grafana, Arize AI, FinOps Exporter
Security & Compliance: Trivy, Snyk, OWASP ZAP, Veracode, Grype, Kyverno, OPA Gatekeeper
☁️ Infrastructure & Environment
Remote-first across LATAM (6+ hrs CST/EST overlap)
VPN + SSO access via Okta or Azure Active Directory
Cloud IDE access or VS Code Dev Containers
GitHub Enterprise, Jira, Slack/Teams
⚙️ Nice to Have
Familiarity with tools such as MLflow, LangChain, LangSmith, Ray, DVC, Feast, BentoML
Background in budget-aware engineering or FinOps tagging strategies
Comfortable with advanced CLI workflows (e.g., Tmux, SSH multiplexing)
🔧 Recommended Stack (Expanded)
IaC: Terraform, AWS CDK, Pulumi
Kubernetes: EKS/AKS/GKE, Cluster Autoscaler, Karpenter
CI/CD & GitOps: GitHub Actions, GitLab CI, Argo CD, Flux, Kustomize
Monitoring & Logging: Prometheus, Grafana, OpenTelemetry, ELK Stack, Datadog
Cloud Providers: AWS, Azure, GCP
Security Tools: Trivy, OWASP ZAP, Snyk, Veracode, Grype, Kyverno, OPA Gatekeeper
ML Frameworks: PyTorch, TensorFlow
Containerization & Automation: Docker, Ansible