Firmenlogo

Data Engineer bei DATAMAXIS

DATAMAXIS · Bengaluru, Indien · Onsite

Jetzt bewerben

Job Description
Cloud-Native Data Engineering on AWS

  • Strong, hands-on expertise in AWS native data services: S3, Glue (Schema Registry, Data Catalog), Step Functions, Lambda, Lake Formation, Athena, MSK/Kinesis, EMR (Spark), SageMaker (inc. Feature Store)
  • Comfort designing and optimizing pipelines for both batch (Step Functions) and streaming (Kinesis/MSK) ingestion.
  • Data Mesh & Distributed Architectures
  • Deep understanding of data mesh principles: including domain-oriented ownership, treating data as a product, and the use of federated governance models
  • Experience enabling self-service platforms, decentralized ingestion, and transformation workflows.
  • Data Contracts & Schema Management
  • Advanced knowledge of schema enforcement, evolution, and validation (preferably AWS Glue Schema Registry/JSON/Avro)
  • Data Transformation & Modelling
  • Proficiency with modern ELT/ETL stack: Spark (EMR), dbt, AWS Glue, and Python (pandas)


AI/ML Data Enablement

  • Designing and supporting vector stores (OpenSearch), feature stores (SageMaker Feature Store), and integrating with MLOps/data pipelines for AI/semantic search and RAG-type workloads
  • Metadata, Catalog, and Lineage
  • Familiarity with central cataloging, lineage solutions, and data discovery (Glue Data Catalog, Collibra, Atlan, Amundsen, etc.)
  • Implementing end-to-end lineage, auditability, and governance processes.
  • Security, Compliance, and Data Governance
  • Design and implementation of data security: row/column-level security (Lake Formation), KMS encryption, role-based access using AuthN/AuthZ standards (JWT/OIDC), GDPR/SOC2/ISO 27001-aligned policies
  • Orchestration & Observability
  • Experience with pipeline orchestration (AWS Step Functions, Apache Airflow/MWAA) and monitoring (CloudWatch, X-Ray) in large-scale environments.

APIs & Integration

  • API design for both batch and real-time data delivery (REST, GraphQL endpoints for AI/reporting/BI consumption)

Job Responsibilities

  • Design, build, and maintain ETL/ELT pipelines to extract, transform, and load data from various sources into cloud-based data platforms.
  • Develop and manage data architectures, data lakes, and data warehouses on AWS (e.g., S3, Redshift, Glue, Athena).
  • Collaborate with data scientists, analysts, and business stakeholders to ensure data accessibility, quality, and security.
  • Optimize performance of large-scale data systems and implement monitoring, logging, and alerting for pipelines.
  • Work with both structured and unstructured data, ensuring reliability and scalability.
  • Implement data governance, security, and compliance standards.
  • Continuously improve data workflows by leveraging automation, CI/CD, and Infrastructure-as-Code (IaC)

Jetzt bewerben

Weitere Jobs