● Solid foundation in building reliable, maintainable, and high-throughput data pipelines
● Hands-on experience with tools like Spark, Kafka, dbt, Airflow, Snowflake, BigQuery, or Delta Lake
● Experience with Databricks for data pipeline development, Delta Lake integration, and Spark-based processing.
● Hands-on experience with tools like Databricks (incl. Delta Lake and Spark), Kafka, dbt, Airflow, Snowflake, or BigQuery
● Strong programming skills in Python and SQL; familiarity with data APIs, connectors, and versioning systems
● Experience working in cloud-native environments (AWS, GCP, or Azure)
Comfortable working with unstructured and semi-structured data formats (JSON, XML, Parquet, etc.)
● Experience supporting data workflows for LLM and GenAI applications (vector stores, embedding pipelines, document loaders)
● Familiarity with MLOps and DataOps practices: pipeline CI/CD, monitoring, lineage, and governance
● Background working on consulting engagements, agile teams, or client-facing data delivery
● Strong documentation habits and communication skills to align with distributed teams
Requirements
● 5+ years in data engineering, backend systems, or data infrastructure
Strong hands-on experience with Databricks and Spark (Delta Lake, Workflows, MLflow optional)
● Familiarity with vector databases and embedding pipelines for AI/LLM systems
● Real-time data engineering experience with Kafka, Kinesis, or similar streaming platforms
● Experience building ML-ready datasets, feature stores, or analytical layers supporting intelligent applications
● CI/CD pipeline knowledge and working with DevOps teams to operationalize data workflows
● Understanding of secure data architecture: encryption, RBAC, anonymization, and auditability
● Prior work on consulting, system integration, or cross-functional delivery projects in complex enterprise settings
● Desired: experience integrating Databricks into secure enterprise cloud environments (e.g., Unity Catalog, private endpoints, secured clusters)