- Optionales Büro in Noida
Job Responsiblities
- Design, implement, and optimize big data pipelines in Databricks.
- Develop scalable ETL workflows to process large datasets.
- Leverage Apache Spark for distributed data processing and real-time analytics.
- Implement data governance, security policies, and compliance standards.
- Optimize data lakehouse architectures for performance and cost-efficiency.
- Collaborate with data scientists, analysts, and engineers to enable advanced AI/ML workflows.
- Monitor and troubleshoot Databricks clusters, jobs, and performance bottlenecks.
- Automate workflows using CI/CD pipelines and infrastructure-as-code practices.
- Ensure data integrity, quality, and reliability in all pipelines.
Basic Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
- 5+ years of hands-on experience with Databricks and Apache Spark.
- Proficiency in SQL, Python, or Scala for data processing and analysis.
- Experience with cloud platforms (AWS, Azure, or GCP) for data engineering.
- Strong knowledge of ETL frameworks, data lakes, and Delta Lake architecture.
- Experience with CI/CD tools and DevOps best practices.
- Familiarity with data security, compliance, and governance best practices.
- Strong problem-solving and analytical skills with an ability to work in a fast-paced environment.
Preferred Qualifications
- Databricks certifications (e.g., Databricks Certified Data Engineer, Spark Developer).
- Hands-on experience with MLflow, Feature Store, or Databricks SQL.
- Exposure to Kubernetes, Docker, and Terraform.
- Experience with streaming data architectures (Kafka, Kinesis, etc.).
- Strong understanding of business intelligence and reporting tools (Power BI, Tableau, Looker).
- Prior experience working with retail, e-commerce, or ad-tech data platforms.