- Professional
- Escritório em Gurugram
This role is for one of Weekday’s clients
Min Experience: 5 years
Location: Gurgaon, Gurugram
JobType: full-time
Requirements
We are seeking an experienced Data Engineer with strong expertise in AWS, Databricks, PySpark, and SQL to design, develop, and optimize scalable data pipelines. The ideal candidate will have a deep understanding of modern data architectures, ETL frameworks, and cloud-based data ecosystems. You will work closely with data scientists, analysts, and business stakeholders to ensure reliable, high-quality, and timely data delivery that powers key business decisions and analytics initiatives.
This role requires both hands-on technical capability and strategic thinking, as you’ll contribute to building robust data foundations and scalable data platforms.
Key Responsibilities
- Data Pipeline Development:
- Design, build, and maintain scalable ETL/ELT pipelines for both batch and streaming data processing using PySpark and Databricks.
- Optimize and automate data workflows to ensure efficient and reliable data movement across systems.
- Develop end-to-end data integration solutions from diverse data sources to centralized data lakes and warehouses.
- Cloud Data Engineering (AWS):
- Work extensively with AWS data services such as S3, Glue, Kinesis, and Redshift to design and manage modern data architectures.
- Implement data lake and data warehouse solutions ensuring scalability, security, and performance.
- Build workflow orchestration using Airflow, AWS Step Functions, or similar tools for automation and scheduling.
- Data Modeling & Optimization:
- Develop and optimize SQL queries for performance, scalability, and data quality.
- Design data models and schemas to support analytical workloads and reporting.
- Ensure data accuracy, consistency, and lineage through validation and quality checks.
- Collaboration & Cross-Functional Support:
- Work closely with data scientists, analysts, and product teams to understand data requirements and deliver relevant datasets for analytics and machine learning use cases.
- Partner with platform and DevOps teams to ensure smooth data pipeline deployment and monitoring.
- Translate complex technical concepts into business-friendly insights and documentation.
- Data Governance & Security:
- Implement data management best practices, including metadata management, access controls, and compliance with data governance standards.
- Ensure adherence to security and privacy guidelines across all data solutions.
Key Skills and Qualifications
- 5+ years of professional experience as a Data Engineer or similar role in a cloud-based environment.
- Expertise in PySpark for distributed data processing and transformation.
- Advanced proficiency in SQL—query optimization, performance tuning, and large-scale data manipulation.
- Hands-on experience with Databricks for collaborative data development and pipeline orchestration.
- Strong understanding of AWS Data Stack, including S3, Glue, Kinesis, Lambda, and Redshift.
- Experience in building and maintaining data lakes and data pipelines (batch and streaming).
- Proficiency in workflow orchestration tools such as Airflow or AWS Step Functions.
- Familiarity with data versioning, CI/CD pipelines, and infrastructure-as-code (IaC) is a plus.
- Strong problem-solving skills, attention to detail, and ability to work in fast-paced, agile environments.