Skills:
Statistical Analysis, Machine Learning, Python Programming, Data Visualization, Predictive Modeling, Big Data, SQL, Data Mining,
We are looking for Data Engineer to join our dynamic team.
- Data Pipeline Development: Designing, building, and maintaining scalable data pipelines to collect, process, and store data from various sources.
- Data Integration: Integrating data from different sources, including databases, APIs, and third-party services, to create a unified view of the data.
- Database Management: Managing and optimising relational and non-relational databases, ensuring they are performant, secure, and reliable.
- Data Modeling: Designing and implementing data models to support efficient querying and analysis, often involving creating data warehouses or data lakes.
- ETL Processes: Developing and maintaining Extract, Transform, Load (ETL) processes to convert raw data into a usable format for analytics and reporting.
- Performance Optimization: Monitoring and tuning database performance, optimising queries, and ensuring that data systems handle high loads efficiently.
- Data Quality Assurance: Implementing processes and tools to ensure data accuracy, consistency, and reliability, including data validation and cleansing.
- Collaboration: Working closely with data scientists, analysts, and other stakeholders to understand data needs and provide the necessary data infrastructure and support.
- Security and Compliance: Ensuring that data is stored and processed in compliance with relevant regulations and industry standards, including implementing data encryption and access controls.
- Documentation and Reporting: Documenting data architecture, pipeline processes, and system configurations, and providing reports or dashboards to monitor system health and data usage.
Requirements
- Programming Languages: Proficiency in languages like Python, Java, or Scala for scripting, data manipulation, and building data pipelines.
- SQL and Database Management: Expertise in SQL for querying databases and managing relational databases such as PostgreSQL, MySQL, or Microsoft SQL Server, as well as knowledge of NoSQL databases like MongoDB or Cassandra.
- Data Warehousing Solutions: Experience with data warehousing technologies like Amazon Redshift, Google BigQuery, Snowflake, or traditional systems like Teradata.
- ETL Tools: Familiarity with ETL (Extract, Transform, Load) tools and frameworks such as Apache Airflow, Apache NiFi, Talend, or Informatica for building and managing data pipelines.
- Big Data Technologies: Knowledge of big data frameworks and tools like Hadoop, Apache Spark, or Apache Flink for handling large-scale data processing.
- Cloud Platforms: Proficiency in cloud computing platforms like AWS, Google Cloud Platform (GCP), or Microsoft Azure, including their data services and tools.
- Data Modeling: Skills in designing and implementing data models, including understanding of dimensional modeling, normalization, and denormalization.
- Data Integration: Ability to integrate data from diverse sources, including APIs, third-party services, and various data formats like JSON, XML, or CSV.
- Version Control: Experience with version control systems like Git for managing code changes and collaborating with other team members.
- Problem-Solving and Analytical Thinking: Strong problem-solving skills to troubleshoot and resolve data issues, optimize performance, and develop efficient solutions.
- Apache: 3 years (Preferred)
- SQL: 4 years (Preferred)
- Data warehouse: 3 years (Preferred)
Benefits
- Work Location: Remote
- 5 days working
Desired Skills and Experience
Statistical Analysis, Machine Learning, Python Programming, Data Visualization, Predictive Modeling, Big Data, SQL, Data Mining