Intermediate (2-5 years of experience)
Summary of the Role
As an Intermediate Data Engineer, you will be responsible for designing, building, and maintaining scalable and robust data pipelines. You will work with large data sets, integrate data from various sources, and ensure that data is accessible and usable for data scientists, analysts, and other stakeholders.
Required Skills
Excellent understanding of data warehouse design (e.g., dimensional modeling) and data mining.
In-depth knowledge of SQL database design and multiple programming languages.
Familiarity with machine learning algorithms and statistics.
Strong analytic skills related to working with unstructured datasets.
Proficient in building and optimizing 'big data' data pipelines, architectures, and data sets.
Strong project management and organizational skills.
Ability to work in a fast-paced environment and manage multiple projects simultaneously.
Effective communication skills to interact with diverse groups of technical and non-technical stakeholders.
Qualifications
Bachelor's or master's degree in Computer Science, Engineering, Mathematics, or a related field.
Proven experience as a Data Engineer or similar role (2-5 years).
Experience with big data tools: Hadoop, Spark, Kafka, etc.
Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
Experience with AWS cloud services: EC2, EMR, RDS, Redshift.
Experience with stream-processing systems: Storm, Spark-Streaming, etc.
Experience with object-oriented/object function scripting languages: Python, Java, Scala, etc.
Responsibilities
Design and construct high-performance data processing systems using big data technologies.
Develop, construct, test, and maintain architectures such as databases and large-scale data processing systems.
Clean, prepare, and optimize data for ingestion and consumption.
Collaborate with data science and analytics teams to improve data models and enhance data quality.
Implement complex big data projects with a focus on collecting, parsing, managing, analyzing, and visualizing large sets of data.
Identify, design, and implement process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability.
Work with stakeholders including the executive, product, data, and design teams to support their data infrastructure needs while assisting with data-related technical issues.
Ensure compliance with data governance and security policies.