We’re looking for Senior Data Engineer who is passionate about data and the insights that large amounts of datasets can provide to help us grow our Capacity Engineering Data Lake.
As our Senior Data Engineer, you will build the ETL and analytics solutions for our internal customers to answer questions with data and drive critical improvements for the business.
You will use best practices in software engineering, data management, data storage, data compute, and distributed systems.
We are passionate about solving business problems with data.
The successful candidate will work with Applied scientists, ML scientists, Business analysts, Product Managers and other stakeholders across the organization.
Our team is part of the EC2 Capacity Engineering organization, which is responsible for providing the elasticity EC2 customers need to scale up / down compute resources in a cost-efficient manner.
We predict customer usage across thousands of configuration combinations to deliver exactly what our customers require in just the right amount of time with just the right amount of capacity.
Develop and maintain automated ETL pipelines for big data using scripting languages such as Python, Spark, SQL and AWS services such as S3, Glue, Lambda, SNS, SQS, KMS.
Example : ETL jobs that process a continuous flow of JSON source files and output the data in a business-friendly Parquet format that can be efficiently queried via Redshift Spectrum using SQL to answer business question.
Develop and maintain automated ETL monitoring and alarming solutions using Python / Scala, Spark, SQL, and AWS services such as CloudWatch and Lambda.
Implement and support reporting and analytics infrastructure for internal business customers using AWS, services such Athena, Redshift, Spectrum, EMR, and QuickSight.
Develop and maintain data security and permissions solutions for enterprise scale data warehouse and data lake implementations including data encryption and database user access controls and logging.
Develop data objects for business analytics using data modeling techniques.
Develop and optimize data warehouse and data lake tables using best practice for DDL, physical and logical tables, data partitioning, compression, and parallelization.
Develop and maintain data warehouse and data lake metadata, data catalog, and user documentation for internal business customers.
Help internal business customers develop, troubleshoot, and optimize complex SQL and ETL solutions to solve reporting, metrics, and analytics problems.
Work with internal business customers and software development teams to gather and document requirements for data publishing and data consumption via data warehouse, data lake, and analytics solutions.
Develop, test, and deploy code using internal software development toolsets. This includes the code for deploying infrastructure and solutions for secure data storage, ETL pipelines, data catalog, and data query.
Bachelor’s degree in Computer Science or related technical field, or equivalent work experience.
6+ years of overall work experience including Data Engineering, Database Engineering, Business Intelligence.
Proven experience with SQL and large data sets, data modeling, ETL development, and data warehousing, or similar skills.
Experience with AWS technologies stack including Lambda, Glue, Redshift, RDS, S3, EMR or similar big data solutions stack
Proficiency in one of the scripting languages - python, ruby, scala, java or similar.
Experience operating very large data warehouses or data lakes.
A real passion for technology. We are looking for someone who is keen to demonstrate their existing skills while trying new approaches.
Authoritative in ETL optimization, designing, coding, and tuning big data processes using Apache Spark or similar technologies.
Experience with building data pipelines and applications to stream and process datasets at low latencies.
Demonstrate efficiency in handling data - tracking data lineage, ensuring data quality, and improving discoverability of data.
Sound knowledge of distributed systems and data architecture (lambda)- design and implement batch and stream data processing pipelines, knows how to optimize the distribution, partitioning, and MPP of high-level data structures.
Knowledge of Engineering and Operational Excellence using standard methodologies.
Amazon is an equal opportunities employer, and we value your passion to discover, invent, simplify and build. We welcome applications from all members of society irrespective of age, sex, disability, sexual orientation, race, religion or belief.
Amazon is strongly committed to diversity within its community and especially welcomes applications from South African citizens who are members of designated groups who may contribute to Employment Equity within the workplace and the further diversification of ideas.
In this regard, the relevant laws and principles associated with Employment Equity will be considered when appointing potential candidates.
We are required by law to verify your ability to work lawfully in South Africa. Amazon requires that you submit a copy of either your identity document or your passport and any applicable work permit if you are a foreign national, along with an updated curriculum vitae.