A Data Engineer will be responsible for understanding the client's technical requirements, design and
build data pipelines to support the requirements. In this role, the Data Engineer, besides developing the
solution, will also oversee other Engineers' development. This role requires strong verbal and written
communication skills and effectively communicate with the client and internal team. A strong
understanding of databases, SQL, cloud technologies, and modern data integration and orchestration
tools like Azure Data Factory (ADF), Informatica, and Airflow are required to succeed in this role.
Play a critical role in the design and implementation of data platforms for the AI products
Develop productized and parameterized data pipelines that feed AI products leveraging GPUs and
CPUs.
Develop efficient data transformation code in spark (in Python and Scala) and Dask.
Build workflows to automate data pipeline using python and Argo.
Develop data validation tests to assess the quality of the input data.
Conduct performance testing and profiling of the code using a variety of tools and techniques.
Guide Data Engineers in delivery teams to follow the best practices in deploying the data pipeline
workflows.
Build data pipeline frameworks to automate high-volume and real-time data delivery for our data
hub
Operationalize scalable data pipelines to support data science and advanced analytics
Optimize customer data science workloads and manage cloud services costs/utilization
Developing sustainable data driven solutions with current new generation data technologies to drive
our business and technology strategies
Minimum Education:
o Bachelors, Master's or Ph.D. Degree in Computer Science or Engineering.
Minimum Work Experience (years):
o 5+ years of experience programming with at least one of the following languages: Python,
Scala, Go.
o 5+ years of experience in SQL and data transformation
o 5+ years of experience in developing distributed systems using open-source technologies
such as Spark and Dask.
o 5+ years of experience with relational databases or NoSQL databases running in Linux
environments (MySQL, MariaDB, PostgreSQL, MongoDB, Redis).
Key Skills and Competencies:
o Experience working with AWS / Azure / GCP environment is highly desired.
o Experience in data models in the Retail and Consumer products industry is desired.
o Experience working on agile projects and understanding of agile concepts is desired.
o Demonstrated ability to learn new technologies quickly and independently.
o Excellent verbal and written communication skills, especially in technical communications.
o Ability to work and achieve stretch goals in a very innovative and fast-paced environment.
o Ability to work collaboratively in a diverse team environment.
o Ability to telework.
o Expected travel: Not expected.
Keyskills: orchestration scala dbms linux internals tools cloud technologies redis sql postgresql spark gcp mysql data transformation mongodb programming communication skills python distribution system relational databases nosql mariadb aws data integration open nosql databases