Work on the data pipelines for capturing historical snapshots of both inputs and product outputs
Performance tuning of the pipelines using industry best practices
Batch orchestration design and development in a way that is least disruptive to system usage
Develop PySpark/Python codes for data transformation and API data extraction in batch jobs
Contribute to overall product architecture and make it best-in-class from a performance and scalability standpoint
Learn new technologies as needed for product usecases
What should you have
4 years of coding experience in Pyspark
Worked with different aspects of the Spark ecosystem, including Spark SQL, DataFrames, Datasets, and streaming data
4+ years of experience as a data engineer
Deep understanding and experience in Pyspark and some experience in the data lake and delta tables
Skilled in big data tools, building data pipelines, ETL design, and implementation
Must have strong programming skills in Python Scala is a plus
Should be familiar with Python (especially libraries like Pandas) The candidate should perform performance tuning and use Pyspark to move data
Experienced writing production-level code, optimizing data processing, identifying performance bottlenecks, and root causes, and resolving defects
Collaborates effectively with cross-functional teams to achieve product goals
Familiar with software development best practices (Git, CI/CD, Unit Testing)
Job Classification
Industry: Film / Music / EntertainmentFunctional Area / Department: Engineering - Software & QARole Category: Software DevelopmentRole: Data EngineerEmployement Type: Full time