Job Description:
We are seeking a highly skilled and motivated Python/PySpark Developer to join our growing team. In this role, you will be responsible for designing, developing, and maintaining high-performance data processing pipelines using Python and the PySpark framework. You will work closely with data engineers, data scientists, and other stakeholders to deliver impactful data-driven solutions.
Responsibilities:
- Design, develop, and implement scalable and efficient data pipelines using PySpark.
- Write clean, well-documented, and maintainable Python code.
- Optimize data processing performance and resource utilization.
- Implement ETL (Extract, Transform, Load) processes to migrate and transform data across various systems.
- Collaborate with data scientists and analysts to understand data requirements and translate them into technical solutions.
- Troubleshoot and debug data processing issues.
- Stay up-to-date with the latest advancements in big data technologies and best practices.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 3+ years of experience in Python development.
- 2+ years of experience with PySpark and Spark ecosystem.
- Strong understanding of data structures, algorithms, and object-oriented programming.
- Experience with SQL and relational databases.
- Familiarity with cloud platforms such as AWS, Azure, or GCP (preferred).
- Excellent problem-solving and analytical skills.
- Strong communication and teamwork skills.
Bonus Points:
- Experience with data visualization tools (e.g., Tableau, Power BI).
- Knowledge of machine learning and data science concepts.
- Experience with containerization technologies (e.g., Docker, Kubernetes).
- Contributions to open-source projects.
Keyskills: Pyspark Cloud Technologies SQL Python