Design, develop, and maintain scalable ETL pipelines using PySpark.
Build and orchestrate data workflows using Apache Airflow.
Develop reusable Python modules for data ingestion and transformation.
Collaborate with data scientists and analysts to understand data needs and build robust solutions.
Optimize Spark jobs for performance and cost-efficiency.
Monitor and troubleshoot data pipeline failures and latency issues.
Required Skills
Strong hands-on experience in Python programming.
In-depth knowledge of PySpark and big data processing.
Proficiency in developing and scheduling DAGs in Apache Airflow.
Experience working with SQL, data lakes, and data warehouses.
Familiarity with Git, Linux, and CI/CD tools.
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Data Science & AnalyticsRole Category: Data Science & Machine LearningRole: Data EngineerEmployement Type: Full time