Role: Python
Technical Expertise- Expertise in PySpark, database migration, transformation, and integration for data warehousing.
- Strong knowledge of Apache Spark and Python programming.
- Experience in developing data processing tasks using PySpark (data reading, merging, enrichment, loading).
- Familiarity with deployment tools (e. g. , Airflow, Control-M) and Unix/Linux Shell scripting.
- Skills in advanced data modeling and processing unstructured data.
- Hands-on experience with Jupyter Notebook, Zeppelin, and PyCharm.
- Proficient in AWS S3 filesystem operations.
- Knowledge of Hadoop, Hive, and Cloudera/Hortonworks Data Platforms.Contributing Responsibilities- Extensive experience with Processing Framework (Spark 2. x/3. x), including Spark SQL and Streaming.
- Strong capabilities in RDBMS (Postgres, Oracle) and NoSQL databases.
- Familiarity with streaming platforms like Apache Kafka and Spark Streaming.
- Experience designing and executing data pipelines using ETL/ELT tools.
- In-depth knowledge of Big Data Hadoop, particularly HDP/CDH Migration to Cloudera CDP platform.
- Ability to optimize and troubleshoot PySpark applications for performance.Technical & Behavioral Competencies- Minimum 5 years of experience with PySpark, Kubernetes, and Docker.
- Strong design knowledge in data warehousing concepts.
- Proficient in Unix/Ubuntu scripting and tuning code for large data volumes.
- Capable of translating functional requirements into technical specifications.
- Involved in testing PySpark modules, ETL mappings, and ensuring client satisfaction.
- Experienced in coding, implementing, debugging, and documenting complex programs.
- Responsible for technical documentation and business needs analysis.
- Provides technical guidance and resolves programming-related issues.
Keyskills: cloudera hive hdpe kubernetes pyspark data warehousing docker scripting operations spark design linux hadoop big data hadoop python rdbms ubuntu database migration transformation cdp framework pycharm kafka data warehousing concepts processing aws unix nosql databases