Proficient in Apache Hive writing complex queries, partitioning, bucketing, and performance tuning.
Strong programming experience with PySpark RDDs, DataFrames, Spark SQL, UDFs.
Experience in working with Hadoop ecosystem (HDFS, YARN, Oozie, etc.).
Good understanding of distributed computing principles and data formats like Parquet, Avro, ORC.
Strong SQL and debugging skills.
Familiarity with version control tools like Git and workflow schedulers like Airflow or Oozie.
Preferred Skills:
Exposure to cloud-based big data platforms such as AWS EMR, Azure Data Lake, or GCP Dataproc.
Experience with performance tuning of Spark jobs and Hive queries.
Knowledge of Scala or Java is a plus.
Familiarity with data governance, data masking, and security best practices.
Experience with CI/CD pipelines, Docker, or container-based deployments is an advantage.
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Engineering - Software & QARole Category: Software DevelopmentRole: Software Development - OtherEmployement Type: Full time
Contact Details:
Company: Manvision TechnologiesLocation(s): Chennai