Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Senior PySpark Developer @ Victrix Systems

Home > Software Development

 Senior PySpark Developer

Job Description

Key Responsibilities :- Design and develop scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity. - Implement efficient XPath expressions, recursive parsing techniques, and custom schema definitions to extract data from nested XML structures. - Optimize Spark jobs through partitioning, caching, and parallel processing to handle terabytes of XML data efficiently. - Transform raw hierarchical XML data into structured DataFrames for analytics, machine learning, and reporting use cases. - Collaborate with data architects and analysts to define data models for nested XML schemas. - Troubleshoot performance bottlenecks and ensure reliability in distributed environments (e.g., AWS, Databricks, Hadoop). - Document parsing logic, data lineage, and optimization strategies for maintainability. Qualifications :- 5+ years of hands-on experience with PySpark and Spark XML libraries (e.g., `spark-xml`) in production environments. - Proven track record of parsing XML data with 20+ levels of nesting using recursive methods and schema inference. - Expertise in XPath, XQuery, and DataFrame transformations (e.g., `explode`, `struct`, `selectExpr`) for hierarchical data. - Strong understanding of Spark optimization techniques: partitioning strategies, broadcast variables, and memory management. - Experience with distributed computing frameworks (e.g., Hadoop, YARN) and cloud platforms (AWS, Azure, GCP). - Familiarity with big data file formats (Parquet, Avro) and orchestration tools (Airflow, Luigi). - Bachelor's degree in Computer Science, Data Engineering, or a related field.Preferred Skills :- Experience with schema evolution and versioning for nested XML/JSON datasets. - Knowledge of Scala or Java for extending Spark XML libraries. - Exposure to Databricks, Delta Lake, or similar platforms. - Certifications in AWS/Azure big data technologies.

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Platform Engineer
Employement Type: Full time

Contact Details:

Company: Victrix Systems
Location(s): Mumbai

+ View Contactajax loader


Keyskills:   PySpark Java Data Pipeline Scala Hadoop Cloud Big Data Spark Distributed Computing

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Power Bi Developer

  • HCLTech
  • 4 - 6 years
  • Bengaluru
  • 2 days ago
₹ 9.5-13 Lacs P.A.

Java Developer

  • Quest Global
  • 8 - 13 years
  • Bengaluru
  • 1 day ago
₹ Not Disclosed

Node Js Developer

  • Synechron
  • 4 - 9 years
  • Bengaluru
  • 1 day ago
₹ Not Disclosed

Sr. Python Developer

  • Cradlepoint
  • 4 - 7 years
  • Noida, Gurugram
  • 1 day ago
₹ Not Disclosed

Victrix Systems

We are a vibrant systems engineering start-up based out of Pune, India, that is trusted for designing the best user-focused technology solutions. We deliver human-led tech-enabled decision systems for small and medium businesses worldwide. We build intelligent systems focusing on quality, performanc...