Data Engineering Lead
Company Name: Blackstraw.ai
Oce Location: Chennai (Work from Office)
Job Type: Full-time
Experience: 10 - 15 Years
Candidates who can join immediately will be preferred.
Job Description:
As a lead data engineer you will oversee data architecture, ETL processes, and analytics
pipelines, ensuring efficiency, scalability, and quality.
Key Responsibilities:
Working with clients to understand their data.
Based on the understanding you will be building the data structures and pipelines.
You will be working on the application from end to end collaborating with UI and other
development teams.
You will be working with various cloud providers such as Azure & AWS.
You will be engineering data using the Hadoop/Spark ecosystem.
You will be responsible for designing, building, optimizing and supporting new and existing data
pipelines.
Orchestrating jobs using various tools such Oozie, Airflow, etc.
Developing programs for cleaning and processing data.
You will be responsible for building the data pipelines to migrate and load the data into the HDFS
either on-prem or in the cloud.
Developing Data ingestion/process/integration pipelines effectively.
Creating Hive data structures,metadata and loading the data into data lakes / BigData warehouse
environments.
Optimized (Performance tuning) many data pipelines effectively to minimize cost.
Code versioning control and git repository is up to date.
You should be able to explain the data pipeline to internal and external stakeholders.
You will be responsible for building and maintaining CI/CD of the data pipelines.
You will be managing the unit testing of all data pipelines.
Tech Stack:
Minimum of 5+ years working experience with Spark, Hadoop eco systems.
Minimum of 4+ years working experience on designing data streaming pipelines.
Should be an expert in either Python/Scala/Java.
Should have experience in Data Ingestion and Integration into data lake using hadoop ecosystem
tools such as Sqoop, Spark, SQL, Hive, Airflow, etc..
Should have experience optimizing (Performance tuning) data pipelines.
Should have minimum experience of 3+ years on NoSQL and Spark Streaming.
Knowledge of Kubernetes and Docker is a plus.
Should have experience with Cloud services either Azure/AWS.
Should have experience with on-prem distribution such as Cloudera/HortonWorks/MapR.
Basic understanding of CI/CD pipelines.
Basic knowledge of Linux environment and commands.
Preferred Qualifications:
Bachelors degree in computer science or related field.
Proven experience with big data ecosystem tools such as Sqoop, Spark, SQL, API, Hive, Oozie,
Airflow, etc..
Solid experience in all phases of SDLC with 10+ years of experience (plan, design, develop, test,
release, maintain and support)
Hands-on experience using Azures data engineering stack.
Should have implemented projects using programming languages such as Scala or Python.
Working experience on SQL complex data merging techniques such as windowing functions etc..
Hands-on experience with on-prem distribution tools such as Cloudera/HortonWorks/MapR.
Should have excellent communication, presentation and problem solving skills.
Key Traits:
Should have excellent communication skills.
Should be self motivated and willing to work as part of a team.
Should be able to collaborate and coordinate with on shore and offshore teams.
Be a problem solver and be proactive to solve the challenges that come his way.
Keyskills: Pyspark Azure Databricks Spark Data Bricks Python
\n\n Tech Mahindra is an Indian multinational information technology services and consulting company. Part of the Mahindra Group, the company is headquartered in Pune and has its registered office in Mumbai. Tech Mahindra is a US$6.0 billion company[6][7] with over 158,000 employees across 90 countr...