Job Description
About CloudKeeper
CloudKeeper is a cloud cost optimization partner that combines the power of
group buying & commitments management, expert cloud consulting & support,
and an enhanced visibility & analytics platform to reduce cloud cost & help
businesses maximize the value from AWS, Microsoft Azure, & Google Cloud.
A certified AWS Premier Partner, Azure Technology Consulting Partner,
Google,Cloud Partner, and FinOps Foundation Premier Member, CloudKeeper
has helped 400+ global companies save an average of 20% on their cloud bills,
modernize their cloud set-up and maximize value all while maintaining
flexibility and avoiding any long-term commitments or cost.
CloudKeeper hived off from TO THE NEW, digital technology services company
with 2500+ employees and an 8-time GPTW winner.
Position Overview:
We are looking for an experienced and driven Data Engineer to join our team.
The ideal candidate will have a strong foundation in big data technologies,
particularly Spark, and a basic understanding of Scala to design and implement
efficient data pipelines. As a Data Engineer at CloudKeeper, you will be
responsible for building and maintaining robust data infrastructure, integrating
large datasets, and ensuring seamless data flow for analytical and operational
purposes.
Key Responsibilities:
Design, develop, and maintain scalable data pipelines and ETL processes to collect, process, and store data from various sources.
- Work with Apache Spark to process large datasets in a distributed environment, ensuring optimal performance and scalability.
- Develop and optimize Spark jobs and data transformations using Scala for large-scale data processing.
- Collaborate with data analysts and other stakeholders to ensure data pipelines meet business and technical requirements.
- Integrate data from different sources (databases, APIs, cloud storage, etc.) into a unified data platform.
- Ensure data quality, consistency, and accuracy by building robust data validation and cleansing mechanisms.
- Use cloud platforms (AWS, Azure, or GCP) to deploy and manage data processing and storage solutions.
- Automate data workflows and tasks using appropriate tools and frameworks.
- Monitor and troubleshoot data pipeline performance, optimizing for efficiency and cost-effectiveness.
- Implement data security best practices, ensuring data privacy and compliance with industry standards.
Required Qualifications:
- 4- 6 years of experience required as a Data Engineer or an equivalent role
- Strong experience working with Apache Spark with Scala for distributed data processing and big data handling.
- Basic knowledge of Python and its application in Spark for writing efficient data transformations and processing jobs.
- Proficiency in SQL for querying and manipulating large datasets.ing technologies.
- Experience with cloud data platforms, preferably AWS (e.g., S3, EC2, EMR, Redshift) or other cloud-based solutions.
- Strong knowledge of data modeling, ETL processes, and data pipeline orchestration.
- Familiarity with containerization (Docker) and cloud-native tools for deploying data solutions.
- Knowledge of data warehousing concepts and experience with tools like AWS Redshift, Google BigQuery, or Snowflake is a plus.
- Experience with version control systems such as Git.
- Strong problem-solving abilities and a proactive approach to resolving technical challenges.
- Excellent communication skills and the ability to work collaboratively within cross-functional teams.
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: Data Science & Analytics
Role Category: Business Intelligence & Analytics
Role: Data Analyst
Employement Type: Contract
Contact Details:
Company: CloudKeeper
Location(s): Noida, Gurugram
Keyskills:
Data Engineering
SCALA
Pyspark
Scala Programming
Python Framework
SQL Queries
Spark
Python
SQL