The Cloud and data operations specialist-I is responsible for the ongoing monitoring, support, and
optimization of cloud infrastructure and Data operations. This role focuses on ensuring the smooth Data
operations, reliability, and performance of cloud systems and services. This role will collaborate closely
with cloud and data engineers, and business stakeholders to identify and resolve issues, follow up
improvements implementation, and maintain cloud services and data operations.
Diligently observe and interpret system alerts and dashboards for cloud and data operations.
Ensure the smooth operation of data pipelines running on systems such as Databricks or Spark.
Respond to and resolve incidents, ensuring minimal disruption and rapid recovery with the help
of CloudOps and DE/DS teams.
Escalate urgent and complex issues appropriately, ensuring compliance with SLAs and internal
KPIs.
Collaborate with internal departments and occasionally with customers to ensure smooth
operations.
Execute routine tasks according to schedules, including DataOps, system maintenance tasks, and
Internal audits.
Work with cloud and data engineers, development, and operations teams to improve cloud
systems, services, and data operations reliability, scalability, and performance.
Maintain up-to-date documentation of cloud observability and data operations.
Identify opportunities to automate routine tasks and performance improvement and work with
cloud engineering, DE/DS teams to improve operational efficiency.
Minimum Education:
Bachelor's degree in computer science or a related field
Minimum Work Experience (years):
1+ years of experience working on an operations-style team like NOC, data operations or support
Key Skills and Competencies:
Experience with Linux, Windows OS, web services, networking, databases, public cloud platforms (Azure, AWS, and GCP), and data services.
Strong understanding of monitoring and observability concepts like infrastructure, systems, APM, system availability, latency, performance, and end-to-end monitoring
Strong understanding of data pipelines, engineering, and operations concepts
Experience with data integration tools
Proficiency in SQL and at least one scripting language (Python, Bash).
Experience operating in distributed systems
Experience with ITIL and ITSM processes, including Incident, Problem, Change, Knowledge, and Event
Management.
Knowledge of enterprise and open-source monitoring and log management systems.
Strong problem-solving and analytical skills
Excellent attention to detail and organizational skills
Team working skills, perform tasks with minimal supervision, ability to work independently and as part
of a team
A proactive approach to identifying issues, problems, and collaborate to implement process improvements
Ability to multi-task, follow instructions properly, and adapt in a fast-paced environment
Good sense of ownership and commitment towards work, with adherence to standards and procedures
Critical and creative thinking skills, and a willingness to learn modern technologies and skills
Ability to work evening shifts, weekends, and holidays
Flexibility to adapt to shifting or changing schedules
Willingness to work on holidays and report for overtime or on rest days if needed
Ability to work extended shifts if necessary
Keyskills: analytical web services data warehousing dbms data pipeline dashboards sql operations gcp spark linux etl scripting languages data operations python data services data analysis microsoft azure engineering monitoring data engineering tableau concepts itsm bash aws itil