Job Description
Senior xOps Specialist AIOps, MLOps & DataOps Architect
Location: Chennai, Pune
Employment Type: Fulltime - Hybrid
Experience Required: 12-15 years
Job Summary:
We are seeking a Senior xOps Specialist to architect, implement, and optimize AI-driven operational frameworks across AIOps, MLOps, and DataOps.
The ideal candidate will design and enhance intelligent automation, predictive analytics, and resilient pipelines for large-scale data engineering, AI/ML deployments, and IT operations.
This role requires deep expertise in AI/ML automation, data-driven DevOps strategies, observability frameworks, and cloud-native orchestration.
Key Responsibilities
Design & Architecture AIOps: AI-Driven IT Operations & Automation
- Architect AI-powered observability platforms, ensuring predictive incident detection and autonomous IT operations.
- Implement AI-driven root cause analysis (RCA) for proactive issue resolution and performance optimization.
- Design self-healing infrastructures leveraging machine learning models for anomaly detection and remediation workflows.
- Establish event-driven automation strategies, enabling autonomous infrastructure scaling and resilience engineering.
MLOps: Machine Learning Lifecycle Optimization
- Architect end-to-end MLOps pipelines, ensuring automated model training, validation, deployment, and monitoring.
- Design CI/CD pipelines for ML models, embedding drift detection, continuous optimization, and model explainability.
- Implement feature engineering pipelines, leveraging data versioning, reproducibility, and intelligent retraining techniques.
- Ensure secure and scalable AI/ML environments, optimizing GPU-accelerated processing and cloud-native model serving.
DataOps: Scalable Data Engineering & Pipelines
- Architect data processing frameworks, ensuring high-performance, real-time ingestion, transformation, and analytics.
- Build data observability platforms, enabling automated anomaly detection, data lineage tracking, and schema evolution.
- Design self-optimizing ETL pipelines, leveraging AI-driven workflows for data enrichment and transformation.
- Implement governance frameworks, ensuring data quality, security, and compliance with enterprise standards.
Automation & API Integration
- Develop Python or Go-based automation scripts for AI model orchestration, data pipeline optimization, and IT workflows.
- Architect event-driven xOps frameworks, enabling intelligent orchestration for real-time workload management.
- Implement AI-powered recommendations, optimizing resource allocation, cost efficiency, and performance benchmarking.
Cloud-Native & DevOps Integration
- Embed AI/ML observability principles within DevOps pipelines, ensuring continuous monitoring and retraining cycles.
- Architect cloud-native solutions optimized for Kubernetes, containerized environments, and scalable AI workloads.
- Establish AIOps-driven cloud infrastructure strategies, automating incident response and operational intelligence.
Qualifications & Skills xOps Expertise
- Deep expertise in AIOps, MLOps, and DataOps, designing AI-driven operational frameworks.
- Proficiency in automation scripting, leveraging Python, Go, and AI/ML orchestration tools.
- Strong knowledge of AI observability, ensuring resilient IT operations and predictive analytics.
- Extensive experience in cloud-native architectures, Kubernetes orchestration, and serverless AI workloads.
- Ability to troubleshoot complex AI/ML pipelines, ensuring optimal model performance and data integrity.
Preferred Certifications (Optional):
- AWS Certified Machine Learning Specialist
- Google Cloud Professional Data Engineer
- Kubernetes Certified Administrator (CKA)
- DevOps Automation & AIOps Certification
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: DevOps Consultant / Architect
Employement Type: Full time
Contact Details:
Company: Citiustech
Location(s): Pune
Keyskills:
devops
Kubeflow
mlops
vLLM
aiops
xops
ML pipeline automation
oberservability
cka
KServe
Machine Learning Operations
dataops