Grid Dynamics wants to build a centralized, observable and secure platform for their ML, Computer Vision, LLM and SLM models. Grid Dynamics wants to onboard a vast number of AI agents, able to cover multiple required skills, ensuring a certain level of control and security in regards to their usage and availability. The observable platform must be vendor-agnostic, easy to extend to multiple type of AI applications and flexible in terms of technologies, frameworks and data types.
This project is focused on establishing a centralized LLMOps capability where every ML, CV, AI-enabled application is monitored, observed, secured and provides logs of every activity.
The solution consists of key building blocks such monitor every step in a RAG, Multimodal RAG or Agentic Platform, track performances and provide curated datasets for potential fine-tuning.
Alignment with business scenarios, Grid Dynamics provides also certain guardrails that allow or block interactions user-to-agent, agent-to-agent or agent-to-user. Also, Guardrails will enable predefined workflows, aimed to give more control over the series of LLM chains.
Essential functions
Job Role: Lead MLOps Engineer
Location: Hyderabad/Bangalore/Chennai
Experience: 7+ Years
Roles & Responsibilities:
-
Chunking (primarily focused on VectorDB storage).
-
Document Parsing and OCR
-
Document Parsing with VLMs (Vision Language Models)
-
Function Calling with LLMs
-
Retrieval Augmented Generation
-
Traditional Search (BM25, NER based parsers, Keyword based search index)
-
Semantic Search (Embeddings, Embedding models)
-
Fine Tuning using LoRA
-
Merging multiple LoRA adapters using MergeKit
-
Quantising LLMs
-
Prompt Engineering techniques
-
4+ years with Azure (ML pipeline components), Azure Databricks, Azure DevOps.
-
Proven experience of design and deployment of end-to-end ML pipelines.
-
Experience with building infrastructure for classic DS models and/or LLM/SLMs.
-
4+ years with orchestration (e.g., Kubeflow, Airflow, Azure Data Factory) and CI/CD for ML.
-
Experience deploying containerized ML solutions (e.g. Docker/Kubernetes).
-
Knowledge of model and data versioning (e.g. MLflow, DVC).
-
Knowledge of MLSecOps (security in the context of MLOps)
-
Experience with Infrastructure as a Code (e.g., Terraform, CloudFormation).
-
Knowledge of MLOps for LLM/SLM
-
Experience in ML system/architecture design (load balancing, caching, failover).
-
Knowledge in building scalable, resilient ML architectures.
-
Cross-team collaboration experience (data science, engineering, DevOps).
-
Experience with monitoring/logging for production models (e.g. Prometheus, Grafana, ELK stack).
Qualifications
-
7+ years in ML Ops / DevOps / data engineering
-
Strong Python skills, plus experience with MLflow, Kubeflow, or Airflow
-
Hands-on with Docker, Kubernetes, and cloud platforms
-
Knowledge of data and model versioning tools (e.g., DVC, MLflow)
We offer
- Opportunity to work on bleeding-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Benefits package - medical insurance, sports
- Corporate social events
- Professional development opportunities
- Well-equipped office