Job Description
Role overview:
Were building a next-gen LLMOps team at Fractal to industrialize GenAI implementation and shape the future of GenAI engineering. This is a hands-on technical leadership role for AI engineers with strong ML and DevOps skills ideal for those who love building scalable systems from the ground up. You will be designing, deploying, and scaling GenAI and Agentic AI applications with robust lifecycle automation and observability.
Required Qualifications:
- 10 - 14 years of experience in working on ML projects that includes product building mindset, strong hands on skills, technical leadership, leading development teams
- Model development, training, deployment at scale, monitoring performance for production use cases
- Strong knowledge on Python, Data Engineering, FastAPI, NLP
- Knowledge on Langchain, Llamaindex, Langtrace, Langfuse, LLM evaluation, MLFlow, BentoML
- Should have worked on proprietary and open-source LLMs
- Experience on LLM fine tuning including PEFT/CPT
- Experience in creating Agentic AI workflows using frameworks like CrewAI, Langraph, AutoGen, Symantec Kernel
- Experience in performance optimization, RAG, guardrails, AI governance, prompt engineering, evaluation, and observability
- Experience in GenAI application deployment on cloud and on-premises at scale for production using DevOps practices
- Experience in DevOps and MLOps
- Good working knowledge on Kubernetes and Terraform
- Experience in minimum one cloud: AWS / GCP / Azure to deploy AI services
- Team player with excellent communication and presentation skills
Must have skills:
- Product thinking that includes ideation, prototyping, and scale internal accelerators for LLMOps
- Architect and build scalable LLMOps platforms for enterprise-grade GenAI systems
- Design and manage end-to-end LLM pipelines from data ingestion and embedding to evaluation and inference
- Drive LLM-specific infrastructure: memory management, token control, prompt chaining, and context optimization
- Lead scalable deployment frameworks for LLMs using Kubernetes and GPU-aware scaling
- Build agentic AI operations capabilities including agent evaluation, observability, orchestration and reflection loops
- Guardrails & Observability: Implement output filtering, context-aware routing, evaluation harnesses, metrics logging, and incident response
- Platform Automation for LLMOps: Drive end-to-end automation with Docker, Kubernetes, GitOps, DevOps, Terraform, etc.
Product Thinking: Ideate, prototype, and scale internal accelerators and reusable components for LLMOps
GenAI Engineering: Productionize LLM-powered applications with modular, reusable, and secure patterns
Pipeline Architecture: Create evaluation pipelines including prompt orchestration, feedback loops, and fine-tuning workflows
Prompt & Model Management: Design systems for versioning, AI governance, automated testing, and prompt quality scoring
Scalable Deployment: Architect cloud-native and hybrid deployment strategies for large-scale inference
Guardrails & Observability: Implement output filtering, context-aware routing, evaluation harnesses, metrics logging, and incident response
DevOps & Platform Automation: Drive end-to-end automation with Docker, Kubernetes, GitOps, Terraform, etc.
Must-Have Technical Skills
- LLMOps frameworks: LangChain, MLflow, BentoML, Ray, Truss, FastAPI
- Prompt evaluation and scoring systems: OpenAI evals, Ragas, Rebuff, Outlines
- Cloud-native deployment: Kubernetes, Helm, Terraform, Docker, GitOps
- ML pipeline: Airflow, Prefect, Feast, Feature Store
- Data stack: Spark/Flink, Parquet/Delta, Lakehouse patterns
- Cloud: Azure ML, GCP Vertex AI, AWS Bedrock/SageMaker
- Languages: Python (must), Bash, YAML, Terraform HCL (preferred)
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: Data Science & Analytics
Role Category: Data Science & Machine Learning
Role: NLP / DL Engineering / Architect
Employement Type: Full time
Contact Details:
Company: Fractal Analytics
Location(s): Pune
Keyskills:
MLops
LLM
Devops