Job Description
Location: Pune, India
About the Role
We are looking for a Principal DevOps Engineer to join our high-impact team in Pune, India.
You will lead the design and implementation of scalable, secure, and highly available
infrastructure across both cloud and on-premise environments. This role demands a deep
understanding of Linux systems, infrastructure automation, and performance tuning,
especially in high-performance computing (HPC) setups.
As a technical leader, you ll collaborate closely with development, QA, and operations teams
to drive DevOps best practices, tool adoption, and overall infrastructure reliability.
Key Responsibilities
Design, build, and maintain Linux-based infrastructure across cloud (primarily AWS) and physical data centers.
Implement and manage Infrastructure as Code (IaC) using tools such as CloudFormation, Terraform, Ansible, and Chef.
Develop and manage CI/CD pipelines using Jenkins, Git, and Gerrit to support continuous delivery.
Automate provisioning, configuration, and software deployments with Bash, Python, Ansible, etc.
Set up and manage monitoring/logging systems like Prometheus, Grafana, and ELK stack.
Optimize system performance and troubleshoot critical infrastructure issues related to networking, filesystems, and services.
Configure and maintain storage and filesystems including ext4, xfs, LVM, NFS, iSCSI, and potentially Lustre.
Manage PXE boot infrastructure using Cobbler/Kickstart, and create/maintain custom ISO images.
Implement infrastructure security best practices, including IAM, encryption, and firewall policies.
Act as a DevOps thought leader, mentor junior engineers, and recommend tooling and process improvements.
Maintain clear and concise documentation of systems, processes, and best practices.
Collaborate with cross-functional teams to ensure reliable and scalable application delivery.
Required Skills Experience
9+ years of experience in DevOps, SRE, or Infrastructure Engineering.
Deep expertise in Linux system administration, especially around storage, networking, and process control.
Strong proficiency in scripting (e.g., Bash, Python) and configuration management tools (Chef, Ansible).
Proven experience in managing on-premise data center infrastructure, including provisioning and PXE boot tools.
Familiar with CI/CD systems, Agile workflows, and Git-based source control (Gerrit/GitHub).
Experience with cloud services, preferably AWS, and hybrid cloud models.
Knowledge of virtualization (e.g., KVM, Vagrant) and containerization (Docker, Podman, Kubernetes).
Excellent communication, collaboration, and documentation skills.
Nice to Have
Hands-on with Lustre or other distributed/parallel filesystems.
Experience in HPC (High-Performance Computing) environments.
Familiarity with Kubernetes deployments in hybrid clusters.