Your browser does not support javascript! Please enable it, otherwise web will not work for you.

SRE-II @ Mindtickle

Home > IT Network

 SRE-II

Job Description

As an SRE II , you will play a key role in ensuring our mission-critical systems reliability, performance, and scalability. You will work closely with engineering teams to design, implement, and maintain infrastructure that supports high-volume data-intensive applications. Your expertise in monitoring, troubleshooting, and automation will drive operational excellence across our distributed environment.
 
What s in it for you
  • Maintain and improve the reliability, availability, and performance of high-volume, data-intensive applications.
  • Design, implement, and enhance monitoring, logging, and alerting solutions at scale.
  • Collaborate with development teams to optimize system architecture and reliability.
  • Manage and troubleshoot distributed systems in a Linux-based production environment.
  • Leverage AWS cloud services to scale infrastructure efficiently.
  • Utilize Kubernetes for container orchestration, ensuring optimal resource utilization and deployment strategies.
  • Implement CI/CD pipelines using GitLab to automate deployments and operational tasks.
  • Use infrastructure as code (IaC) tools such as Terraform and CloudFormation for provisioning and managing cloud resources.
  • Implement observability best practices using Grafana, Prometheus, Thanos, and Loki.
  • Perform root cause analysis (RCA) and proactively address performance bottlenecks and system failures.
  • Ensure security best practices and compliance across all infrastructure components.
We d love to hear from you, if you:
  • Have 3+ years of experience in Site Reliability Engineering or related fields.
  • Possesses strong Linux fundamentals with a deep understanding of system internals.
  • Expertise in troubleshooting and problem-solving in distributed environments.
  • Have hands-on experience with logging and monitoring solutions at scale.
  • Are proficient in at least one programming language (preferably Python).
  • Have strong experience with AWS services and Kubernetes.
  • Have exposure to CI/CD pipelines, preferably using GitLab CI/CD.
  • Have experience with infrastructure as code (Terraform, CloudFormation).
  • Are familiar with observability tools such as Grafana, Prometheus, Thanos, and Loki.
Preferred Qualifications
  • Experience in performance tuning and capacity planning.
  • Knowledge of incident management and post-mortem analysis processes.
  • Familiarity with security best practices in cloud environments.
  • Experience in automating operational tasks using scripting and configuration management tools

Job Classification

Industry: Software Product
Functional Area / Department: Engineering - Hardware & Networks
Role Category: IT Network
Role: System Administrator / Engineer
Employement Type: Full time

Contact Details:

Company: Mindtickle
Location(s): Pune

+ View Contactajax loader


Keyskills:   Performance tuning System architecture Automation Linux Configuration management Incident management Distribution system Monitoring Python Capacity planning

 Fraud Alert to job seekers!

₹ Not Disclosed

Mindtickle

MindTickle- Startup MindTickle is world's leading SAAS platform for sales readiness founded in 2011 by graduates of IITs, ISB & Stanford. Today, our rapidly growing team consists of 250+ people working from Pune & California. We are proud to be one of the very few category-defining India...