Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Site Reliability Engineer @ PubMatic

Home > Devops

 Site Reliability Engineer

Job Description

About The Role
The Ad Server and RTB Production Infrastructure is pivotal to ensuring our software applications reliability, availability, and overall excellence
As an SRE Engineer, you will be responsible for the Ad Server and RTB Production Infrastructure
Your essential duties encompass ensuring the seamless operation and optimal performance of large-scale distributed software applications
Your role revolves around maintaining a robust and high-performing environment, contributing to the reliability of our services, and innovating solutions to guarantee 24/7 availability
By leveraging your technical expertise and dedication, you contribute to maintaining a seamless experience for our users while upholding the highest standards of operational excellence
Your specific responsibilities include:
What You'll Do
Operational Support
Be a primary point of contact for operational support of multiple large-scale distributed software applications in the Ad Server environment, Monitor availability of applications, promptly detect anomalies, analyze the impact, debug the problems in production, and follow up for the resolution by working closely with the engineering team, Maintain services once they are live by measuring and monitoring availability, latency, and overall system health, Diligently work with the engineering team to expedite the resolution of incidents and ensure a swift return to normal operations, Be innovative in building dashboards, adding metrics, writing automation scripts to reduce operation toil, and streamlining processes to enhance system reliability and stability, Design and construct software and systems to effectively manage the Ad Serving platform, its underlying infrastructure, and applications, On Call Availability and Support
Work in shifts to provide continuous on-call support for the production systems and resolve issues on your own by using predefined handbooks, Show a sense of urgency for high-priority issues and arrange war rooms to resolve the problems, Provide timely updates for high-priority issues and do handovers when a problem needs to be worked out 24*7, Conduct post-incident reviews to identify root causes, recommend preventive measures, and contribute to a culture of learning and improvement, We'd Love for You to Have
Three plus years experience in software development, Ability to program using programming languages like C or C++, Scripting languages like Shell or Python, Good to have prior experience in technical engineering, A proactive approach to identify the problems, performance bottlenecks, and areas of improvement, Must know, Networking, Database (MySQL) and Linux System concepts, Debugging and analyzing the core dumps, Hands-on experience with monitoring and observability tools like Grafana, Nagios, Influx, ELK, etc
Familiarity with orchestration tools like Docker and Grafana and incident management systems like Zenduty, Excellent communication and collaboration skills, with the ability to work effectively across teams, Self-motivated and positive mindset to examine any incidents, Excellent interpersonal, written, and verbal communication skills, Should have a bachelors degree in engineering (CS / IT) or equivalent degree from well-known Institutes / Universities, Additional Information
Return to Office: PubMatic employees throughout the global have returned to our offices via a hybrid work schedule (3 days ?in office? and 2 days ?working remotely?) that is intended to maximize collaboration, innovation, and productivity among teams and across functions, Benefits: Our benefits package includes the best of what leading organizations provide, such as paternity/maternity leave, healthcare insurance, broadband reimbursement
As well, when were back in the office, we all benefit from a kitchen loaded with healthy snacks and drinks and catered lunches and much more!
Diversity and Inclusion: PubMatic is proud to be an equal opportunity employer; we dont just value diversity, we promote and celebrate it
We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status, About PubMatic
PubMatic is one of the worlds leading scaled digital advertising platforms, offering more transparent advertising solutions to publishers, media buyers, commerce companies and data owners, allowing them to harness the power and potential of the open internet to drive better business outcomes, Founded in 2006 with the vision that data-driven decisioning would be the future of digital advertising, we enable content creators to run a more profitable advertising business, which in turn allows them to invest back into the multi-screen and multi-format content that consumers demand,

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: PubMatic
Location(s): Pune

+ View Contactajax loader


Keyskills:   software development elk debugging dbms shell scripting linux system communication skills

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Cloud Platform Engineer

  • Accenture
  • 7 - 12 years
  • Hyderabad
  • 3 days ago
₹ Not Disclosed

Azure Devops Engineer

  • Vlink
  • 3 - 8 years
  • Noida, Gurugram
  • 3 days ago
₹ 20-25 Lacs P.A.

Cloud Engineer

  • Cradlepoint
  • 4 - 8 years
  • Noida, Gurugram
  • 3 days ago
₹ Not Disclosed

Senior Cloud DevOps Engineer

  • NICE
  • 4 - 7 years
  • Pune
  • 3 days ago
₹ Not Disclosed

PubMatic

PubMatic is the automation solutions company for an open digital media industry. Featuring the leading omni - channel revenue automation platform for publishers and enterprise - grade programmatic tools for media buyers, PubMatic????s publisher - first approach enables advertisers to access premium ...