Site Reliability Engineer @ Uplers

Home > Devops

Site Reliability Engineer

Uplers
4 - 7 years
Pune
2 months ago
Email to a friend
Report this job

Job Description

Site Reliability EngineerExperience: 4 - 7 Years Exp
Salary : Competitive
Preferred Notice Period: Within 30 Days
Shift: 10:00AM to 7:00PM IST
Opportunity Type: Remote
Placement Type: Permanent(*Note: This is a requirement for one of Uplers' Clients)Must have skills required :
Azure DevOps, SRE concepts, TerraData, CDC, CDC tool, NEWREL
Good to have skills :
Aws cloudwatchReflections Info Systems (One of Uplers' Clients) is Looking for:
Site Reliability Engineer who is passionate about their work, eager to learn and grow, and who is committed to delivering exceptional results. If you are a team player, with a positive attitude and a desire to make a difference, then we want to hear from you. Role Overview Description
As a Site Reliability Engineer (SRE) you will be responsible for improving the overall reliability of applications by ensuring its availability, performance, and scalability. Should be able to gather the technical requirements from the DevOps team and the operational requirements from the Application Support team. With the Site Reliability Engineer role being at the heart of solving production problems, should be able to take a holistic approach to troubleshooting and delve deeply into technical details and must acquire the necessary domain knowledge to effectively troubleshoot and recover from an outage as well as monitor applications in production and build alerts as required.Responsibilities include:
Work closely with the application support team.
Monitor critical applications and services to minimize downtime and ensure their availability.
Collaborate with DevOps teams to maintain and monitor CI/CD pipelines.
Deploy new versions to production environments.
Work with project teams to ensure the reliability and maintainability of new and modified releases.
Provide input to risk management practices that will anticipate reliability-related incidents that could adversely impact operations.
Document processes and monitor application performance metrics.
Continuously improve proactive monitoring alert configuration and incident response processes to increase reliability and reduce Mean Time to Recovery (MTTR ).
Optimize performance and cost efficiency through continuous monitoring, trend analysis, and fine-tuning.
Monitor any abnormal usage that can impact the cost or performance and take corrective actions.
Proactively implement preventive measures to improve system reliability.
Maintain runbooks, Standard Operating Procedures (SOPs), diagrams, and documentation for swift incident response.
Conduct post-incident reviews to improve reliability and contribute to the development of resilience strategies.
Achieve Service Level Indicators (SLIs) that are set to meet reliability objectives.Certifications :
Azure Solutions Architect Expert (Microsoft)
AWS Certified Solutions Architect (AWS)
Open Group Certified Enterprise Architect (TOGAF)
PMP or Prince-2 in Project ManagementPrimary Skills :
Monitoring and Analysis
Continuously monitor CDC dashboards to track service performance and analyze reports.
Oversee production and DevOps infrastructure dashboards, ensuring system stability and identifying potential issues.
Observe alerts from New Relic and escalate them to the respective teams as needed.
Identify duplicated New Relic alerts and optimize alert configurations to reduce noise and improve efficiency.
Track daily alerts in production to enhance alert optimization strategies.
Maintain and update a list of dashboards monitored, including details such as widgets, metrics, and threshold values.
Create and manage dashboards for validating and monitoring CPU optimizations for Rapid and CDC services.
Perform sanity checks on Container Memory Utilization, Missing Pods, Container Restarts, Container CPU Utilization, Active Pods, Node Resource Consumption, and Pod Network Status to ensure system health.Release and Deployment Management
Coordinate and execute weekly production releases, ensuring services are deployed with optimized CPU values.
Update central repositories with the latest service configurations and CPU requests.
Perform post-deployment sanity checks to validate service stability after production releases.
Redeploy CDC services with optimized CPU values, ensuring system performance improvements.
Monitor new CPU optimizations for Rapid and CDC services, tracking performance improvements and resource utilization.
Incident Management and RCA Documentation
Conduct incident analysis, identifying root causes and documenting findings for continuous improvement.
Maintain detailed Root Cause Analysis (RCA) documentation to track incidents and resolutions.
Provide reports on incident trends, helping improve response times and preventive measures.Collaboration and Communication
Participate in daily SyncUpsand internal meetings to discuss ongoing tasks, challenges, and improvements.
Sync up with the (NOC) team to align on monitoring strategies and escalations.
Collaborate with the Database (DB) team for performance tuning and issue resolution.
Conduct knowledge transfer (KT) sessions on Rapid ResourceOptimization and related best practices.
Optimization and Continuous Improvement
Track CPU optimization efforts, ensuring proper resource allocation and utilization for Rapid and CDC services.
Analyze performance data to refine resource allocation strategies and improve system efficiency.
Identify and implement best practices for reducing alert noise and optimizing monitoring configurations.Secondary Skills :
Technical Knowledge
Fluent in AWS key services (EBS, S3, AWS Compute, Storage, RDS etc).
Expertise in Kubernetes or any Container Orchestration System.
Knowledge of Infrastructure as a Code.
Linux system administration knowledge.
Knowledge of RDBMS and Document databases.
Knowledge of Monitoring tools including AWS CloudWatch and NewRelic.
Additional certification in Microsoft, Linux, Cisco, AWS or similar technologies is a plus.How to apply for this opportunity:
Easy 3-Step Process:
1. Click On Apply! And Register or log in on our portal
2. Upload updated Resume & Complete the Screening Form
3. Increase your chances to get shortlisted & meet the client for the Interview!About Our Client:
Reflections,a Deloitte Technology Fast 50, is an AI-Powered Innovative Digital Engineering Company on a mission to become the Trusted Global Technology Transformation Partner to the best brands in the world. We deliver ROI driven Data Science & Analytics, AI/ML, Cloud, Hyper Automation, Cybersecurity, App and Product Development, Blockchain and Metaverse solutions to customers across the globe in Retail, Banking and Financial Services, Healthcare, Logistics and Transportation, Automotive, and Media and Entertainment.About Uplers:
Our goal is to make hiring and getting hired reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant product and engineering job opportunities and progress in their career.(Note: There are many more opportunities apart from this on the portal.)So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Uplers
Location(s): Pune

+ View Contact

Login

Candidates can login here to view contacts and apply.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach Resume Max 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Candidates are expected to provide most recent and accurate profile information, inappropriate content is strictly prohibited!

Keyskills: Site Reliability Engineering Aws cloudwatch CDC tool CDC SRE concepts NEWREL TerraData Azure DevOps

Fraud Alert to job seekers!

₹ Not Disclosed

Job application

We will notify the employer with your details. You can also attach a resume or a cover letter.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach ResumeMax 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Similar positions

Cloud Platform Engineer

Accenture

7 - 12 years

Hyderabad

2 days ago

₹ Not Disclosed

Devops Engineer

Cognizant

5 - 10 years

Noida, Gurugram

3 days ago

₹ Not Disclosed

DevOps Engineer (Google Cloud Platform)

Intelligentdx

4 - 9 years

Pune

3 days ago

₹ Not Disclosed

Cloud Platform Engineer

Accenture

3 - 8 years

Chennai

3 days ago

₹ Not Disclosed

Uplers

Uplers is a one-stop digital services company delivering end-to-end web, design, digital marketing, and email production services to businesses and agencies across 52+ Nations. Backed by a team of 550+ digital expert.

Site Reliability Engineer @ Uplers

Home > Devops