Job Category: Software EngineeringJob Description:Description: As an SRE SME , the engineer should ensure the reliability, availability, performance and scalability of the business applications. Close coordination with business, IT and Operations team and manage operational challenges.Responsibilities:Key responsibilities:
- Reliability and uptime:
- Ensure high availability and reliability of critical business systems by implementing best practices in monitoring, alerting and incident management.
- Automation
- Develop and implement automation tools to eliminate manual processes and improve system efficiency.
- Incident Management.
- Respond to system outages, troubleshoot issues and provide quick resolution to minimize the downtime and business impact.
- Monitoring and Observability
- Design and manage observability solutions using tools like Prometheus, Grafana or Datadog to monitor system performance.
- Performance and Optimization.
- Identify and fix bottlenecks in applications and infrastructure to improve overall system performance.
- Capacity planning.
- Analyse system demands and ensure appropriate scaling to meet future requirements.
- Service Level Objectives:
- Define, measure and monitor SLO and SLA’s to maintain system health and reliability.
- Collaboration.
- Partner with development and operations teams to integrate reliability into software lifecycle CI/CD pipelines.
Qualifications:Skills & Qualifications:
- Bachelor’s degree in Computer science, engineering or related field.
- 7+ years of experience in SRE, Devops or similar roles.
- Strong programming skills in Python, Golang, Java.
- Expertise with cloud platforms (AWS, Azure)
- Proficiency in Devops tools, Kubernetes, Docker, Terraform, Jenkins
- Strong knowledge of monitoring and observability tools (Prometheus, Grafana, ELK stack.
- Familiar with incident management platforms.
- Good understanding of networking, load balancers and DNS.
Preferred.
- Experience with distributed systems and microservices architectures.
- Knowledge of database performance tuning SQL / NoSQL.
Soft skills.
- Strong problem solving and analytical skills.
- Excellent communication and collaboration abilities.
Ability to work in a fast paced environment and handle multiple tasks effectively.
Discover more from
Subscribe to get the latest posts sent to your email.
