Site Reliability Engineer (SN0330)

February 11, 2025

Job Description

We are seeking a dedicated Site Reliability Engineer (SRE) to enhance the reliability, scalability, and efficiency of our cloud-based systems.

  • The ideal candidate will possess a strong background in cloud infrastructure, automation, and incident management, with a focus on optimizing both system performance and developer productivity.
Key Responsibilities
  • Design, implement, and manage scalable and secure cloud infrastructure using Infrastructure as Code (IaC) methodologies.
  • Develop and uphold Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure software reliability and performance.
  • Monitor infrastructure costs, providing transparency and implementing strategies for cost optimization.
  • Streamline development workflows to reduce cognitive load for engineers, enhancing efficiency and effectiveness.
  • Build and maintain robust Continuous Integration/Continuous Deployment (CI/CD) pipelines to expedite the delivery of code to customers.
  • Develop comprehensive observability solutions for end-to-end system monitoring, ensuring issues are detected and addressed promptly.
  • Lead and continuously improve the incident management process to minimize system downtime and impact.
  • Participate in the on-call rotation, acting as a first responder to swiftly address and resolve system issues.
  • Create and maintain incident response playbooks and conduct post-mortem analyses to prevent future occurrences.
Competencies
  • Adaptability
  • Ambition
  • Effective Communication
  • Mentorship
  • Ownership
  • Technical Proficiency
  • Productivity
  • Trustworthiness

Hiring Team Member

Avula Srivalli
Recruitment Coordinator