Job Description
We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. In this role, you will work to improve system reliability, automate processes, and ensure the seamless operation of our applications. You will collaborate closely with development teams to enhance production stability and scalability, while continuously monitoring and optimizing system performance.
Responsibilities
- Design, implement, and automate deployment, monitoring, and scaling processes.
- Monitor application reliability and proactively address performance issues.
- Build and maintain robust CI/CD pipelines to streamline development and deployment workflows.
- Troubleshoot and resolve system incidents to minimize downtime and service disruptions.
- Collaborate with developers to enhance production stability and ensure smooth system operations.
Skills
- AWS, Azure, or Google Cloud Platform (GCP).
- Docker and Kubernetes
- Python and Bash
- Terraform or Ansible
- Strong problem-solving skills
- Excellent communication and teamwork skills