Johannesburg - Our client is currently looking to employ a Site Reliability Engineer with DevOps experience, who is able to take on the responsibility of maintaining existing infrastructure, and helping the business move forward into a more modern, cloud-native, high-availability architecture.
Responsibilities Maintaining our existing application and database servers. Assisting in pioneering our pursuit of a scalable high-availability platform.
Monitoring server usage and implementing optimisations, and preventing unnecessary downtime. Implementing improvements to the platform’s infrastructure.
Implement and manage CI / CD pipelines, and empower other teams to take advantage of these tools. Being on-call and available to respond to alerts from our systems, triage issues, and enforce an escalation policy when incidents and outages occur.
Implement and manage failover and disaster recovery strategies. Manage user access to infrastructure, and enforce security best practices.
Skills Linux system administration : Networking and firewalls. Proficiency with the command line, specifically Bash. Setting up Nginx web servers.
Python experience advantageous, specifically with the following : Django Gunicorn Cloud expertise : AWS / GCP / Azure (AWS Preferred) Cloud-native architecture Microservices architecture Hosted K8s Billing optimisation Experience with orchestration tools such as : Salt Stack (Preferred) Ansible (Preferred) Chef Puppet Setting up monitoring and alerting systems such as : Prometheus / Grafana ELK Stack PagerDuty Setting up CI / CD pipelines.
Containerization experience is advantageous : Docker (Preferred) Kubernetes (Preferred) Packer Vagrant Database administration : PostgreSQL (Preferred) Any relational DB experience advantageous.
Documentation : Ability to write clear and concise documentation in the form of : How-to articles Post-mortems Changelogs Proposals for architectural changes and implementations Networking diagrams Infrastructure diagrams Knowing what things are worth documenting.
Experience working in an agile work environment is advantageous. Ability to manage your own work and time through tickets.
Helping other teams throughout their 2 week sprint cycle.