The ideal candidate would be required to administer Online Platforms, ensure optimum performance and availability of these platforms, provide 3rd and 4th level troubleshooting supportKey Responsibilities of the rolePeople - Line management of an Agile team of Engineers.
Perform the duties of a scrum master : Gather requirements regarding the infrastructure needs of the teamContinuously keep up-to-date with the latest technology trends and engage in research and development activities to ensure that the team is using the most recent and relevant tools for the job.
Attend to admin duties relating to the proper functioning of the team.Update project tickets in Jira and ensure that confluence pages are regularly revised.
Process Continuous improvement of the team processes, quality of deliverables and technical standards.Delivery, Ensure the team is delivering excellent quality solutions, reporting progress and improving delivery if necessary.
Hands on delivery, A proportion of your time will be spent contributing to the team's workloadPartake in the design and implementation of new setupsAssist in process improvement and automation.
Test and design new systems / services and their suitability for possible use in production environmentsProactive monitoring and support of the live and staging infrastructureInvestigate and react to any live or staging issues that might arisePrepare and take part in the periodic release of new softwareAct as an escalation point for issues flagged by customer supportHandle service requests, incidents, problems and change requests, using ITIL best practice.
Assist in process improvement and automation.Assist in intensive investigations into problems that would included detailed log and metrics analysis to provide detailed analysis, resolution and future prevention of such issues.
Key to the person for the roleGreat team playerPassionate about technologyPro-active and flexibleSound fault finding skillsAble to handle pressureAble to be available on call for some 3rd and 4th line support in problem situationsA natural LeaderHandle day to day admin and team with priority and urgencyDesirable skillsManagement and experience with a small team of Technical individualsStrong leadership skill with respect among peers10 Years experience in a Linux based environmentExcellent IT knowledge especially LinuxExcellent communicator, being able to delivery information to both technical and non-technical staffAble to work independently and in a teamExcellent knowledge of monitoring toolsExcellent problem solving abilityDetect and analyse alarms to provide fault isolation and remote troubleshootingTaking accountability for complete resolution of problemsAct as a Senior Technical resource for outages until restorationAbility to analyse logs for troubleshooting issues, including malicious activity or cybersecurity threatsUnderstands how to direct support and guide other team members and keep them motivatedProduction support or implementation of Kubernetes (Preference to Google managed Kubernetes (GKE) for at least 18 monthsTechnology RequiredLinuxAny one of the following : Google GKE / Amazon EKS Azure AKS / Openshift / OKDNetworking skills (including packet capture and analysisSplunk / EFKGrafana, Graphite, Prometheus, Zabbix or similarAnsible, Infrastructure orchestration systemTerraform / Chef / Puppet / SaltstackCode Repositories Github / GitlabMongoDBCI / CD PipelinesDNS / HTTPS certificates