Standard Bank is a firm believer in technical innovation, to help us guarantee exceptional client service and leading edge financial solutions.
Our growing global success reflects our commitment to the latest solutions, the best people, and a uniquely flexible and vibrant working culture.
To help us drive our success into the future, we are looking for an experienced Big Data Platform Engineer in Digital Platforms & Innovation to join our team at our Johannesburg offices.
Standard Bank is a leading African banking group focused on emerging markets globally. It has been a mainstay of South Africa's financial system for 150 years, and now spans 16 countries across the African continent.
The Big Data Platform Engineer is responsible for the enterprise-wide administration of the Hadoop and Data Science Workbench environments as it relates to Big Data & Data Lake Ecosystems.
This includes but is not limited to Hadoop Data Platform and Data Science Workbench, and develop frameworks in support of a DevSecOps culture.
Areas of competency include :
Solution and infrastructure
Data Management & Governance
User support & education
Must be eager to wear multiple hats and be capable of picking up new and open source technologies at a fast pace.
Key Responsibilities / Accountabilities
Hadoop Environment Responsibilities :
Installing, Deployment and Configuration of HDP
Kafka and Nifi Installations - Stand Alone & HDF (Ambari)
Setting up HA for HDFS, YARN, HBase, Hive and other services
Resource Management - Queues
Monitoring and Reporting
Adding Nodes, Decommissioning of Services.
Service Debugging - Journal Node Sync, File System Health
DistCP / Backend DB Backups / Snapshots
Linux (Ubuntu, SLES, CentOS / RH) Responsibilities :
Administration, Installation, Automation, Optimization of OS Layer
Includes prerequisite Setup and Configuration for Hadoop
Environments (Java, OS Package dependencies, File System
Setup, MySQL, PostGreSQL, General Housekeeping Practices)
Cron, Ansible, Bash & Shell Scripting, and Automation where possible
Log Debugging Enthusiast :
Distributed Data Science Workbench
Setup and Configuration (Anaconda, Jupyter, R Studio / R Server / R Connect, Spark, Python, PySpark, Tensorflow)
Programming skills using Java, Scala, Python & R advantageous
Installation and Configuration / Dependency management, issues and solutions galore
Learning graph DB implementation - neo4j / janusgraph
Data Science Debugging - Long running jobs in Kerberos-enabled environment.
Production Model Support
Hive Maintenance Responsibilities :
Good SQL Knowledge, ODBC / JDBC Connections, Internal Hive
DB recovery, partition repairs, MR / Tez optimizations
Security Responsibilities :
Has knowledge and experience around the below tools and concepts :
Kerberos (always fun)
AD Integration / LDAPS
SSL (Ambari, Ranger, Nifi, Atlas, Oozie, Knox, HBase)
Ranger / Ambari / Knox
Service and Elevated User Access
Governance Responsibilities :
User Management using tools like Ranger, Ambari & AD
Data Management using Atlas & SolR
3rd Party Service Integration
Configuration of connectors for Oozie, PowerBI, Tableau, Qlikview, Ab Initio etc.
Networks - DNS & firewall, beginner at networking infrastructure
Writing Documentation for governance processes
Written implementation guides for 3rd party vendors, and internal members.
Creating and maintaining Build documents and Solution design documents
Understanding and knowledge of Data Centre constraints and requirements
Security Design documents for AD
Collaboration with other teams
Preferred Qualification and Experience
Degree or Diploma in I.T and the required Certification
Certification in Administration and Cloud Administration and or development would be an advantage.
Minimum 1 year Big Data experience
Experience in the following will be an davantage :
Programming skills using Java, Scala, Python & R
Data Governance principles
Knowledge of Big Data best practices
Knowledge / Technical Skills / Expertise
Understanding of the Big Data eco system.
Understanding of Hadoop architecture.
Moderate to good SQL knowledge
Knowledge and technical appreciation of the interconnectivities and interfaces between various technical platforms, operating systems and processes.