Human Capital Development for Big Data in South Africa

written by Jacquelene Friedenthal, Embassy of Switzerland in South Africa

Even though South Africa used to lag a little behind other countries in developing new technologies, it is now fully committed to catching up in certain areas. As described in this article, the progress in research linked to Big Data has been tremendous thanks to the implementation of appropriate measures. For instance, the government is creating new higher education and training institutions to overcome the lack of qualified personnel for the development of Big Data. Furthermore, the authorities also support the implementation of real policy for opening data gathered by the government. The private sectors has picked up on these trends and has also begun to invest in new tools using big data to develop predictive models. However, the government is making sure that privacy is preserved thanks to the Protection of the Personal Information Act. Read more about these subjects in this article.

The humble development of the South African position in big data was introduced by Rhodes University in 1988 with a focus on access to universities and academia. In 1997, 56kbps dial-up connections started to gain popularity. The South African communication parastatal, Telkom followed with a 64kbps internet services. The broadband era arrived in South Arica in 2002 with the launch of the first commercial ADSL product offering download speeds of 512kbps. Wireless broadband services emerged in 2004 offering speeds of 128kbps and 1 Mbps and today South Africans can enjoy fixed and wireless broadband speeds of over 100Mbps if you can afford it. However, most universities and research institutions experience a constant 100Mbps through the South African National Research Network (SANReN). The majority of South Africans (57%) internet traffic is from mobile devices. For example, 13 million of South African 54 million people are on facebook of which 10 million connect through a mobile device. It is anticipated that smartphone usage in SA is expected to top 23.6 million users in 2015 up from 19 million in 2014. Reliable internet connectivity means big data and ten years later South Africa is not only a creator of big data but is actively participating in big data research, networks and mining such as international astronomy initiatives.

How is South Africa engaging in research using Big Data

South African researchers are increasingly relying on high performance computing with a significant investment in cyberinfrastructure. The South Africa Department of Science and Technology (DST) launched the Centre for High Performance Computing (CHPC) in 2007. The CHPC engage in a multitude of big data activities including the training of data scientists, support to research projects that use big data sets and providing opportunities for researches to patriciate and collaborate with international networks. The South African Research Network (SANReN) as illustrated in Figure 1 is a large-scale, high capacity National Research and Education Network which is closely linked to the CHPC providing cyber infrastructure for big data amongst the South African universities.  SANReN in collaboration with the Tertiary Education Research Network of South Africa (TENET) connected most of South African universities to fibre cyber infrastructure leading to excellent Internet speeds for students with a focus on research data.

Figure 1: South Africa University & Research Network at a 100Mbs

South African researchers engage in big data partnerships that includes international public and private institutions e.g. the UK Cambridge partnership and membership of the two experiments at the Large Hadron Collider ATLAS and ALICE at CERN.

Furthermore, South Africa is acting as a niche in both optical and radio astronomy having research facilitates such as the Southern African Large Telescope (SALT) and the Square Kilometre Array (SKA) with MeerKAT radio telescope as the precursor all large research facilities producing big data.

A number of institutional based activities are being deployed in support of big data as an enabler for research activities. For example the Human Heredity and Health in Africa (H3Africa) research platform focus on bioinformatics in collaboration with Cambridge. The new Centre for Broadband Communication was launched at the Nelson Mandela Metropolitan University to conduct pioneering research around optical fibre data transport for the SKA. The UbuntuNet Alliance is also a platform for the AfricaConnect project, which connects African countries to Europe’s research network Geant. This allows South Africa to participate in large European science projects in pursuit of data-intensive scientific discovery.

What do governments do to develop the training of new data scientists

Training or Human Capital Development (HCD) remains the key aspect of research, development and innovation in South Africa. Big data in its many forms and applications allows for ‘training’ in various settings ranging from higher education and training institutions (HETI); public and private partnerships; to large research facilities. Almost all of these training avenues, instruments and partnerships are supported by the South African Government and mostly through the Department of Science and Technology (DST). The DST recorded a throughput of only 15 post-grad students per year over the last three years that have qualified with a data science degree which is far from the estimated 200 required for the SKA.

Data scientists are a key factor in enabling competitive advantage. McKinsey projects that the demand for data scientists in 2018 may be as much as 60% more than the supply. The market is demanding data scientists at a rate of about three times those for statisticians and business intelligence analysts The outlook of SKA is to position South Africa as a hub for big data analytics but even so a shortage in big data skills remains a challenge for SKA with the fear that data will need to be shipped and hosted elsewhere. It is estimated that South Arica would require at least 200 data scientists to participate in the SKA alone SKA is collaborating with the Research Data Alliance (RDA) to assist in the training of data scientists.

IBM stated that “There are major gaps in tertiary institutions’ ability to meet current and future IT skills in South Africa, Africa and across the world. IBM is positioning them in South Africa to support skills training for big data which has led to a partnership between IBM, the Dust radio astronomy institution Astron on the Dome project. The Dome project will set the foundation for scientific data and how to extract optimal information related to societal challenges. IBM has recently launched their second major research, development and innovation laboratory in Africa with the first in Kenya focusing on advancing big data, cloud and mobile technologies with a focus to boost skills development.

In addition to the two international big data platforms, SKA and CERN, is the Human Heredity and Health in Africa (H3Africa) research platform focusing on bioinformatics. H3Africa has partnered with Cambridge in the training of African scientists in big data analytics for bioinformatics.

The infrastructure and associated research programmes in South Africa not only provide data access but actively involves themselves with human capacity development (HCD) in the analytical aspects of big data. The Centre for High Performance Computing (CHPC) provides a critical role in HCD through the support of projects focussing on HIV-1 modelling to climate change which in turn acts as a vehicle for the training of scientists in big data. The UK Cambridge scientists has partnered with the CHPC in developing training courses for lecturers in the building, running and configuring HPC systems in collaboration with DELL as a private partner who wishes to invest in HPC in South Africa.

The new Centre for Broadband Communication and the UbuntuNet Alliance acts as facilities for the training of data scientists.  The African Europe Astronomy Partnership (AERAP) identifies opportunities for cooperation in big data with a focus on HCD. The new school of Computer Science and Applied Mathematics at Wits University for example and other universities is starting to introduce big data at undergrad and postgrad levels. The DST and NRF compliment this initiative through the training of specialists in high-end techniques.

How accessible should Big Data sets be

The South African constitution affords the right of access to any information held by the state whereas the Access to Information Act of 2000 provides for a legal framework to realise the constitution, (C4SA, 2014). The very nature of the constitution and the Information Act provides for a right to big data sets, however not without restrictions as imposed by the Personal Protection Information Act.

Public accessibility of big data sets brings you to the questions of why and how. Many arguments are in favour of accessibility based on the notion of societal gain and political accountability. It simply does not make sense to provide unstructured big data sets to communities where the large majority of society is not skilled to analyse big data sets and where the internet connections does not allow the handling of big data sets. Structured big data sets in South Africa are accessible for limited stakeholders e.g. the South African Data Archive an initiative of the National Research Foundation (NRF). This archive serves as a broker between data providers (for example, statistical agencies, government departments, opinion and market research companies and academic institutions) and the research community excluding the broader public.  The public has access to big data sets through pre-determined comparative analytical presentations as provided by Statistics South Africa drawn from the national census data.  The City of Cape Town has established an open data portal setting the benchmark for accessibility of data at municipal level but again it is structured presentations drawn from big data sets.

Considering the various forms of big data especially those generated by individuals on social media platforms and the societal gains they hold then one may argue how accessible should big data sets be which bring you to another question on confidentiality and ethical issues.  The democratisation of data for the use by everyone furthermore requires a culture of digital risk management across data ecosystems e.g. data automation in dynamic environments.

You may then rather argue that South Africa is pro an open data policy and not the accessibility of big data sets.  This challenges in the “democratization” of data increasing the need for a culture of digital risk management across the data ecosystem which is not as yet well understood or regulated.

Is there a political will to implement a real open data policy 

South Arica is a member state of the Open Government Partnership for the inclusion and consultation as key drivers of accountability, transparency and service delivery but with no reference of linking the government data in support of the objectives. Very little evidence is available to demonstrate the readiness of the South African government for Open Data across all departments or agencies. For example, the Open Data for Africa Portal lists 26 datasets on providing open data on South Africa with only Statistics South Africa contributing.  Very little activity is taking place at a regional (canton) and municipal level with the City of Cape Town as the only major metropolitan cities having an open data portal. However, the data presented in the open data portal is often outdated brining into question the validity. The open data movement is South Africa is promising so far sharing government and research data with a national meeting of stakeholders took place in 2014 in a drive for open data creating an open data society.

An example of a South African initiative towards open data is the South African Data Archive serves as a broker between a range of data providers (for example, statistical agencies, government departments, opinion and market research companies and academic institutions) and the research community. The archive does not only preserve data for future use, but all adds value to the collections. It safeguards datasets and related documentation and attempts to make it as easily accessible as possible for research and educational purposes. However, the South African Data Archive is for selected researchers and is certainly not accessible as open data.

Another example of open data in South Africa is that of the Centre for Higher Education Transformation (CHET) who developed an online, open data platform providing institutional-level data on South African higher education. CHET undertook a study to assess the use of the open data where it was found that the SA government’s higher education database is a closed and isolated data source in the data ecosystem with concerns at both government and university levels about how data will be used and interpreted. Apparent from the study is that intermediaries could and should play a key role in the data ecosystem and have the potential to democratise the impacts and use of open data.

Who own Big Data

3.144 billion Internet Users, 957 million websites, 180 billion emails, 3.6 billion Google searchers, 3 million blog posts, 679 million tweets, 7.4 billion videos viewed, 161 million photos uploaded to Instagram – all in one day in June 2015 generating big data produced by activities between people. Who is and who will owe all the worlds knowledge. Owning the knowledge is one thing but influence how society live, socialize and seek information will be the knowledge owner with the capacity to map out the knowledge, visualise it and sell it through available infrastructure.  That capacity is solely in the hands of the private sector.  The aggregation of data and decision making is mostly in the domain of a few technology companies having the infrastructure to accumulate process and analyse the data which is typical the situation in South Africa as well.

The private sector in South Africa is starting to open their big data sets but through a process of visualising with limited access to the data sets. Initiatives undertaken by the South African financial sectors is focusing on customers experience using big data where they implement predictive modelling solutions to integrate social media analytics into the banking systems. Recently NEDBANK became the first bank in SA to offer a data analytics tool that records customers’ shopping behaviour. The tool will offer behavioural insights mined through big data on a web-based platform that provides customers’ spending patterns, income segmentation, and gender and age demographics. Market Edge is available to Nedbank’s card-accepting businesses and forms part of its focus on small, medium and large enterprises in SA. In this instance the ownership of big data is presented to NEDBANK customers as a tool to enhance their business.

But of course who owns your personal data and what right to they have to use the data. It remains to be seen if the South African Protection of Personal Information Act protect its citizens against Google and Facebook for the the private use of their information. The Court of Justice of the European Union dictated in 2014 that an individual has the right to request the removal of personal information from a search engine and that right override economic interest.

What are the measures taken to insure privacy rights

Recently South Africans experienced the exposure of personal data through ineffective measures in the protection of data by one of their biggest banks and the leak of personal data from a commuting rail carrier. It is believed that the increase in large-scale security incidents is a result of ‘big-data’.

The South African constitution provides that everyone has the right to privacy which includes a right to protection against the unlawful collection, retention, dissemination and use of personal information. South Africa took a bold step against the protection of privacy giving effect to the constitution when President Jacob Zuma signed the Protection of Personal Information Act (POPI) into law on 27th November 2013. The Act requires the implementation of measures to consider how information is processed and secured. The POPI Act is making data hoarding illegal in South Africa. Data hoarding is seen as the gathering of data without a clear business reason or security strategy to protect the underlying information. The POPI law stipulated that data may only be processed as long as there is a clear and defined business purpose to do so. KPMG argues that the POPI Act will allow organisations looking to gain competitive advantage out of “Big Data “will as a result be better placed to collect and harness the data in an ethical and legal manner. However, the POPI act recognised that in view of the constitutional values of democracy and openness, economic and social progress within the framework of the information society, requires the removal of unnecessary impediments to the free flow of information including personal information.

It is anticipated that the POPI Act will also requires widespread reforms that both the private and public sector must introduce to ensure that the personal information and data that they collect are protected by both the private and government sectors for example the South African Police Services website was hacked in 2013 with 16 000 whistle blowers have had their private details exposed. The POPI Act is explicit on what data can be obtained, how data can be used and the requirements that it should kept up-to-date. The public and private sectors generating the data is entirely responsible for protection of individuals personal information’s which is challenging as the IT business processes account for 73% of the 48% outsourcing activities of in South Africa companies.


It is evident that the South African public and private sector has taken bold steps to embrace the use of big data for commercial and research purposes. What remain is the use of big data for development which requires a partnership between the private and public sector. Big data, especially in the public sector is often seen as open data with the argument that big data is accessible which it is not. The most pressing factor in using big data for research is the training of data scientists which is hampered by the very low mathematical throughput at school level. No doubt the SKA will play a major catalyst in the production of data scientists and the positioning of South Africa as a global produced of big data.


