Introduction
Artificial intelligence (AI) is now a pervasive phenomenon in the digital world, offering a wide range of everyday uses. At the same time, AI introduces new and unpredictable risks, including the possible invasion of privacy. This article aims to analyse the implications of AI on data privacy and proposes various approaches to mitigate these risks in a rapidly evolving digital environment.
The Constitutional underpinning of data protection is the right to privacy under Article 31 of the Constitution of Kenya, 2010. Further, the Data Protection Act, 2019 (the Act) was enacted to give effect to the right to privacy. Under section 37 of the Act, commercial use of data is expressly forbidden except where consent is obtained, the data subject is anonymised, or the use of data is authorised under written law.
Against this backdrop, AI systems typically involve commercial use of data as it involves gathering, storing and analysing vast amounts of personal data, to generate appealing output which can be sold to third parties. Therefore, there is need to adopt ethical data management practices aimed at forestalling potential data breaches thereby guaranteeing the secure and responsible use of data.
Key Concern
The era of Big Data – characterised by the surge in data collection, creation, and storage due to the expansion of the internet – is a key enabler of the rapid rise of AI. As AI continues to proliferate globally, it is expected that the demand for data will similarly increase thereby pushing companies to collect more and diverse types of data from data subjects. In their relentless pursuit of vast data collection, these companies may bypass the underlying principles of data protection under section 25 of the Act. Therefore, this largely unchecked collection of data presents distinct privacy risks that transcend individual concerns, escalating to societal-level threats. Furthermore, the Act, although comprehensive, falls short of addressing the complexities of AI development and the consequential privacy issues that arise.
Issues Arising
Predictive AI, which refers to a computer program’s ability to recognise patterns, predict behaviours, and project future events using statistical analysis, relies on vast data sets to conduct advanced pattern analysis. Faced with these demands for data, AI developers like OpenAI have had to seek alternative sources of data to construct and train their models.
Generative AI models can also produce original output that resembles human creativity, such as text, images, music, or code, based on the data they have been trained on. These AI models have captured public attention with their widespread use and have sparked concerns about how they are trained, particularly regarding the data they use and the potential privacy risks associated with interacting with them.
A major issue with these AI models is a lack of transparency around how companies acquire their training data, leading to significant privacy concerns. Real-life examples demonstrating the privacy risks posed by AI systems include the following:
- In 2024, a group of eight (8) newspapers sued ChatGPT maker OpenAI and Microsoft, accusing the tech giants of unlawfully using millions of copyrighted news articles without authorization or compensation to train their AI chatbots.
- In 2024, a YouTuber sued OpenAI for transcribing and using his videos to train its artificial intelligence system.
- Closer home, Vodacom Tanzania was sued in a USD 4.3 Million lawsuit by Sayida Masanja, a businessman, who claimed that the telecom operator fed his personal information to OpenAI’s ChatGPT without his consent thereby infringing his privacy.
As AI technologies advance, new avenues for privacy violations are emerging, such as the potential for generative AI systems to infer personal information about individuals or allow users to target others by generating defamatory or impersonating content. As such, there is a likelihood of future product liability lawsuits by data subjects in Kenya being instituted against AI developers like OpenAI.
Further, the data gathered can be exploited to deliberately target individuals for identity theft, fraud, and other cybercrimes. These systems also produce predictive or creative outputs which, through relational inferences, can affect people who were not part of the training datasets or who may have never used these systems. Research shows that when personal, confidential, or legally protected data is included in training datasets, AI systems can retain and later reveal this data as part of their outputs.
As technology becomes increasingly intertwined with our lives, automated systems based on group membership can amplify social biases and stereotypes, leading to adverse decisional outcomes for large segments of the population. People often engage with systems that they may not perceive as highly technical, such as applying for a job, yet AI algorithms may influence whether their applications are reviewed. Another example of how pervasive AI has become is in the healthcare sector where AI systems are increasingly being utilised to analyse patient data as well as support both diagnosis and treatment. These systems collect and examine sensitive medical information, which necessitates robust safeguards to maintain patient privacy.
Given the challenges AI poses to data privacy, as outlined above, it is concerning that we currently rely on AI companies to remove personal information from their training data. Despite the data subject’s rights to erasure and to be forgotten, developers can resist such requests by claiming that the provenance of the data used in training AI cannot be proven – or by ignoring the requests altogether. What is needed is a shift towards ensuring that data collection for AI training aligns with the principles of data protection enshrined under the Act.
Conclusion and Recommendations
Currently, Kenya lacks a dedicated or specific AI legal and regulatory framework. However, several existing regulations and initiatives are pertinent to AI development and usage. The Act serves as a foundational legislative piece for safeguarding data in Kenya. Additionally, the Computer Misuse and Cybercrimes Act, 2018 addresses offences related to digital platforms, which could encompass malicious applications of AI within the country.
The era of Big Data – characterised by the massive surge in data collection, creation, and storage as the internet expanded – was a key enabler of the rapid rise of AI. As AI continues to proliferate globally, it is expected that the demand for data among developers will similarly increase thereby pushing companies to collect more and diverse types of data from data subjects.
In 2018, the Kenyan government also established the Blockchain and Artificial Intelligence Task Force which investigated the potential of AI in the public sector and recommended the creation of an AI policy and regulatory framework for Kenya.
While these measures represent significant progress in mitigating the risks associated with unrestrained data collection and commercialization, the following recommendations can further support AI compliance with data privacy standards:
- Implementing legal frameworks that regulate data intermediaries, that is, data controllers and processors. This can serve as a robust governance mechanism, establishing third parties with clearly defined fiduciary responsibilities aimed at protecting the interests of data subjects. The rationale behind data intermediaries is that an exclusive focus on individual privacy rights may be too narrow, necessitating a more comprehensive and collective approach to data governance. In the case of Large Language Model training – that refers to trained AI models such as ChatGPT – huge datasets are collected and generated, and it would be arduous for each individual linked to this data to negotiate for their data rights. Data intermediaries then come in to give a collective solution as they would play a big role in mediating the relationship between individuals and companies. These entities would function as cooperatives that aggregate data from various sources thereby solving the challenge relating to the volume of consents required in this situation. They would be tasked with managing access to this data in a way that aligns with the values and priorities of the data subjects, ensuring that their interests are safeguarded throughout the AI development process (i.e. through licensing agreements).
- Enactment of the proposed Kenya Robotics and Artificial Intelligence Bill, 2023 as well as implementation of the Artificial Intelligence Code of Practice. These dual regimes would collaborate to advance the responsible and ethical development of AI technologies by providing clear guidelines for organisations. These guidelines would emphasise transparency, explainability, and controllability in AI systems. A robust legislative and regulatory framework will define the responsibilities of AI stakeholders throughout the AI lifecycle, requiring organisations to disclose AI data sources and mitigate risks, particularly those related to data breaches. AI providers will be responsible for monitoring operations, overseeing model development and updates, assessing user and community impacts, and ensuring compliance with legal and ethical standards.
- Adopting a supply-chain approach to data privacy. AI is pegged on the training of data pieces or data input which influences the AI output. This necessitates the need to ensure data set accountability and transparency all through its lifecycle from input to output thereby broadly looking at the entire data ecosystem that feeds AI to ensure compliance. It is therefore essential to embed data protection throughout the entire lifecycle of technologies used to train AI models, ensuring that personal data is automatically safeguarded within these systems.