Developers of artificial intelligence (“AI”) continue to push the technology to the next level; not a week goes by without a news story claiming that the next big thing has arrived.
One popular recent example is OpenAI’s latest machine learning system called GPT-3, which was trained on 45TB of text data and has been pounced upon by fellow developers and commentators alike. Its potential is vast but we are still in the early days of the AI revolution.
However, as the AI available to be used grows more powerful, it becomes ever more important to consider the ethical and legal issues involved. In this context, the Information Commissioner’s Office (“ICO”), the independent authority in the UK for enforcing data protection laws, has released its “Guidance on AI and Data Protection”. This guidance follows and supplements its recent guidance on “Explaining decisions made with AI”, which we considered in a previous article here.
The ICO’s guidance is aimed at both those concerned with data protection compliance and technology specialists, and provides a helpful overview of many of the data protection issues that organisations using or developing AI are facing.
The guidance is a must-read but to whet your appetite we have pulled out the key themes that we have identified.
Consider DP early and build it into design
It is difficult to overstate the importance that the guidance places on organisations embedding data protection compliance by design and by default into their AI systems. The takeaway is that the guidance should be understood and actioned by all organisations at the design and build stage – the ICO does not shy away from the fact that this will require more than a non-substantial allocation of resources.
Central to a organisation’s approach will be the process and procedures it puts in place to assess risks. Organisations need to assess the risks to the rights of individuals that arise as a result of the use of AI, assess the measures available to mitigate these risks, and evidence the consideration of less risky alternatives (and justify why they were not used). The ICO is emphatic that in the vast majority of cases, AI will involve the processing of personal data that is likely to result in a high risk to the rights of individuals and that a data protection impact assessment (“DPIA”) will invariably be required. However, if the risk cannot be sufficiently reduced then you must consult with the ICO.
DPIAs: AI’s new best friend
The ICO sees the use of DPIAs as integral to an organisation showing that its use or development of AI is compliant with data protection laws. The guidance goes into great detail as to when they will be required and how to complete them. The obligations on content are not insignificant and the DPIA should be reviewed and updated regularly, such as where an AI system has experienced concept drift.
Applies to procurers of AI, not just developers
Procurers and purchasers of AI systems are not let off the hook. In such circumstances, an organisation will need to ensure its procurement due diligence is sufficiently robust to investigate the data protection position of the AI it is procuring. In particular, an organisation should ensure that the AI protects the rights of individuals, implements data minimisation techniques, and independently evaluate any trade-offs that have been made.
Ongoing monitoring is crucial
The ICO considers the monitoring of an AI system once it has been deployed to be important for a number of reasons, whether that is to maintain systematic accuracy, to review the success of anti-discrimination measures, to improve the security of the system, or to reduce the risk of white-box attacks. Organisations should clearly set out in their internal policies who has monitoring responsibilities and have development pipelines ready to implement any changes that are necessary.
Identify controller / processor relationships
Development of AI often requires the collaboration of a number of different organisations and at various different stages in the AI development lifecycle. Usual guidance on the relationship applies but the ICO flags that an organisation may be a controller in one phase or for one purpose (e.g. creating and training a model) and be a processor in another phase or for a different purpose (processes data to make predictions for a client). This intricate relationship requires an organisation to have sufficient oversight over its relationships to identify its correct status. The ICO acknowledges this becomes even more complex in the context of cloud processing and plans to address this in its revision of its Cloud Computing Guidance in 2021.
Statistical accuracy is important
One way to achieve compliance with the fairness principle is to improve the statistical accuracy of the AI system. However, statistical accuracy is not by itself enough. An organisation’s records should indicate where the output of an AI system is a statistically informed guess rather than a fact, and an organisation should break down statistical accuracy into more helpful measures (e.g. using false positives and negatives). It is important to retrain the model on new data where that is necessary to maintain sufficient statistical accuracy, and measures should be in place to ensure sufficient monitoring post-deployment (by reference to the impact that an incorrect output would have on an individual).
Review of trade-offs
The ICO is clear: you cannot trade data protection away. However, in some cases a trade-off can be justified. For an organisation, it should assess trade-offs when designing the AI system and should consider implementing available technical approaches to minimise the need for trade-offs. It should maintain detailed evidence of its assessment of trade-offs, including the risks to individuals, its methodology for its decision and how this fits with its risk appetite. Mathematical approaches to trade-offs, like differential privacy and constrained optimisation techniques, are favoured by some, but the ICO suggests that these should be supplemented with qualitative methods and that it is still necessary to assess the risks and justify the trade-off.
Training data (to address risks of bias and discrimination)
How to deal with data used to train AI systems is deal with in great detail in the guidance. One issue that organisations should be alive to is the ever-present risk of bias and discrimination, which may often occurs due to imbalanced training data or where training data reflects past discrimination. The ICO recommends that ‘algorithmic fairness’ methods may be appropriate and emphasises that removing protecting characteristics is rarely enough. The ICO discusses when it may be appropriate to re-train the AI model with a diverse training dataset and when it is appropriate to process special category data to address discrimination in a system.
The necessary integration into AI models of third party code and reliance on external dependencies is very likely to increase the risk of security issues. The ICO recommends that the standard requirements for maintaining code and managing risks apply to the use of AI, but suggests that organisations review their risk management practices to ensure personal data is secure in an AI context. For example, de-identification techniques should be applied to training data before it is shared or used and there should be an audit trail of all movements and storage of personal data from one location to another, with intermediate files deleted when no longer required.
The ICO recognise that data minimisation may seem antithetical to AI but sets out in its guidance various techniques that organisations can use. These methods range from “standard feature selection” methods and achieving sufficient statistical accuracy with fewer data points, to implementing robust and proportionate data retention policies and using “perturbation” techniques (i.e. adding noise to make data less accurate at an individual level). The underlying theme is to only process the minimum amount of personal that is necessary to fulfil the purpose you have identified.
Individual rights (explainability and build with individual's rights in mind from the start)
Organisations should be alive to improving the ‘explainability’ of their AI system to ensure compliance with the transparency principle. The ICO also discusses the rights of individuals under data protection laws in the context of AI, making the point that an individual may have rights in personal data that is in the training data, used to make a prediction at deployment (and the subsequent result) or even contained in the model itself. Interestingly, the ICO suggests that it will be unlikely that an organisation can justify not fulfilling a request from a data subject to erase its personal data in the training data. Where it is impossible (or disproportionate) to locate or inform an individual – as often may be the case – then an organisation should take appropriate measures to protect their rights, such as providing public information explaining where the data was obtained and how someone can object.
“Meaningful” human oversight. Where an AI system is being developed to be used as “decision support”, i.e. to assist a human make a decision, it will avoid being subject to some additional requirements that apply to automated decision-making. However, the ICO flags that there must be meaningful human oversight, which is not subject to automation bias or interpretability risk. The ICO makes a number of recommendations, including adequate training of staff on AI (and, importantly, its limitations) and building the AI system so that there is a lower risk of interpretability risk (for example, using local explanations or confidence scores to accompany an output where appropriate)
Take away points
The guidance helps to shed some light on how the AI sector can achieve compliance with data protection laws, notwithstanding the enormous amount of data that AI systems process (45TB!). Organisations should ensure that their staff are suitably briefed to implement the ICO’s guidance and, most importantly, that its technical engineers and compliance officers work together to ensure that data protection is not simply an afterthought but an integral part of the development of the AI.