GDPR – one year on: data protection and artificial intelligence

25 JULY 2019

| 4 min read

It is more than a year now since the introduction of Europe’s new data privacy regime, the GDPR. There was a flurry of activity leading up to and following the launch date in May 2018, GDPR fell out of the headlines. Recent high profile penalty announcements running into the hundreds of millions have again highlighted the dangers of failure to comply. We consider some of the issues that are keeping business leaders and privacy regulators awake at night. In this article we focus on data privacy issues that arise in the context of artificial intelligence.

Why is there a privacy problem with AI?

AI might be summarised as being a collection of technologies and methods that draw on large data libraries as a “control” factor, to drive an element of responsiveness to, or enhanced analysis of, “real” data and scenarios. We have got used to real-time data. Current AI technology sees computers responding to or interacting with real-time events, in real time. It isn’t true “intelligence”, but it is a significant step beyond the “hard wired” responsiveness of predecessor technology, and it can involve machines making decisions that affect humans – indirectly (e.g. as a result of research that uses AI to make its findings) or even directly. This is one source of the privacy problem: do I want a machine to make decisions that affect me, and do I have a choice?

Effective AI requires oceans of data samples, and this is another source of the privacy problem. AI uses, and can be used to interact with or analyse, large data sets in many different situations, and many of these deploy information about individuals. We see its application in healthcare, for example, where information-based decision making can become quicker and more accurate. Some AI does not use personal data: for example it might use textual data (e.g. literary fiction), or financial or other statistical data. Where AI needs to use data libraries about humans (e.g. for research), the data is sometimes anonymised so that individuals cannot easily be identified from them, but anonymisation is technically hard to achieve, and under the GDPR individuals should be informed before their data is anonymised. On the other hand, often the data libraries cannot be anonymised. AI is used to assist with facial recognition by law enforcement agencies and service providers like airports. Image data can be analysed to identify individuals among a crowd quickly and efficiently. To identify an individual, or a type of individual, requires a huge data library of “control” photographs, each of which will unavoidably identify the human subject of the photo.

AI’s need for oceans of data brings a subtler privacy headache, which is that acquiring that much data is often beyond the reach of any one individual or organisation. So, where personal data is involved, a number of data controllers need to pool their data (in raw, or anonymised form) to feed the AI machine. Data sharing supply chains are becoming fundamental to AI. Like all supply chains, data supply chains need to be carefully managed, because a compliance error by one party in the chain (e.g. in failing to provide appropriate information to affected data subjects, failing to establish a legal basis for sharing their data for AI purposes, or failing to comply with rights exercised in relation to automated decision-making) can impact on the legal and operational viability of the entire AI operation – and the reputation and liability of its participants.

Many have concerns about the rapid growth of AI that makes use of personal data. Data may be collected in situations where individuals are unaware of it. AI may be used to make decisions about people’s lives, like whether they will be invited for interview, or offered a credit card, or stopped by the police or border control authorities. The data itself may be of a particularly sensitive nature – images from diagnostic scans for example. And the way AI algorithms analyse data and reach conclusions can be difficult to understand, particularly where machine-learning is involved to improve the system’s effectiveness “on the job”, so it can be difficult to tell individuals what they need to be told under GDPR.

The use of facial recognition by law enforcement authorities has encountered a strong negative reaction. Images taken from CCTV that can quickly be compared with existing “control” databases to track down suspects raises obvious privacy issues. Indeed, two US cities have recently banned the use of facial recognition technology by public authorities and others are considering bans. Public concerns over this kind of use are made worse by statistics indicating that under current technology, most “matches” are wrong, and false positives are more likely with ethnic minority and female faces. The UK Home Secretary recently backed trials of facial recognition technology by British police forces, but noted that legislation would be needed before it could be used more widely.

Getting to grips with the privacy problem

In January, the Council of Europe published Guidelines on Artificial Intelligence and Data Protection. The Guidelines recognise the valuable part that AI has to play across many fields, but notes the possibility of adverse consequences for individuals.

The key elements of this approach are: lawfulness, fairness, purpose specification, proportionality of data processing, privacy-by-design and by default, responsibility and demonstration of compliance (accountability), transparency, data security and risk management.

We can swiftly illustrate how this Guidance manages to capture the key risk hot-spots. Unlawfulness: a discriminatory decision made by a machine. Fairness, transparency and purpose specification: a failure by a data controller in the data supply chain to tell its data subjects that their personal data will be used as part of an AI library, or that decisions that affect them will be made by or using AI. Proportionality: using too much data, or allowing machines to lead on making decisions that have a disproportionate impact on individuals. Privacy by design, and unaccountability: a failure to build checks and overrides into AI, e.g. so that individuals can make an effective decision to opt out of having their data contributed to a data library, or to being subject to automated decisions. Data security: a failure to ensure that all points in the data supply chain, the AI technology and its outputs are kept confidential, secure and available.

The European Union’s High-Level Expert Group on AI includes data privacy in its detailed policy and guidance. In April it issued Ethics Guidelines for Trustworthy AI and more recently, a set of Policy and Investment Recommendations for Trustworthy AI. These have data privacy right at the heart of the analysis.

UK privacy regulator, the ICO, considered the role of privacy law for AI at length in 2014. The ICO’s paper “Big data, artificial intelligence, machine learning and data protection” still provides a valuable discussion of privacy issues arising from AI. However, although the analysis was updated in 2017 to include discussion of the GDPR, it is not fully up to date with the new legislation and will need further review.

So what does the GDPR mean for AI development? Note that the GDPR does not address specific technologies individually - its principles and obligations apply in a technology-blind way. But it has a lot to say to those designing AI systems. In any project involving personal data, it will be important to analyse what is actually planned in order to identify the privacy issues that might arise. We consider a few that might be prominent in an AI context.
Automated processing and profiling

Automated processing and “profiling” bring in a set of special requirements under GDPR. (Profiling is automated processing involving evaluation of someone’s personal aspects to analyse or predict their health, work performance, economic situation, etc.) In these situation, information given to the data subject at the time of data collection should include not only information that automated decision-making will be used, but also “meaningful information” about the logic being employed, the significance and the likely consequences for the data subject.

The GDPR gives an individual data subject a specific right not to be subject to a decision based solely on automated processing, if it results in a legal effect for them or “similarly significantly affects” them. There are exceptions to this requirement – explicit consent, for example – but in most situations of this type data controllers will have to build in safeguarding measures. This can include arrangements for the individual to contest the decision or have a human decision-maker review the decision.

Developers creating AI systems, and industry leaders who are establishing and managing data supply chains, must consider these requirements up front in order to meet the GDPR requirement of privacy by design and by default. Will the systems they are building permit compliance? Will human intervention in automated processes be possible, for example, if an individual calls for it?
Achieving transparency

A core element of the GDPR is transparency, so that individuals are able to understand what information about them is being processed and how.

Explaining to individuals in a meaningful way how AI algorithms work can be difficult. The ICO is working with the Alan Turing Institute on a collaborative research project (Project explAIn) to understand how artificial intelligence decisions can best be explained to the individuals. Bringing together the views of citizens’ juries and industry expert panels, the interim report defines an approach for future guidance. The views of the citizens’ juries are particularly interesting – the degree of information about processing should vary with context. In a situation like healthcare, the emphasis was on accuracy rather than transparency. But where a negative outcome for the individual might result from the decision (recruitment or criminal justice) transparency and ability to question and challenge were seen as much more important.

Finding appropriate ways to explain how the system works, without overburdening non-specialists with detailed information is a challenge. But it will be necessary not only to achieve compliance with the law, but also to build viable data supply chains, and trust among the public. Failing to effectively inform individuals about what information is gathered and how it is used is likely to do significant damage in the longer term, when it eventually comes to light.
Children

Special care is needed for processing that involves data about those under 18. In many AI contexts, children will not be included (recruitment, financial services etc). But in situations like healthcare analysis children and adults are likely to be equally affected. The GDPR requires extra measures to be taken to protect children. This is particularly important in situations involving automated decision making (explored in Guidelines issued by the EU-level grouping of data protection regulators).

Transparency is a particular area of concern. The GDPR expects processing activity involving children’s data to be explained in a way that children can understand. While explaining automated data processing to an uninformed and perhaps time-poor adult will be a challenge, developing an explanation suitable for children needs even greater thought and care.

Data controllers should generally avoid profiling children for marketing purposes. For example, analysis of a child’s online gaming behaviour to target advertising is likely to be difficult to justify. In the context of electronic interactions with children, the GDPR also sets a disruptive age threshold that applies if the child’s consent is relied on as the legal basis for using the child’s personal data. The GDPR allows for EU member states to set the age of data protection consent in this context at between 13 and 16 years of age, and the UK legislature has chosen 13 years of age. The usual challenges involved in relying on consent (i.e. the risk that it can be withheld or withdrawn), and the difficulty of satisfying the requirements for alternative legal bases (e.g. obtaining an alternative legal basis, perhaps involving a parent or guardian) sit in the background. Use of personal data relating to children in AI, and the sharing and management of child personal data in AI data supply chains, is particularly challenging.
Take-away points

This is not a comprehensive review. We have looked at a few of the privacy requirements likely to arise in an AI development context. Other areas like permission (legal basis) management, purpose limitation, anonymisation of data, data retention, contract management and compliance record-keeping will also be important.

Overall, the more that data privacy obligations are built into a project the better the prospects of avoiding difficulties later on. The GDPR requires a “privacy by design” approach where personal data is collected and used. A Data Protection Impact Assessment is likely to be appropriate in most projects of this kind, and provides a welcome opportunity for consulting with stakeholders (including but not limited to data subjects, sector regulators, investors and internal organisational stakeholders). It is worth bearing in mind that AI projects often move rapidly, with little opportunity for early stage involvement for those responsible for compliance. Awareness of privacy as an issue should be raised with developers so that they are alive to the need to involve privacy specialists before the project is at an advanced stage.

Similarly, it can be difficult to get board-level engagement with data protection compliance. GDPR achieved considerable attention when it appeared a year ago not least because of the headline-grabbing fines that can be imposed on organisations falling short. We have begun to see these powers in action, with the ICO recently proposing fines of £99m and £183.39m for data privacy breaches. As organisations have brought their day-to-day activity into compliance, the level of visibility and focus has declined. However, where data privacy is a live issue for the core development activity of a business, it should stay on the agenda.

We expect to see new guidance from the ICO in the coming months. This is likely to help developers with the practical issues, offering technology-specific advice and greater clarity about what is expected.

Our content explained

Every piece of content we create is correct on the date it’s published but please don’t rely on it as legal advice. If you’d like to speak to us about your own legal requirements, please contact one of our expert lawyers.

Contact

David Hall

Principal Associate
David Hall

Principal Associate
- +(44)(0)121 456 8328
- +(44)(0)7918 671660
- Contact David
  
  Contact David Hall
  
  * = required
  
  First name *
  
  Please enter your first name
  
  Last name *
  
  Please enter your last name
  
  Organisation
  
  Email address *
  
  Please enter your email address Please enter a valid email address
  
  Phone number
  
  Your message *
  
  Please add your message
  
  Send enquiry
  
  Mills & Reeve will use the information you provide in this form in accordance with our privacy policy. We may from time to time send you general updates by email or post that we think you will find of interest. This includes notification of upcoming event and updates or alerts containing relevant legal news. You can update your preferences at any time and will be able to easily unsubscribe from anything that you do not wish to receive.
  
  check
  
  Thank you
  
  Thank you for your enquiry. We will be in touch shortly.
  
  Close
  
  Reset form
  
  Close overlay
- Birmingham

GDPR – one year on: data protection and artificial intelligence

Why is there a privacy problem with AI?

Our content explained

Further reading

David Hall

Contact David Hall

Visitors

Existing clients

Staff