A mass of new data
The NIH Human Microbiome Project has since 2008 been collecting and analysing microbial communities from the nasal passages, oral cavity, skin, gastrointestinal tract, and urogenital tract of healthy volunteers, generating some 14.23 terabytes of sequence data. The data is made publicly available for further analysis. A second phase of the project is investigating particular disease conditions, with longitudinal analysis of microbiotal data.
The American Gut Project, founded in 2012, uses crowd-funding to build “large, anonymised open-access datasets” http://americangut.org/. Samples are collected from the gut, and also other sites such as the skin and oral cavity. Individuals participate by submitting their own sample along with a financial contribution, and receive a report on their own microbiome, and how it compares to the general population. The project emphasises the anonymity of the data collected. “All the data collected by the American Gut Project is de-identified (meaning no information is stored that can trace a sample back to a participant)”.
A related UK-based project (British Gut) recruits participants from across Europe. Early results show differences between UK and US individuals with higher microbial diversity in the UK population. Differences were also seen for individuals with mental health issues and obesity.
Whose information is it?
You may think that the collection of microbiome samples from patients and volunteers will not engage privacy law at all. However, the samples will be initially identifiable through name and address details, meaning they are personal data. This data will normally be kept confidential with specific information that might be derivable about health, nutrition, ancestry and personal traits, and shared only with the individual concerned. In order to develop a wider understanding, the collected data is often anonymised and included within a pooled data resource.
Once direct personal identifiers are removed then surely there is nothing to connect the sample with the donor? This may be true, but it is becoming clear that microbiota vary significantly between one individual and another. Although some aspects of the microbiome change rapidly, research indicates that other features of an individual’s microbiome may remain stable over time. This stability may be a feature that makes an individual suitable as a subject for further study or as a potential donor. And it may provide a means for identifying the individual even in the absence of information as to their name and contact details.
A study by the Wellcome Sanger Institute, the Hudson Institute of Medical Research, Australia, and EMBL’s European Bioinformatics Institute, has created the most comprehensive collection of human intestinal bacteria to date using samples from 20 individuals. This database will be developed to include more and more species of bacteria, with sequence fingerprints for each to enable rapid identification. It is getting easier to identify the species present in an individual quickly and accurately. So will this become a new kind of unique “fingerprint”, falling into biometric data?
Because of the importance of microbiome composition for health, the information that your microbiome contains could reveal disease states and predispositions to disease. Increasingly the microbiome is of interest in understanding differences between people in relation to their ability to eat healthily and use nutrients effectively. It can also be a predictor of the likelihood of contracting certain diseases like diabetes and how they may react to treatment.
European law treats health-related, biometric and genetic data as particularly sensitive - it is categorised as “special category data”. This type of information qualifies for enhanced protection. Processing may only be carried out on the basis of strictly defined legal grounds. If donor consent is the basis for processing, it must be “explicit” consent for clearly specified purposes. And consent can be withdrawn by the data subject meaning that any further processing of their data is prohibited.
Removing obvious identification details like the donor’s name and address before inclusion in a database will be helpful. However, it is worth remembering that anonymisation that leaves open the possibility of re-identifying an individual, even if that requires reference to other data sources, is not regarded as adequate under EU law and the data will still be considered personal data. EU privacy guidance, issued even before the tougher requirements of the GDPR came in, discussed genetic data profiles as an example of personal data that can be at risk of identification. Where the identity of the donor has been removed, but combination of publicly available genetic resources and metadata about donors could reveal donor identity, the risk of identification remains.
Will privacy issues stop progress?
While privacy issues are relevant for microbiome research and analysis, they need not act as a brake on progress. Analysing the structure of a project at the outset will help to identify any privacy risks. Improvements can then be built in so that data privacy compliance risk is kept to a minimum.