Personal Data and Identifiability

November 24th, 2021

It is a common problem encountered by data controllers that a dataset is in principle anonymous, but where the numbers within that dataset are sufficiently small, the individual data subject(s) to which they relate may be identifiable, particularly when taken with other publicly available information. Datasets released often refuse to provide the specific number where it is below five for that reason. In NHS Business Authority v Information Commissioner & Spivack [2021] UKUT 192 (AAC), the Upper Tribunal reviewed and revisited that issue.

The request made under FOIA was for a list of dispensaries which had prescribed a particular drug. The NHS Business Authority refused to provide some of the data for dispensaries where fewer than five items had been prescribed because patients could be identified as a result of that information, combined with other information. It was therefore personal data, and exempt under section 40(2) FOIA. The First-tier Tribunal disagreed.

On appeal to the Upper Tribunal, Judge Jacobs, in a relatively short judgment, held that the question posed by the definition of personal data in the GDPR and the DPA 2018 was whether or not a living individual could be identified, directly or indirectly. Either they could or they could not. “The test has to be applied on the basis of all the information that is reasonably likely to be used, including information that would be sought out by a motivated inquirer, as in this case” (at [13]), but “There is no mention of any test of remoteness or likelihood”: at [12].

It is that last sentence which is the point of legal interest. It might be thought to sit uncomfortably with the discussion of the risk of identifiability in cases such as R (Department of Health) v Information Commissioner [2011] EWHC 1430 (Admin), Information Commissioner v Miller [2018] UKUT 229 (AAC) (which was about precisely this sort of statistical context), R (Bridges) v Chief Constable of South Wales Police [2020] 1 All ER 864 (in the Divisional Court) and, in particular, the well-known and challenging judgment of the CJEU in Case C-582/14 Breyer v Federal Republic of Germany (EU:C:2016:779). Judge Jacobs thought otherwise. Where domestic courts and tribunals had used the concept of risk or possibility of identification, they hadn’t meant it, and were not adopting or endorsing a legal test when read carefully and narrowly.

Judge Jacobs emphasised the authority’s evidence, which had in a properly measured way accepted that identification of a patient from the number of dispensed prescriptions was not certain, because there could be other explanations. This was held to be fatal.

Perhaps the most interesting aspect is the discussion of Breyer, which at [46] specifically refers to the concept of “the risk of identification appears in reality to be insignificant”. Nonetheless, Judge Jacobs considered that the judgment, at [45] especially, was consistent with the requirement to actually identify a data subject. The core of what is said is really at [20]-[22] and may be worth setting out in full:

20. There was an argument before me whether the Court was talking about means or outcome. What I take from the judgment is this. Means and outcome are inevitably linked. Speaking of one, inevitably involves speaking of the other. The chance of a particular outcome depends on the means that can be employed and the means available controls the potential outcome. By limiting the means that can be employed, the chances of identification are reduced.

21. That is not, though, the same thing as imposing an additional test of remoteness or significance or likelihood. Eliminating those means will exclude any possibility of identification that is insignificant. Similarly, if this is different, any possibility that is extremely remote is also excluded. But the test remains whether it is possible to identify a specific individual solely by relying on the data available.

22. Identifying a pool that contains or may contain a person covered by the data is not sufficient. Saying that it is reasonably likely that someone is covered by the data is not sufficient. Still less is it sufficient to say that it is reasonably likely that a particular individual may be one of the pool. Linking any specific individual to the data in any of these circumstances does not rely solely on the data disclosed and other data available by reasonable means; it involves speculation. This is the point that the tribunal was making when it referred to guessing. Any break in the chain between the information and the data subject can only be bridged by speculating or guessing. That is especially likely to arise when there is a pool of potential subjects.

Other interpretations are, perhaps, available. But for now it is a potentially significant contribution and one of practical importance to controllers: a higher threshold for bare statistics to be rendered personal data may be bad for reliance on FOIA exemptions, but good for the level of controls applicable under data protection law (i.e. none). Swings and roundabouts innit.

Robin Hopkins acted for the NHS Business Authority.

Christopher Knight

