This is bothering me in current and future data protection.
When is a profile no longer personal?
I’m thinking of a class, or even a year group of school children.
If you strip off enough identifiers or aggregate data so that individuals are no longer recognisable and could not be identified from other data — directly or indirectly — that is in your control or may come into your control, you no longer process personal data.
How far does Article 4(1) go to the boundary of what is identifiable on economic, cultural or social identity?
There is a growing number of research projects using public sector data (including education but often in conjunction and linkage with various others) in which personal data are used to profile and identify sets of characteristics, with a view to intervention.
Let’s take a case study.
Exclusions and absence, poverty, ethnicity, language, SEN and health, attainment indicators, their birth year and postcode areas are all profiled on individual level data in a data set of 100 London schools all to identify the characteristics of children more likely than others to drop out.
It’s a research project, with a view to shaping a NEET intervention program in early education (Not in Education, Employment or Training). There is no consent sought for using the education, health, probation, Police National Computer, and HMRC data like this, because it’s research, and enjoys an exemption.
Among the data collected BAME ethnicity and non-English language students in certain home postcodes are more prevalent. The names of pupils and DOB and their school address have been removed.
In what is in effect a training dataset, to teach the researchers, “what does a potential NEET look like?” pupils with characteristics like Mohammed Jones, are more likely than others to be prevalent.
It does not permit the identification of the data subject as himself, but the data knows exactly what a pupil like MJ looks like.
Armed with these profiles of what potential NEETs look like, researchers now work with the 100 London schools, to give the resulting knowledge, to help teachers identify their children at risk of potential drop out, or exclusion, or of becoming a NEET.
In one London school, MJ, is a perfect match for the profile. The teacher is relieved from any active judgement who should join the program, he’s a perfect match for what to look for. He’s asked to attend a special intervention group, to identify and work on his risk factors.
The data are accurate. His profile does match. But would he have gone on to become NEET?
Is this research, or was it a targeted intervention?
Are the tests for research exemptions met?
Is this profiling and automated decision-making?
If the teacher is asked to “OK” the list, but will not in practice edit it, does that make it exempt from the profiling restriction for children?
The GDPR also sets out the rules (at Article 6(4)) on factors a controller must take into account to assess whether a new processing purpose is compatible with the purpose for which the data were initially collected.
But if the processing is done only after the identifiers are removed that could identify MJ, not just someone like him, does it apply?
In a world that talks about ever greater personalisation, we are in fact being treated less and less as an individual, but instead we’re constantly assessed by comparison, and probability, how we measure up against other profiles of other people built up from historical data.
Then it is used to predict what someone with a similar profile would do, and therefore by inference, what we the individual would do.
What is the difference in reality, of having given the researchers all the education, health, probation, Police National Computer, and HMRC — as they had it — and then giving them the identifying school datasets with pupils’ named data, and saying “match them up.”
I worry that we are at great risk, in risk prediction, of not using the word research, to mean what we think it means.
And our children are unprotected from bias and unexpected consequences as a result.