Central to future data sharing [1] plans is the principle of public interest, intended to be underpinned by transparency in all parts of the process, to be supported by an informed public. Three principles that are also key in the plan for open policy.
The draft ethics proposals [2] start with user need (i.e. what government wants, researchers want, the users of the data) and public benefit.
With these principles in mind I wonder how compatible the plans are in practice, plans that will remove the citizen from some of the decision making about information sharing from the citizen; that is, you and me.
When talking about data sharing it is all too easy to forget we are talking about people, and in this case, 62 million individual people’s personal information, especially when users of data focus on how data are released or published. The public thinks in terms of personal data as info related to them. And the ICO says, privacy and an individual’s rights are engaged at the point of collection.
The trusted handling, use and re-use of population-wide personal data sharing and ID assurance are vital to innovation and digital strategy. So in order to make these data uses secure and trusted, fit for the 21st century, when will the bad bits of current government datasharing policy and practice [3] be replaced by good parts of ethical plans?
Current practice and Future Proofing Plans
How is policy being future proofed at a time of changes to regulation in the new EUDP which are being made in parallel? Changes that clarify consent and the individual, requiring clear affirmative action by the data subject. [4] How do public bodies and departments plan to meet the current moral and legal obligation to ensure persons whose personal data are subject to transfer and processing between two public administrative bodies must be informed in advance?
How is public perception [5] being taken into account?
And how are digital identities to be protected when they are literally our passport to the world, and their integrity is vital to maintain, especially for our children in the world of big data [6] we cannot imagine today? How do we verify identity but not have to reveal the data behind it, if those data are to be used in ever more government transactions – done badly that could mean the citizen loses sight of who knows what information and who it has been re-shared with.
From the 6th January there are lots of open questions, no formal policy document or draft legislation to review. It appears to be far off being ready for public consultation, needing concrete input on practical aspects of what the change would mean in practice.
Changing the approach to the collection of citizens’ personal data and removing the need for consent to wide re-use and onward sharing, will open up a massive change to the data infrastructure of the country in terms of who is involved in administrative roles in the process and when. As a country to date we have not included data as part of our infrastructure. Some suggest we should. To change the construction of roads would require impact planning, mapping and thought out budget before beginning the project to assess its impact. An assessment this data infrastructure change appears to be missing entirely.
I’ve considered the plans in terms of case studies of policy and practice, transparency and trust, the issues of data quality and completeness and digital inclusion.
But I’m starting by sharing only my thoughts on ethics.
Ethics, standards and digital rights – time for a public charter
How do you want your own, or your children’s personal data handled?
This is not theoretical. Every one of us in the UK has our own confidential data used in a number of ways about which we are not aware today. Are you OK with that? With academic researchers? With GCHQ? [7] What about charities? Or Fleet Street press? All of these bodies have personal data from population wide datasets and that means all of us or all of our children, whether or not we are the subjects of research, subject to investigation, or just an ordinary citizen minding their own business.
On balance, where do you draw the line between your own individual rights and public good? What is fair use without consent and where would you be surprised and want to be informed?
I would like to hear more about how others feel about and weigh the risks and benefits trade off in this area.
Some organisations on debt have concern about digital exclusion. Others about compiling single view data in coercive relationships. Some organisations are campaigning for a digital bill of rights. I had some thoughts on this specifically for health data in the past.
A charter of digital standards and ethics could be enabling, not a barrier and should be a tool that must come to consultation before new legislation.
Discussing datasharing that will open up every public data set “across every public body” without first having defined a clear policy is a challenge. Without defining its ethical good practice first as a reference framework, it’s dancing in the dark. This draft plan is running in parallel but not part of the datasharing discussion.
Ethical practice and principles must be the foundation of data sharing plans, not an after thought.
Why? Because this stuff is hard. The kinds of research that use sensitive de-identified data are sometimes controversial and will become more challenging as the capabilities of what is possible increase with machine learning, genomics, and increased personalisation and targeting of marketing, and interventions.
The ADRN had spent months on its ethical framework and privacy impact assessment, before I joined the panel.
What does Ethics look like in sharing bulk datasets?
What do you think about the commercialisation of genomic data by the state – often from children whose parents are desperate for a diagnosis – to ‘kick start’ the UK genomics industry? What do you think about data used in research on domestic violence and child protection? And in predictive policing?
Or research on religious affiliations and home schooling? Or abortion and births in teens matching school records to health data?
Will the results of the research encourage policy change or interventions with any group of people? Could these types of research have unintended consequences or be used in ways researchers did not foresee supporting not social benefit but a particular political or scientific objective? If so, how is that governed?
What research is done today, what is good practice, what is cautious and what would Joe Public expect? On domestic violence for example, public feedback said no.
And while there’s also a risk of not making the best use of data, there are also risks of releasing even anonymised data [8] in today’s world in which jigsawing together the pieces of poorly anonymised data means it is identifying. Profiling or pigeonholing individuals or areas was a concern raised in public engagement work.
The Bean Report used to draw out some of the reasoning behind needs for increased access to data: “Remove obstacles to the greater use of public sector administrative data for statistical purposes, including through changes to the associated legal framework, while ensuring appropriate ethical safeguards are in place and privacy is protected.”
The Report doesn’t outline how the appropriate ethical safeguards are in place and privacy is protected. Or what ethical looks like.
In the Public interest is not clear cut.
The boundary between public and private interest shift in time as well as culture. While in the UK the law today says we all have the right to be treated as equals, regardless of our gender, identity or sexuality it has not always been so.
By putting the rights of the individual on a lower par than the public interest in this change, we risk jeopardising having any data at all to use. But data will be central to the digital future strategy we are told the government wants to “show the rest of the world how it’s done.”
If they’re serious, if all our future citizens must have a digital identity to use with government with any integrity, then the use of not only our current adult, but our children’s data really matters – and current practices must change. Here’s a case study why:
Pupil data: The Poster Child of Datasharing Bad Practice
Right now, the National Pupil database containing our 8 million or more children’s personal data in England is unfortunately the poster child of what a change in legislation and policy around data sharing, can mean in practice. Bad practice.
The “identity of a pupil will not be discovered using anonymised data in isolation”, says the User Guide, but when they give away named data, and identifiable data in all but 11 requests since 2012, it’s not anonymised. Anything but the ‘anonymised data’ of publicly announced plans presented in 2011, yet precisely what the change in law to broaden the range of users in the Prescribed Persons Act 2009 permitted , and the expansion of purposes in the amended Education (Individual Pupil Information)(Prescribed Persons) Regulations introduced in June 2013. It was opened up to:
“(d)persons who, for the purpose of promoting the education or well-being of children in England are—
(i)conducting research or analysis,
(ii)producing statistics, or
(iii)providing information, advice or guidance,
and who require individual pupil information for that purpose(5);”.
The law was changed so that, individual pupil level data, and pupil names are extracted, stored and have also been released at national level. Raw data sent to commercial third parties, charities and press in identifiable individual level and often sensitive data items.
This is a world away from safe setting, statistical analysis of de-identified data by accredited researchers, in the public interest.
Now our children’s confidential data sit on servers on Fleet Street – is this the model for all our personal administrative data in future?
If not, how do we ensure it is not? How will the new all-datasets’ datasharing legislation permit wider sharing with more people than currently have access and not end up with all our identifiable data sent ‘into the wild’ without audit as our pupil data are today?
Consultation, transparency, oversight and public involvement in ongoing data decision making are key, and well written legislation.
The public interest alone, is not a strong enough description to keep data safe. This same government brought in this National Pupil Database policy thinking it too was ‘in the public interest’ after all.
We need a charter of ethics and digital rights that focuses on the person, not exclusively the public interest use of data.
They are not mutually exclusive, but enhance one another.
Getting ethics in the right place
These ethical principles start in the wrong place. To me, this is not an ethical framework, it’s a ‘how-to-do-data-sharing’ guideline and try to avoid repeating care.data. Ethics is not first about the public interest, or economic good, or government interest. Instead, referencing an ethics council view, you start with the person.
“The terms of any data initiative must take into account both private and public interests. Enabling those with relevant interests to have a say in how their data are used and telling them how they are, in fact, used is a way in which data initiatives can demonstrate respect for persons.”
Professor Michael Parker, Member of the Nuffield Council on Bioethics Working Party and Professor of Bioethics and Director of the Ethox Centre, University of Oxford:
“Compliance with the law is not enough to guarantee that a particular use of data is morally acceptable – clearly not everything that can be done should be done. Whilst there can be no one-size-fits-all solution, people should have say in how their data are used, by whom and for what purposes, so that the terms of any project respect the preferences and expectations of all involved.”
The partnership between members of the public and public administration must be consensual to continue to enjoy support. [10]. If personal data are used for research or other uses, in the public interest, without explicit consent, it should be understood as a privilege by those using the data, not a right.
As such, we need to see data as about the person, as they see it themselves, and data at the point of collection as information about individual people, not just think of statistics. Personal data are sensitive, and some research uses highly sensitive, and data used badly can do harm. Designing new patterns of datasharing must think of the private, as well as public interest, co-operating for the public good.
And we need a strong ethical framework to shape that in.
******
[1] http://datasharing.org.uk/2016/01/13/data-sharing-workshop-i-6-january-2016-meeting-note/
[2] Draft data science ethical framework: https://data.blog.gov.uk/wp-content/uploads/sites/164/2015/12/Data-science-ethics-short-for-blog-1.pdf
[3] defenddigitalme campaign to get pupil data in England made safe http://defenddigitalme.com/
[4] On the European Data Protection regulations: https://www.privacyandsecuritymatters.com/2015/12/the-general-data-protection-regulation-in-bullet-points/
[5] Public engagament work – ADRN/ESRC/ Ipsos MORI 2014 https://adrn.ac.uk/media/1245/sri-dialogue-on-data-2014.pdf
[6] Written evidence submitted to the parliamentary committee on big data: http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/science-and-technology-committee/big-data-dilemma/written/25380.pdf
[7] http://www.bbc.co.uk/news/uk-politics-35300671 Theresa May affirmed bulk datasets use at the IP Bill committee hearing and did not deny use of bulk personal datasets, including medical records
[8] http://www.economist.com/news/science-and-technology/21660966-can-big-databases-be-kept-both-anonymous-and-useful-well-see-you-anon
[9] Nuffield Council on Bioethics http://nuffieldbioethics.org/report/collection-linking-use-data-biomedical-research-health-care/ethical-governance-of-data-initiatives/
[10] Royal Statistical Society – the data trust deficit https://www.ipsos-mori.com/researchpublications/researcharchive/3422/New-research-finds-data-trust-deficit-with-lessons-for-policymakers.aspx
Background: Why datasharing matters to me:
When I joined the data sharing discussions that have been running for almost 2 years only very recently, it was wearing two hats, both in a personal capacity.
The first was with interest in how any public policy and legislation may be changing and will affect deidentified datasharing for academic research, as I am one of two lay people, offering public voice on the ADRN approvals panel.
Its aim is to makes sure the process of granting access to the use of sensitive, linked administrative data from population-wide datasets is fair, equitable and transparent, for de-identified use by trusted researchers, for non-commercial use, under strict controls and in safe settings. Once a research project is complete, the data are securely destroyed. It’s not doing work that “a government department or agency would carry out as part of its normal operations.”
Wearing my second hat, I am interested to see how new policy and practice plan to affect current practice. I coordinate the campaign efforts with the Department for Education to stop giving away the identifiable, confidential and sensitive personal data of our 8m children in England to commercial third parties and press from the National Pupil Database.