Tag Archives: ethics

Gotta know it all? Pokémon GO, privacy and behavioural research

I caught my first Pokémon and I liked it. Well, OK, someone else handed me a phone and insisted I have a go. Turns out my curve ball is pretty good. Pokémon GO is enabling all sorts of new discoveries.

Discoveries reportedly including a dead man, robbery, picking up new friends, and scrapes and bruises. While players are out hunting anime in augmented reality, enjoying the novelty, and discovering interesting fun facts about their vicinity, Pokémon GO is gathering a lot of data. It’s influencing human activity in ways that other games can only envy, taking in-game interaction to a whole new level.

And it’s popular.

But what is it learning about us as we do it?

This week questions have been asked about the depth of interaction that the app gets by accessing users’ log in credentials.

What I would like to know is what access goes in the other direction?

Google, heavily invested in AI and Machine intelligence research, has “learning systems placed at the core of interactive services in a fast changing and sometimes adversarial environment, combinations of techniques including deep learning and statistical models need to be combined with ideas from control and game theory.”

The app, which is free to download, has raised concerns over suggestions the app could access a user’s entire Google account, including email and passwords. Then it seemed it couldn’t. But Niantic is reported to have made changes to permissions to limit access to basic profile information anyway.

If Niantic gets access to data owned by Google through its use of google log in credentials, does Nantic’s investor, Google’s Alphabet, get the reverse: user data from the Google log in interaction with the app, and if so, what does Google learn through the interaction?

Who gets access to what data and why?

Brian Crecente writes that Apple, Google, Niantic likely making more on Pokémon Go than Nintendo, with 30 percent of revenue from in-app purchases on their online stores.

Next stop  is to make money from marketing deals between Niantic and the offline stores used as in-game focal points, gyms and more, according to Bryan Menegus at Gizmodo who reported Redditors had discovered decompiled code in the Android and iOS versions of Pokémon Go earlier this week “that indicated a potential sponsorship deal with global burger chain McDonald’s.”

The logical progressions of this, is that the offline store partners, i.e. McDonald’s and friends, will be making money from players, the people who get led to their shops, restaurants and cafes where players will hang out longer than the Pokéstop, because the human interaction with other humans, the battles between your collected creatures and teamwork, are at the heart of the game. Since you can’t visit gyms until you are level 5 and have chosen a team, players are building up profiles over time and getting social in real life. Location data that may build up patterns about the players.

This evening the two players that I spoke to were already real-life friends on their way home from work (that now takes at least an hour longer every evening) and they’re finding the real-life location facts quite fun, including that thing they pass on the bus every day, and umm, the Scientology centre. Well, more about that later**.

Every player I spotted looking at the phone with that finger flick action gave themselves away with shared wry smiles. All 30 something men. There is possibly something of a legacy in this they said, since the initial Pokémon game released 20 years ago is drawing players who were tweens then.

Since the app is online and open to all, children can play too. What this might mean for them in the offline world, is something the NSPCC picked up on here before the UK launch. Its focus  of concern is the physical safety of young players, citing the risk of in-game lures misuse. I am not sure how much of an increased risk this is compared with existing scenarios and if children will be increasingly unsupervised or not. It’s not a totally new concept. Players of all ages must be mindful of where they are playing**. Some stories of people getting together in the small hours of the night has generated some stories which for now are mostly fun. (Go Red Team.) Others are worried about hacking. And it raises all sorts of questions if private and public space is has become a Pokestop.

While the NSPCC includes considerations on the approach to privacy in a recent more general review of apps, it hasn’t yet mentioned the less obvious considerations of privacy and ethics in Pokémon GO. Encouraging anyone, but particularly children, out of their home or protected environments and into commercial settings with the explicit aim of targeting their spending. This is big business.

Privacy in Pokémon GO

I think we are yet to see a really transparent discussion of the broader privacy implications of the game because the combination of multiple privacy policies involved is less than transparent. They are long, they seem complete, but are they meaningful?

We can’t see how they interact.

Google has crowd sourced the collection of real time traffic data via mobile phones.  Geolocation data from google maps using GPS data, as well as network provider data seem necessary to display the street data to players. Apparently you can download and use the maps offline since Pokémon GO uses the Google Maps API. Google goes to “great lengths to make sure that imagery is useful, and reflects the world our users explore.” In building a Google virtual reality copy of the real world, how data are also collected and will be used about all of us who live in it,  is a little wooly to the public.

U.S. Senator Al Franken is apparently already asking Niantic these questions. He points out that Pokémon GO has indicated it shares de-identified and aggregate data with other third parties for a multitude of purposes but does not describe the purposes for which Pokémon GO would share or sell those data [c].

It’s widely recognised that anonymisation in many cases fails so passing only anonymised data may be reassuring but fail in reality. Stripping out what are considered individual personal identifiers in terms of data protection, can leave individuals with unique characteristics or people profiled as groups.

Opt out he feels is inadequate as a consent model for the personal and geolocational data that the app is collecting and passing to others in the U.S.

While the app provider would I’m sure argue that the UK privacy model respects the European opt in requirement, I would be surprised if many have read it. Privacy policies fail.

Poor practices must be challenged if we are to preserve the integrity of controlling the use of our data and knowledge about ourselves. Being aware of who we have ceded control of marketing to us, or influencing how we might be interacting with our environment, is at least a step towards not blindly giving up control of free choice.

The Pokémon GO permissions “for the purpose of performing services on our behalf“, “third party service providers to work with us to administer and provide the Services” and  “also use location information to improve and personalize our Services for you (or your authorized child)” are so broad as they could mean almost anything. They can also be changed without any notice period. It’s therefore pretty meaningless. But it’s the third parties’ connection, data collection in passing, that is completely hidden from players.

If we are ever to use privacy policies as meaningful tools to enable consent, then they must be transparent to show how a chain of permissions between companies connect their services.

Otherwise they are no more than get out of jail free cards for the companies that trade our data behind the scenes, if we were ever to claim for its misuse.  Data collectors must improve transparency.

Behavioural tracking and trust

Covert data collection and interaction is not conducive to user trust, whether through a failure to communicate by design or not.

By combining location data and behavioural data, measuring footfall is described as “the holy grail for retailers and landlords alike” and it is valuable.  “Pavement Opportunity” data may be sent anonymously, but if its analysis and storage provides ways to pitch to people, even if not knowing who they are individually, or to groups of people, it is discriminatory and potentially invisibly predatory. The pedestrian, or the player, Jo Public, is a commercial opportunity.

Pokémon GO has potential to connect the opportunity for profit makers with our pockets like never before. But they’re not alone.

Who else is getting our location data that we don’t sign up for sharing “in 81 towns and cities across Great Britain?

Whether footfall outside the shops or packaged as a game that gets us inside them, public interest researchers and commercial companies alike both risk losing our trust if we feel used as pieces in a game that we didn’t knowingly sign up to. It’s creepy.

For children the ethical implications are even greater.

There are obligations to meet higher legal and ethical standards when processing children’s data and presenting them marketing. Parental consent requirements fail children for a range of reasons.

So far, the UK has said it will implement the EU GDPR. Clear and affirmative consent is needed. Parental consent will be required for the processing of personal data of children under age 16. EU Member States may lower the age requiring parental consent to 13, so what that will mean for children here in the UK is unknown.

The ethics of product placement and marketing rules to children of all ages go out the window however, when the whole game or programme is one long animated advert. On children’s television and YouTube, content producers have turned brand product placement into programmes: My Little Pony, Barbie, Playmobil and many more.

Alice Webb, Director of BBC Children’s and BBC North,  looked at some of the challenges in this as the BBC considers how to deliver content for children whilst adapting to technological advances in this LSE blog and the publication of a new policy brief about families and ‘screen time’, by Alicia Blum-Ross and Sonia Livingstone.

So is this augmented reality any different from other platforms?

Yes because you can’t play the game without accepting the use of the maps and by default some sacrifice of your privacy settings.

Yes because the ethics and implications of of putting kids not simply in front of a screen that pitches products to them, but puts them physically into the place where they can consume products – if the McDonalds story is correct and a taster of what will follow – is huge.

Boundaries between platforms and people

Blum-Ross says, “To young people, the boundaries and distinctions that have traditionally been established between genres, platforms and devices mean nothing; ditto the reasoning behind the watershed system with its roots in decisions about suitability of content. “

She’s right. And if those boundaries and distinctions mean nothing to providers, then we must have that honest conversation with urgency. With our contrived consent, walking and running and driving without coercion, we are being packaged up and delivered right to the door of for-profit firms, paying for the game with our privacy. Smart cities are exploiting street sensors to do the same.

Freewill is at the very heart of who we are. “The ability to choose between different possible courses of action. It is closely linked to the concepts of responsibility, praise, guilt, sin, and other judgments which apply only to actions that are freely chosen.” Free choice of where we shop, what we buy and who we interact with is open to influence. Influence that is not entirely transparent presents opportunity for hidden manipulation, while the NSPCC might be worried about the risk of rare physical threat, the potential for the influencing of all children’s behaviour, both positive and negative, reaches everyone.

Some stories of how behaviour is affected, are heartbreakingly positive. And I met and chatted with complete strangers who shared the joy of something new and a mutual curiosity of the game. Pokémon GOis clearly a lot of fun. It’s also unclear on much more.

I would like to explicitly understand if Pokémon GO is gift packaging behavioural research by piggybacking on the Google platforms that underpin it, and providing linked data to Google or third parties.

Fishing for frequent Pokémon encourages players to ‘check in’ and keep that behaviour tracking live. 4pm caught a Krabby in the closet at work. 6pm another Krabby. Yup, still at work. 6.32pm Pidgey on the street outside ThatGreenCoffeeShop. Monday to Friday.

The Google privacy policies changed in the last year require ten clicks for opt out, and in part, the download of an add-on. Google has our contacts, calendar events, web searches, health data, has invested in our genetics, and all the ‘Things that make you “you”. They have our history, and are collecting our present. Machine intelligence work on prediction, is the future. For now, perhaps that will be pinging you with a ‘buy one get one free’ voucher at 6.20, or LCD adverts shifting as you drive back home.

Pokémon GO doesn’t have to include what data Google collects in its privacy policy. It’s in Google’s privacy policy. And who really read that when it came out months ago, or knows what it means in combination with new apps and games we connect it with today? Tracking and linking data on geolocation, behavioural patterns, footfall, whose other phones are close by,  who we contact, and potentially even our spend from Google wallet.

Have Google and friends of Niantic gotta know it all?

OkCupid and Google DeepMind: Happily ever after? Purposes and ethics in datasharing

This blog post is also available as an audio file on soundcloud.


What constitutes the public interest must be set in a universally fair and transparent ethics framework if the benefits of research are to be realised – whether in social science, health, education and more – that framework will provide a strategy to getting the pre-requisite success factors right, ensuring research in the public interest is not only fit for the future, but thrives. There has been a climate change in consent. We need to stop talking about barriers that prevent datasharing  and start talking about the boundaries within which we can.

What is the purpose for which I provide my personal data?

‘We use math to get you dates’, says OkCupid’s tagline.

That’s the purpose of the site. It’s the reason people log in and create a profile, enter their personal data and post it online for others who are looking for dates to see. The purpose, is to get a date.

When over 68K OkCupid users registered for the site to find dates, they didn’t sign up to have their identifiable data used and published in ‘a very large dataset’ and onwardly re-used by anyone with unregistered access. The users data were extracted “without the express prior consent of the user […].”

Are the registration consent purposes compatible with the purposes to which the researcher put the data should be a simple enough question.  Are the research purposes what the person signed up to, or would they be surprised to find out their data were used like this?

Questions the “OkCupid data snatcher”, now self-confessed ‘non-academic’ researcher, thought unimportant to consider.

But it appears in the last month, he has been in good company.

Google DeepMind, and the Royal Free, big players who do know how to handle data and consent well, paid too little attention to the very same question of purposes.

The boundaries of how the users of OkCupid had chosen to reveal information and to whom, have not been respected in this project.

Nor were these boundaries respected by the Royal Free London trust that gave out patient data for use by Google DeepMind with changing explanations, without clear purposes or permission.

The legal boundaries in these recent stories appear unclear or to have been ignored. The privacy boundaries deemed irrelevant. Regulatory oversight lacking.

The respectful ethical boundaries of consent to purposes, disregarding autonomy, have indisputably broken down, whether by commercial org, public body, or lone ‘researcher’.

Research purposes

The crux of data access decisions is purposes. What question is the research to address – what is the purpose for which the data will be used? The intent by Kirkegaard was to test:

“the relationship of cognitive ability to religious beliefs and political interest/participation…”

In this case the question appears intended rather a test of the data, not the data opened up to answer the test. While methodological studies matter, given the care and attention [or self-stated lack thereof] given to its extraction and any attempt to be representative and fair, it would appear this is not the point of this study either.

The data doesn’t include profiles identified as heterosexual male, because ‘the scraper was’. It is also unknown how many users hide their profiles, “so the 99.7% figure [identifying as binary male or female] should be cautiously interpreted.”

“Furthermore, due to the way we sampled the data from the site, it is not even representative of the users on the site, because users who answered more questions are overrepresented.” [sic]

The paper goes on to say photos were not gathered because they would have taken up a lot of storage space and could be done in a future scraping, and

“other data were not collected because we forgot to include them in the scraper.”

The data are knowingly of poor quality, inaccurate and incomplete. The project cannot be repeated as ‘the scraping tool no longer works’. There is an unclear ethical or peer review process, and the research purpose is at best unclear. We can certainly give someone the benefit of the doubt and say intent appears to have been entirely benevolent. It’s not clear what the intent was. I think it is clearly misplaced and foolish, but not malevolent.

The trouble is, it’s not enough to say, “don’t be evil.” These actions have consequences.

When the researcher asserts in his paper that, “the lack of data sharing probably slows down the progress of science immensely because other researchers would use the data if they could,”  in part he is right.

Google and the Royal Free have tried more eloquently to say the same thing. It’s not research, it’s direct care, in effect, ignore that people are no longer our patients and we’re using historical data without re-consent. We know what we’re doing, we’re the good guys.

However the principles are the same, whether it’s a lone project or global giant. And they’re both wildly wrong as well. More people must take this on board. It’s the reason the public interest needs the Dame Fiona Caldicott review published sooner rather than later.

Just because there is a boundary to data sharing in place, does not mean it is a barrier to be ignored or overcome. Like the registration step to the OkCupid site, consent and the right to opt out of medical research in England and Wales is there for a reason.

We’re desperate to build public trust in UK research right now. So to assert that the lack of data sharing probably slows down the progress of science is misplaced, when it is getting ‘sharing’ wrong, that caused the lack of trust in the first place and harms research.

A climate change in consent

There has been a climate change in public attitude to consent since care.data, clouded by the smoke and mirrors of state surveillance. It cannot be ignored.  The EUGDPR supports it. Researchers may not like change, but there needs to be an according adjustment in expectations and practice.

Without change, there will be no change. Public trust is low. As technology advances and if we continue to see commercial companies get this wrong, we will continue to see public trust falter unless broken things get fixed. Change is possible for the better. But it has to come from companies, institutions, and people within them.

Like climate change, you may deny it if you choose to. But some things are inevitable and unavoidably true.

There is strong support for public interest research but that is not to be taken for granted. Public bodies should defend research from being sunk by commercial misappropriation if they want to future-proof public interest research.

The purpose for which the people gave consent are the boundaries within which you have permission to use data, that gives you freedom within its limits, to use the data.  Purposes and consent are not barriers to be overcome.

If research is to win back public trust developing a future proofed, robust ethical framework for data science must be a priority today.

Commercial companies must overcome the low levels of public trust they have generated in the public to date if they ask ‘trust us because we’re not evil‘. If you can’t rule out the use of data for other purposes, it’s not helping. If you delay independent oversight it’s not helping.

This case study and indeed the Google DeepMind recent episode by contrast demonstrate the urgency with which working out what common expectations and oversight of applied ethics in research, who gets to decide what is ‘in the public interest’ and data science public engagement must be made a priority, in the UK and beyond.

Boundaries in the best interest of the subject and the user

Society needs research in the public interest. We need good decisions made on what will be funded and what will not be. What will influence public policy and where needs attention for change.

To do this ethically, we all need to agree what is fair use of personal data, when is it closed and when is it open, what is direct and what are secondary uses, and how advances in technology are used when they present both opportunities for benefit or risks to harm to individuals, to society and to research as a whole.

The potential benefits of research are potentially being compromised for the sake of arrogance, greed, or misjudgement, no matter intent. Those benefits cannot come at any cost, or disregard public concern, or the price will be trust in all research itself.

In discussing this with social science and medical researchers, I realise not everyone agrees. For some, using deidentified data in trusted third party settings poses such a low privacy risk, that they feel the public should have no say in whether their data are used in research as long it’s ‘in the public interest’.

For the DeepMind researchers and Royal Free, they were confident even using identifiable data, this is the “right” thing to do, without consent.

For the Cabinet Office datasharing consultation, the parts that will open up national registries, share identifiable data more widely and with commercial companies, they are convinced it is all the “right” thing to do, without consent.

How can researchers, society and government understand what is good ethics of data science, as technology permits ever more invasive or covert data mining and the current approach is desperately outdated?

Who decides where those boundaries lie?

“It’s research Jim, but not as we know it.” This is one aspect of data use that ethical reviewers will need to deal with, as we advance the debate on data science in the UK. Whether independents or commercial organisations. Google said their work was not research. Is‘OkCupid’ research?

If this research and data publication proves anything at all, and can offer lessons to learn from, it is perhaps these three things:

Who is accredited as a researcher or ‘prescribed person’ matters. If we are considering new datasharing legislation, and for example, who the UK government is granting access to millions of children’s personal data today. Your idea of a ‘prescribed person’ may not be the same as the rest of the public’s.

Researchers and ethics committees need to adjust to the climate change of public consent. Purposes must be respected in research particularly when sharing sensitive, identifiable data, and there should be no assumptions made that differ from the original purposes when users give consent.

Data ethics and laws are desperately behind data science technology. Governments, institutions, civil, and all society needs to reach a common vision and leadership how to manage these challenges. Who defines these boundaries that matter?

How do we move forward towards better use of data?

Our data and technology are taking on a life of their own, in space which is another frontier, and in time, as data gathered in the past might be used for quite different purposes today.

The public are being left behind in the game-changing decisions made by those who deem they know best about the world we want to live in. We need a say in what shape society wants that to take, particularly for our children as it is their future we are deciding now.

How about an ethical framework for datasharing that supports a transparent public interest, which tries to build a little kinder, less discriminating, more just world, where hope is stronger than fear?

Working with people, with consent, with public support and transparent oversight shouldn’t be too much to ask. Perhaps it is naive, but I believe that with an independent ethical driver behind good decision-making, we could get closer to datasharing like that.

That would bring Better use of data in government.

Purposes and consent are not barriers to be overcome. Within these, shaped by a strong ethical framework, good data sharing practices can tackle some of the real challenges that hinder ‘good use of data’: training, understanding data protection law, communications, accountability and intra-organisational trust. More data sharing alone won’t fix these structural weaknesses in current UK datasharing which are our really tough barriers to good practice.

How our public data will be used in the public interest will not be a destination or have a well defined happy ending, but it is a long term  process which needs to be consensual and there needs to be a clear path to setting out together and achieving collaborative solutions.

While we are all different, I believe that society shares for the most part, commonalities in what we accept as good, and fair, and what we believe is important. The family sitting next to me have just counted out their money and bought an ice cream to share, and the staff gave them two. The little girl is beaming. It seems that even when things are difficult, there is always hope things can be better. And there is always love.

Even if some might give it a bad name.

********

img credit: flickr/sofi01/ Beauty and The Beast  under creative commons

Ethics, standards and digital rights – time for a citizens’ charter

Central to future data sharing [1] plans is the principle of public interest, intended to be underpinned by transparency in all parts of the process, to be supported by an informed public.  Three principles that are also key in the plan for open policy.

The draft ethics proposals [2] start with user need (i.e. what government wants, researchers want, the users of the data) and public benefit.

With these principles in mind I wonder how compatible the plans are in practice, plans that will remove the citizen from some of the decision making about information sharing from the citizen; that is, you and me.

When talking about data sharing it is all too easy to forget we are talking about people, and in this case, 62 million individual people’s personal information, especially when users of data focus on how data are released or published. The public thinks in terms of personal data as info related to them. And the ICO says, privacy and an individual’s rights are engaged at the point of collection.

The trusted handling, use and re-use of population-wide personal data sharing and ID assurance are vital to innovation and digital strategy. So in order to make these data uses secure and trusted, fit for the 21st century, when will the bad bits of current government datasharing policy and practice [3] be replaced by good parts of ethical plans?

Current practice and Future Proofing Plans

How is policy being future proofed at a time of changes to regulation in the new EUDP which are being made in parallel? Changes that clarify consent and the individual, requiring clear affirmative action by the data subject. [4]  How do public bodies and departments plan to meet the current moral and legal obligation to ensure persons whose personal data are subject to transfer and processing between two public administrative bodies must be informed in advance?

How is public perception [5] being taken into account?

And how are digital identities to be protected when they are literally our passport to the world, and their integrity is vital to maintain, especially for our children in the world of big data [6] we cannot imagine today? How do we verify identity but not have to reveal the data behind it, if those data are to be used in ever more government transactions – done badly that could mean the citizen loses sight of who knows what information and who it has been re-shared with.

From the 6th January there are lots of open questions, no formal policy document or draft legislation to review. It appears to be far off being ready for public consultation, needing concrete input on practical aspects of what the change would mean in practice.

Changing the approach to the collection of citizens’ personal data and removing the need for consent to wide re-use and onward sharing, will open up a massive change to the data infrastructure of the country in terms of who is involved in administrative roles in the process and when. As a country to date we have not included data as part of our infrastructure. Some suggest we should. To change the construction of roads would require impact planning, mapping and thought out budget before beginning the project to assess its impact. An assessment this data infrastructure change appears to be missing entirely.

I’ve considered the plans in terms of case studies of policy and practice, transparency and trust, the issues of data quality and completeness and digital inclusion.

But I’m starting by sharing only my thoughts on ethics.

Ethics, standards and digital rights – time for a public charter

How do you want your own, or your children’s personal data handled?

This is not theoretical. Every one of us in the UK has our own confidential data used in a number of ways about which we are not aware today. Are you OK with that? With academic researchers? With GCHQ? [7] What about charities? Or Fleet Street press? All of these bodies have personal data from population wide datasets and that means all of us or all of our children, whether or not we are the subjects of research, subject to investigation, or just an ordinary citizen minding their own business.

On balance, where do you draw the line between your own individual rights and public good? What is fair use without consent and where would you be surprised and want to be informed?
I would like to hear more about how others feel about and weigh the risks and benefits trade off in this area.

Some organisations on debt have concern about digital exclusion. Others about compiling single view data in coercive relationships. Some organisations are campaigning for a digital bill of rights. I had some thoughts on this specifically for health data in the past.

A charter of digital standards and ethics could be enabling, not a barrier and should be a tool that must come to consultation before new legislation.

Discussing datasharing that will open up every public data set “across every public body” without first having defined a clear policy is a challenge. Without defining its ethical good practice first as a reference framework, it’s dancing in the dark. This draft plan is running in parallel but not part of the datasharing discussion.
Ethical practice and principles must be the foundation of data sharing plans, not an after thought.

Why? Because this stuff is hard. The kinds of research that use sensitive de-identified data are sometimes controversial and will become more challenging as the capabilities of what is possible increase with machine learning, genomics, and increased personalisation and targeting of marketing, and interventions.

The ADRN had spent months on its ethical framework and privacy impact assessment, before I joined the panel.

What does Ethics look like in sharing bulk datasets?

What do you think about the commercialisation of genomic data by the state – often from children whose parents are desperate for a diagnosis – to ‘kick start’ the UK genomics industry?  What do you think about data used in research on domestic violence and child protection? And in predictive policing?

Or research on religious affiliations and home schooling? Or abortion and births in teens matching school records to health data?

Will the results of the research encourage policy change or interventions with any group of people? Could these types of research have unintended consequences or be used in ways researchers did not foresee supporting not social benefit but a particular political or scientific objective? If so, how is that governed?

What research is done today, what is good practice, what is cautious and what would Joe Public expect? On domestic violence for example, public feedback said no.

And while there’s also a risk of not making the best use of data, there are also risks of releasing even anonymised data [8] in today’s world in which jigsawing together the pieces of poorly anonymised data means it is identifying. Profiling or pigeonholing individuals or areas was a concern raised in public engagement work.

The Bean Report used to draw out some of the reasoning behind needs for increased access to data: “Remove obstacles to the greater use of public sector administrative data for statistical purposes, including through changes to the associated legal framework, while ensuring appropriate ethical safeguards are in place and privacy is protected.”

The Report doesn’t outline how the appropriate ethical safeguards are in place and privacy is protected. Or what ethical looks like.

In the Public interest is not clear cut.

The boundary between public and private interest shift in time as well as culture. While in the UK the law today says we all have the right to be treated as equals, regardless of our gender, identity or sexuality it has not always been so.

By putting the rights of the individual on a lower par than the public interest in this change, we risk jeopardising having any data at all to use. But data will be central to the digital future strategy we are told the government wants to “show the rest of the world how it’s done.”

If they’re serious, if all our future citizens must have a digital identity to use with government with any integrity, then the use of not only our current adult, but our children’s data really matters – and current practices must change.  Here’s a case study why:

Pupil data: The Poster Child of Datasharing Bad Practice

Right now, the National Pupil database containing our 8 million or more children’s personal data in England is unfortunately the poster child of what a change in legislation and policy around data sharing, can mean in practice.  Bad practice.

The “identity of a pupil will not be discovered using anonymised data in isolation”, says the User Guide, but when they give away named data, and identifiable data in all but 11 requests since 2012, it’s not anonymised. Anything but the ‘anonymised data’ of publicly announced plans presented in 2011, yet precisely what the change in law to broaden the range of users in the Prescribed Persons Act 2009 permitted , and the expansion of purposes in the amended Education (Individual Pupil Information)(Prescribed Persons) Regulations introduced in June 2013.  It was opened up to:

“(d)persons who, for the purpose of promoting the education or well-being of children in England are—

(i)conducting research or analysis,

(ii)producing statistics, or

(iii)providing information, advice or guidance,

and who require individual pupil information for that purpose(5);”.

The law was changed so that, individual pupil level data, and pupil names are extracted, stored and have also been released at national level. Raw data sent to commercial third parties, charities and press in identifiable individual level and often sensitive data items.

This is a world away from safe setting, statistical analysis of de-identified data by accredited researchers, in the public interest.

Now our children’s confidential data sit on servers on Fleet Street – is this the model for all our personal administrative data in future?

If not, how do we ensure it is not? How will the new all-datasets’ datasharing legislation permit wider sharing with more people than currently have access and not end up with all our identifiable data sent ‘into the wild’ without audit as our pupil data are today?

Consultation, transparency, oversight and public involvement in ongoing data decision making are key, and  well written legislation.

The public interest alone, is not a strong enough description to keep data safe. This same government brought in this National Pupil Database policy thinking it too was ‘in the public interest’ after all.

We need a charter of ethics and digital rights that focuses on the person, not exclusively the public interest use of data.

They are not mutually exclusive, but enhance one another.

Getting ethics in the right place

These ethical principles start in the wrong place. To me, this is not an ethical framework, it’s a ‘how-to-do-data-sharing’ guideline and try to avoid repeating care.data. Ethics is not first about the public interest, or economic good, or government interest. Instead, referencing an ethics council view, you start with the person.

“The terms of any data initiative must take into account both private and public interests. Enabling those with relevant interests to have a say in how their data are used and telling them how they are, in fact, used is a way in which data initiatives can demonstrate respect for persons.”

Professor Michael Parker, Member of the Nuffield Council on Bioethics Working Party and Professor of Bioethics and Director of the Ethox Centre, University of Oxford:

“Compliance with the law is not enough to guarantee that a particular use of data is morally acceptable – clearly not everything that can be done should be done. Whilst there can be no one-size-fits-all solution, people should have say in how their data are used, by whom and for what purposes, so that the terms of any project respect the preferences and expectations of all involved.”

The  partnership between members of the public and public administration must be consensual to continue to enjoy support. [10]. If personal data are used for research or other uses, in the public interest, without explicit consent, it should be understood as a privilege by those using the data, not a right.

As such, we need to see data as about the person, as they see it themselves, and data at the point of collection as information about individual people, not just think of statistics. Personal data are sensitive, and some research uses highly sensitive,  and data used badly can do harm. Designing new patterns of datasharing must think of the private, as well as public interest,  co-operating for the public good.

And we need a strong ethical framework to shape that in.

******

[1] http://datasharing.org.uk/2016/01/13/data-sharing-workshop-i-6-january-2016-meeting-note/

[2] Draft data science ethical framework: https://data.blog.gov.uk/wp-content/uploads/sites/164/2015/12/Data-science-ethics-short-for-blog-1.pdf

[3] defenddigitalme campaign to get pupil data in England made safe http://defenddigitalme.com/

[4] On the European Data Protection regulations: https://www.privacyandsecuritymatters.com/2015/12/the-general-data-protection-regulation-in-bullet-points/

[5] Public engagament work – ADRN/ESRC/ Ipsos MORI 2014 https://adrn.ac.uk/media/1245/sri-dialogue-on-data-2014.pdf

[6] Written evidence submitted to the parliamentary committee on big data: http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/science-and-technology-committee/big-data-dilemma/written/25380.pdf

[7] http://www.bbc.co.uk/news/uk-politics-35300671 Theresa May affirmed bulk datasets use at the IP Bill committee hearing and did not deny use of bulk personal datasets, including medical records

[8] http://www.economist.com/news/science-and-technology/21660966-can-big-databases-be-kept-both-anonymous-and-useful-well-see-you-anon

[9] Nuffield Council on Bioethics http://nuffieldbioethics.org/report/collection-linking-use-data-biomedical-research-health-care/ethical-governance-of-data-initiatives/

[10] Royal Statistical Society –  the data trust deficit https://www.ipsos-mori.com/researchpublications/researcharchive/3422/New-research-finds-data-trust-deficit-with-lessons-for-policymakers.aspx

Background: Why datasharing matters to me:

When I joined the data sharing discussions that have been running for almost 2 years only very recently, it was wearing two hats, both in a personal capacity.

The first was with interest in how any public policy and legislation may be changing and will affect deidentified datasharing for academic research, as I am one of two lay people, offering public voice on the ADRN approvals panel.

Its aim is to makes sure the process of granting access to the use of sensitive, linked administrative data from population-wide datasets is fair, equitable and transparent, for de-identified use by trusted researchers, for non-commercial use, under strict controls and in safe settings. Once a research project is complete, the data are securely destroyed. It’s not doing work that “a government department or agency would carry out as part of its normal operations.”

Wearing my second hat, I am interested to see how new policy and practice plan to affect current practice. I coordinate the campaign efforts with the Department for Education to stop giving away the identifiable, confidential and sensitive personal data of our 8m children in England to commercial third parties and press from the National Pupil Database.