Category Archives: engagement

Crouching Tiger Hidden Dragon: the making of an IoT trust mark

The Internet of Things (IoT) brings with it unique privacy and security concerns associated with smart technology and its use of data.

  • What would it mean for you to trust an Internet connected product or service and why would you not?
  • What has damaged consumer trust in products and services and why do sellers care?
  • What do we want to see different from today, and what is necessary to bring about that change?

These three pairs of questions implicitly underpinned the intense day of  discussion at the London Zoo last Friday.

The questions went unasked, and could have been voiced before we started, although were probably assumed to be self-evident:

  1. Why do you want one at all [define the problem]?
  2. What needs to change and why [define the future model]?
  3. How do you deliver that and for whom [set out the solution]?

If a group does not agree on the need and drivers for change, there will be no consensus on what that should look like, what the gap is to achieve it, and even less on making it happen.

So who do you want the trustmark to be for, why will anyone want it, and what will need to change to deliver the aims? No one wants a trustmark per se. Perhaps you want what values or promises it embodies to  demonstrate what you stand for, promote good practice, and generate consumer trust. To generate trust, you must be seen to be trustworthy. Will the principles deliver on those goals?

The Open IoT Certification Mark Principles, as a rough draft was the outcome of the day, and are available online.

Here’s my reflections, including what was missing on privacy, and the potential for it to be considered in future.

I’ve structured this first, assuming readers attended the event, at ca 1,000 words. Lists and bullet points. The background comes after that, for anyone interested to read a longer piece.

Many thanks upfront, to fellow participants, to the organisers Alexandra D-S and Usman Haque and the colleague who hosted at the London Zoo. And Usman’s Mum.  I hope there will be more constructive work to follow, and that there is space for civil society to play a supporting role and critical friend.


The mark didn’t aim to fix the IoT in a day, but deliver something better for product and service users, by those IoT companies and providers who want to sign up. Here is what I took away.

I learned three things

  1. A sense of privacy is not homogenous, even within people who like and care about privacy in theoretical and applied ways. (I very much look forward to reading suggestions promised by fellow participants, even if enforced personal openness and ‘watching the watchers’ may mean ‘privacy is theft‘.)
  2. Awareness of current data protection regulations needs improved in the field. For example, Subject Access Requests already apply to all data controllers, public and private. Few have read the GDPR, or the e-Privacy directive, despite importance for security measures in personal devices, relevant for IoT.
  3. I truly love working on this stuff, with people who care.

And it reaffirmed things I already knew

  1. Change is hard, no matter in what field.
  2. People working together towards a common goal is brilliant.
  3. Group collaboration can create some brilliantly sharp ideas. Group compromise can blunt them.
  4. Some men are particularly bad at talking over each other, never mind over the women in the conversation. Women notice more. (Note to self: When discussion is passionate, it’s hard to hold back in my own enthusiasm and not do the same myself. To fix.)
  5. The IoT context, and risks within it are not homogenous, but brings new risks and adverseries. The risks for manufacturers and consumers and the rest of the public are different, and cannot be easily solved with a one-size-fits-all solution. But we can try.

Concerns I came away with

  1. If the citizen / customer / individual is to benefit from the IoT trustmark, they must be put first, ahead of companies’ wants.
  2. If the IoT group controls both the design, assessment to adherence and the definition of success, how objective will it be?
  3. The group was not sufficiently diverse and as a result, reflects too little on the risks and impact of the lack of diversity in design and effect, and the implications of dataveillance .
  4. Critical minority thoughts although welcomed, were stripped out from crowdsourced first draft principles in compromise.
  5. More future thinking should be built-in to be robust over time.

IoT adversaries: via Twitter, unknown source

What was missing

There was too little discussion of privacy in perhaps the most important context of IoT – inter connectivity and new adversaries. It’s not only about *your* thing, but things that it speaks to, interacts with, of friends, passersby, the cityscape , and other individual and state actors interested in offense and defense. While we started to discuss it, we did not have the opportunity to discuss sufficiently at depth to be able to get any thinking into applying solutions in the principles.

One of the greatest risks that users face is the ubiquitous collection and storage of data about users that reveal detailed, inter-connected patterns of behaviour and our identity and not seeing how that is used by companies behind the scenes.

What we also missed discussing is not what we see as necessary today, but what we can foresee as necessary for the short term future, brainstorming and crowdsourcing horizon scanning for market needs and changing stakeholder wants.

Future thinking

Here’s the areas of future thinking that smart thinking on the IoT mark could consider.

  1. We are moving towards ever greater requirements to declare identity to use a product or service, to register and log in to use anything at all. How will that change trust in IoT devices?
  2. Single identity sign-on is becoming ever more imposed, and any attempts for multiple presentation of who I am by choice, and dependent on context, therefore restricted. [not all users want to use the same social media credentials for online shopping, with their child’s school app, and their weekend entertainment]
  3. Is this imposition what the public wants or what companies sell us as what customers want in the name of convenience? What I believe the public would really want is the choice to do neither.
  4. There is increasingly no private space or time, at places of work.
  5. Limitations on private space are encroaching in secret in all public city spaces. How will ‘handoffs’ affect privacy in the IoT?
  6. Public sector (connected) services are likely to need even more exacting standards than single home services.
  7. There is too little understanding of the social effects of this connectedness and knowledge created, embedded in design.
  8. What effects may there be on the perception of the IoT as a whole, if predictive data analysis and complex machine learning and AI hidden in black boxes becomes more commonplace and not every company wants to be or can be open-by-design?
  9. Ubiquitous collection and storage of data about users that reveal detailed, inter-connected patterns of behaviour and our identity needs greater commitments to disclosure. Where the hand-offs are to other devices, and whatever else is in the surrounding ecosystem, who has responsibility for communicating interaction through privacy notices, or defining legitimate interests, where the data joined up may be much more revealing than stand-alone data in each silo?
  10. Define with greater clarity the privacy threat models for different groups of stakeholders and address the principles for each.

What would better look like?

The draft privacy principles are a start, but they’re not yet aspirational as I would have hoped. Of course the principles will only be adopted if possible, practical and by those who choose to. But where is the differentiator from what everyone is required to do, and better than the bare minimum? How will you sell this to consumers as new? How would you like your child to be treated?

The wording in these 5 bullet points, is the first crowdsourced starting point.

  • The supplier of this product or service MUST be General Data Protection Regulation (GDPR) compliant.
  • This product SHALL NOT disclose data to third parties without my knowledge.
  • I SHOULD get full access to all the data collected about me.
  • I MAY operate this device without connecting to the internet.
  • My data SHALL NOT be used for profiling, marketing or advertising without transparent disclosure.

Yes other points that came under security address some of the crossover between privacy and surveillance risks, but there is as yet little substantial that is aspirational to make the IoT mark a real differentiator in terms of privacy. An opportunity remains.

It was that and how young people perceive privacy that I hoped to bring to the table. Because if manufacturers are serious about future success, they cannot ignore today’s children and how they feel. How you treat them today, will shape future purchasers and their purchasing, and there is evidence you are getting it wrong.

The timing is good in that it now also offers the opportunity to promote consistent understanding, and embed the language of GDPR and ePrivacy regulations into consistent and compatible language in policy and practice in the #IoTmark principles.

User rights I would like to see considered

These are some of the points I would think privacy by design would mean. This would better articulate GDPR Article 25 to consumers.

Data sovereignty is a good concept and I believe should be considered for inclusion in explanatory blurb before any agreed privacy principles.

  1. Goods should by ‘dumb* by default’ until the smart functionality is switched on. [*As our group chair/scribe called it]  I would describe this as, “off is the default setting out-of-the-box”.
  2. Privact by design. Deniability by default. i.e. not only after opt out, but a company should not access the personal or identifying purchase data of anyone who opts out of data collection about their product/service use during the set up process.
  3. The right to opt out of data collection at a later date while continuing to use services.
  4. A right to object to the sale or transfer of behavioural data, including to third-party ad networks and absolute opt-in on company transfer of ownership.
  5. A requirement that advertising should be targeted to content, [user bought fridge A] not through jigsaw data held on users by the company [how user uses fridge A, B, C and related behaviour].
  6. An absolute rejection of using children’s personal data gathered to target advertising and marketing at children

Background: Starting points before privacy

After a brief recap on 5 years ago, we heard two talks.

The first was a presentation from Bosch. They used the insights from the IoT open definition from 5 years ago in their IoT thinking and embedded it in their brand book. The presenter suggested that in five years time, every fridge Bosch sells will be ‘smart’. And the  second was a fascinating presentation, of both EU thinking and the intellectual nudge to think beyond the practical and think what kind of society we want to see using the IoT in future. Hints of hardcore ethics and philosophy that made my brain fizz from , soon to retire from the European Commission.

The principles of open sourcing, manufacturing, and sustainable life cycle were debated in the afternoon with intense arguments and clearly knowledgeable participants, including those who were quiet.  But while the group had assigned security, and started work on it weeks before, there was no one pre-assigned to privacy. For me, that said something. If they are serious about those who earn the trustmark being better for customers than their competition, then there needs to be greater emphasis on thinking like their customers, and by their customers, and what use the mark will be to customers, not companies. Plan early public engagement and testing into the design of this IoT mark, and make that testing open and diverse.

To that end, I believe it needed to be articulated more strongly, that sustainable public trust is the primary goal of the principles.

  • Trust that my device will not become unusable or worthless through updates or lack of them.
  • Trust that my device is manufactured safely and ethically and with thought given to end of life and the environment.
  • Trust that my source components are of high standards.
  • Trust in what data and how that data is gathered and used by the manufacturers.

Fundamental to ‘smart’ devices is their connection to the Internet, and so the last for me, is therefore key to successful public perception and it actually making a difference, beyond the PR value to companies. The value-add must be measured from consumers point of view.

All the openness about design functions and practice improvements, without attempting to change privacy infringing practices, may be wasted effort. Why? Because the perceived benefit of the value of the mark, will be proportionate to what risks it is seen to mitigate.

Why?

Because I assume that you know where your source components come from today. I was shocked to find out not all do and that ‘one degree removed’ is going to be an improvement? Holy cow, I thought. What about regulatory requirements for product safety recalls? These differ of course for different product areas, but I was still surprised. Having worked in global Fast Moving Consumer Goods (FMCG) and food industry, semiconductor and optoelectronics, and medical devices it was self-evident for me, that sourcing is rigorous. So that new requirement to know one degree removed, was a suggested minimum. But it might shock consumers to know there is not usually more by default.

Customers also believe they have reasonable expectations of not being screwed by a product update, left with something that does not work because of its computing based components. The public can take vocal, reputation-damaging action when they are let down.

In the last year alone, some of the more notable press stories include a manufacturer denying service, telling customers, “Your unit will be denied server connection,” after a critical product review. Customer support at Jawbone came in for criticism after reported failings. And even Apple has had problems in rolling out major updates.

While these are visible, the full extent of the overreach of company market and product surveillance into our whole lives, not just our living rooms, is yet to become understood by the general population. What will happen when it is?

The Internet of Things is exacerbating the power imbalance between consumers and companies, between government and citizens. As Wendy Grossman wrote recently, in one sense this may make privacy advocates’ jobs easier. It was always hard to explain why “privacy” mattered. Power, people understand.

That public discussion is long overdue. If open principles on IoT devices mean that the signed-up companies differentiate themselves by becoming market leaders in transparency, it will be a great thing. Companies need to offer full disclosure of data use in any privacy notices in clear, plain language  under GDPR anyway, but to go beyond that, and offer customers fair presentation of both risks and customer benefits, will not only be a point-of-sales benefit, but potentially improve digital literacy in customers too.

The morning discussion touched quite often on pay-for-privacy models. While product makers may see this as offering a good thing, I strove to bring discussion back to first principles.

Privacy is a human right. There can be no ethical model of discrimination based on any non-consensual invasion of privacy. Privacy is not something I should pay to have. You should not design products that reduce my rights. GDPR requires privacy-by-design and data protection by default. Now is that chance for IoT manufacturers to lead that shift towards higher standards.

We also need a new ethics thinking on acceptable fair use. It won’t change overnight, and perfect may be the enemy of better. But it’s not a battle that companies should think consumers have lost. Human rights and information security should not be on the battlefield at all in the war to win customer loyalty.  Now is the time to do better, to be better, demand better for us and in particular, for our children.

Privacy will be a genuine market differentiator

If manufacturers do not want to change their approach to exploiting customer data, they are unlikely to be seen to have changed.

Today feelings that people in US and Europe reflect in surveys are loss of empowerment, feeling helpless, and feeling used. That will shift to shock, resentment, and any change curve will predict, anger.

A 2014 survey for the Royal Statistical Society by Ipsos MORI, found that trust in institutions to use data is much lower than trust in them in general.

“The poll of just over two thousand British adults carried out by Ipsos MORI found that the media, internet services such as social media and search engines and telecommunication companies were the least trusted to use personal data appropriately.” [2014, Data trust deficit with lessons for policymakers, Royal Statistical Society]

In the British student population, one 2015 survey of university applicants in England, found of 37,000 who responded, the vast majority of UCAS applicants agree that sharing personal data can benefit them and support public benefit research into university admissions, but they want to stay firmly in control. 90% of respondents said they wanted to be asked for their consent before their personal data is provided outside of the admissions service.

In 2010, a multi method model of research with young people aged 14-18, by the Royal Society of Engineering, found that, “despite their openness to social networking, the Facebook generation have real concerns about the privacy of their medical records.” [2010, Privacy and Prejudice, RAE, Wellcome]

When people use privacy settings on Facebook set to maximum, they believe they get privacy, and understand little of what that means behind the scenes.

Are there tools designed by others, like Projects by If licenses, and ways this can be done, that you’re not even considering yet?

What if you don’t do it?

“But do you feel like you have privacy today?” I was asked the question in the afternoon. How do people feel today, and does it matter? Companies exploiting consumer data and getting caught doing things the public don’t expect with their data, has repeatedly damaged consumer trust. Data breaches and lack of information security have damaged consumer trust. Both cause reputational harm. Damage to reputation can harm customer loyalty. Damage to customer loyalty costs sales, profit and upsets the Board.

Where overreach into our living rooms has raised awareness of invasive data collection, we are yet to be able to see and understand the invasion of privacy into our thinking and nudge behaviour, into our perception of the world on social media, the effects on decision making that data analytics is enabling as data shows companies ‘how we think’, granting companies access to human minds in the abstract, even before Facebook is there in the flesh.

Governments want to see how we think too, and is thought crime really that far away using database labels of ‘domestic extremists’ for activists and anti-fracking campaigners, or the growing weight of policy makers attention given to predpol, predictive analytics, the [formerly] Cabinet Office Nudge Unit, Google DeepMind et al?

Had the internet remained decentralized the debate may be different.

I am starting to think of the IoT not as the Internet of Things, but as the Internet of Tracking. If some have their way, it will be the Internet of Thinking.

Considering our centralised Internet of Things model, our personal data from human interactions has become the network infrastructure, and data flows, are controlled by others. Our brains are the new data servers.

In the Internet of Tracking, people become the end nodes, not things.

And it is this where the future users will be so important. Do you understand and plan for factors that will drive push back, and crash of consumer confidence in your products, and take it seriously?

Companies have a choice to act as Empires would – multinationals, joining up even on low levels, disempowering individuals and sucking knowledge and power at the centre. Or they can act as Nation states ensuring citizens keep their sovereignty and control over a selected sense of self.

Look at Brexit. Look at the GE2017. Tell me, what do you see is the direction of travel? Companies can fight it, but will not defeat how people feel. No matter how much they hope ‘nudge’ and predictive analytics might give them this power, the people can take back control.

What might this desire to take-back-control mean for future consumer models? The afternoon discussion whilst intense, reached fairly simplistic concluding statements on privacy. We could have done with at least another hour.

Some in the group were frustrated “we seem to be going backwards” in current approaches to privacy and with GDPR.

But if the current legislation is reactive because companies have misbehaved, how will that be rectified for future? The challenge in the IoT both in terms of security and privacy, AND in terms of public perception and reputation management, is that you are dependent on the behaviours of the network, and those around you. Good and bad. And bad practices by one, can endanger others, in all senses.

If you believe that is going back to reclaim a growing sense of citizens’ rights, rather than accepting companies have the outsourced power to control the rights of others, that may be true.

There was a first principle asked whether any element on privacy was needed at all, if the text was simply to state, that the supplier of this product or service must be General Data Protection Regulation (GDPR) compliant. The GDPR was years in the making after all. Does it matter more in the IoT and in what ways? The room tended, understandably, to talk about it from the company perspective.  “We can’t” “won’t” “that would stop us from XYZ.” Privacy would however be better addressed from the personal point of view.

What do people want?

From the company point of view, the language is different and holds clues. Openness, control, and user choice and pay for privacy are not the same thing as the basic human right to be left alone. Afternoon discussion reminded me of the 2014 WAPO article, discussing Mark Zuckerberg’s theory of privacy and a Palo Alto meeting at Facebook:

“Not one person ever uttered the word “privacy” in their responses to us. Instead, they talked about “user control” or “user options” or promoted the “openness of the platform.” It was as if a memo had been circulated that morning instructing them never to use the word “privacy.””

In the afternoon working group on privacy, there was robust discussion whether we had consensus on what privacy even means. Words like autonomy, control, and choice came up a lot. But it was only a beginning. There is opportunity for better. An academic voice raised the concept of sovereignty with which I agreed, but how and where  to fit it into wording, which is at once both minimal and applied, and under a scribe who appeared frustrated and wanted a completely different approach from what he heard across the group, meant it was left out.

This group do care about privacy. But I wasn’t convinced that the room cared in the way that the public as a whole does, but rather only as consumers and customers do. But IoT products will affect potentially everyone, even those who do not buy your stuff. Everyone in that room, agreed on one thing. The status quo is not good enough. What we did not agree on, was why, and what was the minimum change needed to make a enough of a difference that matters.

I share the deep concerns of many child rights academics who see the harm that efforts to avoid restrictions Article 8 the GDPR will impose. It is likely to be damaging for children’s right to access information, be discriminatory according to parents’ prejudices or socio-economic status, and ‘cheating’ – requiring secrecy rather than privacy, in attempts to hide or work round the stringent system.

In ‘The Class’ the research showed, ” teachers and young people have a lot invested in keeping their spheres of interest and identity separate, under their autonomous control, and away from the scrutiny of each other.” [2016, Livingstone and Sefton-Green, p235]

Employers require staff use devices with single sign including web and activity tracking and monitoring software. Employee personal data and employment data are blended. Who owns that data, what rights will employees have to refuse what they see as excessive, and is it manageable given the power imbalance between employer and employee?

What is this doing in the classroom and boardroom for stress, anxiety, performance and system and social avoidance strategies?

A desire for convenience creates shortcuts, and these are often met using systems that require a sign-on through the platforms giants: Google, Facebook, Twitter, et al. But we are kept in the dark how by using these platforms, that gives access to them, and the companies, to see how our online and offline activity is all joined up.

Any illusion of privacy we maintain, we discussed, is not choice or control if based on ignorance, and backlash against companies lack of efforts to ensure disclosure and understanding is growing.

“The lack of accountability isn’t just troubling from a philosophical perspective. It’s dangerous in a political climate where people are pushing back at the very idea of globalization. There’s no industry more globalized than tech, and no industry more vulnerable to a potential backlash.”

[Maciej Ceglowski, Notes from an Emergency, talk at re.publica]

Why do users need you to know about them?

If your connected *thing* requires registration, why does it? How about a commitment to not forcing one of these registration methods or indeed any at all? Social Media Research by Pew Research in 2016 found that  56% of smartphone owners ages 18 to 29 use auto-delete apps, more than four times the share among those 30-49 (13%) and six times the share among those 50 or older (9%).

Does that tell us anything about the demographics of data retention preferences?

In 2012, they suggested social media has changed the public discussion about managing “privacy” online. When asked, people say that privacy is important to them; when observed, people’s actions seem to suggest otherwise.

Does that tell us anything about how well companies communicate to consumers how their data is used and what rights they have?

There is also data with strong indications about how women act to protect their privacy more but when it comes to basic privacy settings, users of all ages are equally likely to choose a private, semi-private or public setting for their profile. There are no significant variations across age groups in the US sample.

Now think about why that matters for the IoT? I wonder who makes the bulk of purchasing decsions about household white goods for example and has Bosch factored that into their smart-fridges-only decision?

Do you *need* to know who the user is? Can the smart user choose to stay anonymous at all?

The day’s morning challenge was to attend more than one interesting discussion happening at the same time. As invariably happens, the session notes and quotes are always out of context and can’t possibly capture everything, no matter how amazing the volunteer (with thanks!). But here are some of the discussion points from the session on the body and health devices, the home, and privacy. It also included a discussion on racial discrimination, algorithmic bias, and the reasons why care.data failed patients and failed as a programme. We had lengthy discussion on ethics and privacy: smart meters, objections to models of price discrimination, and why pay-for-privacy harms the poor by design.

Smart meter data can track the use of unique appliances inside a person’s home and intimate patterns of behaviour. Information about our consumption of power, what and when every day, reveals  personal details about everyday lives, our interactions with others, and personal habits.

Why should company convenience come above the consumer’s? Why should government powers, trump personal rights?

Smart meter is among the knowledge that government is exploiting, without consent, to discover a whole range of issues, including ensuring that “Troubled Families are identified”. Knowing how dodgy some of the school behaviour data might be, that helps define who is “troubled” there is a real question here, is this sound data science? How are errors identified? What about privacy? It’s not your policy, but if it is your product, what are your responsibilities?

If companies do not respect children’s rights,  you’d better shape up to be GDPR compliant

For children and young people, more vulnerable to nudge, and while developing their sense of self can involve forming, and questioning their identity, these influences need oversight or be avoided.

In terms of GDPR, providers are going to pay particular attention to Article 8 ‘information society services’ and parental consent, Article 17 on profiling,  and rights to restriction of processing (19) right to erasure in recital 65 and rights to portability. (20) However, they  may need to simply reassess their exploitation of children and young people’s personal data and behavioural data. Article 57 requires special attention to be paid by regulators to activities specifically targeted at children, as ‘vulnerable natural persons’ of recital 75.

Human Rights, regulations and conventions overlap in similar principles that demand respect for a child, and right to be let alone:

(a) The development of the child ‘s personality, talents and mental and physical abilities to their fullest potential;

(b) The development of respect for human rights and fundamental freedoms, and for the principles enshrined in the Charter of the United Nations.

A weakness of the GDPR is that it allows derogation on age and will create inequality and inconsistency  for children as a result. By comparison Article one of the Convention on the Rights of the Child (CRC) defines who is to be considered a “child” for the purposes of the CRC, and states that: “For the purposes of the present Convention, a child means every human being below the age of eighteen years unless, under the law applicable to the child, majority is attained earlier.”<

Article two of the CRC says that States Parties shall respect and ensure the rights set forth in the present Convention to each child within their jurisdiction without discrimination of any kind.

CRC Article 16 says that no child shall be subjected to arbitrary or unlawful interference with his or her honour and reputation.

Article 8 CRC requires respect for the right of the child to preserve his or her identity […] without unlawful interference.

Article 12 CRC demands States Parties shall assure to the child who is capable of forming his or her own views the right to express those views freely in all matters affecting the child, the views of the child being given due weight in accordance with the age and maturity of the child.

That stands in potential conflict with GDPR article 8. There is much on GDPR on derogations by country, and or children, still to be set.

What next for our data in the wild

Hosting the event at the zoo offered added animals, and during a lunch tour we got out on a tour, kindly hosted by a fellow participant. We learned how smart technology was embedded in some of the animal enclosures, and work on temperature sensors with penguins for example. I love tigers, so it was a bonus that we got to see such beautiful and powerful animals up close, if a little sad for their circumstances and as a general basic principle, seeing big animals caged as opposed to in-the-wild.

Freedom is a common desire in all animals. Physical, mental, and freedom from control by others.

I think any manufacturer that underestimates this element of human instinct is ignoring the ‘hidden dragon’ that some think is a myth.  Privacy is not dead. It is not extinct, or even unlike the beautiful tigers, endangered. Privacy in the IoT at its most basic, is the right to control our purchasing power. The ultimate people power waiting to be sprung. Truly a crouching tiger. People object to being used and if companies continue to do so without full disclosure, they do so at their peril. Companies seem all-powerful in the battle for privacy, but they are not.  Even insurers and data brokers must be fair and lawful, and it is for regulators to ensure that practices meet the law.

When consumers realise our data, our purchasing power has the potential to control, not be controlled, that balance will shift.

“Paper tigers” are superficially powerful but are prone to overextension that leads to sudden collapse. If that happens to the superficially powerful companies that choose unethical and bad practice, as a result of better data privacy and data ethics, then bring it on.

I hope that the IoT mark can champion best practices and make a difference to benefit everyone.

While the companies involved in its design may be interested in consumers, I believe it could be better for everyone, done well. The great thing about the efforts into an #IoTmark is that it is a collective effort to improve the whole ecosystem.

I hope more companies will realise their privacy rights and ethical responsibility in the world to all people, including those interested in just being, those who want to be let alone, and not just those buying.

“If a cat is called a tiger it can easily be dismissed as a paper tiger; the question remains however why one was so scared of the cat in the first place.”

Further reading: Networks of Control – A Report on Corporate Surveillance, Digital Tracking, Big Data & Privacy by Wolfie Christl and Sarah Spiekermann

A vanquished ghost returns as details of distress required in NHS opt out

It seems the ugly ghosts of care.data past were alive and well at NHS Digital this Christmas.

Old style thinking, the top-down patriarchal ‘no one who uses a public service should be allowed to opt out of sharing their records. Nor can people rely on their record being anonymised,‘ that you thought was vanquished, has returned with a vengeance.

The Secretary of State for Health, Jeremy Hunt, has reportedly  done a U-turn on opt out of the transfer of our medical records to third parties without consent.

That backtracks on what he said in Parliament on January 25th, 2014 on opt out of anonymous data transfers, despite the right to object in the NHS constitution [1].

So what’s the solution? If the new opt out methods aren’t working, then back to the old ones and making Section 10 requests? But it seems the Information Centre isn’t keen on making that work either.

All the data the HSCIC holds is sensitive and as such, its release risks patients’ significant harm or distress [2] so it shouldn’t be difficult to tell them to cease and desist, when it comes to data about you.

But how is NHS Digital responding to people who make the effort to write directly?

Someone who “got a very unhelpful reply” is being made to jump through hoops.

If anyone asks that their hospital data should not be used in any format and passed to third parties, that’s surely for them to decide.

Let’s take the case study of a woman who spoke to me during the whole care.data debacle who had been let down by the records system after rape. Her NHS records subsequently about her mental health care were inaccurate, and had led to her being denied the benefit of private health insurance at a new job.

Would she have to detail why selling her medical records would cause her distress? What level of detail is fair and who decides? The whole point is, you want to keep info confidential.

Should you have to state what you fear? “I have future distress, what you might do to me?” Once you lose control of data, it’s gone. Based on past planning secrecy and ideas for the future, like mashing up health data with retail loyalty cards as suggested at Strata in November 2013 [from 16:00] [2] no wonder people are sceptical. 

Given the long list of commercial companies,  charities, think tanks and others that passing out our sensitive data puts at risk and given the Information Centre’s past record, HSCIC might be grateful they have only opt out requests to deal with, and not millions of medical ethics court summonses. So far.

HSCIC / NHS Digital has extracted our identifiable records and has given them away, including for commercial product use, and continues give them away, without informing us. We’ve accepted Ministers’ statements and that a solution would be found. Two years on, patience wears thin.

“Without that external trust, we risk losing our public mandate and then cannot offer the vital insights that quality healthcare requires.”

— Sir Nick Partridge on publication of the audit report of 10% of 3,059 releases by the HSCIC between 2005-13

— Andy WIlliams said, “We want people to be certain their choices will be followed.”

Jeremy Hunt said everyone should be able to opt out of having their anonymised data used. David Cameron did too when the plan was  announced in 2012.

In 2014 the public was told there should be no more surprises. This latest response is not only a surprise but enormously disrespectful.

When you’re trying to rebuild trust, assuming that we accept that ‘is’ the aim, you can’t say one thing, and do another.  Perhaps the Department for Health doesn’t like the public answer to what the public wants from opt out, but that doesn’t make the DH view right.

Perhaps NHS Digital doesn’t want to deal with lots of individual opt out requests, that doesn’t make their refusal right.

Kingsley Manning recognised in July 2014, that the Information Centre “had made big mistakes over the last 10 years.” And there was “a once-in-a-generation chance to get it right.”

I didn’t think I’d have to move into the next one before they fix it.

The recent round of 2016 public feedback was the same as care.data 1.0. Respect nuanced opt outs and you will have all the identifiable public interest research data you want. Solutions must be better for other uses, opt out requests must be respected without distressing patients further in the process, and anonymous must mean  anonymous.

Pseudonymised data requests that go through the DARS process so that a Data Sharing Framework Contract and Data Sharing Agreement are in place are considered to be compliant with the ICO code of practice – fine, but they are not anonymous. If DARS is still giving my family’s data to Experian, Harvey Walsh, and co, despite opt out, I’ll be furious.

The [Caldicott 2] Review Panel found “that commissioners do not need dispensation from confidentiality, human rights & data protection law.

Neither do our politicians, their policies or ALBs.


[1] https://www.england.nhs.uk/ourwork/tsd/ig/ig-fair-process/further-info-gps/

“A patient can object to their confidential personal information from being disclosed out of the GP Practice and/or from being shared onwards by the HSCIC for non-direct care purposes (secondary purposes).”

[2] Minimum Mandatory Measures http://www.nationalarchives.gov.uk/documents/information-management/cross-govt-actions.pdf p7

Data for Policy: Ten takeaways from the conference

The knowledge and thinking on changing technology, the understanding of the computing experts and those familiar with data, must not stay within conference rooms and paywalls.

What role do data and policy play in a world of post-truth politics and press? How will young people become better informed for their future?

The data for policy conference this week, brought together some of the leading names in academia and a range of technologists, government representatives, people from the European Commission, and other global organisations, Think Tanks, civil society groups, companies, and individuals interested in data and statistics. Beyond the UK, speakers came from several other countries in Europe, from the US, South America and Australia.

The schedule was ambitious and wide-ranging in topics. There was brilliant thinking and applications of ideas. Theoretical and methodological discussions were outnumbered by the presentations that included practical applications or work in real-life scenarios using social science data, humanitarian data, urban planning, public population-wide administrative data from health, finance, documenting sexual violence and more. This was good.

We heard about lots of opportunities and applied projects where large datasets are being used to improve the world. But while I always come away from these events having learned something and encouraged to learn more about those I didn’t, I do wonder if the biggest challenges in data and policy aren’t still the simplest.

No matter how much information we have, we must use it wisely. I’ve captured ten takeaways of things I would like to see follow. This may not have been the forum for it.

Ten takeaways on Data-for-Policy

1. Getting beyond the Bubble

All this knowledge must reach beyond the bubble of academia, beyond a select few white-male-experts-in well off parts of the world, and get into the hands and heads of the many. Ways to do this must include reducing the cost or changing  pathways of academic print access. Event and conference fees are also a  barrier to many.

2. Context of accessibility and control

There is little discussion of the importance of context. The nuance of most of these subjects was too much for the length of the sessions but I didn’t hear any single session mention threats to data access and trust in data collection posed by surveillance or state censorship or restriction of access to data or information systems, or the editorial control of knowledge and news by Facebook and co. There was no discussion of the influence of machine manipulators, how bots change news or numbers and create fictitious followings.

Policy makers and public are influenced by the media, post-truth or not. Policy makers in the UK government recently wrote in response to challenge over a Statutory Instrument that if Mums-net wasn’t kicking up  a fuss then they believed the majority of the public were happy. How are policy makers being influenced by press or social media statistics without oversight or regulating for their accuracy?

Increasing data and technology literacy in policy makers, is going to go far beyond improving an understanding of data science.

3. Them and Us

I feel a growing disconnect between those ‘in the know’ and those in ‘the public’. Perhaps that is a side-effect of my own understanding growing about how policy is made, but it goes wider. Those who talked about ‘the public’ did so without mention that attendees are all part of that public. Big data, are often our data. We are the public.

Vast parts of the population feel left behind already by policy and government decision-making; divided by income, Internet access, housing, life opportunites, and the ability to realise our dreams.

How policy makers address this gulf in the short and long term both matter as a foundation for what data infrastructure we have access to, how well trusted it is, whose data are included and who is left out of access to the information or decision-making using it.

Researchers prevented from accessing data held by government departments, perhaps who fear it will be used to criticise rather than help improve policy of the day, may be limiting our true picture of some of this divide and its solutions.

Equally data that is used to implement top-down policy without public involvement, seems a shame to ignore public opinion. I would like to have asked, does GDS in its land survey work searching for free school sites include people surveys asking, do you want a free school in your area at all?

4. There is no neutral

Global trust in politics is in tatters. Trust in the media is as bad. Neither appear to be interested across the world in doing much to restore their integrity.

All the wisdom in the world could not convince a majority in the 23rd June referendum, that the UK should remain in the European Union. This unspoken context was perhaps an aside to most of the subjects of the conference which went beyond the UK,  but we cannot ignore that the UK is deep in political crisis in the world, and at home the Opposition seems to have gone into a tailspin.

What role do data and evidence have in post-truth politics?

It was clear in discussion, that if I mentioned technology and policy in a political context, eyes started to glaze over. Politics should not interfere with the public interest, but it does and cannot be ignored. In fact it is short term political terms and needs for long term vision that are perhaps most at-odds in making good data policy plans.

The concept of public good, is not uncomplicated. It is made more complex still if you factor in changes over time, and cannot ignore that Trump or Turkey are not fictitious backdrops considering who decides what the public good and policy priorities should be.

Researchers’ role in shaping public good is not only about being ethical in their own research, but having the vision to have safeguards in place for how the knowledge they create are used.

5. Ethics is our problem, but who has the solution?

While many speakers touched on the common themes of ethics and privacy in data collection and analytics, saying this is going to be one of our greatest challenges, few address how, and who is taking responsibility and accountability for making it happen in ways that are not left to big business and profit making decision-takers.

It appears from last year, that ethics played a more central role. A year later we now have two new ethical bodies in the UK, at the UK Statistics Authority and at the Turing Institute. How they will influence the wider ethics issues in data science remains to be seen.

Legislation and policy are not keeping pace with the purchasing power or potential of the big players, the Googles and Amazons and Microsofts, and a government that sees anything resulting in economic growth as good, is unlikely to be willing to regulate it.

How technology can be used and how it should be used still seems a far off debate that no one is willing to take on and hold policy makers to account for. Implementing legislation and policy underpinned with ethics must serve as a framework for giving individuals insight into how decisions about them were reached by machines, or the imbalance of power that commercial companies and state agencies have in our lives that comes from insights through privacy invasion.

6. Inclusion and bias

Clearly this is one event in a world of many events that address similar themes, but I do hope that the unequal balance in representation across the many diverse aspects of being human are being addressed elsewhere.  A wider audience must be inclusive. The talk by Jim Waldo on retaining data accuracy while preserving privacy was interesting as it showed how deidentified data can create bias in results if data is very different from the original. Gaps in data, especially using big population data which excludes certain communities, wasn’t something I heard discussed as much.

7.Commercial data sources

Government and governmental organisations appear to be starting to give significant weight to the use of commercial data and social media data sources. I guess any data seen as ‘freely available’ that can be mined seems valuable. I wonder however how this will shape the picture of our populations, with what measures of validity and  whether data are comparable and offer reproducability.

These questions will matter in shaping policy and what governments know about the public. And equally, they must consider those communities whether in the UK or in other countries, that are not represented in these datasets and how these bias decision-making.

8. Data is not a panacea for policy making

Overall my take away is the important role that data scientists have to remind policy makers that data is only information. Nothing new. We may be able to access different sources of data in different ways, and process it faster or differently from the past, but we cannot rely on data of itself to solve the universal problems of the human condition. Data must be of good integrity to be useful and valuable. Data must be only one part of the library of resources to be used in planning policy. The limitations of data must also be understood. The uncertainties and unknowns can be just as important as evidence.

9. Trust and transparency

Regulation and oversight matter but cannot be the only solutions offered to concerns about shaping what is possible to do versus what should be done. Talking about protecting trust is not enough. Organisations must become more trustworthy if trust levels are to change; through better privacy policies, through secure data portability and rights to revoke consent and delete outdated data.

10. Young people and involvement in their future

What inspired me most were the younger attendees presenting posters, especially the PhD student using data to provide evidence of sexual violence in El Salvador and their passion for improving lives.

We are still not talking about how to protect and promote privacy in the Internet of Things, where sensors on every street corner in Smart Cities gather data about where we have been, what we buy and who we are with. Even our children’s toys send data to others.

I’m still as determined to convince policy makers that young people’s data privacy and digital self-awareness must be prioritised.

Highlighting the policy and practice failings in the niche area of the National Pupil Database serves only to get ideas from others how  policy and practice could be better. 20 million school children’s records is not a bad place to start to make data practice better.

The questions that seem hardest to move forward are the simplest: how to involve everyone in what data and policy may bring for future and not leave out certain communities through carelessness.

If the public is not encouraged to understand how our own personal data are collected and used, how can we expect to grow great data scientists of the future? What uses of data put good uses at risk?

And we must make sure we don’t miss other things, while data takes up the time and focus of today’s policy makers and great minds alike.

cb-poster-for-web

care.data listening events and consultation: The same notes again?

If lots of things get said in a programme of events, and nothing is left around to read about it, did they happen?

The care.data programme 2014-15 listening exercise and action plan has become impossible to find online. That’s OK, you might think, the programme has been scrapped. Not quite.

You can give your views online until September 7th on the new consultation, “New data security standards and opt-out models for health and social care”  and/or attend the new listening events, September 26th in London, October 3rd in Southampton and October 10th in Leeds.

The Ministerial statement on July 6, announced that NHS England had taken the decision to close the care.data programme after the review of data security and consent by Dame Fiona Caldicott, the National Data Guardian for Health and Care.

But the same questions are being asked again around consent and use of your medical data, from primary and secondary care. What a very long questionnaire asks is in effect,  do you want to keep your medical history private? You can answer only Q 15 if you want.

Ambiguity again surrounds what constitutes “de-identified” patient information.

What is clear is that public voice seems to have been deleted or lost from the care.data programme along with the feedback and brand.

People spoke up in 2014, and acted. The opt out that 1 in 45 people chose between January and March 2014 was put into effect by the HSCIC in April 2016. Now it seems, that might be revoked.

We’ve been here before.  There is no way that primary care data can be extracted without consent without it causing further disruption and damage to public trust and public interest research.  The future plans for linkage between all primary care data and secondary data and genomics for secondary uses, is untenable without consent.

Upcoming events cost time and money and will almost certainly go over the same ground that hours and hours were spent on in 2014. However if they do achieve a meaningful response rate, then I hope the results will not be lost and will be combined with those already captured under the ‘care.data listening events’ responses.  Will they have any impact on what consent model there may be in future?

So what we gonna do? I don’t know, whatcha wanna do? Let’s do something.

Let’s have accredited access and security fixed. While there may now be a higher transparency and process around release, there are still problems about who gets data and what they do with it.

Let’s have clear future scope and control. There is still no plan to give the public rights to control or delete data if we change our minds who can have it or for what purposes. And that is very uncertain. After all, they might decide to privatise or outsource the whole thing as was planned for the CSUs. 

Let’s have answers to everything already asked but unknown. The questions in the previous Caldicott review have still to be answered.

We have the possibility to  see health data used wisely, safely, and with public trust. But we seem stuck with the same notes again. And the public seem to be the last to be invited to participate and views once gathered, seem to be disregarded. I hope to be proved wrong.

Might, perhaps, the consultation deliver the nuanced consent model discussed at public listening exercises that many asked for?

Will the care.data listening events feedback summary be found, and will its 2014 conclusions and the enacted opt out be ignored? Will the new listening event view make more difference than in 2014?

Is public engagement, engagement, if nobody hears what was said?

Mum, are we there yet? Why should AI care.

Mike Loukides drew similarities between the current status of AI and children’s learning in an article I read this week.

The children I know are always curious to know where they are going, how long will it take, and how they will know when they get there. They ask others for guidance often.

Loukides wrote that if you look carefully at how humans learn, you see surprisingly little unsupervised learning.

If unsupervised learning is a prerequisite for general intelligence, but not the substance, what should we be looking for, he asked. It made me wonder is it also true that general intelligence is a prerequisite for unsupervised learning? And if so, what level of learning must AI achieve before it is capable of recursive self-improvement? What is AI being encouraged to look for as it learns, what is it learning as it looks?

What is AI looking for and how will it know when it gets there?

Loukides says he can imagine a toddler learning some rudiments of counting and addition on his or her own, but can’t imagine a child developing any sort of higher mathematics without a teacher.

I suggest a different starting point. I think children develop on their own, given a foundation. And if the foundation is accompanied by a purpose — to understand why they should learn to count, and why they should want to — and if they have the inspiration, incentive and  assets they’ll soon go off on their own, and outstrip your level of knowledge. That may or may not be with a teacher depending on what is available, cost, and how far they get compared with what they want to achieve.

It’s hard to learn something from scratch by yourself if you have no boundaries to set knowledge within and search for more, or to know when to stop when you have found it.

You’ve only to start an online course, get stuck, and try to find the solution through a search engine to know how hard it can be to find the answer if you don’t know what you’re looking for. You can’t type in search terms if you don’t know the right words to describe the problem.

I described this recently to a fellow codebar-goer, more experienced than me, and she pointed out something much better to me. Don’t search for the solution or describe what you’re trying to do, ask the search engine to find others with the same error message.

In effect she said, your search is wrong. Google knows the answer, but can’t tell you what you want to know, if you don’t ask it in the way it expects.

So what will AI expect from people and will it care if we dont know how to interrelate? How does AI best serve humankind and defined by whose point-of-view? Will AI serve only those who think most closely in AI style steps and language?  How will it serve those who don’t know how to talk about, or with it? AI won’t care if we don’t.

If as Loukides says, we humans are good at learning something and then applying that knowledge in a completely different area, it’s worth us thinking about how we are transferring our knowledge today to AI and how it learns from that. Not only what does AI learn in content and context, but what does it learn about learning?

His comparison of a toddler learning from parents — who in effect are ‘tagging’ objects through repetition of words while looking at images in a picture book — made me wonder how we will teach AI the benefit of learning? What incentive will it have to progress?

“the biggest project facing AI isn’t making the learning process faster and more efficient. It’s moving from machines that solve one problem very well (such as playing Go or generating imitation Rembrandts) to machines that are flexible and can solve many unrelated problems well, even problems they’ve never seen before.”

Is the skill to enable “transfer learning” what will matter most?

For AI to become truly useful, we need better as a global society to understand *where* it might best interface with our daily lives, and most importantly *why*.  And consider *who* is teaching and AI and who is being left out in the crowdsourcing of AI’s teaching.

Who is teaching AI what it needs to know?

The natural user interfaces for people to interact with today’s more common virtual assistants (Amazon’s Alexa, Apple’s Siri and Viv, Microsoft  and Cortana) are not just providing information to the user, but through its use, those systems are learning. I wonder what percentage of today’s  population is using these assistants, how representative are they, and what our AI assistants are being taught through their use? Tay was a swift lesson learned for Microsoft.

In helping shape what AI learns, what range of language it will use to develop its reference words and knowledge, society co-shapes what AI’s purpose will be —  and for AI providers to know what’s the point of selling it. So will this technology serve everyone?

Are providers counter-balancing what AI is currently learning from crowdsourcing, if the crowd is not representative of society?

So far we can only teach machines to make decisions based on what we already know, and what we can tell it to decide quickly against pre-known references using lots of data. Will your next image captcha, teach AI to separate the sloth from the pain-au-chocolat?

One of the task items for machine processing is better searches. Measurable goal driven tasks have boundaries, but who sets them? When does a computer know, if it’s found enough to make a decision. If the balance of material about the Holocaust on the web for example, were written by Holocaust deniers will AI know who is right? How will AI know what is trusted and by whose measure?

What will matter most is surely not going to be how to optimise knowledge transfer from human to AI — that is the baseline knowledge of supervised learning — and it won’t even be for AI to know when to use its skill set in one place and when to apply it elsewhere in a different context; so-called learning transfer, as Mike Loukides says. But rather, will AI reach the point where it cares?

  • Will AI ever care what it should know and where to stop or when it knows enough on any given subject?
  • How will it know or care if what it learns is true?
  • If in the best interests of advancing technology or through inaction  we do not limit its boundaries, what oversight is there of its implications?

Online limits will limit what we can reach in Thinking and Learning

If you look carefully at how humans learn online, I think rather than seeing  surprisingly little unsupervised learning, you see a lot of unsupervised questioning. It is often in the questioning that is done in private we discover, and through discovery we learn. Often valuable discoveries are made; whether in science, in maths, or important truths are found where there is a need to challenge the status quo. Imagine if Galileo had given up.

The freedom to think freely and to challenge authority, is vital to protect, and one reason why I and others are concerned about the compulsory web monitoring starting on September 5th in all schools in England, and its potential chilling effect. Some are concerned who  might have access to these monitoring results today or in future, if stored could they be opened to employers or academic institutions?

If you tell children do not use these search terms and do not be curious about *this* subject without repercussions, it is censorship. I find the idea bad enough for children, but for us as adults its scary.

As Frankie Boyle wrote last November, we need to consider what our internet history is:

“The legislation seems to view it as a list of actions, but it’s not. It’s a document that shows what we’re thinking about.”

Children think and act in ways that they may not as an adult. People also think and act differently in private and in public. It’s concerning that our private online activity will become visible to the State in the IP Bill — whether photographs that captured momentary actions in social media platforms without the possibility to erase them, or trails of transitive thinking via our web history — and third-parties may make covert judgements and conclusions about us, correctly or not, behind the scenes without transparency, oversight or recourse.

Children worry about lack of recourse and repercussions. So do I. Things done in passing, can take on a permanence they never had before and were never intended. If expert providers of the tech world such as Apple Inc, Facebook Inc, Google Inc, Microsoft Corp, Twitter Inc and Yahoo Inc are calling for change, why is the government not listening? This is more than very concerning, it will have disastrous implications for trust in the State, data use by others, self-censorship, and fear that it will lead to outright censorship of adults online too.

By narrowing our parameters what will we not discover? Not debate?  Or not invent? Happy are the clockmakers, and kids who create. Any restriction on freedom to access information, to challenge and question will restrict children’s learning or even their wanting to.  It will limit how we can improve our shared knowledge and improve our society as a result. The same is true of adults.

So in teaching AI how to learn, I wonder how the limitations that humans put on its scope — otherwise how would it learn what the developers want — combined with showing it ‘our thinking’ through search terms,  and how limitations on that if users self-censor due to surveillance, will shape what AI will help us with in future and will it be the things that could help the most people, the poorest people, or will it be people like those who programme the AI and use search terms and languages it already understands?

Who is accountable for the scope of what we allow AI to do or not? Who is accountable for what AI learns about us, from our behaviour data if it is used without our knowledge?

How far does AI have to go?

The leap for AI will be if and when AI can determine what it doesn’t know, and it sees a need to fill that gap. To do that, AI will need to discover a purpose for its own learning, indeed for its own being, and be able to do so without limitation from the that humans shaped its framework for doing so. How will AI know what it needs to know and why? How will it know, what it knows is right and sources to trust? Against what boundaries will AI decide what it should engage with in its learning, who from and why? Will it care? Why will it care? Will it find meaning in its reason for being? Why am I here?

We assume AI will know better. We need to care, if AI is going to.

How far are we away from a machine that is capable of recursive self-improvement, asks John Naughton in yesterday’s Guardian, referencing work by Yuval Harari suggesting artificial intelligence and genetic enhancements will usher in a world of inequality and powerful elites. As I was finishing this, I read his article, and found myself nodding, as I read the implications of new technology focus too much on technology and too little on society’s role in shaping it.

AI at the moment has a very broad meaning to the general public. Is it living with life-supporting humanoids?  Do we consider assistive search tools as AI? There is a fairly general understanding of “What is A.I., really?” Some wonder if we are “probably one of the last generations of Homo sapiens,” as we know it.

If the purpose of AI is to improve human lives, who defines improvement and who will that improvement serve? Is there a consensus on the direction AI should and should not take, and how far it should go? What will the global language be to speak AI?

As AI learning progresses, every time AI turns to ask its creators, “Are we there yet?”,  how will we know what to say?

image: Stephen Barling flickr.com/photos/cripsyduck (CC BY-NC 2.0)

The illusion that might cheat us: ethical data science vision and practice

This blog post is also available as an audio file on soundcloud.


Anais Nin, wrote in her 1946 diary of the dangers she saw in the growth of technology to expand our potential for connectivity through machines, but diminish our genuine connectedness as people. She could hardly have been more contemporary for today:

“This is the illusion that might cheat us of being in touch deeply with the one breathing next to us. The dangerous time when mechanical voices, radios, telephone, take the place of human intimacies, and the concept of being in touch with millions brings a greater and greater poverty in intimacy and human vision.”
[Extract from volume IV 1944-1947]

Echoes from over 70 years ago, can be heard in the more recent comments of entrepreneur Elon Musk. Both are concerned with simulation, a lack of connection between the perceived, and reality, and the jeopardy this presents for humanity. But both also have a dream. A dream based on the positive potential society has.

How will we use our potential?

Data is the connection we all have between us as humans and what machines and their masters know about us. The values that masters underpin their machine design with, will determine the effect the machines and knowledge they deliver, have on society.

In seeking ever greater personalisation, a wider dragnet of data is putting together ever more detailed pieces of information about an individual person. At the same time data science is becoming ever more impersonal in how we treat people as individuals. We risk losing sight of how we respect and treat the very people whom the work should benefit.

Nin grasped the risk that a wider reach, can mean more superficial depth. Facebook might be a model today for the large circle of friends you might gather, but how few you trust with confidences, with personal knowledge about your own personal life, and the privilege it is when someone chooses to entrust that knowledge to you. Machine data mining increasingly tries to get an understanding of depth, and may also add new layers of meaning through profiling, comparing our characteristics with others in risk stratification.
Data science, research using data, is often talked about as if it is something separate from using information from individual people. Yet it is all about exploiting those confidences.

Today as the reach has grown in what is possible for a few people in institutions to gather about most people in the public, whether in scientific research, or in surveillance of different kinds, we hear experts repeatedly talk of the risk of losing the valuable part, the knowledge, the insights that benefit us as society if we can act upon them.

We might know more, but do we know any better? To use a well known quote from her contemporary, T S Eliot, ‘Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?’

What can humans achieve? We don’t yet know our own limits. What don’t we yet know?  We have future priorities we aren’t yet aware of.

To be able to explore the best of what Nin saw as ‘human vision’ and Musk sees in technology, the benefits we have from our connectivity; our collaboration, shared learning; need to be driven with an element of humility, accepting values that shape  boundaries of what we should do, while constantly evolving with what we could do.

The essence of this applied risk is that technology could harm you, more than it helps you. How do we avoid this and develop instead the best of what human vision makes possible? Can we also exceed our own expectations of today, to advance in moral progress?

Continue reading The illusion that might cheat us: ethical data science vision and practice

OkCupid and Google DeepMind: Happily ever after? Purposes and ethics in datasharing

This blog post is also available as an audio file on soundcloud.


What constitutes the public interest must be set in a universally fair and transparent ethics framework if the benefits of research are to be realised – whether in social science, health, education and more – that framework will provide a strategy to getting the pre-requisite success factors right, ensuring research in the public interest is not only fit for the future, but thrives. There has been a climate change in consent. We need to stop talking about barriers that prevent datasharing  and start talking about the boundaries within which we can.

What is the purpose for which I provide my personal data?

‘We use math to get you dates’, says OkCupid’s tagline.

That’s the purpose of the site. It’s the reason people log in and create a profile, enter their personal data and post it online for others who are looking for dates to see. The purpose, is to get a date.

When over 68K OkCupid users registered for the site to find dates, they didn’t sign up to have their identifiable data used and published in ‘a very large dataset’ and onwardly re-used by anyone with unregistered access. The users data were extracted “without the express prior consent of the user […].”

Are the registration consent purposes compatible with the purposes to which the researcher put the data should be a simple enough question.  Are the research purposes what the person signed up to, or would they be surprised to find out their data were used like this?

Questions the “OkCupid data snatcher”, now self-confessed ‘non-academic’ researcher, thought unimportant to consider.

But it appears in the last month, he has been in good company.

Google DeepMind, and the Royal Free, big players who do know how to handle data and consent well, paid too little attention to the very same question of purposes.

The boundaries of how the users of OkCupid had chosen to reveal information and to whom, have not been respected in this project.

Nor were these boundaries respected by the Royal Free London trust that gave out patient data for use by Google DeepMind with changing explanations, without clear purposes or permission.

The legal boundaries in these recent stories appear unclear or to have been ignored. The privacy boundaries deemed irrelevant. Regulatory oversight lacking.

The respectful ethical boundaries of consent to purposes, disregarding autonomy, have indisputably broken down, whether by commercial org, public body, or lone ‘researcher’.

Research purposes

The crux of data access decisions is purposes. What question is the research to address – what is the purpose for which the data will be used? The intent by Kirkegaard was to test:

“the relationship of cognitive ability to religious beliefs and political interest/participation…”

In this case the question appears intended rather a test of the data, not the data opened up to answer the test. While methodological studies matter, given the care and attention [or self-stated lack thereof] given to its extraction and any attempt to be representative and fair, it would appear this is not the point of this study either.

The data doesn’t include profiles identified as heterosexual male, because ‘the scraper was’. It is also unknown how many users hide their profiles, “so the 99.7% figure [identifying as binary male or female] should be cautiously interpreted.”

“Furthermore, due to the way we sampled the data from the site, it is not even representative of the users on the site, because users who answered more questions are overrepresented.” [sic]

The paper goes on to say photos were not gathered because they would have taken up a lot of storage space and could be done in a future scraping, and

“other data were not collected because we forgot to include them in the scraper.”

The data are knowingly of poor quality, inaccurate and incomplete. The project cannot be repeated as ‘the scraping tool no longer works’. There is an unclear ethical or peer review process, and the research purpose is at best unclear. We can certainly give someone the benefit of the doubt and say intent appears to have been entirely benevolent. It’s not clear what the intent was. I think it is clearly misplaced and foolish, but not malevolent.

The trouble is, it’s not enough to say, “don’t be evil.” These actions have consequences.

When the researcher asserts in his paper that, “the lack of data sharing probably slows down the progress of science immensely because other researchers would use the data if they could,”  in part he is right.

Google and the Royal Free have tried more eloquently to say the same thing. It’s not research, it’s direct care, in effect, ignore that people are no longer our patients and we’re using historical data without re-consent. We know what we’re doing, we’re the good guys.

However the principles are the same, whether it’s a lone project or global giant. And they’re both wildly wrong as well. More people must take this on board. It’s the reason the public interest needs the Dame Fiona Caldicott review published sooner rather than later.

Just because there is a boundary to data sharing in place, does not mean it is a barrier to be ignored or overcome. Like the registration step to the OkCupid site, consent and the right to opt out of medical research in England and Wales is there for a reason.

We’re desperate to build public trust in UK research right now. So to assert that the lack of data sharing probably slows down the progress of science is misplaced, when it is getting ‘sharing’ wrong, that caused the lack of trust in the first place and harms research.

A climate change in consent

There has been a climate change in public attitude to consent since care.data, clouded by the smoke and mirrors of state surveillance. It cannot be ignored.  The EUGDPR supports it. Researchers may not like change, but there needs to be an according adjustment in expectations and practice.

Without change, there will be no change. Public trust is low. As technology advances and if we continue to see commercial companies get this wrong, we will continue to see public trust falter unless broken things get fixed. Change is possible for the better. But it has to come from companies, institutions, and people within them.

Like climate change, you may deny it if you choose to. But some things are inevitable and unavoidably true.

There is strong support for public interest research but that is not to be taken for granted. Public bodies should defend research from being sunk by commercial misappropriation if they want to future-proof public interest research.

The purpose for which the people gave consent are the boundaries within which you have permission to use data, that gives you freedom within its limits, to use the data.  Purposes and consent are not barriers to be overcome.

If research is to win back public trust developing a future proofed, robust ethical framework for data science must be a priority today.

Commercial companies must overcome the low levels of public trust they have generated in the public to date if they ask ‘trust us because we’re not evil‘. If you can’t rule out the use of data for other purposes, it’s not helping. If you delay independent oversight it’s not helping.

This case study and indeed the Google DeepMind recent episode by contrast demonstrate the urgency with which working out what common expectations and oversight of applied ethics in research, who gets to decide what is ‘in the public interest’ and data science public engagement must be made a priority, in the UK and beyond.

Boundaries in the best interest of the subject and the user

Society needs research in the public interest. We need good decisions made on what will be funded and what will not be. What will influence public policy and where needs attention for change.

To do this ethically, we all need to agree what is fair use of personal data, when is it closed and when is it open, what is direct and what are secondary uses, and how advances in technology are used when they present both opportunities for benefit or risks to harm to individuals, to society and to research as a whole.

The potential benefits of research are potentially being compromised for the sake of arrogance, greed, or misjudgement, no matter intent. Those benefits cannot come at any cost, or disregard public concern, or the price will be trust in all research itself.

In discussing this with social science and medical researchers, I realise not everyone agrees. For some, using deidentified data in trusted third party settings poses such a low privacy risk, that they feel the public should have no say in whether their data are used in research as long it’s ‘in the public interest’.

For the DeepMind researchers and Royal Free, they were confident even using identifiable data, this is the “right” thing to do, without consent.

For the Cabinet Office datasharing consultation, the parts that will open up national registries, share identifiable data more widely and with commercial companies, they are convinced it is all the “right” thing to do, without consent.

How can researchers, society and government understand what is good ethics of data science, as technology permits ever more invasive or covert data mining and the current approach is desperately outdated?

Who decides where those boundaries lie?

“It’s research Jim, but not as we know it.” This is one aspect of data use that ethical reviewers will need to deal with, as we advance the debate on data science in the UK. Whether independents or commercial organisations. Google said their work was not research. Is‘OkCupid’ research?

If this research and data publication proves anything at all, and can offer lessons to learn from, it is perhaps these three things:

Who is accredited as a researcher or ‘prescribed person’ matters. If we are considering new datasharing legislation, and for example, who the UK government is granting access to millions of children’s personal data today. Your idea of a ‘prescribed person’ may not be the same as the rest of the public’s.

Researchers and ethics committees need to adjust to the climate change of public consent. Purposes must be respected in research particularly when sharing sensitive, identifiable data, and there should be no assumptions made that differ from the original purposes when users give consent.

Data ethics and laws are desperately behind data science technology. Governments, institutions, civil, and all society needs to reach a common vision and leadership how to manage these challenges. Who defines these boundaries that matter?

How do we move forward towards better use of data?

Our data and technology are taking on a life of their own, in space which is another frontier, and in time, as data gathered in the past might be used for quite different purposes today.

The public are being left behind in the game-changing decisions made by those who deem they know best about the world we want to live in. We need a say in what shape society wants that to take, particularly for our children as it is their future we are deciding now.

How about an ethical framework for datasharing that supports a transparent public interest, which tries to build a little kinder, less discriminating, more just world, where hope is stronger than fear?

Working with people, with consent, with public support and transparent oversight shouldn’t be too much to ask. Perhaps it is naive, but I believe that with an independent ethical driver behind good decision-making, we could get closer to datasharing like that.

That would bring Better use of data in government.

Purposes and consent are not barriers to be overcome. Within these, shaped by a strong ethical framework, good data sharing practices can tackle some of the real challenges that hinder ‘good use of data’: training, understanding data protection law, communications, accountability and intra-organisational trust. More data sharing alone won’t fix these structural weaknesses in current UK datasharing which are our really tough barriers to good practice.

How our public data will be used in the public interest will not be a destination or have a well defined happy ending, but it is a long term  process which needs to be consensual and there needs to be a clear path to setting out together and achieving collaborative solutions.

While we are all different, I believe that society shares for the most part, commonalities in what we accept as good, and fair, and what we believe is important. The family sitting next to me have just counted out their money and bought an ice cream to share, and the staff gave them two. The little girl is beaming. It seems that even when things are difficult, there is always hope things can be better. And there is always love.

Even if some might give it a bad name.

********

img credit: flickr/sofi01/ Beauty and The Beast  under creative commons

Can new datasharing laws win social legitimacy, public trust and support without public engagement?

I’ve been struck by stories I’ve heard on the datasharing consultation, on data science, and on data infrastructures as part of ‘government as a platform’ (#GaaPFuture) in recent weeks. The audio recorded by the Royal Statistical Society on March 17th is excellent, and there were some good questions asked.

There were even questions from insurance backed panels to open up more data for commercial users, and calls for journalists to be seen as accredited researchers, as well as to include health data sharing. Three things that some stakeholders, all users of data, feel are  missing from consultation, and possibly some of those with the most widespread public concern and lowest levels of public trust. [1]

What I feel is missing in consultation discussions are:

  1. a representative range of independent public voice
  2. a compelling story of needs – why tailored public services benefits citizens from whom data is taken, not only benefits data users
  3. the impacts we expect to see in local government
  4. any cost/risk/benefit assessment of those impacts, or for citizens
  5. how the changes will be independently evaluated – as some are to be reviewed

The Royal Statistical Society and ODI have good summaries here of their thoughts, more geared towards the statistical and research aspects of data,  infrastructure and the consultation.

I focus on the other strands that use identifiable data for targeted interventions. Tailored public services, Debt, Fraud, Energy Companies’ use. I think we talk too little of people, and real needs.

Why the State wants more datasharing is not yet a compelling story and public need and benefit seem weak.

So far the creation of new data intermediaries, giving copies of our personal data to other public bodies  – and let’s be clear that this often means through commercial representatives like G4S, Atos, Management consultancies and more –  is yet to convince me of true public needs for the people, versus wants from parts of the State.

What the consultation hopes to achieve, is new powers of law, to give increased data sharing increased legal authority. However this alone will not bring about the social legitimacy of datasharing that the consultation appears to seek through ‘open policy making’.

Legitimacy is badly needed if there is to be public and professional support for change and increased use of our personal data as held by the State, which is missing today,  as care.data starkly exposed. [2]

The gap between Social Legitimacy and the Law

Almost 8 months ago now, before I knew about the datasharing consultation work-in-progress, I suggested to BIS that there was an opportunity for the UK to drive excellence in public involvement in the use of public data by getting real engagement, through pro-active consent.

The carrot for this, is achieving the goal that government wants – greater legal clarity, the use of a significant number of consented people’s personal data for complex range of secondary uses as a secondary benefit.

It was ignored.

If some feel entitled to the right to infringe on citizens’ privacy through a new legal gateway because they believe the public benefit outweighs private rights, then they must also take on the increased balance of risk of doing so, and a responsibility to  do so safely. It is in principle a slippery slope. Any new safeguards and ethics for how this will be done are however unclear in those data strands which are for targeted individual interventions. Especially if predictive.

Upcoming discussions on codes of practice [which have still to be shared] should demonstrate how this is to happen in practice, but codes are not sufficient. Laws which enable will be pushed to their borderline of legal and beyond that of ethical.

In England who would have thought that the 2013 changes that permitted individual children’s data to be given to third parties [3] for educational purposes, would mean giving highly sensitive, identifiable data to journalists without pupils or parental consent? The wording allows it. It is legal. However it fails the DPA Act legal requirement of fair processing.  Above all, it lacks social legitimacy and common sense.

In Scotland, there is current anger over the intrusive ‘named person’ laws which lack both professional and public support and intrude on privacy. Concerns raised should be lessons to learn from in England.

Common sense says laws must take into account social legitimacy.

We have been told at the open policy meetings that this change will not remove the need for informed consent. To be informed, means creating the opportunity for proper communications, and also knowing how you can use the service without coercion, i.e. not having to consent to secondary data uses in order to get the service, and knowing to withdraw consent at any later date. How will that be offered with ways of achieving the removal of data after sharing?

The stick for change, is the legal duty that the recent 2015 CJEU ruling reiterating the legal duty to fair processing [4] waved about. Not just a nice to have, but State bodies’ responsibility to inform citizens when their personal data are used for purposes other than those for which those data had initially been consented and given. New legislation will not  remove this legal duty.

How will it be achieved without public engagement?

Engagement is not PR

Failure to act on what you hear from listening to the public is costly.

Engagement is not done *to* people, don’t think explain why we need the data and its public benefit’ will work. Policy makers must engage with fears and not seek to dismiss or diminish them, but acknowledge and mitigate them by designing technically acceptable solutions. Solutions that enable data sharing in a strong framework of privacy and ethics, not that sees these concepts as barriers. Solutions that have social legitimacy because people support them.

Mr Hunt’s promised February 2014 opt out of anonymised data being used in health research, has yet to be put in place and has had immeasurable costs for delayed public research, and public trust.

How long before people consider suing the DH as data controller for misuse? From where does the arrogance stem that decides to ignore legal rights, moral rights and public opinion of more people than those who voted for the Minister responsible for its delay?

 

This attitude is what fails care.data and the harm is ongoing to public trust and to confidence for researchers’ continued access to data.

The same failure was pointed out by the public members of the tiny Genomics England public engagement meeting two years ago in March 2014, called to respond to concerns over the lack of engagement and potential harm for existing research. The comms lead made a suggestion that the new model of the commercialisation of the human genome in England, to be embedded in the NHS by 2017 as standard clinical practice, was like steam trains in Victorian England opening up the country to new commercial markets. The analogy was felt by the lay attendees to be, and I quote, ‘ridiculous.’

Exploiting confidential personal data for public good must have support and good two-way engagement if it is to get that support, and what is said and agreed must be acted on to be trustworthy.

Policy makers must take into account broad public opinion, and that is unlikely to be submitted to a Parliamentary consultation. (Personally, I first knew such  processes existed only when care.data was brought before the Select Committee in 2014.) We already know what many in the public think about sharing their confidential data from the work with care.data and objections to third party access, to lack of consent. Just because some policy makers don’t like what was said, doesn’t make that public opinion any less valid.

We must bring to the table the public voice from past but recent public engagement work on administrative datasharing [5], the voice of the non-research community, and from those who are not stakeholders who will use the data but the ‘data subjects’, the public  whose data are to be used.

Policy Making must be built on Public Trust

Open policy making is not open just because it says it is. Who has been invited, participated, and how their views actually make a difference on content and implementation is what matters.

Adding controversial ideas at the last minute is terrible engagement, its makes the process less trustworthy and diminishes its legitimacy.

This last minute change suggests some datasharing will be dictated despite critical views in the policy making and without any public engagement. If so, we should ask policy makers on what mandate?

Democracy depends on social legitimacy. Once you lose public trust, it is not easy to restore.

Can new datasharing laws win social legitimacy, public trust and support without public engagement?

In my next post I’ll post look at some of the public engagement work done on datasharing to date, and think about ethics in how data are applied.

*************

References:

[1] The Royal Statistical Society data trust deficit

[2] “The social licence for research: why care.data ran into trouble,” by Carter et al.

[3] FAQs: Campaign for safe and ethical National Pupil Data

[4] CJEU Bara 2015 Ruling – fair processing between public bodies

[5] Public Dialogues using Administrative data (ESRC / ADRN)

img credit: flickr.com/photos/internetarchivebookimages/

A data sharing fairytale (3): transformation and impact

Part three: It is vital that the data sharing consultation is not seen in a silo, or even a set of silos each particular to its own stakeholder. To do it justice and ensure the questions that should be asked are answered, we must look instead at the whole story and the background setting. And we must ask each stakeholder, what does your happy ending look like?

Parts one and two to follow address public engagement and ethics, this focuses on current national data practice, tailored public services, and local impact of the change and transformation that will result.

What is your happy ending?

This data sharing consultation is gradually revealing to me how disjoined government appears in practice and strategy. Our digital future, a society that is more inclusive and more just, supported by better uses of technology and data in ‘dot everyone’ will not happen if they cannot first join the dots across all of Cabinet thinking and good practice, and align policies that are out of step with each other.

Last Thursday night’s “Government as a Platform Future” panel discussion (#GaaPFuture) took me back to memories of my old job, working in business implementations of process and cutting edge systems. Our finest hour was showing leadership why success would depend on neither. Success was down to local change management and communications, because change is about people, not the tech.

People in this data sharing consultation, means the public, means the staff of local government public bodies, as well as the people working at national stakeholders of the UKSA (statistics strand), ADRN (de-identified research strand), Home Office (GRO strand), DWP (Fraud and Debt strands), and DECC (energy) and staff at the national driver, the Cabinet Office.

I’ve attended two of the 2016 datasharing meetings,  and am most interested from three points of view  – because I am directly involved in the de-identified data strand,  campaign for privacy, and believe in public engagement.

Engagement with civil society, after almost 2 years of involvement on three projects, and an almost ten month pause in between, the projects had suddenly become six in 2016, so the most sensitive strands of the datasharing legislation have been the least openly discussed.

At the end of the first 2016 meeting, I asked one question.

How will local change management be handled and the consultation tailored to local organisations’ understanding and expectations of its outcome?

Why? Because a top down data extraction programme from all public services opens up the extraction of personal data as business intelligence to national level, of all local services interactions with citizens’ data.  Or at least, those parts they have collected or may collect in future.

That means a change in how the process works today. Global business intelligence/data extractions are designed to make processes more efficient, through reductions in current delivery, yet concrete public benefits for citizens are hard to see that would be different from today, so why make this change in practice?

What it might mean for example, would be to enable collection of all citizens’ debt information into one place, and that would allow the service to centralise chasing debt and enforce its collection, outsourced to a single national commercial provider.

So what does the future look like from the top? What is the happy ending for each strand, that will be achieved should this legislation be passed?  What will success for each set of plans look like?

What will we stop doing, what will we start doing differently and how will services concretely change from today, the current state, to the future?

Most importantly to understand its implications for citizens and staff, we should ask how will this transformation be managed well to see the benefits we are told it will deliver?

Can we avoid being left holding a pumpkin, after the glitter of ‘use more shiny tech’ and government love affair with the promises of Big Data wear off?

Look into the local future

Those with the vision of the future on a panel at the GDS meeting this week, the new local government model enabled by GaaP, also identified, there are implications for potential loss of local jobs, and “turkeys won’t vote for Christmas”. So who is packaging this change to make it successfully deliverable?

If we can’t be told easily in consultation, then it is not a clear enough policy to deliver. If there is a clear end-state, then we should ask what the applied implications in practice are going to be?

It is vital that the data sharing consultation is not seen in a silo, or even a set of silos each particular to its own stakeholder, about copying datasets to share them more widely, but that we look instead at the whole story and the background setting.

The Tailored Reviews: public bodies guidance suggests massive reform of local government, looking for additional savings, looking to cut back office functions and commercial plans. It asks “What workforce reductions have already been agreed for the body? Is there potential to go further? Are these linked to digital savings referenced earlier?”

Options include ‘abolish, move out of central government, commercial model, bring in-house, merge with another body.’

So where is the local government public bodies engagement with change management plans in the datasharing consultation as a change process? Does it not exist?

I asked at the end of the first datasharing meeting in January and everyone looked a bit blank. A question ‘to take away’ turned into nothing.

Yet to make this work, the buy-in of local public bodies is vital. So why skirt round this issue in local government, if there are plans to address it properly?

If there are none, then with all the data in the world, public services delivery will not be improved, because the issues are friction not of interference by consent, or privacy issues, but working practices.

If the idea is to avoid this ‘friction’ by removing it, then where is the change management plan for public services and our public staff?

Trust depends on transparency

John Pullinger, our National Statistician, this week also said on datasharing we need a social charter on data to develop trust.

Trust can only be built between public and state if the organisations, and all the people in them, are trustworthy.

To implement process change successfully, the people involved in these affected organisations, the staff, must trust that change will mean positive improvement and risks explained.

For the public, what defined levels of data access, privacy protection, and scope limitation that this new consultation will permit in practice, are clearly going to be vital to define if the public will trust its purposes.

The consultation does not do this, and there is no draft code of conduct yet, and no one is willing to define ‘research’ or ‘public interest’.

Public interest models or ‘charter’ for collection and use of research data in health, concluded that ofr ethical purposes, time also mattered. Benefits must be specific, measurable, attainable, relevant and time-bound. So let’s talk about the intended end state that is to be achieved from these changes, and identify how its benefits are to meet those objectives – change without an intended end state will almost never be successful, if you don’t know start knowing what it looks like.

For public trust, that means scope boundaries. Sharing now, with today’s laws and ethics is only fully meaningful if we trust that today’s governance, ethics and safeguards will be changeable in future to the benefit of the citizen, not ever greater powers to the state at the expense of the individual. Where is scope defined?

There is very little information about where limits would be on what data could not be shared, or when it would not be possible to do so without explicit consent. Permissive powers put the onus onto the data controller to share, and given ‘a new law says you should share’ would become the mantra, it is likely to mean less individual accountability. Where are those lines to be drawn to support the staff and public, the data user and the data subject?

So to summarise, so far I have six key questions:

  • What does your happy ending look like for each data strand?
  • How will bad practices which conflict with the current consultation proposals be stopped?
  • How will the ongoing balance of use of data for government purposes, privacy and information rights be decided and by whom?
  • In what context will the ethical principles be shaped today?
  • How will the transformation from the current to that future end state be supported, paid for and delivered?
  • Who will oversee new policies and ensure good data science practices, protection and ethics are applied in practice?

This datasharing consultation is not entirely for something new, but expansion of what is done already. And in some places is done very badly.

How will the old stories and new be reconciled?

Wearing my privacy and public engagement hats, here’s an idea.

Perhaps before the central State starts collecting more, sharing more, and using more of our personal data for ‘tailored public services’ and more, the government should ask for a data amnesty?

It’s time to draw a line under bad practice.  Clear out the ethics drawers of bad historical practice, and start again, with a fresh chapter. Because current practices are not future-proofed and covering them up in the language of ‘better data ethics’ will fail.

The consultation assures us that: “These proposals are not about selling public or personal data, collecting new data from citizens or weakening the Data Protection Act 1998.”

However it does already sell out personal data from at least BIS. How will these contradictory positions across all Departments be resolved?

The left hand gives out de-identified data in safe settings for public benefit research while the right hands out over 10 million records to the Telegraph and The Times without parental or schools’ consent. Only in la-la land are these both considered ethical.

Will somebody at the data sharing meeting please ask, “when will this stop?” It is wrong. These are our individual children’s identifiable personal data. Stop giving them away to press and charities and commercial users without informed consent. It’s ludicrous. Yet it is real.

Policy makers should provide an assurance there are plans for this to change as part of this consultation.

Without it, the consultation line about commercial use, is at best disingenuous, at worst a bare cheeked lie.

“These powers will also ensure we can improve the safe handling of citizen data by bringing consistency and improved safeguards to the way it is handled.”

Will it? Show me how and I might believe it.

Privacy, it was said at the RSS event, is the biggest concern in this consultation:

“includes proposals to expand the use of appropriate and ethical data science techniques to help tailor interventions to the public”

“also to start fixing government’s data infrastructure to better support public services.”

The techniques need outlined what they mean, and practices fixed now, because many stand on shaky legal ground. These privacy issues have come about over cumulative governments of different parties in the last ten years, so the problems are non-partisan, but need practical fixes.

Today, less than transparent international agreements push ‘very far-reaching chapters on the liberalisation of data trading’ while according to the European Court of Justice these practices lack a solid legal basis.

Today our government already gives our children’s personal data to commercial third parties and sells our higher education data without informed consent, while the DfE and BIS both know they fail processing and its potential consequences: the European Court reaffirmed in 2015 “persons whose personal data are subject to transfer and processing between two public administrative bodies must be informed in advance” in Judgment in Case C-201/14.

In a time that actively cultivates universal public fear,  it is time for individuals to be brave and ask the awkward questions because you either solve them up front, or hit the problems later. The child who stood up and said The Emperor has on no clothes, was right.

What’s missing?

The consultation conversation will only be genuine, once the policy makers acknowledge and address solutions regards:

  1. those data practices that are currently unethical and must change
  2. how the tailored public services datasharing legislation will shape the delivery of government services’ infrastructure and staff, as well as the service to the individual in the public.

If we start by understanding what the happy ending looks like, we are much more likely to arrive there, and how to measure success.

The datasharing consultation engagement, the ethics of data science, and impact on data infrastructures as part of ‘government as a platform’ need seen as a whole joined up story if we are each to consider what success for us as stakeholders, looks like.

We need to call out current data failings and things that are missing, to get them fixed.

Without a strong, consistent ethical framework you risk 3 things:

  1. data misuse and loss of public trust
  2. data non-use because your staff don’t trust they’re doing it right
  3. data is becoming a toxic asset

The upcoming meetings should address this and ask practically:

  1. How the codes of conduct, and ethics, are to be shaped, and by whom, if outwith the consultation?
  2. What is planned to manage and pay for the future changes in our data infrastructures;  ie the models of local government delivery?
  3. What is the happy ending that each data strand wants to achieve through this and how will the success criteria be measured?

Public benefit is supposed to be at the heart of this change. For UK statistics, for academic public benefit research, they are clear.

For some of the other strands, local public benefits that outweigh the privacy risks and do not jeopardise public trust seem like magical unicorns dancing in the land far, far away of centralised government; hard to imagine, and even harder to capture.

*****

Part one: A data sharing fairytale: Engagement
Part two: A data sharing fairytale: Ethics
Part three: A data sharing fairytale: Impact (this post)

Tailored public bodies review: Feb 2016

img credit: Hermann Vogel illustration ‘Cinderella’