Category Archives: datasharing

Crouching Tiger Hidden Dragon: the making of an IoT trust mark

The Internet of Things (IoT) brings with it unique privacy and security concerns associated with smart technology and its use of data.

  • What would it mean for you to trust an Internet connected product or service and why would you not?
  • What has damaged consumer trust in products and services and why do sellers care?
  • What do we want to see different from today, and what is necessary to bring about that change?

These three pairs of questions implicitly underpinned the intense day of  discussion at the London Zoo last Friday.

The questions went unasked, and could have been voiced before we started, although were probably assumed to be self-evident:

  1. Why do you want one at all [define the problem]?
  2. What needs to change and why [define the future model]?
  3. How do you deliver that and for whom [set out the solution]?

If a group does not agree on the need and drivers for change, there will be no consensus on what that should look like, what the gap is to achieve it, and even less on making it happen.

So who do you want the trustmark to be for, why will anyone want it, and what will need to change to deliver the aims? No one wants a trustmark per se. Perhaps you want what values or promises it embodies to  demonstrate what you stand for, promote good practice, and generate consumer trust. To generate trust, you must be seen to be trustworthy. Will the principles deliver on those goals?

The Open IoT Certification Mark Principles, as a rough draft was the outcome of the day, and are available online.

Here’s my reflections, including what was missing on privacy, and the potential for it to be considered in future.

I’ve structured this first, assuming readers attended the event, at ca 1,000 words. Lists and bullet points. The background comes after that, for anyone interested to read a longer piece.

Many thanks upfront, to fellow participants, to the organisers Alexandra D-S and Usman Haque and the colleague who hosted at the London Zoo. And Usman’s Mum.  I hope there will be more constructive work to follow, and that there is space for civil society to play a supporting role and critical friend.


The mark didn’t aim to fix the IoT in a day, but deliver something better for product and service users, by those IoT companies and providers who want to sign up. Here is what I took away.

I learned three things

  1. A sense of privacy is not homogenous, even within people who like and care about privacy in theoretical and applied ways. (I very much look forward to reading suggestions promised by fellow participants, even if enforced personal openness and ‘watching the watchers’ may mean ‘privacy is theft‘.)
  2. Awareness of current data protection regulations needs improved in the field. For example, Subject Access Requests already apply to all data controllers, public and private. Few have read the GDPR, or the e-Privacy directive, despite importance for security measures in personal devices, relevant for IoT.
  3. I truly love working on this stuff, with people who care.

And it reaffirmed things I already knew

  1. Change is hard, no matter in what field.
  2. People working together towards a common goal is brilliant.
  3. Group collaboration can create some brilliantly sharp ideas. Group compromise can blunt them.
  4. Some men are particularly bad at talking over each other, never mind over the women in the conversation. Women notice more. (Note to self: When discussion is passionate, it’s hard to hold back in my own enthusiasm and not do the same myself. To fix.)
  5. The IoT context, and risks within it are not homogenous, but brings new risks and adverseries. The risks for manufacturers and consumers and the rest of the public are different, and cannot be easily solved with a one-size-fits-all solution. But we can try.

Concerns I came away with

  1. If the citizen / customer / individual is to benefit from the IoT trustmark, they must be put first, ahead of companies’ wants.
  2. If the IoT group controls both the design, assessment to adherence and the definition of success, how objective will it be?
  3. The group was not sufficiently diverse and as a result, reflects too little on the risks and impact of the lack of diversity in design and effect, and the implications of dataveillance .
  4. Critical minority thoughts although welcomed, were stripped out from crowdsourced first draft principles in compromise.
  5. More future thinking should be built-in to be robust over time.

IoT adversaries: via Twitter, unknown source

What was missing

There was too little discussion of privacy in perhaps the most important context of IoT – inter connectivity and new adversaries. It’s not only about *your* thing, but things that it speaks to, interacts with, of friends, passersby, the cityscape , and other individual and state actors interested in offense and defense. While we started to discuss it, we did not have the opportunity to discuss sufficiently at depth to be able to get any thinking into applying solutions in the principles.

One of the greatest risks that users face is the ubiquitous collection and storage of data about users that reveal detailed, inter-connected patterns of behaviour and our identity and not seeing how that is used by companies behind the scenes.

What we also missed discussing is not what we see as necessary today, but what we can foresee as necessary for the short term future, brainstorming and crowdsourcing horizon scanning for market needs and changing stakeholder wants.

Future thinking

Here’s the areas of future thinking that smart thinking on the IoT mark could consider.

  1. We are moving towards ever greater requirements to declare identity to use a product or service, to register and log in to use anything at all. How will that change trust in IoT devices?
  2. Single identity sign-on is becoming ever more imposed, and any attempts for multiple presentation of who I am by choice, and dependent on context, therefore restricted. [not all users want to use the same social media credentials for online shopping, with their child’s school app, and their weekend entertainment]
  3. Is this imposition what the public wants or what companies sell us as what customers want in the name of convenience? What I believe the public would really want is the choice to do neither.
  4. There is increasingly no private space or time, at places of work.
  5. Limitations on private space are encroaching in secret in all public city spaces. How will ‘handoffs’ affect privacy in the IoT?
  6. Public sector (connected) services are likely to need even more exacting standards than single home services.
  7. There is too little understanding of the social effects of this connectedness and knowledge created, embedded in design.
  8. What effects may there be on the perception of the IoT as a whole, if predictive data analysis and complex machine learning and AI hidden in black boxes becomes more commonplace and not every company wants to be or can be open-by-design?
  9. Ubiquitous collection and storage of data about users that reveal detailed, inter-connected patterns of behaviour and our identity needs greater commitments to disclosure. Where the hand-offs are to other devices, and whatever else is in the surrounding ecosystem, who has responsibility for communicating interaction through privacy notices, or defining legitimate interests, where the data joined up may be much more revealing than stand-alone data in each silo?
  10. Define with greater clarity the privacy threat models for different groups of stakeholders and address the principles for each.

What would better look like?

The draft privacy principles are a start, but they’re not yet aspirational as I would have hoped. Of course the principles will only be adopted if possible, practical and by those who choose to. But where is the differentiator from what everyone is required to do, and better than the bare minimum? How will you sell this to consumers as new? How would you like your child to be treated?

The wording in these 5 bullet points, is the first crowdsourced starting point.

  • The supplier of this product or service MUST be General Data Protection Regulation (GDPR) compliant.
  • This product SHALL NOT disclose data to third parties without my knowledge.
  • I SHOULD get full access to all the data collected about me.
  • I MAY operate this device without connecting to the internet.
  • My data SHALL NOT be used for profiling, marketing or advertising without transparent disclosure.

Yes other points that came under security address some of the crossover between privacy and surveillance risks, but there is as yet little substantial that is aspirational to make the IoT mark a real differentiator in terms of privacy. An opportunity remains.

It was that and how young people perceive privacy that I hoped to bring to the table. Because if manufacturers are serious about future success, they cannot ignore today’s children and how they feel. How you treat them today, will shape future purchasers and their purchasing, and there is evidence you are getting it wrong.

The timing is good in that it now also offers the opportunity to promote consistent understanding, and embed the language of GDPR and ePrivacy regulations into consistent and compatible language in policy and practice in the #IoTmark principles.

User rights I would like to see considered

These are some of the points I would think privacy by design would mean. This would better articulate GDPR Article 25 to consumers.

Data sovereignty is a good concept and I believe should be considered for inclusion in explanatory blurb before any agreed privacy principles.

  1. Goods should by ‘dumb* by default’ until the smart functionality is switched on. [*As our group chair/scribe called it]  I would describe this as, “off is the default setting out-of-the-box”.
  2. Privact by design. Deniability by default. i.e. not only after opt out, but a company should not access the personal or identifying purchase data of anyone who opts out of data collection about their product/service use during the set up process.
  3. The right to opt out of data collection at a later date while continuing to use services.
  4. A right to object to the sale or transfer of behavioural data, including to third-party ad networks and absolute opt-in on company transfer of ownership.
  5. A requirement that advertising should be targeted to content, [user bought fridge A] not through jigsaw data held on users by the company [how user uses fridge A, B, C and related behaviour].
  6. An absolute rejection of using children’s personal data gathered to target advertising and marketing at children

Background: Starting points before privacy

After a brief recap on 5 years ago, we heard two talks.

The first was a presentation from Bosch. They used the insights from the IoT open definition from 5 years ago in their IoT thinking and embedded it in their brand book. The presenter suggested that in five years time, every fridge Bosch sells will be ‘smart’. And the  second was a fascinating presentation, of both EU thinking and the intellectual nudge to think beyond the practical and think what kind of society we want to see using the IoT in future. Hints of hardcore ethics and philosophy that made my brain fizz from , soon to retire from the European Commission.

The principles of open sourcing, manufacturing, and sustainable life cycle were debated in the afternoon with intense arguments and clearly knowledgeable participants, including those who were quiet.  But while the group had assigned security, and started work on it weeks before, there was no one pre-assigned to privacy. For me, that said something. If they are serious about those who earn the trustmark being better for customers than their competition, then there needs to be greater emphasis on thinking like their customers, and by their customers, and what use the mark will be to customers, not companies. Plan early public engagement and testing into the design of this IoT mark, and make that testing open and diverse.

To that end, I believe it needed to be articulated more strongly, that sustainable public trust is the primary goal of the principles.

  • Trust that my device will not become unusable or worthless through updates or lack of them.
  • Trust that my device is manufactured safely and ethically and with thought given to end of life and the environment.
  • Trust that my source components are of high standards.
  • Trust in what data and how that data is gathered and used by the manufacturers.

Fundamental to ‘smart’ devices is their connection to the Internet, and so the last for me, is therefore key to successful public perception and it actually making a difference, beyond the PR value to companies. The value-add must be measured from consumers point of view.

All the openness about design functions and practice improvements, without attempting to change privacy infringing practices, may be wasted effort. Why? Because the perceived benefit of the value of the mark, will be proportionate to what risks it is seen to mitigate.

Why?

Because I assume that you know where your source components come from today. I was shocked to find out not all do and that ‘one degree removed’ is going to be an improvement? Holy cow, I thought. What about regulatory requirements for product safety recalls? These differ of course for different product areas, but I was still surprised. Having worked in global Fast Moving Consumer Goods (FMCG) and food industry, semiconductor and optoelectronics, and medical devices it was self-evident for me, that sourcing is rigorous. So that new requirement to know one degree removed, was a suggested minimum. But it might shock consumers to know there is not usually more by default.

Customers also believe they have reasonable expectations of not being screwed by a product update, left with something that does not work because of its computing based components. The public can take vocal, reputation-damaging action when they are let down.

In the last year alone, some of the more notable press stories include a manufacturer denying service, telling customers, “Your unit will be denied server connection,” after a critical product review. Customer support at Jawbone came in for criticism after reported failings. And even Apple has had problems in rolling out major updates.

While these are visible, the full extent of the overreach of company market and product surveillance into our whole lives, not just our living rooms, is yet to become understood by the general population. What will happen when it is?

The Internet of Things is exacerbating the power imbalance between consumers and companies, between government and citizens. As Wendy Grossman wrote recently, in one sense this may make privacy advocates’ jobs easier. It was always hard to explain why “privacy” mattered. Power, people understand.

That public discussion is long overdue. If open principles on IoT devices mean that the signed-up companies differentiate themselves by becoming market leaders in transparency, it will be a great thing. Companies need to offer full disclosure of data use in any privacy notices in clear, plain language  under GDPR anyway, but to go beyond that, and offer customers fair presentation of both risks and customer benefits, will not only be a point-of-sales benefit, but potentially improve digital literacy in customers too.

The morning discussion touched quite often on pay-for-privacy models. While product makers may see this as offering a good thing, I strove to bring discussion back to first principles.

Privacy is a human right. There can be no ethical model of discrimination based on any non-consensual invasion of privacy. Privacy is not something I should pay to have. You should not design products that reduce my rights. GDPR requires privacy-by-design and data protection by default. Now is that chance for IoT manufacturers to lead that shift towards higher standards.

We also need a new ethics thinking on acceptable fair use. It won’t change overnight, and perfect may be the enemy of better. But it’s not a battle that companies should think consumers have lost. Human rights and information security should not be on the battlefield at all in the war to win customer loyalty.  Now is the time to do better, to be better, demand better for us and in particular, for our children.

Privacy will be a genuine market differentiator

If manufacturers do not want to change their approach to exploiting customer data, they are unlikely to be seen to have changed.

Today feelings that people in US and Europe reflect in surveys are loss of empowerment, feeling helpless, and feeling used. That will shift to shock, resentment, and any change curve will predict, anger.

A 2014 survey for the Royal Statistical Society by Ipsos MORI, found that trust in institutions to use data is much lower than trust in them in general.

“The poll of just over two thousand British adults carried out by Ipsos MORI found that the media, internet services such as social media and search engines and telecommunication companies were the least trusted to use personal data appropriately.” [2014, Data trust deficit with lessons for policymakers, Royal Statistical Society]

In the British student population, one 2015 survey of university applicants in England, found of 37,000 who responded, the vast majority of UCAS applicants agree that sharing personal data can benefit them and support public benefit research into university admissions, but they want to stay firmly in control. 90% of respondents said they wanted to be asked for their consent before their personal data is provided outside of the admissions service.

In 2010, a multi method model of research with young people aged 14-18, by the Royal Society of Engineering, found that, “despite their openness to social networking, the Facebook generation have real concerns about the privacy of their medical records.” [2010, Privacy and Prejudice, RAE, Wellcome]

When people use privacy settings on Facebook set to maximum, they believe they get privacy, and understand little of what that means behind the scenes.

Are there tools designed by others, like Projects by If licenses, and ways this can be done, that you’re not even considering yet?

What if you don’t do it?

“But do you feel like you have privacy today?” I was asked the question in the afternoon. How do people feel today, and does it matter? Companies exploiting consumer data and getting caught doing things the public don’t expect with their data, has repeatedly damaged consumer trust. Data breaches and lack of information security have damaged consumer trust. Both cause reputational harm. Damage to reputation can harm customer loyalty. Damage to customer loyalty costs sales, profit and upsets the Board.

Where overreach into our living rooms has raised awareness of invasive data collection, we are yet to be able to see and understand the invasion of privacy into our thinking and nudge behaviour, into our perception of the world on social media, the effects on decision making that data analytics is enabling as data shows companies ‘how we think’, granting companies access to human minds in the abstract, even before Facebook is there in the flesh.

Governments want to see how we think too, and is thought crime really that far away using database labels of ‘domestic extremists’ for activists and anti-fracking campaigners, or the growing weight of policy makers attention given to predpol, predictive analytics, the [formerly] Cabinet Office Nudge Unit, Google DeepMind et al?

Had the internet remained decentralized the debate may be different.

I am starting to think of the IoT not as the Internet of Things, but as the Internet of Tracking. If some have their way, it will be the Internet of Thinking.

Considering our centralised Internet of Things model, our personal data from human interactions has become the network infrastructure, and data flows, are controlled by others. Our brains are the new data servers.

In the Internet of Tracking, people become the end nodes, not things.

And it is this where the future users will be so important. Do you understand and plan for factors that will drive push back, and crash of consumer confidence in your products, and take it seriously?

Companies have a choice to act as Empires would – multinationals, joining up even on low levels, disempowering individuals and sucking knowledge and power at the centre. Or they can act as Nation states ensuring citizens keep their sovereignty and control over a selected sense of self.

Look at Brexit. Look at the GE2017. Tell me, what do you see is the direction of travel? Companies can fight it, but will not defeat how people feel. No matter how much they hope ‘nudge’ and predictive analytics might give them this power, the people can take back control.

What might this desire to take-back-control mean for future consumer models? The afternoon discussion whilst intense, reached fairly simplistic concluding statements on privacy. We could have done with at least another hour.

Some in the group were frustrated “we seem to be going backwards” in current approaches to privacy and with GDPR.

But if the current legislation is reactive because companies have misbehaved, how will that be rectified for future? The challenge in the IoT both in terms of security and privacy, AND in terms of public perception and reputation management, is that you are dependent on the behaviours of the network, and those around you. Good and bad. And bad practices by one, can endanger others, in all senses.

If you believe that is going back to reclaim a growing sense of citizens’ rights, rather than accepting companies have the outsourced power to control the rights of others, that may be true.

There was a first principle asked whether any element on privacy was needed at all, if the text was simply to state, that the supplier of this product or service must be General Data Protection Regulation (GDPR) compliant. The GDPR was years in the making after all. Does it matter more in the IoT and in what ways? The room tended, understandably, to talk about it from the company perspective.  “We can’t” “won’t” “that would stop us from XYZ.” Privacy would however be better addressed from the personal point of view.

What do people want?

From the company point of view, the language is different and holds clues. Openness, control, and user choice and pay for privacy are not the same thing as the basic human right to be left alone. Afternoon discussion reminded me of the 2014 WAPO article, discussing Mark Zuckerberg’s theory of privacy and a Palo Alto meeting at Facebook:

“Not one person ever uttered the word “privacy” in their responses to us. Instead, they talked about “user control” or “user options” or promoted the “openness of the platform.” It was as if a memo had been circulated that morning instructing them never to use the word “privacy.””

In the afternoon working group on privacy, there was robust discussion whether we had consensus on what privacy even means. Words like autonomy, control, and choice came up a lot. But it was only a beginning. There is opportunity for better. An academic voice raised the concept of sovereignty with which I agreed, but how and where  to fit it into wording, which is at once both minimal and applied, and under a scribe who appeared frustrated and wanted a completely different approach from what he heard across the group, meant it was left out.

This group do care about privacy. But I wasn’t convinced that the room cared in the way that the public as a whole does, but rather only as consumers and customers do. But IoT products will affect potentially everyone, even those who do not buy your stuff. Everyone in that room, agreed on one thing. The status quo is not good enough. What we did not agree on, was why, and what was the minimum change needed to make a enough of a difference that matters.

I share the deep concerns of many child rights academics who see the harm that efforts to avoid restrictions Article 8 the GDPR will impose. It is likely to be damaging for children’s right to access information, be discriminatory according to parents’ prejudices or socio-economic status, and ‘cheating’ – requiring secrecy rather than privacy, in attempts to hide or work round the stringent system.

In ‘The Class’ the research showed, ” teachers and young people have a lot invested in keeping their spheres of interest and identity separate, under their autonomous control, and away from the scrutiny of each other.” [2016, Livingstone and Sefton-Green, p235]

Employers require staff use devices with single sign including web and activity tracking and monitoring software. Employee personal data and employment data are blended. Who owns that data, what rights will employees have to refuse what they see as excessive, and is it manageable given the power imbalance between employer and employee?

What is this doing in the classroom and boardroom for stress, anxiety, performance and system and social avoidance strategies?

A desire for convenience creates shortcuts, and these are often met using systems that require a sign-on through the platforms giants: Google, Facebook, Twitter, et al. But we are kept in the dark how by using these platforms, that gives access to them, and the companies, to see how our online and offline activity is all joined up.

Any illusion of privacy we maintain, we discussed, is not choice or control if based on ignorance, and backlash against companies lack of efforts to ensure disclosure and understanding is growing.

“The lack of accountability isn’t just troubling from a philosophical perspective. It’s dangerous in a political climate where people are pushing back at the very idea of globalization. There’s no industry more globalized than tech, and no industry more vulnerable to a potential backlash.”

[Maciej Ceglowski, Notes from an Emergency, talk at re.publica]

Why do users need you to know about them?

If your connected *thing* requires registration, why does it? How about a commitment to not forcing one of these registration methods or indeed any at all? Social Media Research by Pew Research in 2016 found that  56% of smartphone owners ages 18 to 29 use auto-delete apps, more than four times the share among those 30-49 (13%) and six times the share among those 50 or older (9%).

Does that tell us anything about the demographics of data retention preferences?

In 2012, they suggested social media has changed the public discussion about managing “privacy” online. When asked, people say that privacy is important to them; when observed, people’s actions seem to suggest otherwise.

Does that tell us anything about how well companies communicate to consumers how their data is used and what rights they have?

There is also data with strong indications about how women act to protect their privacy more but when it comes to basic privacy settings, users of all ages are equally likely to choose a private, semi-private or public setting for their profile. There are no significant variations across age groups in the US sample.

Now think about why that matters for the IoT? I wonder who makes the bulk of purchasing decsions about household white goods for example and has Bosch factored that into their smart-fridges-only decision?

Do you *need* to know who the user is? Can the smart user choose to stay anonymous at all?

The day’s morning challenge was to attend more than one interesting discussion happening at the same time. As invariably happens, the session notes and quotes are always out of context and can’t possibly capture everything, no matter how amazing the volunteer (with thanks!). But here are some of the discussion points from the session on the body and health devices, the home, and privacy. It also included a discussion on racial discrimination, algorithmic bias, and the reasons why care.data failed patients and failed as a programme. We had lengthy discussion on ethics and privacy: smart meters, objections to models of price discrimination, and why pay-for-privacy harms the poor by design.

Smart meter data can track the use of unique appliances inside a person’s home and intimate patterns of behaviour. Information about our consumption of power, what and when every day, reveals  personal details about everyday lives, our interactions with others, and personal habits.

Why should company convenience come above the consumer’s? Why should government powers, trump personal rights?

Smart meter is among the knowledge that government is exploiting, without consent, to discover a whole range of issues, including ensuring that “Troubled Families are identified”. Knowing how dodgy some of the school behaviour data might be, that helps define who is “troubled” there is a real question here, is this sound data science? How are errors identified? What about privacy? It’s not your policy, but if it is your product, what are your responsibilities?

If companies do not respect children’s rights,  you’d better shape up to be GDPR compliant

For children and young people, more vulnerable to nudge, and while developing their sense of self can involve forming, and questioning their identity, these influences need oversight or be avoided.

In terms of GDPR, providers are going to pay particular attention to Article 8 ‘information society services’ and parental consent, Article 17 on profiling,  and rights to restriction of processing (19) right to erasure in recital 65 and rights to portability. (20) However, they  may need to simply reassess their exploitation of children and young people’s personal data and behavioural data. Article 57 requires special attention to be paid by regulators to activities specifically targeted at children, as ‘vulnerable natural persons’ of recital 75.

Human Rights, regulations and conventions overlap in similar principles that demand respect for a child, and right to be let alone:

(a) The development of the child ‘s personality, talents and mental and physical abilities to their fullest potential;

(b) The development of respect for human rights and fundamental freedoms, and for the principles enshrined in the Charter of the United Nations.

A weakness of the GDPR is that it allows derogation on age and will create inequality and inconsistency  for children as a result. By comparison Article one of the Convention on the Rights of the Child (CRC) defines who is to be considered a “child” for the purposes of the CRC, and states that: “For the purposes of the present Convention, a child means every human being below the age of eighteen years unless, under the law applicable to the child, majority is attained earlier.”<

Article two of the CRC says that States Parties shall respect and ensure the rights set forth in the present Convention to each child within their jurisdiction without discrimination of any kind.

CRC Article 16 says that no child shall be subjected to arbitrary or unlawful interference with his or her honour and reputation.

Article 8 CRC requires respect for the right of the child to preserve his or her identity […] without unlawful interference.

Article 12 CRC demands States Parties shall assure to the child who is capable of forming his or her own views the right to express those views freely in all matters affecting the child, the views of the child being given due weight in accordance with the age and maturity of the child.

That stands in potential conflict with GDPR article 8. There is much on GDPR on derogations by country, and or children, still to be set.

What next for our data in the wild

Hosting the event at the zoo offered added animals, and during a lunch tour we got out on a tour, kindly hosted by a fellow participant. We learned how smart technology was embedded in some of the animal enclosures, and work on temperature sensors with penguins for example. I love tigers, so it was a bonus that we got to see such beautiful and powerful animals up close, if a little sad for their circumstances and as a general basic principle, seeing big animals caged as opposed to in-the-wild.

Freedom is a common desire in all animals. Physical, mental, and freedom from control by others.

I think any manufacturer that underestimates this element of human instinct is ignoring the ‘hidden dragon’ that some think is a myth.  Privacy is not dead. It is not extinct, or even unlike the beautiful tigers, endangered. Privacy in the IoT at its most basic, is the right to control our purchasing power. The ultimate people power waiting to be sprung. Truly a crouching tiger. People object to being used and if companies continue to do so without full disclosure, they do so at their peril. Companies seem all-powerful in the battle for privacy, but they are not.  Even insurers and data brokers must be fair and lawful, and it is for regulators to ensure that practices meet the law.

When consumers realise our data, our purchasing power has the potential to control, not be controlled, that balance will shift.

“Paper tigers” are superficially powerful but are prone to overextension that leads to sudden collapse. If that happens to the superficially powerful companies that choose unethical and bad practice, as a result of better data privacy and data ethics, then bring it on.

I hope that the IoT mark can champion best practices and make a difference to benefit everyone.

While the companies involved in its design may be interested in consumers, I believe it could be better for everyone, done well. The great thing about the efforts into an #IoTmark is that it is a collective effort to improve the whole ecosystem.

I hope more companies will realise their privacy rights and ethical responsibility in the world to all people, including those interested in just being, those who want to be let alone, and not just those buying.

“If a cat is called a tiger it can easily be dismissed as a paper tiger; the question remains however why one was so scared of the cat in the first place.”

The Resistance to Theory (1982), Paul de Man

Further reading: Networks of Control – A Report on Corporate Surveillance, Digital Tracking, Big Data & Privacy by Wolfie Christl and Sarah Spiekermann

Failing a generation is not what post-Brexit Britain needs

Basically Britain needs Prof. Brian Cox shaping education policy:

“If it were up to me I would increase pay and conditions and levels of responsibility and respect significantly, because it is an investment that would pay itself back many times over in the decades to come.”

Don’t use children as ‘measurement probes’ to test schools

What effect does using school exam results to reform the school system have on children? And what effect does it have on society?

Last autumn Ofqual published a report and their study on consistency of exam marking and metrics.

The report concluded that half of pupils in English Literature, as an example, are not awarded the “correct” grade on a particular exam paper due to marking inconsistencies and the design of the tests.
Given the complexity and sensitivity of the data, Ofqual concluded, it is essential that the metrics stand up to scrutiny and that there is a very clear understanding behind the meaning and application of any quality of marking.  They wrote that, “there are dangers that information from metrics (particularly when related to grade boundaries) could be used out of context.”

Context and accuracy are fundamental to the value of and trust in these tests. And at the moment, trust is not high in the system behind it. There must also be trust in policy behind the system.

This summer two sets of UK school tests, will come under scrutiny. GCSEs and SATS. The goal posts are moving for children and schools across the country. And it’s bad for children and bad for Britain.

Grades A-G will be swapped for numbers 1 -9

GCSE sitting 15-16 year olds will see their exams shift to a numerical system, scoring from the highest Grade 9 to Grade 1, with the three top grades replacing the current A and A*. The alphabetical grading system will be fully phased out by 2019.

The plans intended that roughly the same proportion of students as have achieved a Grade C will be awarded a new Grade 4 and as Schools Week reported: “There will be two GCSE pass rates in school performance tables.”

One will measure grade 5s or above, and this will be called the ‘strong’ pass rate. And the other will measure grade 4s or above, and this will be the ‘standard’ pass rate.

Laura McInerney summed up, “in some senses, it’s not a bad idea as it will mean it is easier to see if the measures are comparable. We can check if the ‘standard’ rate is better or worse over the next few years. (This is particularly good for the DfE who have been told off by the government watchdog for fiddling about with data so much that no one can tell if anything has worked anymore).”

There’s plenty of confusion in parents, how the numerical grading system will work. The confusion you can gauge in playground conversations, is also reflected nationally in a more measurable way.

Market research in a range of audiences – including businesses, head teachers, universities, colleges, parents and pupils – found that just 31 per cent of secondary school pupils and 30 per cent of parents were clear on the new numerical grading system.

So that’s a change in the GCSE grading structure. But why? If more differentiators are needed, why not add one or two more letters and shift grade boundaries? A policy need for these changes is unclear.

Machine marking is training on ten year olds

I wonder if any of the shift to numerical marking, is due in any part to a desire to move GCSEs in future to machine marking?

This year, ten and eleven year olds, children in their last year of primary school, will have their SATs tests computer marked.

That’s everything in maths and English. Not multiple choice papers or one word answers, but full written responses. If their f, b or g doesn’t look like the correct  letter in the correct place in the sentence, then it gains no marks.

Parents are concerned about children whose handwriting is awful, but their knowledge is not. How well can they hope to be assessed? If exams are increasingly machine marked out of sight, many sent to India, where is our oversight of the marking process and accuracy?

The concerns I’ve heard simply among local parents and staff, seem reflected in national discussions and the assessor, Oftsed. TES has reported Ofsted’s most senior officials as saying that the inspectorate is just as reluctant to use this year’s writing assessments as it was in 2016. Teachers and parents locally are united in feeling it is not accurate, not fair, and not right.

The content is also to be tougher.

How will we know what is being accurately measured and the accuracy of the metrics with content changes at the same time? How will we know if children didn’t make the mark, or if the marks were simply not awarded?

The accountability of the process is less than transparent to pupils and parents. We have little opportunity for Ofqual’s recommended scrutiny of these metrics, or the data behind the system on our kids.

Causation, correlation and why we should care

The real risk is that no one will be able to tell if there is an error, where it stems from, and where there is a reason if pass rates should be markedly different from what was expected.

After the wide range of changes across pupil attainment, exam content, school progress scores, and their interaction and dependencies, can they all fit together and be comparable with the past at all?

If the SATS are making lots of mistakes simply due to being bad at reading ten year’ old’s handwriting, how will we know?

Or if GCSE scores are lower, will we be able to see if it is because they have genuinely differentiated the results in a wider spread, and stretched out the fail, pass and top passes more strictly than before?

What is likely, is that this year’s set of children who were expecting As and A star at GCSE but fail to be the one of the two children nationally who get the new grade 9, will be disappointed to feel they are not, after all, as great as they thought they were.

And next year, if you can’t be the one or two to get the top mark, will the best simply stop stretching themselves and rest a bit easier, because, whatever, you won’t get that straight grade As anyway?

Even if children would not change behaviours were they to know, the target range scoring sent by third party data processors to schools, discourages teachers from stretching those at the top.

Politicians look for positive progress, but policies are changing that will increase the number of schools deemed to have failed. Why?

Our children’s results are being used to reform the school system.

Coasting and failing schools can be compelled to become academies.

Government policy on this forced academisation was rejected by popular revolt. It appears that the government is determined that schools *will* become academies with the same fervour that they *will* re-introduce grammar schools. Both are unevidenced and unwanted. But there is a workaround.  Create evidence. Make the successful scores harder to achieve, and more will be seen to fail.

A total of 282 secondary schools in England were deemed to be failing by the government this January, as they “have not met a new set of national standards”.

It is expected that even more will attain ‘less’ this summer. Tim Leunig, Chief Analyst & Chief Scientific Adviser Department for Education, made a personal guess at two reaching the top mark.

The context of this GCSE ‘failure’ is the changes in how schools are measured. Children’s progress over 8 subjects, or “P8” is being used as an accountability measure of overall school quality.

But it’s really just: “a school’s average Attainment 8 score adjusted for pupils’ Key Stage 2 attainment.” [Dave Thomson, Education Datalab]

Work done by FFT Education Datalab showed that contextualising P8 scores can lead to large changes for some schools.  (Read more here and here). You cannot meaningfully compare schools with different types of intake, but it appears that the government is determined to do so. Starting ever younger if new plans go ahead.

Data is being reshaped to tell stories to fit to policy.

Shaping children’s future

What this reshaping doesn’t factor in at all, is the labelling of a generation or more, with personal failure, from age ten and up.

All this tinkering with the data, isn’t just data.

It’s tinkering badly with our kids sense of self, their sense of achievement, aspiration, and with that; the country’s future.

Education reform has become the aim, and it has replaced the aims of education.

Post-Brexit Britain doesn’t need policy that delivers ideology. We don’t need “to use children as ‘measurement probes’ to test schools.

Just as we shouldn’t use children’s educational path to test their net worth or cost to the economy. Or predict it in future.

Children’s education and human value cannot be measured in data.

Google Family Link for Under 13s: children’s privacy friend or faux?

“With the Family Link app from Google, you can stay in the loop as your kid explores on their Android* device. Family Link lets you create a Google Account for your kid that’s like your account, while also helping you set certain digital ground rules that work for your family — like managing the apps your kid can use, keeping an eye on screen time, and setting a bedtime on your kid’s device.”


John Carr shared his blog post about the Google Family Link today which was the first I had read about the new US account in beta. In his post, with an eye on GDPR, he asks, what is the right thing to do?

What is the Family Link app?

Family Link requires a US based google account to sign up, so outside the US we can’t read the full details. However from what is published online, it appears to offer the following three key features:

“Approve or block the apps your kid wants to download from the Google Play Store.

Keep an eye on screen time. See how much time your kid spends on their favorite apps with weekly or monthly activity reports, and set daily screen time limits for their device. “

and

“Set device bedtime: Remotely lock your kid’s device when it’s time to play, study, or sleep.”

From the privacy and disclosure information it reads that there is not a lot of difference between a regular (over 13s) Google account and this one for under 13s. To collect data from under 13s it must be compliant with COPPA legislation.

If you google “what is COPPA” the first result says, The Children’s Online Privacy Protection Act (COPPA) is a law created to protect the privacy of children under 13.”

But does this Google Family Link do that? What safeguards and controls are in place for use of this app and children’s privacy?

What data does it capture?

“In order to create a Google Account for your child, you must review the Disclosure (including the Privacy Notice) and the Google Privacy Policy, and give consent by authorizing a $0.30 charge on your credit card.”

Google captures the parent’s verified real-life credit card data.

Google captures child’s name, date of birth and email.

Google captures voice.

Google captures location.

Google may associate your child’s phone number with their account.

And lots more:

Google automatically collects and stores certain information about the services a child uses and how a child uses them, including when they save a picture in Google Photos, enter a query in Google Search, create a document in Google Drive, talk to the Google Assistant, or watch a video in YouTube Kids.

What does it offer over regular “13+ Google”?

In terms of general safeguarding, it doesn’t appear that SafeSearch is on by default but must be set and enforced by a parent.

Parents should “review and adjust your child’s Google Play settings based on what you think is right for them.”

Google rightly points out however that, “filters like SafeSearch are not perfect, so explicit, graphic, or other content you may not want your child to see makes it through sometimes.”

Ron Amadeo at Arstechnica wrote a review of the Family Link app back in February, and came to similar conclusions about added safeguarding value:

“Other than not showing “personalized” ads to kids, data collection and storage seems to work just like in a regular Google account. On the “Disclosure for Parents” page, Google notes that “your child’s Google Account will be like your own” and “Most of these products and services have not been designed or tailored for children.” Google won’t do any special content blocking on a kid’s device, so they can still get into plenty of trouble even with a monitored Google account.”

Your child will be able to share information, including photos, videos, audio, and location, publicly and with others, when signed in with their Google Account. And Google wants to see those photos.

There’s some things that parents cannot block at all.

Installs of app updates can’t be controlled, so leave a questionable grey area. Many apps are built on classic bait and switch – start with a free version and then the upgrade contains paid features. This is therefore something to watch for.

“Regardless of the approval settings you choose for your child’s purchases and downloads, you won’t be asked to provide approval in some instances, such as if your child: re-downloads an app or other content; installs an update to an app (even an update that adds content or asks for additional data or permissions); or downloads shared content from your Google Play Family Library. “

The child “will have the ability to change their activity controls, delete their past activity in “My Activity,” and grant app permissions (including things like device location, microphone, or contacts) to third parties”.

What’s in it for children?

You could argue that this gives children “their own accounts” and autonomy. But why do they need one at all? If I give my child a device on which they can download an app, then I approve it first.

If I am not aware of my under 13 year old child’s Internet time physically, then I’m probably not a parent who’s going to care to monitor it much by remote app either. Is there enough insecurity around ‘what children under 13 really do online’, versus what I see or they tell me as a parent, that warrants 24/7 built-in surveillance software?

I can use safe settings without this app. I can use a device time limiting app without creating a Google account for my child.

If parents want to give children an email address, yes, this allows them to have a device linked Gmail account to which you as a parent, cannot access content. But wait a minute, what’s this. Google can?

Google can read their mails and provide them “personalised product features”. More detail is probably needed but this seems clear:

“Our automated systems analyze your child’s content (including emails) to provide your child personally relevant product features, such as customized search results and spam and malware detection.”

And what happens when the under 13s turn 13? It’s questionable that it is right for Google et al. to then be able draw on a pool of ready-made customers’ data in waiting. Free from COPPA ad regulation. Free from COPPA privacy regulation.

Google knows when the child reaches 13 (the set-up requires a child’s date of birth, their first and last name, and email address, to set up the account). And they will inform the child directly when they become eligible to sign up to a regular account free of parental oversight.

What a birthday gift. But is it packaged for the child or Google?

What’s in it for Google?

The parental disclosure begins,

“At Google, your trust is a priority for us.”

If it truly is, I’d suggest they revise their privacy policy entirely.

Google’s disclosure policy also makes parents read a lot before you fully understand the permissions this app gives to Google.

I do not believe Family Link gives parents adequate control of their children’s privacy at all nor does it protect children from predatory practices.

While “Google will not serve personalized ads to your child“, your child “will still see ads while using Google’s services.”

Google also tailors the Family Link apps that the child sees, (and begs you to buy) based on their data:

“(including combining personal information from one service with information, including personal information, from other Google services) to offer them tailored content, such as more relevant app recommendations or search results.”

Contextual advertising using “persistent identifiers” is permitted under COPPA, and is surely a fundamental flaw. It’s certainly one I wouldn’t want to see duplicated under GDPR. Serving up ads that are relevant to the content the child is using, doesn’t protect them from predatory ads at all.

Google captures geolocators and knows where a child is and builds up their behavioural and location patterns. Google, like other online companies, captures and uses what I’ve labelled ‘your synthesised self’; the mix of online and offline identity and behavioural data about a user. In this case, the who and where and what they are doing, are the synthesised selves of under 13 year old children.

These data are made more valuable by the connection to an adult with spending power.

The Google Privacy Policy’s description of how Google services generally use information applies to your child’s Google Account.

Google gains permission via the parent’s acceptance of the privacy policy, to pass personal data around to third parties and affiliates. An affiliate is an entity that belongs to the Google group of companies. Today, that’s a lot of companies.

Google’s ad network consists of Google services, like Search, YouTube and Gmail, as well as 2+ million non-Google websites and apps that partner with Google to show ads.

I also wonder if it will undo some of the previous pro-privacy features on any linked child’s YouTube account if Google links any logged in accounts across the Family Link and YouTube platforms.

Is this pseudo-safe use a good thing?

In practical terms, I’d suggest this app is likely to lull parents into a false sense of security. Privacy safeguarding is not the default set up.

It’s questionable that Google should adopt some sort of parenting role through an app. Parental remote controls via an app isn’t an appropriate way to regulate whether my under 13 year old is using their device, rather than sleeping.

It’s also got to raise questions about children’s autonomy at say, 12. Should I as a parent know exactly every website and app that my child visits? What does that do for parental-child trust and relations?

As for my own children I see no benefit compared with letting them have supervised access as I do already.  That is without compromising my debit card details, or under a false sense of safeguarding. Their online time is based on age appropriate education and trust, and yes I have to manage their viewing time.

That said, if there are people who think parents cannot do that, is the app a step forward? I’m not convinced. It’s definitely of benefit to Google. But for families it feels more like a sop to adults who feel a duty towards safeguarding children, but aren’t sure how to do it.

Is this the best that Google can do by children?

In summary it seems to me that the Family Link app is a free gift from Google. (Well, free after the thirty cents to prove you’re a card-carrying adult.)

It gives parents three key tools: App approval (accept, pay, or block), Screen-time surveillance,  and a remote Switch Off of child’s access.

In return, Google gets access to a valuable data set – a parent-child relationship with credit data attached – and can increase its potential targeted app sales. Yet Google can’t guarantee additional safeguarding, privacy, or benefits for the child while using it.

I think for families and child rights, it’s a false friend. None of these tools per se require a Google account. There are alternatives.

Children’s use of the Internet should not mean they are used and their personal data passed around or traded in hidden back room bidding by the Internet companies, with no hope of control.

There are other technical solutions to age verification and privacy too.

I’d ask, what else has Google considered and discarded?

Is this the best that a cutting edge technology giant can muster?

This isn’t designed to respect children’s rights as intended under COPPA or ready for GDPR, and it’s a shame they’re not trying.

If I were designing Family Link for children, it would collect no real identifiers. No voice. No locators. It would not permit others access to voice or images or need linked. It would keep children’s privacy intact, and enable them when older, to decide what they disclose. It would not target personalised apps/products  at children at all.

GDPR requires active, informed parental consent for children’s online services. It must be revocable, personal data must collect the minimum necessary and be portable. Privacy policies must be clear to children. This, in terms of GDPR readiness, is nowhere near ‘it’.

Family Link needs to re-do their homework. And this isn’t a case of ‘please revise’.

Google is a multi-billion dollar company. If they want parental trust, and want to be GDPR and COPPA compliant, they should do the right thing.

When it comes to child rights, companies must do or do not. There is no try.


image source: ArsTechnica

A vanquished ghost returns as details of distress required in NHS opt out

It seems the ugly ghosts of care.data past were alive and well at NHS Digital this Christmas.

Old style thinking, the top-down patriarchal ‘no one who uses a public service should be allowed to opt out of sharing their records. Nor can people rely on their record being anonymised,‘ that you thought was vanquished, has returned with a vengeance.

The Secretary of State for Health, Jeremy Hunt, has reportedly  done a U-turn on opt out of the transfer of our medical records to third parties without consent.

That backtracks on what he said in Parliament on January 25th, 2014 on opt out of anonymous data transfers, despite the right to object in the NHS constitution [1].

So what’s the solution? If the new opt out methods aren’t working, then back to the old ones and making Section 10 requests? But it seems the Information Centre isn’t keen on making that work either.

All the data the HSCIC holds is sensitive and as such, its release risks patients’ significant harm or distress [2] so it shouldn’t be difficult to tell them to cease and desist, when it comes to data about you.

But how is NHS Digital responding to people who make the effort to write directly?

Someone who “got a very unhelpful reply” is being made to jump through hoops.

If anyone asks that their hospital data should not be used in any format and passed to third parties, that’s surely for them to decide.

Let’s take the case study of a woman who spoke to me during the whole care.data debacle who had been let down by the records system after rape. Her NHS records subsequently about her mental health care were inaccurate, and had led to her being denied the benefit of private health insurance at a new job.

Would she have to detail why selling her medical records would cause her distress? What level of detail is fair and who decides? The whole point is, you want to keep info confidential.

Should you have to state what you fear? “I have future distress, what you might do to me?” Once you lose control of data, it’s gone. Based on past planning secrecy and ideas for the future, like mashing up health data with retail loyalty cards as suggested at Strata in November 2013 [from 16:00] [2] no wonder people are sceptical. 

Given the long list of commercial companies,  charities, think tanks and others that passing out our sensitive data puts at risk and given the Information Centre’s past record, HSCIC might be grateful they have only opt out requests to deal with, and not millions of medical ethics court summonses. So far.

HSCIC / NHS Digital has extracted our identifiable records and has given them away, including for commercial product use, and continues give them away, without informing us. We’ve accepted Ministers’ statements and that a solution would be found. Two years on, patience wears thin.

“Without that external trust, we risk losing our public mandate and then cannot offer the vital insights that quality healthcare requires.”

— Sir Nick Partridge on publication of the audit report of 10% of 3,059 releases by the HSCIC between 2005-13

— Andy WIlliams said, “We want people to be certain their choices will be followed.”

Jeremy Hunt said everyone should be able to opt out of having their anonymised data used. David Cameron did too when the plan was  announced in 2012.

In 2014 the public was told there should be no more surprises. This latest response is not only a surprise but enormously disrespectful.

When you’re trying to rebuild trust, assuming that we accept that ‘is’ the aim, you can’t say one thing, and do another.  Perhaps the Department for Health doesn’t like the public answer to what the public wants from opt out, but that doesn’t make the DH view right.

Perhaps NHS Digital doesn’t want to deal with lots of individual opt out requests, that doesn’t make their refusal right.

Kingsley Manning recognised in July 2014, that the Information Centre “had made big mistakes over the last 10 years.” And there was “a once-in-a-generation chance to get it right.”

I didn’t think I’d have to move into the next one before they fix it.

The recent round of 2016 public feedback was the same as care.data 1.0. Respect nuanced opt outs and you will have all the identifiable public interest research data you want. Solutions must be better for other uses, opt out requests must be respected without distressing patients further in the process, and anonymous must mean  anonymous.

Pseudonymised data requests that go through the DARS process so that a Data Sharing Framework Contract and Data Sharing Agreement are in place are considered to be compliant with the ICO code of practice – fine, but they are not anonymous. If DARS is still giving my family’s data to Experian, Harvey Walsh, and co, despite opt out, I’ll be furious.

The [Caldicott 2] Review Panel found “that commissioners do not need dispensation from confidentiality, human rights & data protection law.

Neither do our politicians, their policies or ALBs.


[1] https://www.england.nhs.uk/ourwork/tsd/ig/ig-fair-process/further-info-gps/

“A patient can object to their confidential personal information from being disclosed out of the GP Practice and/or from being shared onwards by the HSCIC for non-direct care purposes (secondary purposes).”

[2] Minimum Mandatory Measures http://www.nationalarchives.gov.uk/documents/information-management/cross-govt-actions.pdf p7

care.data listening events and consultation: The same notes again?

If lots of things get said in a programme of events, and nothing is left around to read about it, did they happen?

The care.data programme 2014-15 listening exercise and action plan has become impossible to find online. That’s OK, you might think, the programme has been scrapped. Not quite.

You can give your views online until September 7th on the new consultation, “New data security standards and opt-out models for health and social care”  and/or attend the new listening events, September 26th in London, October 3rd in Southampton and October 10th in Leeds.

The Ministerial statement on July 6, announced that NHS England had taken the decision to close the care.data programme after the review of data security and consent by Dame Fiona Caldicott, the National Data Guardian for Health and Care.

But the same questions are being asked again around consent and use of your medical data, from primary and secondary care. What a very long questionnaire asks is in effect,  do you want to keep your medical history private? You can answer only Q 15 if you want.

Ambiguity again surrounds what constitutes “de-identified” patient information.

What is clear is that public voice seems to have been deleted or lost from the care.data programme along with the feedback and brand.

People spoke up in 2014, and acted. The opt out that 1 in 45 people chose between January and March 2014 was put into effect by the HSCIC in April 2016. Now it seems, that might be revoked.

We’ve been here before.  There is no way that primary care data can be extracted without consent without it causing further disruption and damage to public trust and public interest research.  The future plans for linkage between all primary care data and secondary data and genomics for secondary uses, is untenable without consent.

Upcoming events cost time and money and will almost certainly go over the same ground that hours and hours were spent on in 2014. However if they do achieve a meaningful response rate, then I hope the results will not be lost and will be combined with those already captured under the ‘care.data listening events’ responses.  Will they have any impact on what consent model there may be in future?

So what we gonna do? I don’t know, whatcha wanna do? Let’s do something.

Let’s have accredited access and security fixed. While there may now be a higher transparency and process around release, there are still problems about who gets data and what they do with it.

Let’s have clear future scope and control. There is still no plan to give the public rights to control or delete data if we change our minds who can have it or for what purposes. And that is very uncertain. After all, they might decide to privatise or outsource the whole thing as was planned for the CSUs. 

Let’s have answers to everything already asked but unknown. The questions in the previous Caldicott review have still to be answered.

We have the possibility to  see health data used wisely, safely, and with public trust. But we seem stuck with the same notes again. And the public seem to be the last to be invited to participate and views once gathered, seem to be disregarded. I hope to be proved wrong.

Might, perhaps, the consultation deliver the nuanced consent model discussed at public listening exercises that many asked for?

Will the care.data listening events feedback summary be found, and will its 2014 conclusions and the enacted opt out be ignored? Will the new listening event view make more difference than in 2014?

Is public engagement, engagement, if nobody hears what was said?

Datasharing, lawmaking and ethics: power, practice and public policy

“Lawmaking is the Wire, not Schoolhouse Rock. It’s about blood and war and power, not evidence and argument and policy.”

"We can't trust the regulators," they say. "We need to be able to investigate the data for ourselves." Technology seems to provide the perfect solution. Just put it all online - people can go through the data while trusting no one.  There's just one problem. If you can't trust the regulators, what makes you think you can trust the data?" 

Extracts from The Boy Who Could Change the World: The Writings of Aaron Swartz. Chapter: ‘When is Technology Useful? ‘ June 2009.

The question keeps getting asked, is the concept of ethics obsolete in Big Data?

I’ve come to some conclusions why ‘Big Data’ use keeps pushing the boundaries of what many people find acceptable, and yet the people doing the research, the regulators and lawmakers often express surprise at negative reactions. Some even express disdain for public opinion, dismissing it as ignorant, not ‘understanding the benefits’, yet to be convinced. I’ve decided why I think what is considered ‘ethical’ in data science does not meet public expectation.

It’s not about people.

Researchers using large datasets, often have a foundation in data science, applied computing, maths, and don’t see data as people. It’s only data. Creating patterns, correlations, and analysis of individual level data are not seen as research involving human subjects.

This is embodied in the nth number of research ethics reviews I have read in the last year in which the question is asked, does the research involve people? The answer given is invariably ‘no’.

And these data analysts using, let’s say health data, are not working in a subject that is founded on any ethical principle, contrasting with the medical world the data come from.

The public feels differently about the information that is about them, and may be known, only to them or select professionals. The values that we as the public attach to our data  and expectations of its handling may reflect the expectation we have of handling of us as people who are connected to it. We see our data as all about us.

The values that are therefore put on data, and on how it can and should be used, can be at odds with one another, the public perception is not reciprocated by the researchers. This may be especially true if researchers are using data which has been de-identified, although it may not be anonymous.

New legislation on the horizon, the Better Use of Data in Government,  intends to fill the [loop]hole between what was legal to share in the past and what some want to exploit today, and emphasises a gap in the uses of data by public interest, academic researchers, and uses by government actors. The first incorporate by-and-large privacy and anonymisation techniques by design, versus the second designed for applied use of identifiable data.

Government departments and public bodies want to identify and track people who are somehow misaligned with the values of the system; either through fraud, debt, Troubled Families, or owing Student Loans. All highly sensitive subjects. But their ethical data science framework will not treat them as individuals, but only as data subjects. Or as groups who share certain characteristics.

The system again intrinsically fails to see these uses of data as being about individuals, but sees them as categories of people – “fraud” “debt” “Troubled families.” It is designed to profile people.

Services that weren’t built for people, but for government processes, result in datasets used in research, that aren’t well designed for research. So we now see attempts to shoehorn historical practices into data use  by modern data science practitioners, with policy that is shortsighted.

We can’t afford for these things to be so off axis, if civil service thinking is exploring “potential game-changers such as virtual reality for citizens in the autism spectrum, biometrics to reduce fraud, and data science and machine-learning to automate decisions.”

In an organisation such as DWP this must be really well designed since “the scale at which we operate is unprecedented: with 800 locations and 85,000  colleagues, we’re larger than most retail operations.”

The power to affect individual lives through poor technology is vast and some impacts seem to be being badly ignored. The ‘‘real time earnings’ database improved accuracy of benefit payments was widely agreed to have been harmful to some individuals through the Universal Credit scheme, with delayed payments meaning families at foodbanks, and contributing to worse.

“We believe execution is the major job of every business leader,” perhaps not the best wording in on DWP data uses.

What accountability will be built-by design?

I’ve been thinking recently about drawing a social ecological model of personal data empowerment or control. Thinking about visualisation of wants, gaps and consent models, to show rather than tell policy makers where these gaps exist in public perception and expectations, policy and practice. If anyone knows of one on data, please shout. I think it might be helpful.

But the data *is* all about people

Regardless whether they are in front of you or numbers on a screen, big or small datasets using data about real lives are data about people. And that triggers a need to treat the data with an ethical approach as you would people involved face-to-face.

Researchers need to stop treating data about people as meaningless data because that’s not how people think about their own data being used. Not only that, but if the whole point of your big data research is to have impact, your data outcomes, will change lives.

Tosh, I know some say. But, I have argued, the reason being is that the applications of the data science/ research/ policy findings / impact of immigration in education review / [insert purposes of the data user’s choosing] are designed to have impact on people. Often the people about whom the research is done without their knowledge or consent. And while most people say that is OK, where it’s public interest research, the possibilities are outstripping what the public has expressed as acceptable, and few seem to care.

Evidence from public engagement and ethics all say, hidden pigeon-holing, profiling, is unacceptable. Data Protection law has special requirements for it, on autonomous decisions. ‘Profiling’ is now clearly defined under article 4 of the GDPR as ” any form of automated processing of personal data consisting of using those data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements.”

Using big datasets for research that ‘isn’t interested in individuals’ may still intend to create results profiling groups for applied policing, or discriminate, to make knowledge available by location. The data may have been deidentified, but in application becomes no longer anonymous.

Big Data research that results in profiling groups with the intent for applied health policy impacts for good, may by the very point of research, with the intent of improving a particular ethnic minority access to services, for example.

Then look at the voting process changes in North Carolina and see how that same data, the same research knowledge might be applied to exclude, to restrict rights, and to disempower.

Is it possible to have ethical oversight that can protect good data use and protect people’s rights if they conflict with the policy purposes?

The “clear legal basis”is not enough for public trust

Data use can be legal and can still be unethical, harmful and shortsighted in many ways, for both the impacts on research – in terms of withholding data and falsifying data and avoiding the system to avoid giving in data – and the lives it will touch.

What education has to learn from health is whether it will permit the uses by ‘others’ outside education to jeopardise the collection of school data intended in the best interests of children, not the system. In England it must start to analyse what is needed vs wanted. What is necessary and proportionate and justifies maintaining named data indefinitely, exposed to changing scope.

In health, the most recent Caldicott review suggests scope change by design – that is a red line for many: “For that reason the Review recommends that, in due course, the opt-out should not apply to all flows of information into the HSCIC. This requires careful consideration with the primary care community.”

The community spoke out already, and strongly in Spring and Summer 2014 that there must be an absolute right to confidentiality to protect patients’ trust in the system. Scope that ‘sounds’ like it might sneakily change in future, will be a death knell to public interest research, because repeated trust erosion will be fatal.

Laws change to allow scope change without informing people whose data are being used for different purposes

Regulators must be seen to be trusted, if the data they regulate is to be trustworthy. Laws and regulators that plan scope for the future watering down of public protection, water down public trust from today. Unethical policy and practice, will not be saved by pseudo-data-science ethics.

Will those decisions in private political rooms be worth the public cost to research, to policy, and to the lives it will ultimately affect?

What happens when the ethical black holes in policy, lawmaking and practice collide?

At the last UK HealthCamp towards the end of the day, when we discussed the hard things, the topic inevitably moved swiftly to consent, to building big databases, public perception, and why anyone would think there is potential for abuse, when clearly the intended use is good.

The answer came back from one of the participants, “OK now it’s the time to say. Because, Nazis.” Meaning, let’s learn from history.

Given the state of UK politics, Go Home van policies, restaurant raids, the possibility of Trump getting access to UK sensitive data of all sorts from across the Atlantic, given recent policy effects on the rights of the disabled and others, I wonder if we would hear the gentle laughter in the room in answer to the same question today.

With what is reported as Whitehall’s digital leadership sharp change today, the future of digital in government services and policy and lawmaking does indeed seem to be more “about blood and war and power,” than “evidence and argument and policy“.

The concept of ethics in datasharing using public data in the UK is far from becoming obsolete. It has yet to begin.

We have ethical black holes in big data research, in big data policy, and big data practices in England. The conflicts between public interest research and government uses of population wide datasets, how the public perceive the use of our data and how they are used, gaps and tensions in policy and practice are there.

We are simply waiting for the Big Bang. Whether it will be creative, or destructive we are yet to feel.

*****

image credit: LIGO – graphical visualisation of black holes on the discovery of gravitational waves

References:

Report: Caldicott review – National Data Guardian for Health and Care Review of Data Security, Consent and Opt-Outs 2016

Report: The OneWay Mirror: Public attitudes to commercial access to health data

Royal Statistical Society Survey carried out by Ipsos MORI: The Data Trust Deficit

The illusion that might cheat us: ethical data science vision and practice

This blog post is also available as an audio file on soundcloud.


Anais Nin, wrote in her 1946 diary of the dangers she saw in the growth of technology to expand our potential for connectivity through machines, but diminish our genuine connectedness as people. She could hardly have been more contemporary for today:

“This is the illusion that might cheat us of being in touch deeply with the one breathing next to us. The dangerous time when mechanical voices, radios, telephone, take the place of human intimacies, and the concept of being in touch with millions brings a greater and greater poverty in intimacy and human vision.”
[Extract from volume IV 1944-1947]

Echoes from over 70 years ago, can be heard in the more recent comments of entrepreneur Elon Musk. Both are concerned with simulation, a lack of connection between the perceived, and reality, and the jeopardy this presents for humanity. But both also have a dream. A dream based on the positive potential society has.

How will we use our potential?

Data is the connection we all have between us as humans and what machines and their masters know about us. The values that masters underpin their machine design with, will determine the effect the machines and knowledge they deliver, have on society.

In seeking ever greater personalisation, a wider dragnet of data is putting together ever more detailed pieces of information about an individual person. At the same time data science is becoming ever more impersonal in how we treat people as individuals. We risk losing sight of how we respect and treat the very people whom the work should benefit.

Nin grasped the risk that a wider reach, can mean more superficial depth. Facebook might be a model today for the large circle of friends you might gather, but how few you trust with confidences, with personal knowledge about your own personal life, and the privilege it is when someone chooses to entrust that knowledge to you. Machine data mining increasingly tries to get an understanding of depth, and may also add new layers of meaning through profiling, comparing our characteristics with others in risk stratification.
Data science, research using data, is often talked about as if it is something separate from using information from individual people. Yet it is all about exploiting those confidences.

Today as the reach has grown in what is possible for a few people in institutions to gather about most people in the public, whether in scientific research, or in surveillance of different kinds, we hear experts repeatedly talk of the risk of losing the valuable part, the knowledge, the insights that benefit us as society if we can act upon them.

We might know more, but do we know any better? To use a well known quote from her contemporary, T S Eliot, ‘Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?’

What can humans achieve? We don’t yet know our own limits. What don’t we yet know?  We have future priorities we aren’t yet aware of.

To be able to explore the best of what Nin saw as ‘human vision’ and Musk sees in technology, the benefits we have from our connectivity; our collaboration, shared learning; need to be driven with an element of humility, accepting values that shape  boundaries of what we should do, while constantly evolving with what we could do.

The essence of this applied risk is that technology could harm you, more than it helps you. How do we avoid this and develop instead the best of what human vision makes possible? Can we also exceed our own expectations of today, to advance in moral progress?

Continue reading The illusion that might cheat us: ethical data science vision and practice