Data for Policy: Ten takeaways from the conference

The knowledge and thinking on changing technology, the understanding of the computing experts and those familiar with data, must not stay within conference rooms and paywalls.

What role do data and policy play in a world of post-truth politics and press? How will young people become better informed for their future?

The data for policy conference this week, brought together some of the leading names in academia and a range of technologists, government representatives, people from the European Commission, and other global organisations, Think Tanks, civil society groups, companies, and individuals interested in data and statistics. Beyond the UK, speakers came from several other countries in Europe, from the US, South America and Australia.

The schedule was ambitious and wide-ranging in topics. There was brilliant thinking and applications of ideas. Theoretical and methodological discussions were outnumbered by the presentations that included practical applications or work in real-life scenarios using social science data, humanitarian data, urban planning, public population-wide administrative data from health, finance, documenting sexual violence and more. This was good.

We heard about lots of opportunities and applied projects where large datasets are being used to improve the world. But while I always come away from these events having learned something and encouraged to learn more about those I didn’t, I do wonder if the biggest challenges in data and policy aren’t still the simplest.

No matter how much information we have, we must use it wisely. I’ve captured ten takeaways of things I would like to see follow. This may not have been the forum for it.

Ten takeaways on Data-for-Policy

1. Getting beyond the Bubble

All this knowledge must reach beyond the bubble of academia, beyond a select few white-male-experts-in well off parts of the world, and get into the hands and heads of the many. Ways to do this must include reducing the cost or changing  pathways of academic print access. Event and conference fees are also a  barrier to many.

2. Context of accessibility and control

There is little discussion of the importance of context. The nuance of most of these subjects was too much for the length of the sessions but I didn’t hear any single session mention threats to data access and trust in data collection posed by surveillance or state censorship or restriction of access to data or information systems, or the editorial control of knowledge and news by Facebook and co. There was no discussion of the influence of machine manipulators, how bots change news or numbers and create fictitious followings.

Policy makers and public are influenced by the media, post-truth or not. Policy makers in the UK government recently wrote in response to challenge over a Statutory Instrument that if Mums-net wasn’t kicking up  a fuss then they believed the majority of the public were happy. How are policy makers being influenced by press or social media statistics without oversight or regulating for their accuracy?

Increasing data and technology literacy in policy makers, is going to go far beyond improving an understanding of data science.

3. Them and Us

I feel a growing disconnect between those ‘in the know’ and those in ‘the public’. Perhaps that is a side-effect of my own understanding growing about how policy is made, but it goes wider. Those who talked about ‘the public’ did so without mention that attendees are all part of that public. Big data, are often our data. We are the public.

Vast parts of the population feel left behind already by policy and government decision-making; divided by income, Internet access, housing, life opportunites, and the ability to realise our dreams.

How policy makers address this gulf in the short and long term both matter as a foundation for what data infrastructure we have access to, how well trusted it is, whose data are included and who is left out of access to the information or decision-making using it.

Researchers prevented from accessing data held by government departments, perhaps who fear it will be used to criticise rather than help improve policy of the day, may be limiting our true picture of some of this divide and its solutions.

Equally data that is used to implement top-down policy without public involvement, seems a shame to ignore public opinion. I would like to have asked, does GDS in its land survey work searching for free school sites include people surveys asking, do you want a free school in your area at all?

4. There is no neutral

Global trust in politics is in tatters. Trust in the media is as bad. Neither appear to be interested across the world in doing much to restore their integrity.

All the wisdom in the world could not convince a majority in the 23rd June referendum, that the UK should remain in the European Union. This unspoken context was perhaps an aside to most of the subjects of the conference which went beyond the UK,  but we cannot ignore that the UK is deep in political crisis in the world, and at home the Opposition seems to have gone into a tailspin.

What role do data and evidence have in post-truth politics?

It was clear in discussion, that if I mentioned technology and policy in a political context, eyes started to glaze over. Politics should not interfere with the public interest, but it does and cannot be ignored. In fact it is short term political terms and needs for long term vision that are perhaps most at-odds in making good data policy plans.

The concept of public good, is not uncomplicated. It is made more complex still if you factor in changes over time, and cannot ignore that Trump or Turkey are not fictitious backdrops considering who decides what the public good and policy priorities should be.

Researchers’ role in shaping public good is not only about being ethical in their own research, but having the vision to have safeguards in place for how the knowledge they create are used.

5. Ethics is our problem, but who has the solution?

While many speakers touched on the common themes of ethics and privacy in data collection and analytics, saying this is going to be one of our greatest challenges, few address how, and who is taking responsibility and accountability for making it happen in ways that are not left to big business and profit making decision-takers.

It appears from last year, that ethics played a more central role. A year later we now have two new ethical bodies in the UK, at the UK Statistics Authority and at the Turing Institute. How they will influence the wider ethics issues in data science remains to be seen.

Legislation and policy are not keeping pace with the purchasing power or potential of the big players, the Googles and Amazons and Microsofts, and a government that sees anything resulting in economic growth as good, is unlikely to be willing to regulate it.

How technology can be used and how it should be used still seems a far off debate that no one is willing to take on and hold policy makers to account for. Implementing legislation and policy underpinned with ethics must serve as a framework for giving individuals insight into how decisions about them were reached by machines, or the imbalance of power that commercial companies and state agencies have in our lives that comes from insights through privacy invasion.

6. Inclusion and bias

Clearly this is one event in a world of many events that address similar themes, but I do hope that the unequal balance in representation across the many diverse aspects of being human are being addressed elsewhere.  A wider audience must be inclusive. The talk by Jim Waldo on retaining data accuracy while preserving privacy was interesting as it showed how deidentified data can create bias in results if data is very different from the original. Gaps in data, especially using big population data which excludes certain communities, wasn’t something I heard discussed as much.

7.Commercial data sources

Government and governmental organisations appear to be starting to give significant weight to the use of commercial data and social media data sources. I guess any data seen as ‘freely available’ that can be mined seems valuable. I wonder however how this will shape the picture of our populations, with what measures of validity and  whether data are comparable and offer reproducability.

These questions will matter in shaping policy and what governments know about the public. And equally, they must consider those communities whether in the UK or in other countries, that are not represented in these datasets and how these bias decision-making.

8. Data is not a panacea for policy making

Overall my take away is the important role that data scientists have to remind policy makers that data is only information. Nothing new. We may be able to access different sources of data in different ways, and process it faster or differently from the past, but we cannot rely on data of itself to solve the universal problems of the human condition. Data must be of good integrity to be useful and valuable. Data must be only one part of the library of resources to be used in planning policy. The limitations of data must also be understood. The uncertainties and unknowns can be just as important as evidence.

9. Trust and transparency

Regulation and oversight matter but cannot be the only solutions offered to concerns about shaping what is possible to do versus what should be done. Talking about protecting trust is not enough. Organisations must become more trustworthy if trust levels are to change; through better privacy policies, through secure data portability and rights to revoke consent and delete outdated data.

10. Young people and involvement in their future

What inspired me most were the younger attendees presenting posters, especially the PhD student using data to provide evidence of sexual violence in El Salvador and their passion for improving lives.

We are still not talking about how to protect and promote privacy in the Internet of Things, where sensors on every street corner in Smart Cities gather data about where we have been, what we buy and who we are with. Even our children’s toys send data to others.

I’m still as determined to convince policy makers that young people’s data privacy and digital self-awareness must be prioritised.

Highlighting the policy and practice failings in the niche area of the National Pupil Database serves only to get ideas from others how  policy and practice could be better. 20 million school children’s records is not a bad place to start to make data practice better.

The questions that seem hardest to move forward are the simplest: how to involve everyone in what data and policy may bring for future and not leave out certain communities through carelessness.

If the public is not encouraged to understand how our own personal data are collected and used, how can we expect to grow great data scientists of the future? What uses of data put good uses at risk?

And we must make sure we don’t miss other things, while data takes up the time and focus of today’s policy makers and great minds alike.

cb-poster-for-web

care.data listening events and consultation: The same notes again?

If lots of things get said in a programme of events, and nothing is left around to read about it, did they happen?

The care.data programme 2014-15 listening exercise and action plan has become impossible to find online. That’s OK, you might think, the programme has been scrapped. Not quite.

You can give your views online until September 7th on the new consultation, “New data security standards and opt-out models for health and social care”  and/or attend the new listening events, September 26th in London, October 3rd in Southampton and October 10th in Leeds.

The Ministerial statement on July 6, announced that NHS England had taken the decision to close the care.data programme after the review of data security and consent by Dame Fiona Caldicott, the National Data Guardian for Health and Care.

But the same questions are being asked again around consent and use of your medical data, from primary and secondary care. What a very long questionnaire asks is in effect,  do you want to keep your medical history private? You can answer only Q 15 if you want.

Ambiguity again surrounds what constitutes “de-identified” patient information.

What is clear is that public voice seems to have been deleted or lost from the care.data programme along with the feedback and brand.

People spoke up in 2014, and acted. The opt out that 1 in 45 people chose between January and March 2014 was put into effect by the HSCIC in April 2016. Now it seems, that might be revoked.

We’ve been here before.  There is no way that primary care data can be extracted without consent without it causing further disruption and damage to public trust and public interest research.  The future plans for linkage between all primary care data and secondary data and genomics for secondary uses, is untenable without consent.

Upcoming events cost time and money and will almost certainly go over the same ground that hours and hours were spent on in 2014. However if they do achieve a meaningful response rate, then I hope the results will not be lost and will be combined with those already captured under the ‘care.data listening events’ responses.  Will they have any impact on what consent model there may be in future?

So what we gonna do? I don’t know, whatcha wanna do? Let’s do something.

Let’s have accredited access and security fixed. While there may now be a higher transparency and process around release, there are still problems about who gets data and what they do with it.

Let’s have clear future scope and control. There is still no plan to give the public rights to control or delete data if we change our minds who can have it or for what purposes. And that is very uncertain. After all, they might decide to privatise or outsource the whole thing as was planned for the CSUs. 

Let’s have answers to everything already asked but unknown. The questions in the previous Caldicott review have still to be answered.

We have the possibility to  see health data used wisely, safely, and with public trust. But we seem stuck with the same notes again. And the public seem to be the last to be invited to participate and views once gathered, seem to be disregarded. I hope to be proved wrong.

Might, perhaps, the consultation deliver the nuanced consent model discussed at public listening exercises that many asked for?

Will the care.data listening events feedback summary be found, and will its 2014 conclusions and the enacted opt out be ignored? Will the new listening event view make more difference than in 2014?

Is public engagement, engagement, if nobody hears what was said?