The knowledge and thinking on changing technology, the understanding of the computing experts and those familiar with data, must not stay within conference rooms and paywalls.
What role do data and policy play in a world of post-truth politics and press? How will young people become better informed for their future?
The data for policy conference this week, brought together some of the leading names in academia and a range of technologists, government representatives, people from the European Commission, and other global organisations, Think Tanks, civil society groups, companies, and individuals interested in data and statistics. Beyond the UK, speakers came from several other countries in Europe, from the US, South America and Australia.
The schedule was ambitious and wide-ranging in topics. There was brilliant thinking and applications of ideas. Theoretical and methodological discussions were outnumbered by the presentations that included practical applications or work in real-life scenarios using social science data, humanitarian data, urban planning, public population-wide administrative data from health, finance, documenting sexual violence and more. This was good.
We heard about lots of opportunities and applied projects where large datasets are being used to improve the world. But while I always come away from these events having learned something and encouraged to learn more about those I didn’t, I do wonder if the biggest challenges in data and policy aren’t still the simplest.
No matter how much information we have, we must use it wisely. I’ve captured ten takeaways of things I would like to see follow. This may not have been the forum for it.
Ten takeaways on Data-for-Policy
1. Getting beyond the Bubble
All this knowledge must reach beyond the bubble of academia, beyond a select few white-male-experts-in well off parts of the world, and get into the hands and heads of the many. Ways to do this must include reducing the cost or changing pathways of academic print access. Event and conference fees are also a barrier to many.
2. Context of accessibility and control
There is little discussion of the importance of context. The nuance of most of these subjects was too much for the length of the sessions but I didn’t hear any single session mention threats to data access and trust in data collection posed by surveillance or state censorship or restriction of access to data or information systems, or the editorial control of knowledge and news by Facebook and co. There was no discussion of the influence of machine manipulators, how bots change news or numbers and create fictitious followings.
Policy makers and public are influenced by the media, post-truth or not. Policy makers in the UK government recently wrote in response to challenge over a Statutory Instrument that if Mums-net wasn’t kicking up a fuss then they believed the majority of the public were happy. How are policy makers being influenced by press or social media statistics without oversight or regulating for their accuracy?
Increasing data and technology literacy in policy makers, is going to go far beyond improving an understanding of data science.
3. Them and Us
I feel a growing disconnect between those ‘in the know’ and those in ‘the public’. Perhaps that is a side-effect of my own understanding growing about how policy is made, but it goes wider. Those who talked about ‘the public’ did so without mention that attendees are all part of that public. Big data, are often our data. We are the public.
Vast parts of the population feel left behind already by policy and government decision-making; divided by income, Internet access, housing, life opportunites, and the ability to realise our dreams.
How policy makers address this gulf in the short and long term both matter as a foundation for what data infrastructure we have access to, how well trusted it is, whose data are included and who is left out of access to the information or decision-making using it.
Researchers prevented from accessing data held by government departments, perhaps who fear it will be used to criticise rather than help improve policy of the day, may be limiting our true picture of some of this divide and its solutions.
Equally data that is used to implement top-down policy without public involvement, seems a shame to ignore public opinion. I would like to have asked, does GDS in its land survey work searching for free school sites include people surveys asking, do you want a free school in your area at all?
4. There is no neutral
All the wisdom in the world could not convince a majority in the 23rd June referendum, that the UK should remain in the European Union. This unspoken context was perhaps an aside to most of the subjects of the conference which went beyond the UK, but we cannot ignore that the UK is deep in political crisis in the world, and at home the Opposition seems to have gone into a tailspin.
What role do data and evidence have in post-truth politics?
It was clear in discussion, that if I mentioned technology and policy in a political context, eyes started to glaze over. Politics should not interfere with the public interest, but it does and cannot be ignored. In fact it is short term political terms and needs for long term vision that are perhaps most at-odds in making good data policy plans.
The concept of public good, is not uncomplicated. It is made more complex still if you factor in changes over time, and cannot ignore that Trump or Turkey are not fictitious backdrops considering who decides what the public good and policy priorities should be.
Researchers’ role in shaping public good is not only about being ethical in their own research, but having the vision to have safeguards in place for how the knowledge they create are used.
5. Ethics is our problem, but who has the solution?
While many speakers touched on the common themes of ethics and privacy in data collection and analytics, saying this is going to be one of our greatest challenges, few address how, and who is taking responsibility and accountability for making it happen in ways that are not left to big business and profit making decision-takers.
It appears from last year, that ethics played a more central role. A year later we now have two new ethical bodies in the UK, at the UK Statistics Authority and at the Turing Institute. How they will influence the wider ethics issues in data science remains to be seen.
Legislation and policy are not keeping pace with the purchasing power or potential of the big players, the Googles and Amazons and Microsofts, and a government that sees anything resulting in economic growth as good, is unlikely to be willing to regulate it.
How technology can be used and how it should be used still seems a far off debate that no one is willing to take on and hold policy makers to account for. Implementing legislation and policy underpinned with ethics must serve as a framework for giving individuals insight into how decisions about them were reached by machines, or the imbalance of power that commercial companies and state agencies have in our lives that comes from insights through privacy invasion.
6. Inclusion and bias
Clearly this is one event in a world of many events that address similar themes, but I do hop that the unequal balance in representation across the many diverse aspects of being human are being addressed elsewhere. A wider audience must be inclusive. The talk by Jim Waldo on retaining data accuracy while preserving privacy was interesting as it showed how deidentified data can create bias in results if data is very different from the original. Gaps in data, especially using big population data which excludes certain communities, wasn’t something I heard discussed as much.
7.Commercial data sources
Government and governmental organisations appear to be starting to give significant weight to the use of commercial data and social media data sources. I guess any data seen as ‘freely available’ that can be mined seems valuable. I wonder however how this will shape the picture of our populations, with what measures of validity and whether data are comparable and offer reproducability.
These questions will matter in shaping policy and what governments know about the public. And equally, they must consider those communities whether in the UK or in other countries, that are not represented in these datasets and how these bias decision-making.
8. Data is not a panacea for policy making
Overall my take away is the important role that data scientists have to remind policy makers that data is only information. Nothing new. We may be able to access different sources of data in different ways, and process it faster or differently from the past, but we cannot rely on data of itself to solve the universal problems of the human condition. Data must be of good integrity to be useful and valuable. Data must be only one part of the library of resources to be used in planning policy. The limitations of data must also be understood. The uncertainties and unknowns can be just as important as evidence.
9. Trust and transparency
Regulation and oversight matter but cannot be the only solutions offered to concerns about shaping what is possible to do versus what should be done. Talking about protecting trust is not enough. Organisations must become more trustworthy if trust levels are to change; through better privacy policies, through secure data portability and rights to revoke consent and delete outdated data.
10. Young people and involvement in their future
What inspired me most were the younger attendees presenting posters, especially the PhD student using data to provide evidence of sexual violence in El Salvador and their passion for improving lives.
We are still not talking about how to protect and promote privacy in the Internet of Things, where sensors on every street corner in Smart Cities gather data about where we have been, what we buy and who we are with. Even our children’s toys send data to others.
I’m still as determined to convince policy makers that young people’s data privacy and digital self-awareness must be prioritised.
Highlighting the policy and practice failings in the niche area of the National Pupil Database serves only to get ideas from others how policy and practice could be better. 20 million school children’s records is not a bad place to start to make data practice better.
The questions that seem hardest to move forward are the simplest: how to involve everyone in what data and policy may bring for future and not leave out certain communities through carelessness.
If the public is not encouraged to understand how our own personal data are collected and used, how can we expect to grow great data scientists of the future? What uses of data put good uses at risk?
And we must make sure we don’t miss other things, while data takes up the time and focus of today’s policy makers and great minds alike.