Tag Archives: Department for Education

Mutant algorithms, roadmaps and reports: getting real with public sector data

The CDEI has published ‘new analysis on the use of data in local government during the COVID-19 crisis’ (the Report) and it features some similarities in discussing data that the Office for AI roadmap (the Roadmap) did in January on machine learning.

A notable feature is that the CDEI work includes a public poll. Nearly a quarter of 2,000 adults said that the most important thing for them, to trust the council’s use of data, would be “a guarantee that information is anonymised before being shared, so your data can’t be linked back to you.”

Both the Report and the Roadmap shy away from or avoid that problematic gap in their conclusions, between public expectations and reality in the application of data used at scale in public service provision, especially in identifying vulnerability and risk prediction.

Both seek to provide vision and aims around the future development of data governance in the UK.

The fact is that everyone must take off their rose-tinted spectacles on data governance to accept this gap, and get basics fixed in existing practice to address it. In fact, as academic Michael Veale wrote, often the public sector is looking for the wrong solution entirely.The focus should be on taking off the ‘tech goggles’ to identify problems, challenges and needs, and to not be afraid to discover that other policy options are superior to a technology investment.”

But used as it is, the public sector procurement and use of big data at scale, whether in AI and Machine Learning or other systems, require significant changes in approach.

The CDEI poll asked, If an organisation is using an algorithmic tool to make decisions, what do you think are the most important safeguards that they should put in place  68% rated, that humans have a key role in overseeing the decision-making process, for example reviewing automated decisions and making the final decision, in their top three safeguards.

So what is this post about? Why our arms length bodies and various organisations’ work on data strategy are hindering the attainment of the goals they claim to promote, and what needs fixed to get back on track. Accountability.

Framing the future governance of data

On Data Infrastructure and Public Trust, the AI Council Roadmap stated an ambition to, “Lead the development of data governance options and its uses. The UK should lead in developing appropriate standards to frame the future governance of data.”

To suggest we not only should be a world leader but imagine that there is the capability to do so, suggests a disconnect with current reality, none of which was mentioned in the Roadmap but is drawn out a little more in the CDEI Report from local authority workshops.

When it comes to data policy and Artificial Intelligence (AI) or Machine Learning (ML) based on data processing and therefore dependent on its infrastructure, suggesting we should lead on data governance, as if separate from the existing standards and frameworks set out in law, would be disastrous for the UK and businesses in it.  Exports need to meet standards in the receiving countries. You cannot just ‘choose your own’ adventure here.

The CDEI Report says both that participants in their workshops found a lack of legal clarity “in the collection and use of data” and, “Participants finished the Forum by discussing ways of overcoming the barriers to effective and ethical data use.”

Lack of understanding of the law is a lack of competence and capability that I have seen and heard time and time and time again in participants at workshops, events, webinars, some of whom are in charge of deciding what tools are procured and how to implement public policy using administrative data, over the last 5 years. The law on data processing is accessible and generally straightforward.

If your work involves “overcoming barriers” then either there is not competence to understand what is lawful to proceed with confidence using data protections appropriately, or you are trying to avoid doing so. Neither is a good place to be in for public authorities, and bodes badly for the safe, fair, transparent and lawful use of our personal data by them.

But it is also lack of data infrastructure that increases the skills gap and leaves a bigger need to know what is lawful or not, because if your data is held in “excessive use of excel spreadsheets” then you need to make decisions about ‘sharing’ done through distribution of data. Data access can be more easily controlled through role-based access models, that make it clear when someone is working around their assigned security role, and creates an audit trail of access. You reduce risk by distributing access, not distributing data.

The CDEI Report quotes as a ‘concern’ that data access granted under emergency powers in the pandemic will be taken away. This is a mistaken view that should be challenged. That access was *always* conditional and time limited. It is not something that will be ‘taken away’ but an exceptional use only granted because it was temporary, for exceptional purposes in exceptional times. Had it not been time limited, you wouldn’t have had access. Emergency powers in law are not ‘taken away’, but can only be granted at all in an emergency. So let’s not get caught up in artificial imaginings of what could change and what ifs, but change what we know is necessary.

We would do well to get away from the hyperbole of being world-leading, and aim for a minimum high standard of competence and capability in all staff who have any data decision-making roles and invest in the basic data infrastructure they need to do a good job.

Appropriate standards to frame the future governance of data

The AI Council Roadmap suggested that, “The UK should lead in developing appropriate standards to frame the future governance of data.”  Let’s stop and really think for a minute, what did the Roadmap writers think they meant by that?

Because we have law that frames ‘appropriate standards.’ The UK government just seems unable or unwilling to meet it. And not only in these examples, in fact I’d challenge all the business owners on the AI Council to prove their own products meet it.

You could start with the Guidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679 (wp251rev.01). Or consider any of the Policy, recommendations, declarations, guidelines and other legal instruments issued by Council of Europe bodies or committees on artificial intelligence. Or valuable for export standards, ensure respect for the Convention 108  standards to which we are a signed up State party among its over 50 countries, and growing. That’s all before the simplicity of the UK Data Protection Act 2018 and the GDPR.

You could start with auditing current practice for lawfulness. The CDEI Roadmap says, “The CDEI is now working in partnership with local authorities, including Bristol City Council, to help them maximise the benefits of data and data-driven technologies.” I might suggest that includes a good legal team, as I think the Council needs one.

The UK is already involved in supporting the development of guidelines (as I was alongside UK representatives of government and the data regulator the ICO among hundreds of participants in drawing out Convention 108 Guidelines on data processing in education) but to suggest as a nation state that we have the authority to speak on the future governance of data without acknowledging what we should already be doing and where we get it wrong, is an odd place to start.

The current state of reality in various sectors

Take for example the ICO audit of the Department for Education.

Failures to meet basic principles of data protection law include knowing what data they’ve got, appropriate controls on distribution and failure to fair process (tell people you process their data). This is no small stuff. And it’s only highlights from the eight page summary.

The DfE don’t adequately understand what data they hold and not having a record of processing leads to a direct breach of #GDPR. Did you know the Department is not able to tell you to which third parties your own or your child’s sensitive, identifying personal data (from over 21m records) was sent, among 1000s of releases?

The approach on data releases has been to find a way to fit the law to suit data requests, rather than assess if data distribution should be approved at all. This ICO assessment was of only 400 applications — there’s been closer to 2,000 approved since 2012. One refusal was to the US. Another the MOD.


For too long, the DfE ‘internal cultural barriers and attitudes’ has meant it hasn’t cared about your rights and freedoms or meeting its lawful obligations. That is a national government Department in charge of over fifty such mega databases, the NPD is only one of. This is a systemic and structural set of problems, as a direct result of Ministerial decisions that changed the law in 2012 to give our personal data away from state education. It was a choice made not to tell the people whom the data were about. This continues to be in breach of the law. And that is the same across many government departments.

Why does it even matter some still ask? Because there is harm to people today. There is harm in history that must not be possible to repeat. And some of the data held could be used in dangerous ways.

You only need to glance at other applications in government departments and public services to see bad policy, bad data and bad AI or machine learning outcomes. And all of those lead to breakdowns in trust and relations between people and the systems meant to support them, that in turn lead to bad data, and policy.

Unless government changes its approach, the direction of travel is towards less trust, and for public health for example, we see the consequences in disastrous responses from not attending for vaccination based on mistrust of proven data sharing, to COVID conspiracy theories.

Commercial reuse of pubic admin data is a huge mistake and the direction of travel is damaging.

“Survey responses collected from more than 3,000 people across the UK and US show that in late 2018, some 95% of people were not willing to share their medical data with commercial industries. This contrasts with a Wellcome study conducted in 2016 which found that half of UK respondents were willing to do so.” (July 2020, Imperial College)

Mutant algorithms

Summer 2020 first saw no human accountability for grades “derailed by a mutant #algorithm — then the resignation of two  Ofqual executives. What aspects of the data governance failures will be addressed this year? Where’s the *fairness* —there is a legal duty to tell people how what data is used especially in its automated aspects.

Misplaced data and misplaced policy aims

In June 2020 The DWP argued in a court case that, “to change the way the benefit’s online computer calculation system worked in line with the original court ruling would undermine the principle of universal credit” — Not only does it fail its public interest purpose, and does harm, but is lax on its own #data governance controls. World leading is far, far, far away.

Entrenched racism

In August 2020 “The Home Office [has] agreed to stop using a computer algorithm to help decide visa applications after allegations that it contained “entrenched racism”. How did it ever get approved for use?

That entrenched racism is found in policing too. The Gangs Matrix use of data required an Enforcement Notice from the ICO and how it continues to operate at all, given its recognised discrimination and harm to young lives, is shocking.

Policy makers seem fixated on quick fixes that for the most part exist only in the marketing speak of the sellers of the products, while ignoring real problems in ethics and law, and denying harm.

“Now is a good time to stop.”

The most obvious case for me, where the Office for AI should step in, and where the CDEI Report from workshops with Local Authorities was most glaringly remiss, is where there is evidence of failure of efficacy and proven risk of danger to life through the procurement of technology in public policy. Don’t forget to ask what doesn’t work.

In January 2020  a report from researchers at The Turing institute, Rees Centre and What Works Centre published a report on ethics in Machine Learning in Children’s Social Care (CSC) and raised the “dangerous blind spots” and “lurking biases” in application of machine learning in UK children’s social care— totally unsuitable for life and death situations. Its later evidence showed models that do not work or wuld reach the threshold they set for defining ‘success’.

Out of the thirty four councils who had said they had acute difficulties in recruiting children’s social workers in December 2020 Local Government survey, 50 per cent said they had both difficulty recruiting generally and difficulty recruiting the required expertise, experience or qualification. Can staff in such challenging circumstances really have capacity to understand the limitations of developing technology on top of their every day expertise?

And when it comes to focussing on the data, there are problems too. By focusing on the data held, and using only that to make policy decisions rather than on the ground expertise, we end up in situations where only “those who get measured, get helped”.

As Michael Sanders wrote, on CSC, “Now is a good time to stop. With the global coronavirus pandemic, everything has been changed, all our data scrambled to the point of uselessness in any case.

There is no short cut

If the Office for AI Roadmap is to be taken seriously outside its own bubble, the board need to be and be seen to be independent of government. It must engage with reality of applied AI in practice in public services, getting basics fixed first.  Otherwise all its talk of “doubling down” and suggesting the UK government can build public trust and position the UK as a ‘global leader’ on Data Governance is misleading and a waste of everyone’s time and capacity.

I appreciate that it says, “This Roadmap and its recommendations reflects the views of the Council as well as 100+ additional experts.” All of whom I imagine are more expert than me. If so, which of them is working on fixing the basic underlying problems with data governance within public sector data, how and by when? If they are not, why are they not, and who is?

The CDEI report published today identified in local authorities that, “public consultation can be a ‘nice to have’, as it often involves significant costs where budgets are already limited.” If it’s a position the CDEI does not say is flawed, it may as well pack up and go home. On page 27 it reports, “When asked about their understanding of how their local council is currently using personal data and presented with a list of possible uses, 39% of respondents reported that they do not know how their personal data is being used.” The CDEI should be flagging this with a great big red pen as an indicator of unlawful practice.

The CDEI Report also draws on the GDS Ethical Framework but that will be forever flawed as long as its own users, not the used, are the leading principle focus, underpinning the aims. It starts with “Define and understand public benefit and user need.” There’s very little about ethics and it’s much more about “justifying our project”.

The Report did not appear to have asked the attendees what impact they think their processes have on everyday lives, and social justice.

Without fixes in these approaches, we will never be world leading, but will lag behind because we haven’t built the safe infrastructure necessitated by our vast public administrative data troves. We must end bad data practice which includes getting right the basic principles on retention and data minimisation, and security (all of which would be helped if we started by reducing those ‘vast public administrative data troves’ much of which ranges from poor to abysmal data quality anyway). Start proper governance and oversight procedures. And put in place all the communication channels, tools, policy and training to make telling people how data are used and fair processing happen. It is not, a ‘nice to have’ but is required in all data processing laws around the world.

Any genuine “barriers” to data use in data protection law,  are designed as protections for people; the people the public sector, its staff and these arms length bodies are supposed to serve.

Blaming algorithms, blaming lack of clarity in the law, blaming “barriers” is avoidance of one thing. Accountability. Accountability for bad policy, bad data and bad applications of tools is a human responsibility. The systems you apply to human lives affect people, sometimes forever and in the most harmful ways.

What would I love to see led from any of these arms length bodies?

  1. An audit of existing public admin data held, by national and local government, and consistent published registers of databases and algorithms / AI / ML currently in use.
  2. Expose where your data system is nothing more than excel spreadsheets and demand better infrastructure.
  3. Identify the lawful basis for each set of data processes, their earliest records dates and content.
  4. Publish that resulting ROPA and the retention schedule.
  5. Assign accountable owners to databases, tools and the registers.
  6. Sort out how you will communicate with people whose data you unlawfully process to meet the law, or stop processing it.
  7. And above all, publish a timeline for data quality processes and show that you understand how the degradation of data accuracy, quality, and storage limitations all affect the rights and responsibilities in law that change over time, as a result.

There is no short cut, to doing a good job, but a bad one.

If organisations and bodies are serious about “good data” use in the UK, they must stop passing the buck and spreading the hype. Let’s get on with what needs fixed.

In the words of Gavin Freeguard, then let’s see how it goes.

The National Pupil Database end of year report: D for transparency, C minus in security.

Transparency and oversight of how things are administered are simple ways that the public can both understand and trust that things run as we expect.

For the National Pupil Database, parents might be surprised, as I was about some of the current practices.

The scope of use and who could access the National Pupil Database was changed in 2012 and although I had three children at school at that time and heard nothing about it, nor did I read it in the papers. (Hah – time to read the papers?)  So I absolutely agree with Owen Boswara’s post when he wrote:

“There appears to have been no concerted effort to bring the consultation or the NPD initiative to the attention of parents or pupils (i.e. the data subjects themselves). This is a quote from one of the parents who did respond:

“I am shocked and appalled that I wasn’t notified about this consultation through my child’s school – I read about it on Twitter of all things. A letter should have gone to every single parent explaining the proposals and how to respond to this consultation.”

(Now imagine that sentiment amplified via Mumsnet …)”
[July 2013, blog by O. Boswara]

As Owen wrote,  imagine that sentiment amplified via Mumsnet indeed.

Here’s where third parties can apply and here’s a list of who has been given data from the National Pupil Database . (It’s only been updated twice in 18 months. The most recent of which has been since I’ve asked about it, in .) The tier groups 1-4 are explained here on p.18, where 1 is the most sensitive identifiable classification.

The consultation suggested in 2012 that the changes could be an “effective engine of economic growth, social wellbeing, political accountability and public service improvement.”.  

Has this been measured at all if the justification given has begun to be achieved? Often research can take a long time and implementing any changes as a result, more time. But perhaps there has been some measure of public benefit already begun to be accrued?

The release panel would one hope, have begun to track this. [update: DfE confirmed August 20th they do not track benefits, nor have ever done any audit of recipients]

And in parallel what oversight governs checks and balances to make sure that the drive for the ‘engine of economic growth’ remembers to treat these data as knowledge about our children?

Is there that level of oversight from application to benefits measurement?

Is there adequate assessment of privacy impact and ethics in applications?

Why the National Pupil Database troubles me, is not the data it contains per se, but the lack of child/guardian involvement, lack of accountable oversight how it is managed and full transparency around who it is used by and its processes.

Some practical steps forward

Taken now, steps could resolve some of these issues and avoid the risk of them becoming future issues of concern.

The first being thorough fair processing, as I covered in my previous post.

The submission of the school census returns, including a set of named pupil records, has been a statutory requirement on schools since the Education Act 1996. That’s almost twenty years ago in the pre-mainstream internet age.

The Department must now shape up its current governance practices in its capacity as the data processor and controller of the National Pupil Database, to be fit for the 21st century.

Ignoring current weaknesses, actively accepts an ever-increasing reputational risk for the Department, schools, other data sharing bodies or those who link to the data and its bona fide research users. If people lose trust in data uses, they won’t share at all and the quality of data will suffer, bad for functional admin of the state and individual, but also for the public good.

That concerns me also wearing my hat as a lay member on the ADRN panel because it’s important that the public trusts our data is looked after wisely so that research can continue to use it for advances in health and social science and all sorts of areas of knowledge to improve our understanding of society and make it better.

Who decides who gets my kids data, even if I can’t?

A Data Management Advisory Panel (DMAP) considers applications for only some of the applications, tier 1 data requests. Those are the most, but not the only applications for access to sensitive data.

“When you make a request for NPD data it will be considered for approval by the Education Data Division (EDD) with the exception of tier 1 data requests, which will be assessed by the department’s Data Management Advisory Panel. The EDD will inform you of the outcome of the decision.”

Where is governance transparency?

What is the make up of both the Data Management Advisory Panel and and the Education Data Division (EDD)? Who sits on them and how are they selected? Do they document their conflicts of interest for each application? For how long are they appointed and under what selection criteria?

Where is decision outcome transparency?

The outcome of the decision should be documented and published. However, the list has been updated only twice since its inception in 2012. Once was December 2013, and the most recently was, ahem, May 18 2015. After considerable prodding. There should be a regular timetable, with responsible owner and a depth of insight into its decision making.

Where is transparency over decision making to approve or reject requests?

Do privacy impact assessments and ethics reviews play any role in their application and if so, how are they assessed and by whom?

How are those sensitive and confidential data stored and governed?

The weakest link in any system is often said to be human error. Users of the NPD data vary from other government departments to “Mom and Pop” small home businesses, selling schools’ business intelligence and benchmarking.

So how secure are our children’s data really, and once the data have left the Department database, how are they treated? Does lots of form filling and emailed data with a personal password ensure good practice, or simply provide barriers to slow down the legitimate applications process?

What happens to data that are no longer required for the given project? Are they properly deleted and what audits have ever been carried out to ensure that?

The National Pupil Database end of year report: a C- in security

The volume of data that can be processed now at speed is incomparable with 1996, and even 2012 when the current processes were set up. The opportunities and risks in cyber security have also moved on.

Surely the Department for Education should take responsibility seriously to treat our children’s personal data and sensitive records equally as well as the HSCIC now intends to manage health data?

Processing administrative or linked data in an environment with layered physical security (e.g. a secure perimeter, CCTV, security guarding or a locked room without remote connection such as internet access) is good practice. And reduces the risk of silly, human error. Or  simple theft.

Is giving out chunks of raw data by email, with reams of paperwork as its approval ‘safeguards’ really fit for the 21st century and beyond?

tiers

Twenty years on from the conception of the National Pupil Database, it is time to treat the personal data of our future adult citizens with the respect it deserves and we expect of best-in-class data management.

It should be as safe and secure as we treat other sensitive government data, and lessons could be learned from the FARR, ADRN and HSCIC safe settings.

Back to school – more securely, with public understanding and transparency

Understanding how that all works, how technology and people, data sharing and privacy, data security and trust all tie together is fundamental to understanding the internet. When administrations take our data, they take on responsibilities for some of our participation in dot.everyone that the state is so keen for us all to take part in. Many of our kids will live in the world which is the internet of things.  Not getting that, is to not understand the Internet.

And to reiterate some of why that matters, I go back to my previous post in which I quoted Martha Lane Fox recently and the late Aaron Swartz when he said: “It’s not OK not understand the internet, anymore”.

While the Department of Education has turned down my subject access request to find out what the National Pupil Database stores on my own children, it matters too much to brush the issues aside, as only important for me. About 700,000 children are born each year and will added to this database every academic year. None ever get deleted.

Parents can, and must ask that it is delivered to the highest standards of fair processing, transparency, oversight and security. I’m certainly going to.

It’s going to be Back to School in September, and those annual privacy notices, all too soon.

*****

1. The National Pupil Database end of year report card

2. The National Pupil Database end of year report: an F in fair processing

3. The National Pupil Database end of year report: a D in transparency

References:

[1] The National Pupil Database user guide: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/261189/NPD_User_Guide.pdf

[2] Data tables to see the individual level data items stored and shared (by tabs on the bottom of the file) https://www.gov.uk/government/publications/national-pupil-database-user-guide-andsupporting-information

[3] The table to show who has bought or received data and for what purpose https://www.gov.uk/government/publications/national-pupil-database-requests-received

[4] Data Trust Deficit – from the RSS: http://www.statslife.org.uk/news/1672-new-rss-research-finds-data-trust-deficit-with-lessons-for-policymakers

[5] Talk by Phil Booth and Terri Dowty: http://www.infiniteideasmachine.com/2013/04/terris-and-my-talk-on-the-national-pupil-database-at-the-open-data-institute/

[6] Presentation given by Paul Sinclair of the Department for Education at the Workshop on Evaluating the Impact of Youth Programmes, 3rd June 2013

What is in the database?

The Schools Census dataset contains approximately eight million records incrementally every year (starting in 1996) and includes variables on the pupil’s home postcode, gender, age, ethnicity, special educational needs (SEN), free school meals eligibility, and schooling history. It covers pupils in state-funded primary, secondary, nursery, special schools and pupil referral units. Schools that are entirely privately funded are not included.

Pupils can be tracked across schools. Pupils can now be followed throughout their school careers. And it provides a very rich set of data on school characteristics. There is further use by linking the data from other related datasets such as those on higher education, neighbourhoods and teachers in schools.

Data stored include the full range of personal and sensitive data from name, date of birth and address, through SEN and disability needs. (Detail of content is here.)  To see what is in it download the excel sheet : NPD Requests.

 

The Department for Education has specific legal powers to collect pupil, child and workforce data held by schools, local authorities and awarding bodies under section 114 of the Education Act 2005section 537A of the Education Act 1996, and section 83 of the Children Act 1989. The submission of the school census returns, including a set of named pupil records, is a statutory requirement on schools under Section 537A of the Education Act 1996.