The words we use to define data
In the 2021 Defend Digital Me report, The Words We Use in Data Policy: Putting People Back in the Picture, we examined why public conversations about personal data often fail. We highlighted the need for systemic changes in how we talk about data to better account for children’s data within the UK’s national data strategy. A central issue is how we think about data—often seen and framed through misleading metaphors. Metaphors like ‘flows,’ ‘footprints,’ or ‘traces’ influence public opinions and policy but oversimplify governance challenges. These framings profoundly affect views on what should be done with data. This matters as the Data Use and Access Bill in Parliament seeks to rewrite UK data protection law, threatening to undermine public trust in administrative data just as AI companies and others lobby for increased access.
Data as language, not a commodity
But imagine instead that data is not a fixed entity or commodity; it is more akin to language telling the story of your life. Data, turned into information, conveys meaning, which varies by source, user, context, and time. Misinterpreting or ignoring these dimensions leads to poor governance and flawed decisions. Data’s characteristics and value are ephemeral and interpersonal. Like Dr Louise Banks in Arrival, policymakers must recognise that UK data governance requires a multidimensional approach to understanding what data is—not just substance, but traceability, context, and meaning across the data life cycle. We need to talk more about the dimension of time in data governance laws.
The Time Dimension in Data Governance
Time reshapes data governance, affecting its accuracy, personal nature, and user relationships. Personal data may shift between personal and non-personal depending on context, use, and linkage over time.
- Personal Data Over Time
Data can simultaneously be personal or non-personal depending on who holds it and what it is combined with. What identifies an individual in Dataset A may not identify another without access to Dataset B but while I hold A and you hold both A and B, then it is only personal data to you. Over time, data’s ‘personal’ characteristic may shift to include me depending on its use or linkage or breadth of access or leaks and more. - Accuracy and Completeness
Data degrades over time. For instance, a “current address” loses accuracy when someone moves house. But changing systems—such as updated postcode formats to give a new one to the same property or new categorisations (e.g., introducing “White Northern Irish” into a population that may have previously selected “White British” in a census)—can undermine past data’s comparability and completeness. More importantly, how would you know and how will AI systems know if we have no context, no life-cycle ROPA, and give up enforcing the importance of this? - Children’s Data and Vulnerabilities
Special protections for what is labelled “children’s data” in law raises questions: Do these protections apply only at the time of collection because the person the data is about is aged under 18, or do they persist as a characteristic of the data even after the person it is from, ages into adulthood? The concept of a “clean slate,” as proposed by the High-Level Expert Group on AI (HLEG), go some way to solving this issue. However, current practices fail to provide such safeguards that the original GDPR deemed necessary. Failures of which, are demonstrated in the National Pupil Database as the prime case study over time. - Evolving Definitions and Legal Changes
Policy shifts, such as the UK’s Data Use and Access Bill, can change how data is categorised and handled over time by recategorising it as of the law’s commencement date. Such changes affect its characteristics and governance.
Why lifecycle governance matters
Data governance is a constraint of the imbalance of power beyond the lifetime of the data itself and the relationship between the data subject and their user. European data protection laws, rooted in human rights principles, emphasise lifecycle governance. Concepts like data minimisation, retention limitation, and respect for data subject rights ensure that the relationship between individuals and data users remains dynamic and accountable.
The point of data collection is not to produce the KPI, or the report, or benchmark, or even to follow the money in delivery of a public service. The point is the delivery of a public service. Public administrative data collected on the side is a by-product of the process. Statistical data may follow standards and a review process. Much of the rest of public admin data may not. A return might suggest 100% completion but that is no measure of accuracy. When public policy deifies “the product” of data as AI, we focus on the wrong end of the process. Data about public administrative services is a set of contextualised inputs, a dynamic and interpretive representation of public-service delivery and the person’s life it involves, not fixed outputs with fixed characteristics or quality. The person must be kept in the picture in a continuous governance process. Engagement in public service delivery does not end when someone walks out the door, if their data continues to be processed.
We must ensure any public policy or AI creating inferences of meaning are built only on data that are correct, and used within the context in which the meaning intended at source is valid over time.
This is a critical period in which AI companies and others are lobbying hard for more access. Ignoring the role of time in data governance avoids accountability for the problems of data quality and contextual collapse, but will mean datasets that are not fit for purpose will become the foundations for public policy, or for building AI to use or to export. Carnegie UK’s research offers a sobering reminder: poorly designed systems can waste taxpayer money, erode public trust, and fail to deliver promised benefits.