The Department for Education (DfE) and the UK Data Archive have strengthened their long-standing collaboration through a new framework agreement. Building on a history of successfully curating and providing access to DfE and DfE-linked data, this agreement streamlines the process for depositing and sharing new datasets via the UK Data Service.
The framework supports the timely release of DfE and DfE-linked research-ready data collections, offering a consistent, transparent, and auditable pathway for access. It also reinforces robust governance and trusted handling of sensitive data, including provision through controlled access routes where necessary.
We are pleased to highlight a selection of recent DfE datasets now available through the UK Data Service:
- Study 9428 – Longitudinal Study of Young Persons in England 2, 2013-2021
- Study 9464 – Children of the 2020s: Wave 1, 2021-2022
- Study 9493 – Further Education Workforce Census in England, 2021-2024
- Study 9505 – Longitudinal Education Outcomes (LEO) Synthetic Data
Work is underway to make the following datasets available:
- Childcare and early years survey of parents (CEYSP) 2024
- PIAAC 2023 – Survey of Adult Skills
Release of LEO synthetic data to aid researchers
A key addition to the UK Data Service data catalogue is a new synthetic data collection based on the Longitudinal Education Outcomes (LEO) study, created to help researchers explore the structure and potential of this rich data resource. LEO is a linked dataset that brings together information on education, employment, benefits, and earnings for around 39 million people in England. LEO data is available to accredited researchers via the Office for National Statistics’ Secure Research Service (ONS SRS).
Its scale and complexity make it an exceptionally powerful resource, but this can also present challenges for researchers seeking to understand its structure and prepare robust analyses. The synthetic LEO version has been developed to give researchers a practical way to understand the original data before applying. It reflects the structure, variables, and format of the original data, while using privacy-preserving methods to ensure no identifiable information is included.
The synthetic data has been created by UCL CEPEO researchers in partnership with the LEO programme team at the Department for Education with funding from Administrative Data Research (ADR) UK. It largely contains the same data tables, variables, format and structure as the real LEO data, but is smaller and has been created using column-by-column random sampling with privacy preserving methods applied.
To ensure its responsible and ethical use, the synthetic data has been released as safeguarded data and can be easily accessed via our data catalogue. Researchers can read more about the potential uses of LEO synthetic data in this accompanying blog.
This collaboration reflects a shared commitment to expand access to high-quality DfE data and linked resources while building a more accessible and responsive data infrastructure for the research community.