The Centre for Longitudinal Studies (CLS)
About the Centre for Longitudinal Studies and its cohort data
The Centre for Longitudinal Studies (CLS), based at University College London (UCL), is home to four national longitudinal cohort studies, which follow the lives of tens of thousands of people –
• 1958 National Child Development Study
• 1970 British Cohort Study
• Next Steps
• Millennium Cohort Study
Each study follows large, nationally representative groups of people born in a given year. The oldest study charts the lives of a group of Baby Boomers born in the late 1950s, while the youngest keeps up with a group born at the turn of the new century. These studies map cohort members individual trajectories, in areas such as education, employment and health, creating a unique resource for researchers and provide rich insights that have helped shape governmental policies.
Enabling secure onward sharing of linked NHS Digital data
In 2015, CLS and NHS Digital worked on a pioneering new agreement which enables secure onward sharing of linked NHS Digital and CLS study data via the UK Data Service. So far, health administrative records from Hospital Episode Statistics (HES) have been linked to all four of the CLS longitudinal studies, with three deposited with the UKDS and the fourth currently being prepared for deposit. These new data linkages are a major enhancement to the long-running birth cohort studies. Researchers are now able to analyse the administrative health data alongside the study data to answer many more questions about the factors that influence health and the way health interacts with other facets of people’s lives.
About the Hospital Episode Statistics
The Hospital Episode Statistics (HES) is a database that contains information about all hospital admissions in England. It is comprised of four datasets:
• Accident and Emergency episodes dataset (A&E): Attendance to Accident and Emergency care facility years 2007- most recent.
• Admitted Patient Care episodes dataset (APC): Attendance to Admitted Patient Care years 1997-most recent.
• Adult Critical Care episodes dataset (CC): Attendance to Critical Care years 2009-most recent.
• Outpatients episodes dataset (OP): OP Attendance to Outpatient years 2003-most recent.
The years of data provided are the earliest years in which data was available from NHS Digital.
The data cover diverse topics including: diagnosis, maternity, mortality, mental health, types of therapies, treatment’s length, Indices of Multiple Deprivation (IMD), service providers, organisations, and regional geographical location.
More about the linked data
Data from the National Child Development Study (NCDS), 1970 British Cohort Study (BCS70) and Next Steps linked with Hospital Episode Statistics data are available via the UK Data Service SecureLab. The Millennium Cohort Study linked data are currently being prepared for deposit. Access to the data is facilitated via the Five Safes Framework . Accredited Researchers must complete the Safe Researcher Training and request the variables needed for their analyses. All projects must be approved by the data owner and the research is carried out within the SecureLab environment with outputs subject to disclosure control checks.
For further details about the available data, see –
CLS is in the process of refreshing this data to get post 2017 data.
Archiving challenges and solutions
Linking Hospital Episode Statistics data to these longitudinal data presents some challenges. Because HES data are not collected for research purposes, there can be missing or poorly completed information and lack of clarity around some codings and how they may have changed over time. Therefore, health data requires extra quality checks and data cleaning and may require additional information from the data provider. Before depositing any data with the UKDS, the CLS routinely performs a number of disclosure control checks and treats data for disclosure control. Health records pose specific challenges – for example, very specific health conditions, in combination with other variables, can increase the risk of re-identification. Some variables, such as those relating to classification of diseases, were truncated to avoid the possibility of data disclosure. Scarce data, for example relating to mental health or cancer, may also pose disclosure risks and give too few cases for meaningful analysis, and may also be removed from the deposited data.
This is the first time that data from NHS Digital linked to a survey have been made available to the research community through the UK Data Service and this therefore presented onward sharing challenges and solutions. In addition to UKDS’s standard access arrangements, NHS Digital had some specific additional requirements. CLS worked closely with NHS Digital to create a new data sharing agreement that the researchers need to sign with UCL (where CLS is based) if they apply to access the data via the UKDS. This data sharing agreement requires researchers to provide evidence that the research project benefits the health and social care system, which is an NHS Digital legal requirement for access to their data.
For data minimisation purposes, NHS Digital also require researchers to specify the specific health variables required for their research project and CLS created a spreadsheet of variables for this purpose.
Applications and approvals
Thanks to UKDS’s flexible ordering system, the new CLS data sharing agreement and the spreadsheet of variables are both made available to researchers when they apply online to access the data. The application will first be screened by the staff member at the UKDS and then forwarded to CLS for approval. Under the agreement with NHS Digital, CLS is responsible for approving project applications through its pre-existing Data Access Committee (DAC). The DAC will assess, among other things, whether the proposal meets all the NHS Digital requirements. CLS will publish the details of approved applications on the CLS website.
This linked dataset has been well received by the research community. For example, Martinez Jimenez, a post graduate researcher at Lancaster University and Imperial College Business School, has been using Next Steps: Linked Health Administrative Datasets (Hospital Episode Statistics), England, 1997-2017: Secure Access. His project focuses on the impact of economic shocks on health and health inequalities. Mario viewed this linked data as extremely valuable. He not only found all the information required for his research project, but the dataset provided him opportunities to further explore his area of research. For prospective users, he suggested that:
“It is a great dataset, it combines information that any data survey can give us. Also, one of the cohorts focus on adolescence including the generation of the millennium which is very interesting generation to study on and extremely significant to researchers. The linkage is amazing as it not only gives information related to household including the parents and the children and the circumstances they are in, but also on the healthcare utilisation.”
The substantial additional work involved in data processing and project approvals for these data means that researchers benefit from enriched data for research purposes. A key consideration for data owners and data controllers for the onward sharing of their linked data is that they use a trusted secure research environment similar to the UKDS SecureLab and have robust data sharing agreements and access arrangements in place.
Follow this link to find the UK Data Service’s guidance on preparing data for deposit.