This site uses cookies

Some of these cookies are essential, while others help us to improve your experience by providing insights into how the site is being used.

For more detailed information please check our Cookie notice


Necessary cookies

Necessary cookies enable core functionality. This website cannot function properly without these cookies.


Cookies that measure website use

If you provide permission, we will use Google Analytics to measure how you use the website so we can improve it based on our understanding of user needs. Google Analytics sets cookies that store anonymised information about how you got to the site, the pages you visit, how long you spend on each page and what you click on while you’re visiting the site.

Using government microdata to explore health

Type of case study: Research

Impact

The UK is fortunate in its wealth of available major cross-sectional surveys, with most government surveys available for secondary use since their inception. These surveys share some common features:

  • they are large micro data files which contain a large number of detailed variables which require analysis within an appropriate package
  • they form series of repeated cross-sections which enable comparisons over time for groups
  • they are nationally representative, although ‘nation’ may mean the United Kingdom as a whole, Great Britain, or constituent countries: England, Wales, Scotland or Northern Ireland
  • they are sample survey data, which may involve a degree of complexity, both in terms of their structure (many are household files, whereby data are collected for all household members) and sampling strategy
  • the data holdings and documentation are extensive; users are more likely to be overwhelmed than starved of detail

There is a wealth of data for the study of topics of contemporary social interest and concern available using archived datasets.

What can a user do with the data?

smoker

Government survey datasets such as the Health Survey for England (HSE) and the General Household Survey (GHS) are well suited to particular research uses, including multivariate analysis, analyses that look within households, and analyses that look at change over time. As micro data these can look at relationships between multiple individual characteristics. The depth of many questionnaires allows users to explore the validity of existing means of operationalising concepts, or to use new ones.

Primatesta et al. (2001), for example, use the HSE to explore the relationship between smoking and blood pressure. Adda and Cornaglia (2005) use saliva test data from the HSE to demonstrate that while cigarette consumption declines for some groups when tax is increased, the intensity with which the cigarette is smoked increases to compensate.

blood pressure

Household datasets like the GHS enable household members to be associated with each other by means of a household ID and inter-person relationship data. Jarvis (1996) has used this aspect of the GHS to look at the association between parenthood and smoking behaviour. Having controlled for a range of socio-economic factors, he finds that parents with dependent children are more likely than their childless peers to give up smoking. Sample size was increased by ‘pooling’ several consistent datasets together.

A relatively high degree of consistency over time within survey series enables trends to be monitored. Researchers can produce their own summary statistics across time to generate their own time series (for example smoking by social class for men and women (Evandrou and Falkingham 2002)), or may pool data over time, to allow the data to be analysed not only by period but also by pseudo-cohort (e.g. Kemm (2001) combined data for the period 1974 to look at smoking by age and by birth cohort to find that smoking falls with age for all cohorts).

But how does a potential user locate, understand and use data such as these for a topic like smoking?

Finding data

Naïve users may simply start their search with a web search: a Google search on ‘health survey’ or ‘smoking survey’ will result in links to appropriate UK Data Service web pages within the first page or two of hits. From the UK Data Service home page users can readily access a range of tools to find data and/or information. These include:

  • a catalogue search tool (Discover), which enables users to perform keyword searches on the metadata information about the survey such as its abstract (with the ability to restrict results to using the various facets and to order results both by relevance and date)
  • a web site search which allows users to locate information about surveys
  • a list of key data which allows users to quickly locate the best known datasets

Data search results link to the appropriate catalogue records which provide summary information about the surveys and links to the full documentation, the data (including links to download the data, in standard formats such as SPSS and STATA, for offline data analysis and links to the online data analysis tool, Nesstar), and specialist support and/or registration facilities as appropriate.

Using data

If researchers wished to use the GHS to undertake an analysis of people who would like to give up smoking they would need to know whether there were a sufficiently large number of people in the dataset who smoke but would like to give up. The screenshot below shows the Nesstar distribution of the GHS ‘giveup’ variable in 2004/5; it gives the wording and applicability of the question as well as the distribution of the variable. Users can see that that this dataset contains 2,438 individuals who smoke but would like to give up.

See Analysing health data for more information and movie tutorials on how to analyse health data using Nesstar.

For further information or to access the datasets referred to in this case study see: