Big Data Network Support
About the project
The public and commercial sectors are continuously generating large amounts of data that can provide a powerful discovery tool for researchers, enabling them to gain valuable insight.
However, the arrival of ‘big data’ has changed social scientists’ expectations, bringing technological and infrastructure challenges for data services and repositories, in terms of volume, complexity and rights in the data. The UK Data Service now needs to ingest sizable streams of real-time data, and enable exploration and linkage of a variety of data assets. We have been operating since 1967, when ‘data storage technology’ meant punch cards and since then have adapted to magnetic tape, floppy disks and modern online databases – and users have moved from examining printed statistical tables to downloading survey data files, and to exploring and visualising over the web.
These new challenges mean that researchers need to be enabled to make the most of these data for knowledge exchange and impact – through access to data and capacity building. Moreover, repositories need to review their existing repository architecture and infrastructure to be ready for the changes in the data landscape.
A dedicated Big Data Network Support team
As part of Phase 2 of the Economic and Social Research Council’s (ESRC) Big Data Network (BDN2), the UK Data Service established a dedicated Big Data Network Support team to provide advice to other Big Data Network Research Centres – The Urban Big Data Centre, The ESRC Business and Local Government Data Research Centre, The Consumer Data Research Centre – on areas of data licensing and governance and trusted access. We put on many training events, webinars and developed user guides and capacity building case studies.
Importantly, the BDNS award allowed the UK Data Service to research and develop an open source solution for hosting big data. We focussed first on the field of household energy research, and developed blueprints and a test instance of an Apache Hadoop system to manage smart meter data, known as Data Service as a Platform (DSaaP). This is being utilised as the technology for a group-breaking EPSRC-funded project headed up by UCL as a Smart Energy Research Lab (SERL).
Objectives Big Data Network Support
- Unify data discovery across the BDN2 data collections
- Coordinate user training and capacity building in big data analytics for researchers using the data
- Harmonize data acquisition, governance and access, across the Data Research Centres
- Encourage the sharing of information and expertise across the Data Research Centres
- Provide tailored support for each of the Data Research Centres
- Support access to, and the use of, new and novel forms of data beyond the lifetime of the current funding period of the BDN2
Further funding
- EPSRC for a Smart Energy Research Lab (SERL) with UCL Energy Institute and others
UK Data Service contribution
- Principle Investigator: Nathan Cunningham, UK Data Service
- Team: Darren Bell, Deirdre Lungley, Chris Park, Sarah King-Hele, Peter Smythe, Margerita Ceraolo, Hervé L’Hours, Louise Corti
- Funder: Economic and Social Research Council (ESRC)
- Dates: January 2014 – September 2017
Case studies and guides
Alongside our training and capacity building programme, our big data work produced some useful guides and case studies. Below are links to some of this work:
Technology
- Delving into data: building data infrastructure
- Scaling up: digital data services for the social sciences
- Data Service as a Platform
- Amazon Web Services: UK Data Service Case Study
Capacity Building
- Introductory big data training for social scientists
- Upskilling social scientists in big data: ‘Encounters’ summer schools
Guides
- Big data and data sharing: Ethical issues
- Social media research: A Guide to Ethics (Authors: Dr. Leanne Townsend and Prof. Claire Wallace, The University of Aberdeen)
- Preserving social media
- Preserving transactional data
- Using R to analyse key UK
- HiveQL example queries
- Loading data into Hadoop Distributed File System (HDFS)
- Obtaining and downloading the Hortonworks Data Platform (HDP) Sandbox
Research examples
- Research with household energy data at scale
- Using smart meter data to enable energy demand research
- Researching the thermal character of UK dwellings
- Matching satellite data to surveillance site data to investigate service delivery
- A model to estimate the diversity of domestic energy demand at high-resolution
- Measuring the impact of online shopping on high streets across England