Computational social science
The changing data research landscape
New technologies, resources and methods are constantly changing how researchers interact with and use data. Many innovations, including those in modelling, simulation, big data, web-scraping, social media and more, have already made a huge impact on how researchers can and should access, manage and explore data.
The UK Data Service supports researchers and data analysts by providing learning resources relating to several innovative aspects of data-intensive social science research.
What is agent-based modelling?
Agent-based modelling refers to a class of computational models for simulating the actions and interactions of autonomous agents that can represent individuals or collective entities such as households, organisations or political entities.
Social science seeks to understand and predict patterns involving human behaviour, many of which are large-scale and complex. But social science explanations or predictions can be difficult to test and refine because of the serious ethical and practical barriers to controlling, manipulating and replicating conditions within experiments. For example, there are many theories behind some of the complex patterns of urban mobility, but when traffic calming measures fail to produce the desired results it can be difficult to identify why or how the situation can be improved.
One possible solution is to run social science experiments using computer-simulated actors whose features, behaviours and actions are informed by real world data. This allows social scientists to explore the factors needed to replicate observations, and to test and refine their understanding of how robust those factors might be. Computational social science experiments also allow researchers to explore how emergent patterns might change under experimental, or even counter-factual, conditions.
Webinars on agent-based modelling
In January 2020, Dr Julia Kasmire at the UK Data Service hosted a series of three webinars that introduce users to the concept of agent-based modelling and go on to discuss some of the key techniques involved.
The webinars are published on the UK Data Service YouTube channel. Links to each webinar is listed below with a brief explanation.
Webinar: Introduction to agent-based modelling for social scientists
This webinar introduces the important concepts of emergent patterns, bottom-up processes, and other theoretical ideas underpinning agent-based modelling; presents several examples of agent-based models; discusses the pros and cons of agent-based models; presents several software options for agent-based modelling and where to get more information.
- Materials associated with the Agent-based Modelling training series (GitHub)
- Webinar recording: Introduction to agent-based modelling for social scientists
- Presentation slides: Introduction to agent-based modelling for social scientists, Julia Kasmire (PDF)
- Download NetLogo
- Telephone Game model.
Webinar: Adding real world GIS and census data to agent-based modelling for social scientists
Introduces the important concepts of downloading, cleaning and preparing shapefiles and other data files for importing into an existing agent-based model; presents an extensive exploration of how a commuting model differs when based on random data or imported real world data; discusses some problems and limitations of using real world data in agent-based models; presents links so that users can access and use the model data presented in the webinar.
- Webinar recording: Adding real world GIS and census data to agent-based modelling for social scientists
- Presentation slides: Adding real-world GIS and census data to agent-based modelling for social scientists, Julia Kasmire (PDF)
- Download NetLogo
- Tram Commute model.
Webinar: Conducting experiments, recording output and analysing results of agent-based modelling for social scientists
Includes how to conduct parameter sweeps for model testing in NetLogo; how to automate the process of computational experiments (also in NetLogo); two different methods of exporting experimental data to saved files for further analysis; briefly displays what exported experimental data looks like and how it might be analysed to support experimental conclusions.
Managing largescale data
Getting data, storing data, manipulating data
All projects need data – you can generate it yourself via surveys or you can get some from the UK Data Service.
If you didn’t generate it yourself, the chances are it is not quite what you wanted but you can adapt it to your needs. You can adapt the data by a variety of means; cleaning the data, extracting parts of the data or joining data from different sources.
The UK Data Service organised a series of three webinars hosted by Peter Smyth from the Cathie Marsh Institute, which covered ways of dealing with these data issues. Below is a brief description of each seminar together with a link to the webinar recording and slides.
Webinar: Introduction to SQL and databases
For large datasets, e.g. millions of records, a dedicated database environment is the storage medium of choice. A database environment is designed to store data efficiently while at the same time making the retrieval of the data you want straight forward using simple to learn SQL queries.
- Webinar recording: Introduction to SQL and Databases
- Presentation slides: Introduction to SQL and Databases, Peter Smyth (PDF)
- introduction to SQL Queries
- introduction to SQL Queries (Python).
Webinar: Getting data from the Internet
This webinar illustrates techniques for automating the download of files from the Internet, using APIs to download data and demonstrating how you might store the data.
It also shows how you can systematically identify data on web pages, scrape it and store the data in a dataset.
- Webinar recording: Getting Data from the Internet
- Presentation slides: Getting Data from the Internet, Peter Smyth (PDF)
- Webinar questions and answers (PDF)
- Jupyter notebooks (Zip file).
Webinar: Power Pivot and dynamic arrays in Excel
Versions of Excel now include Power Pivot – an extension that allows users to load datasets with millions of rows and join datasets together in a more intuitive way. Although integrated into Excel, it is a distinct processing system with its own set of commands for processing data.
Online software for big data
Resources for researchers and data analysts
The UK Data Service has produced a number of guides for researchers and analysts who want to run large-scale data analytics across cluster computers i.e. many computers working together.
Below, we have listed links to these guides, which are in PDF format, together with a brief description of each document.
Installing Spark on a Windows PC
Apache Spark is an open source parallel processing framework that enables users to run large-scale data analytics applications across clustered computers. Installing Spark on a Windows PC guide (PDF).
Obtaining and downloading the HDP Sandbox
Hortonworks is a commercial company which specialises in data platforms based on open source software for big data, in particular Hadoop. HDP is an acronym for the Hortonworks Data Platform, which is an implementation of a Hadoop cluster and a range of associated big data products which run in the Hadoop environment. Obtaining and downloading the HDP Sandbox guide (PDF).
Loading data into HDFS
This short guide provides detailed instructions of how to load a dataset from a PC into a Hadoop system. Loading data into HDFS guide (PDF).
HiveQL example queries
This workbook contains some practical exercises for researchers and/or data analysts who want to run simple queries using Apache Hive. HiveQL example queries guide (PDF).