Software and tools

The importance of software for social science

Software programs are critical for handling the complex or large amounts of data available to researchers, and are vital tools for manipulating and analysing data. They are also important for producing graphics of data, mapping and data mining.

The suite of training resources below relate to software tools used by social science data researchers.

Key software tools


R is a free, user developed, advanced statistical and computing programme for statistical computing.  It is increasingly used in the academic world for teaching purposes.



A software package for Windows, SPSS can be used to to produce graphics of data as well as other data analysis.


Stata 12.1

Stata12.1 is a statistical software package for data analysis. You can use Stata by pointing and clicking, or by using the command syntax.

The software can support complex analysis, and, as it is so programmable, developers and users continue to add new features.



QGIS is an open source mapping package which can be downloaded for free  –  it has a good range of functionality and is straightforward to use.



Python is a general-purpose programming language for data scientists to collect, clean, and analyse data.  It is often used because it is flexible and suitable for handling large datasets.


  • These training materials include webinar recordings, slides, and sample Python code for core social science research tasks.


Nesstar enables you to search, browse, visualise, analyse and download a selected range of different kinds of social and economic data, from survey data to multidimensional tables.



UKDS.Stat enables you to extract the information you want from large socio-economic international datasets available at the UK Data Service.



Infuse is an open standards structure developed by the UK Data Service to provide easy access to aggregate data from the UK 2011 and 2001 censuses.



Casweb is an interface that provides access to census data from 1971 to 2001.



GeoConvert is a tool that makes it easy to match UK postcodes, census geographies and convert data between them.


Boundary data tools

Through its Census Support service, the UK Data Service provides a selection of tools to enable you to easily analyse census boundary data.



QualiBank is the UK Data Service’s search and browse interface for qualitative data objects allowing searching of the content of text files, such as interviews, essays, open ended questions and reports.


Other computational social science software

Installing Spark on a Windows PC

Apache Spark is an open source parallel processing framework that enables users to run large-scale data analytics applications across clustered computers.

View our guide Installing Spark on a Windows PC (PDF).

Obtaining and downloading the HDP Sandbox

Hortonworks is a commercial company which specialises in data platforms based on open source software for big data, in particular Hadoop. HDP is an acronym for the Hortonworks Data Platform, which is an implementation of a Hadoop cluster and a range of associated big data products which run in the Hadoop environment.

View our guide Obtaining and downloading the HDP Sandbox (PDF).

Loading data into the HDFS

This short guide provides detailed instructions of how to load a dataset from a PC into a Hadoop system.

View our guide Loading data into HDFS (PDF).

HiveQL example queries

This workbook contains some practical excercises for researchers and/or data analysts who want to run simple queries using Apache Hive.

View our guide HiveQL example queries (PDF).