Other computational social science software

Installing Spark on a Windows PC

Apache Spark is an open source parallel processing framework that enables users to run large-scale data analytics applications across clustered computers.

View our guideĀ Installing Spark on a Windows PC (PDF).

Obtaining and downloading the HDP Sandbox

Hortonworks is a commercial company which specialises in data platforms based on open source software for big data, in particular Hadoop. HDP is an acronym for the Hortonworks Data Platform, which is an implementation of a Hadoop cluster and a range of associated big data products which run in the Hadoop environment.

View our guide Obtaining and downloading the HDP Sandbox (PDF).

Loading data into the HDFS

This short guide provides detailed instructions of how to load a dataset from a PC into a Hadoop system.

View our guide Loading data into HDFS (PDF).

HiveQL example queries

This workbook contains some practical excercises for researchers and/or data analysts who want to run simple queries using Apache Hive.

View our guide HiveQL example queries (PDF).