Other computational social science software
Installing Spark on a Windows PC
Apache Spark is an open source parallel processing framework that enables users to run large-scale data analytics applications across clustered computers.
View our guide Installing Spark on a Windows PC (PDF).
Obtaining and downloading the HDP Sandbox
Hortonworks is a commercial company which specialises in data platforms based on open source software for big data, in particular Hadoop. HDP is an acronym for the Hortonworks Data Platform, which is an implementation of a Hadoop cluster and a range of associated big data products which run in the Hadoop environment.
View our guide Obtaining and downloading the HDP Sandbox (PDF).
Loading data into the HDFS
This short guide provides detailed instructions of how to load a dataset from a PC into a Hadoop system.
View our guide Loading data into HDFS (PDF).
HiveQL example queries
This workbook contains some practical excercises for researchers and/or data analysts who want to run simple queries using Apache Hive.
View our guide HiveQL example queries (PDF).