Other computational social science software

Installing Spark on a Windows PC

Apache Spark is an open source parallel processing framework that enables users to run large-scale data analytics applications across clustered computers.

View our guide Installing Spark on a Windows PC (PDF).


Loading data into the HDFS

This short guide provides detailed instructions of how to load a dataset from a PC into a Hadoop system.

View our guide Loading data into HDFS (PDF).


HiveQL example queries

This workbook contains some practical excercises for researchers and/or data analysts who want to run simple queries using Apache Hive.

View our guide HiveQL example queries (PDF).