Other computational social science software
Installing Spark on a Windows PC
Apache Spark is an open source parallel processing framework that enables users to run large-scale data analytics applications across clustered computers.
View our guide Installing Spark on a Windows PC (PDF).
Loading data into the HDFS
This short guide provides detailed instructions of how to load a dataset from a PC into a Hadoop system.
View our guide Loading data into HDFS (PDF).
HiveQL example queries
This workbook contains some practical excercises for researchers and/or data analysts who want to run simple queries using Apache Hive.
View our guide HiveQL example queries (PDF).