The goal of this tutorial was to demonstrate how to use Apache SparkR for analyzing large-scale datasets in R. For demo purposes data extracted from the Bitcoin blockchain were used to produce a time series plot similar to this one.

The talk was structured as follows:

  • Intro + Architecture Spark [Roland]
  • Intro Bitcoin Use Case [Bernhard]
  • Demo Standard R (+ some extra packages) [Bernhard]
  • Demo SparkR [Bernhard]
  • Demo SparkR - Cluster [Roland]

Materials are available at https://github.com/behas/sparkR-tutorial.

Best,

-ViennaR