Palestra: Applied Spark: from concepts to Bitcoin analytics


Apache Spark has gained recognition in the developer community for being a highly performant, versatile workhorse for efficiently processing large volumes of data. Many have heard of the remarkable performance gains that Spark achieves over traditional Hadoop by keeping working data entirely in memory. But there is more to the story. At its core Spark provides an alternative way of thinking about processing Big Data. In particular:​

  • Spark employs a computational model that is equally at home with either batch or streaming data processing. This is very helpful in enabling projects to maintain one architecture while processing both historical (batch) and near-real-time (streaming) data
  • Spark exposes a straightforward, easy to learn paradigm for expressing transformations and actions on data. The support for Python, in addition to Scala and Java, makes integrating Spark into a full data capture, processing, and visualization pipeline intuitive for a wider audience.
  • It works equally well "in the small" via local threads and flat-file input, and "at scale" via a Mesos or Spark cluster and HDFS. This makes it possible to build systems that can transition seamlessly from an experimental scope to one that easily scales to meet increased demand over time.

​In this talk, we will discuss these features – and, using practical examples, show how an architecture based on modern Big Data technologies including not only Spark but also Elastic Search and InfluxDB, can be scaled to capture and analyze the continuous global stream of Bitcoin transaction data, both from the open blockchain as well as real-time trades made on the 10 largest public Bitcoin exchanges.

​(Bitcoin is an open-source, peer-to-peer technology that for the first time makes possible a truly decentralized global financial system, with profound implications in economics, politics, and sociology. Though still in its infancy, it has already proven to be a disruptive technology. The open and decentralized nature of Bitcoin, combined with exchange markets that never close, presents a unique opportunity to mine data to witness and analyze the growth of an emergent alternative financial system. Unlike in more mature equity markets of the world, where access to real-time data is restricted to those with means and influence, most information about current the state of the Bitcoin network is freely available, waiting to be collected, processed, analyzed, and visualized by anyone with the right tools.)