How Impala has Pushed HDFS in New Ways
[presentation]Local:Ballroom 112:20 - 13:10Abstract:
Impala is the first system written on top of HDFS which is capable of providing responses to interactive queries over large data sets in real time. Consequently, the performance characteristics and aspirations of Impala are naturally quite different from those of traditional Map/Reduce workloads which have been running on top of HDFS for years. This has prompted several new developments in HDFS to allow Impala to take full advantage of the hardware resources of a cluster. This talk will provide an introduction to Impala and describe some of the HDFS advancements that we have implemented which were directly motivated by the project.