An abstraction layer over MapReduce (and now Spark) that provides a high level Java API for creating data transformation pipelines, originally designed to make working with MapReduce easier based on the Google FlumeJava paper. Also includes connectors for HBase, Hive and Kafka, Java 8 lambda support, an experimental Scala wrapper for the API (Scrunch), and support for in memory pipelines and helper classes to support testing. Open sourced by Cloudera in October 2011, donated to the Apache Foundation in May 2012, before graduating in February 2013. Support for Spark was added as part of v0.10 in June 2014. Still being maintained, and appears to have had been adopted at a number of large companies, but with limited new development.
Other Names Crunch Vendors The Apache Software Foundation Type Commercial Open Source Last Updated April 2017 - 0.15
Is packaged by Apache Bigtop Is packaged by (but deprecated) Cloudera CDH
version release date release links release comment 0.15 2017-02-26 GitHub release page