A set of libraries for working with data in Hadoop. Consists of two sub-projects - DataFu Pig (a set of Pig User Defined Functions) and DataFu Hourglass (a framework for incremental processing using MapReduce). Originally created at LinkedIn, with the Pig UDFs being open sourced in January 2012 as DataFu, with a v1.0 release in September 2013. Split into sub-projects in October 2013 when LinkedIn open sourced DataFu Hourglass and added it to the project. Donated to the Apache Foundation in January 2014, graduating in February 2018. Last major release was v1.3 in November 2015, with a handful of bug fix releases but little development activity since then.
Other Names DataFu Vendors The Apache Software Foundation Type Commercial Open Source Last Updated January 2019 - v1.5
Apache DataFu > DataFu Hourglass A framework over MapReduce that supports the efficient generation of statistics of dated data by incrementally updating the previous days output. Supports both fixed length and fixed start point windows, and the generation of statistics by input partition or as a total over all input data. Apache DataFu > DataFu Pig A set of user defined functions for Apache Pig, including support for statistical calculations, bag and set operations, sessionisation of streams of data, cardinality estimation, sampling, hashing, PageRank and others.
Is packaged by Apache Bigtop, Hortonworks Data Platform
version release date release links release comment 1.0 2013-09-04 summary 1.3 2015-11-18 summary First Apache (Incubating) release 1.4 2018-03-25 summary Release to mark Apache graduation; includes 1.3.x patches 1.5 2019-01-07 summary Java 8 compatibility; two new macros