Apache DataFu edit   discuss  

A set of libraries for working with data in Hadoop. Consists of two sub-projects - DataFu Pig (a set of Pig User Defined Functions) and DataFu Hourglass (a framework for incremental processing using MapReduce). Originally created at LinkedIn, with the Pig UDFs being open sourced in January 2012 as DataFu, with a v1.0 release in September 2013. Split into sub-projects in October 2013 when LinkedIn open sourced DataFu Hourglass and added it to the project. Donated to the Apache Foundation in January 2014, graduating in February 2018. Last major release was v1.3 in November 2015, with a handful of bug fix releases but little development activity since then.

Technology Information

Other NamesDataFu
VendorsThe Apache Software Foundation
TypeCommercial Open Source
Last UpdatedJanuary 2019 - v1.5

Sub-projects

Apache DataFu >  DataFu HourglassA framework over MapReduce that supports the efficient generation of statistics of dated data by incrementally updating the previous days output. Supports both fixed length and fixed start point windows, and the generation of statistics by input partition or as a total over all input data.
Apache DataFu >  DataFu PigA set of user defined functions for Apache Pig, including support for statistical calculations, bag and set operations, sessionisation of streams of data, cardinality estimation, sampling, hashing, PageRank and others.

Related Technologies

Is packaged byApache Bigtop, Hortonworks Data Platform

Release History

versionrelease daterelease linksrelease comment
1.02013-09-04summary 
1.32015-11-18summaryFirst Apache (Incubating) release
1.42018-03-25summaryRelease to mark Apache graduation; includes 1.3.x patches
1.52019-01-07summaryJava 8 compatibility; two new macros

News

Blog Posts