Apache NiFi edit   discuss  

General purpose technology for the movement of data between systems, including the ingestion of data into an analytical platform. Based on directed acyclic graph of Processors and Connections, with the unit of work being a FlowFile (a blob of data plus a set of key/value pair attributes). Supports guaranteed delivery of FlowFiles, with NiFi resiliently storing state (by default to a local write ahead log) and data blobs (by default a set of local partitions on disk), with all FlowFile transformations executed via a thread pool within the NiFi instance (with the option to deploy multiple NiFi instances as a cluster). All flows are configured in a graphical user interface, which is also used for management and operations (starting/stopping individual Processors and viewing real time statuses, statistics and other information). Also supports some record level operations (via RecordReaders and RecordSetWriters), data provenance (reporting on the processing events and lineage of individual FlowFiles), scheduling of Processor execution (based on periodic execution timers or cron specifications), multi-threaded Processor execution, configuration of Processor batch sizes (to enable low latency or high throughput), prioritised queues within Connections (allowing FlowFiles to be processed based on their age or a priority attribute as an alternative to FIFO), back pressure (based on counts or data volume against individual Connections) and pressure release (automatic discarding of FlowFiles based on their age), the ability to stream data to and from other NiFi instances and other streaming technologies, the ability to import and export flows as XML (flow templates), an expression language for setting Processor configuration and populating FlowFile attributes, Controller Services to provide shared services to processors (e.g. access to credentials, shared state), Reporting Tasks to output status and statistics information and a user security model. Extensible through the addition of custom Processors, Controller Services, Reporting Tasks and Prioritizers, and integrates with Apache Ranger and Apache Ambari. Originally developed at the NSA as Niagara Files, before being donated to the Apache Foundation in November 2014, graduating in July 2015. Java based, with development lead by Hortonworks after their aquisition of Onyara (which was set up by original NiFi developers to provide commercial support and services).

Technology Information

Other NamesNiFi
VendorsThe Apache Software Foundation
TypeCommercial Open Source
Last UpdatedFebruary 2019 - v1.9

Sub-projects

Apache NiFi >  MiNiFiLightweight headless version of NiFi used to collect and process data at it's source, before forwarding it on for centralised processing. Supports all key NiFi functionality including all NiFi processors, guaranteed delivery, data buffering (including back pressure and pressure release) and prioritised queuing, however flows are specified in configuration files, status information and statistics are only available via Reporting Tasks or via a CLI, and provenance can only be viewed by exporting events via Reporting Tasks to log files or a full NiFi instance. Supports warm re-deployments, automatically restarting to load a new configuration written to disk or pushed or pulled over HTTP. Available as a Java or Native C++ executable. Started in March 2016, with a 0.1 release in December 2016.
Apache NiFi >  NiFi RegistryA solution for the configuration management of NiFi flows. Integrates with NiFi to allow users to store, retrieve and upgrade flows, keeping a full history of all changes to a flow committed to the registry, with flows stored and organised by buckets. Supports local users and groups, or authentication via certificates, LDAP or Kerberos, with access control policies allowing read, write and delete permissions to be specified for buckets, users and groups. Has a Web based UI and a REST interface for managing buckets, local users and groups, viewing flow history and for managing access control. First released in January 2018.

Related Technologies

Is packaged byHortonworks DataFlow

Release History

versionrelease daterelease linksrelease comment
1.22017-05-08summary; record level processing; running SQL on event streams 
1.32017-06-08summary 
1.42017-10-02summary 
1.52018-01-12summaryregistry / version control blog post from DZone
1.62018-04-08summary 
1.72018-06-25summary 
1.82018-10-26summary 
1.92019-02-19summary 

News

Blog Posts