Technology for buffering and storing real-time streams of data between producers and consumers, with a focus on high throughput at low latency. Based on a distributed, horizontally scalable architecture, with messages organised into topics which are partitioned and replicated across nodes (called brokers by Kafka) to provide resilience and written to disk to provide persistence. Topics may have multiple producers and consumers, with ability to do fault tolerant reads and to load balance across consumers (consumer groups). Records consist of a key, value and timestamp, with the ability to compact topics to remove updates and deletes by key. Supports rolling upgrades, a full security model (including secure and authenticated connections and ACLs for controlling access to topics), the ability to set quotas (for data produced or consumed), Yammer metrics for both servers and clients, and tools to mirror data to a second cluster (mirror maker) and re-distribute partitions across nodes (for example when adding new nodes). Comes with a Java client, but clients for a wide range of languages are also available. Has two sub-projects (Kafka Connect and Kafka Streams) that are bundled with the main product. Originally developed at LinkedIn, being open sourced in January 2011, before being donated to the Apache Foundation in July 2011 and graduating in October 2012. Development is primarily led by Confluent (which was founded by the team that built Kafka at LinkedIn), who have a number of open source and commercial offerings based around Kafka. Commercial support is also available from most Hadoop vendors.
Other Names Kafka Vendors The Apache Software Foundation Type Commercial Open Source Last Updated June 2019 - v2.3
Apache Kafka > Kafka Connect Framework for building scalable and reliable integrations between Kafka and other technologies, either for importing or exporting data. Part of the core Apache Kafka open source technology, connectors are available for a wide range of systems, including Hadoop, relational, NoSQL and analytical databases, search technologies and message queues amongst others, with an API for developing custom connectors. Supports lightweight transformations, and runs separately to Kafka, in either a stand-alone or distributed cluster mode, with a REST API for managing connectors. Introduced in Kafka 0.9, previously known as Copycat Apache Kafka > Kafka Streams A stream processing technologies that's tightly integrated to Apache Kafka, consuming and publishing events from and to Kafka topics (and potentially writing output to external systems). Based on an event-at-a-time model (i.e. not micro batch), with support for stateful processing, windowing, aggregations, joining and re-processing data. Supports a low level DSL API, as well as a high level API that provides both stream and table abstractions (where tables present the latest record for each key). Executes as a stand-alone process, with support for parallel processing across threads within a single instance and across multiple instances, with the ability to dynamically scale the number of instances. Introduced in Kafka 0.10.
Manageable via Streams Messaging Manager, Burrow, Confluent Control Centre, LinkedIn Cruise Control, LinkedIn Kafka Monitor, Nastel AutoPilot Is packaged by Apache Bigtop, Hortonworks Data Platform, Hortonworks DataFlow, Cloudera CDH, Confluent Open Source, Confluent Enterprise
version release date release links release comment 0.11 2017-06-28 announcement Includes support for exactly once semantics and easier client upgrades 1.0 2017-11-01 news; blog post 1.1 2018-03-29 news 2.0 2018-07-30 news 2.1 2018-07-30 news 2.2 2019-03-22 news 2.3 2019-06-25 news