Streaming Data Stores edit   discuss  

Our list of and information on commercial, open source and cloud based streaming data stores, including Kafka, Confluent, MapR-ES and alternatives to these.

Category Definition

Technologies for the persistent storage of continuous streams of data, with data access based on a publish/subscribe model. Should support multiple independent publishers and subscribers, the ability to add new subscribers and replay the history of a stream, horizontal scalability and load balancing, durable writes, ordered streams (data is always read in the order it was written), high throughput and low latency characteristics, handling of updates and deletes to source records, and the ability to secure the data.

Open Source Technologies

The following are open source Streaming Data Store technologies:

Apache KafkaTechnology for buffering and storing real-time streams of data between publishers to subscribers, with a focus on high throughput at low latency.
Confluent Open SourceA package of open source projects built around Apache Kafka with the addition of the Confluent Schema Registry, Kafka REST Proxy, a number of connectors for Kafka Connect and a number of Kafka clients (language SDKs).
PravegaTechnology for the buffering and long term storage of streaming data, designed for low latency and high throughput, with support for exactly once semantics, durable writes, strict ordering, dynamic scaling, transactions and long term storage backed by HDFS.
Apache BookKeeperDistributed log storage service from Yahoo - http://bookkeeper.apache.org/
Apache DistributedLogDistributed log service from Twitter supporting durability, replication and strong consistency built over Apache BookKeeper - http://bookkeeper.apache.org/distributedlog/
Apache PulsarDistributed pub-sub messaging from Yahoo, with persistent message storage based on Apache BookKeeper - http://pulsar.incubator.apache.org/
LogDeviceOpen source distributed data store for sequential data from Facebook - https://logdevice.io/

Note that Apache Kafka is bundled with a number of Hadoop distributions.

Commercial Technologies

The following are commercial Streaming Data Store technologies:

Confluent EnterpriseA commercial version of the Confluent Open Source product, with the addition of a number of commercial closed source products including a JMS client, Control Centre (for managing Kafka clusters), Multi DC Replication (active-active replication between Kafka clusters) and Auto Data Balancing.
MapR-ESPart of the MapR Converged Data Platform - supports streaming data storage capabilities and a Kafka compatible API
AMQ StreamsKafka distrubtion from RedHat that runs on OpenShift - https://access.redhat.com/products/red-hat-amq-streams

Technologies Available as a Service

The following are Streaming Data Store technologies available as a managed service in the cloud:

Confluent CloudConfluent Enterprise as a service - https://www.confluent.io/confluent-cloud/
Amazon Kinesis StreamsStreaming data storage and publish service - https://aws.amazon.com/kinesis/streams/
Amazon Managed Streaming for Kafka (MSK) (public preview)Fully managed, highly available, and secure Apache Kafka service - https://aws.amazon.com/msk/
Azure Event HubsElastic service for the buffering and publishing of streaming event data with a Kafka compatible end point - https://azure.microsoft.com/en-us/services/event-hubs/
Google Cloud Pub/SubReal time message and streaming data service with “at least once” delivery - https://cloud.google.com/pubsub/

Blog Posts