Spark Streaming edit  

Spark library for continuous stream processing, using a DStream (discretized stream) API. Uses a micro-batch execution model leveraging core Spark to execute the specified logic against each micro-batch (a DStream is a sequence of Spark RDDs), with the ability to also use other Spark batch operations (including Spark SQL and MLlib) against each micro-batch. This model also provides fault tolerance through exactly-once processing semantics. Supports a number of data sources (including HDFS, sockets, Flume, Kafka, Kinesis and messaging buses), as well as functions to maintain state and to execute windowed operations. First introduced in Spark 0.7, with a production release as part of Spark 0.9, however development appears to be largely stopped following the introduction of Structured Streaming in Spark 2.0

Technology Information

Parent ProjectApache Spark
Last UpdatedAugust 2017

Blog Posts