Extension to the Spark SQL DataFrame API to allow Spark SQL queries to be executed over streams of data, with the engine continuously updating and maintaining the result as new data arrives. Uses the full Spark SQL engine (including the Catalyst optimiser), and supports end-to-end exactly-once semantics via checkpointing when sources have sequential offsets. Supports aggregations over sliding event-time windows, including support for late data and watermarking. Introduced in Spark 2.0 with a production release in Spark 2.2.
Type Sub-Project Parent Project Apache Spark Last Updated August 2017