Spark library for processing structured data, using either SQL statements or a DataFrame API. Supports querying and writing to local datasets (including JSON, Parquet, Avro, Orc and CSV) as well as external data sources (including Hive and JDBC), including the ability to query across data sources. Includes Catalyst, a cost based optimiser that turns high level operations into low level Spark DAGs for execution. Also includes a Hive compatible Thrift JDBC/ODBC server that's compatible with Beeline and the Hive JDBC and ODBC drivers, and a REPL CLI for interactive queries. Introduced in Spark 1.0 with a production release in Spark 1.3, with substantially improved SQL functionalities in Spark 2.0.
Type Sub-Project Parent Project Apache Spark Last Updated August 2017