Unified storage solution for Hadoop based on an indexed columnar data format, focusing on providing efficient processing and querying capabilities for disparate data access patterns. Data is loaded in batch, encoded, indexed using multiple strategies, compressed and written to HDFS using a columnar file format. Provides a number of highly configurable indexes (multi-dimensional key, min/max index, and inverted index), global dictionary encoding and column grouping to support interactive style OLAP queries, high throughput scan queries, low latency point queries and individual record queries. Also supports batch updates and deletes using delta bitmap files and compaction. Written in Java using Apache Thrift, supports all common primitive data types and complex nested data types including array and structures. Consists of several modules, the format specification and core implementation (columnar storage, indexing, compression, encoding), Hadoop input/output format interface, deep integration with Spark, interfacing to Spark SQL and the DataFrame API and connectors for Hive and Presto. Started back in 2013 at Huawei's India R&D center, donated to the Apache Foundation in 2015, graduated in April 2017, with a stable (1.1.0) release in May 2017, and under active development.
Other Names CarbonData Vendors The Apache Software Foundation Type Commercial Open Source Last Updated September 2019 - 1.6
version release date release links release comment 1.3 2017-02-03 release notes 1.4 2018-06-04 release notes 1.5 2018-10-23 release notes 1.6 2019-08-19 release notes