Apache CarbonData edit   discuss  

Unified storage solution for Hadoop based on an indexed columnar data format, focusing on providing efficient processing and querying capabilities for disparate data access patterns. Data is loaded in batch, encoded, indexed using multiple strategies, compressed and written to HDFS using a columnar file format. Provides a number of highly configurable indexes (multi-dimensional key, min/max index, and inverted index), global dictionary encoding and column grouping to support interactive style OLAP queries, high throughput scan queries, low latency point queries and individual record queries. Also supports batch updates and deletes using delta bitmap files and compaction. Written in Java using Apache Thrift, supports all common primitive data types and complex nested data types including array and structures. Consists of several modules, the format specification and core implementation (columnar storage, indexing, compression, encoding), Hadoop input/output format interface, deep integration with Spark, interfacing to Spark SQL and the DataFrame API and connectors for Hive and Presto. Started back in 2013 at Huawei's India R&D center, donated to the Apache Foundation in 2015, graduated in April 2017, with a stable (1.1.0) release in May 2017, and under active development.

Technology Information

Other NamesCarbonData
VendorsThe Apache Software Foundation
TypeCommercial Open Source
Last UpdatedSeptember 2019 - 1.6

Release History

versionrelease daterelease linksrelease comment
1.32017-02-03release notes 
1.42018-06-04release notes 
1.52018-10-23release notes 
1.62019-08-19release notes 

News