Apache Arrow edit   discuss  

In-memory data structure specification for building columnar based data systems. Provides a standard interchange format to allow sharing of data between processes on a node without the overhead of moving or transforming the data, permits O(1) random access and has the ability to represent both flat relational structures and complex hierarchical nested data. Data is organised using a columnar structure memory-layout making it cache efficient for analytical workloads (which typically group all data relevant to a column operation together) and allows execution engines to take advantage of modern CPU SIMD (Single Instruction Multiple Data) instructions which work on multiple data values simultaneously in a single CPU clock cycle. Supports Java, C, C++, JavaScript, Python, Go, Ruby and Rust. Seeded from the Apache Drill project and promoted directly to a top level Apache project in February 2016 followed by an initial 0.1 release in October 2016. Used in a range of other projects including Drill, Spark, Impala, Kudu, Pandas and others. Has not yet reached a v1.0 milestone, but is still under active development with a range of contributors from a number of other Apache and non-Apache data projects.

Technology Information

Other NamesArrow
VendorsThe Apache Software Foundation
TypeCommercial Open Source
Last UpdatedJuly 2019 - v0.14

Release History

versionrelease daterelease linksrelease comment
0.82017-12-18blog post; release notes 
0.92018-03-21blog post; release notes 
0.102018-08-07blog post; release notes 
0.112018-10-09blog post; release notes 
0.122019-01-21blog post; release notes 
0.132019-04-02blog post; release notes 
0.142019-07-02blog post