Data serialisation framework that supports a columnar storage format to enable efficient querying of data. Built using Apache Thrift, and supports complex nested data structures, using techniques from the Google Dremel paper. Consists of three sub-projects, the specification and Thrift definitions (Parquet Format), the Java and Hadoop libraries (Parquet MR) and the C++ implementation (Parquet CPP). Created as a joint effort between Twitter and Cloudera based on work started as part of Avro Trevni, with an initial v1.0 release in July 2013. Donated to the Apache Foundation in May 2014, graduating in April 2015. Has seen significant adoption in the Hadoop ecosystem.
Other Names Parquet Vendors The Apache Software Foundation Type Commercial Open Source Last Updated December 2018 - v1.11 (Parquet MR), v2.3 (Parquet Format), v1.5 (Parquet C++)
Is packaged by Cloudera CDH
version release date release links release comment 1.0 (Parquet C++) 2017-03-17 announcement 1.1 (Parquet C++) 2017-05-22 1.2 (Parquet C++) 2017-08-04 announcement 1.3 (Parquet C++) 2017-09-25 announcement 1.4 (Parquet C++) 2018-03-07 announcement 1.10 (Parquet MR) 2018-03-31 changelog 1.5 (Parquet C++) 2018-09-22 announcement 1.11 (Parquet MR) 2018-11-22 changelog