Library that provides a logical dataset and record abstraction over HDFS, S3, local filesystems and HBase, including support for partitioning and views (which allow datasets to be filtered and supports automatic partition pruning). Provides a command line interface and Maven plugin for managing and viewing datasets. Supports Crunch, Flume, Spark and MapReduce, and can integrate with a Hive Metastore to make datasets available through Hive and Impala. Stores data using Avro (utilising Avro schema evolution / resolution) or Parquet.
Type Sub-Project Parent Project Kite Last Updated January 2017
Is packaged by (but deprecated) Cloudera CDH