Kite Data edit   discuss  

Library that provides a logical dataset and record abstraction over HDFS, S3, local filesystems and HBase, including support for partitioning and views (which allow datasets to be filtered and supports automatic partition pruning). Provides a command line interface and Maven plugin for managing and viewing datasets. Supports Crunch, Flume, Spark and MapReduce, and can integrate with a Hive Metastore to make datasets available through Hive and Impala. Stores data using Avro (utilising Avro schema evolution / resolution) or Parquet.

Technology Information

TypeSub-Project
Parent ProjectKite
Last UpdatedJanuary 2017

Related Technologies

Is packaged by (but deprecated)Cloudera CDH

Blog Posts