A set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem. Consists of three sub-projects - Kite Data (a logical dataset abstraction over Hadoop), Morphlines (embeddable configuration driven transformation pipelines) and Kite Maven Plugin (a Maven plugin for deploying Hadoop applications). Java based, Open Source under the Apache 2.0 licence and hosted on GitHub. First released in May 2013 by Cloudera as the Cloudera Development Kit (CDK), renamed to Kite in December 2013, and reached a v1.0 release in February 2015 with a number of external contributors. Last release was v1.1 in June 2015, with very little development activity since this time.
Other Names Cloudera Development Kit, CDK Vendors Cloudera Type Commercial Open Source Last Updated January 2017 - v1.1
Kite > Kite Data Library that provides a logical dataset and record abstraction over HDFS, S3, local filesystems and HBase, including support for partitioning and views (which allow datasets to be filtered and supports automatic partition pruning). Provides a command line interface and Maven plugin for managing and viewing datasets. Supports Crunch, Flume, Spark and MapReduce, and can integrate with a Hive Metastore to make datasets available through Hive and Impala. Stores data using Avro (utilising Avro schema evolution / resolution) or Parquet. Kite > Kite Maven Plugin A Maven plugin that supports the packaging, deployment and execution of applications onto Hadoop. Kite > Morphlines A configuration driven in-memory transformation pipeline that can be embedded into any Java code base, with specific support for Flume, MapReduce, HBase, Spark and Solr. Supports multiple different file types including CSV, Avro, JSON, Parquet, RCFile, SequenceFile, ProtoBuf and XML plus gzip, bzip2, tar zip and jar files. Also supports a number of transformation steps out of the box, including integration with Apache Tika for reading common file formats.
Is packaged by Apache Bigtop