StreamSets Data Collector edit   discuss  

General purpose technology for the movement of data between systems, including the ingestion of batch and streaming data into an analytical platform. Pipelines are configured in a graphical user interface, and consist of a single origin, one or more processor stages and then one or more destinations, with support for a wide range of source/destination technologies and processor transformations. Supports a wide range of data formats, executors (tasks that can be triggered based on events from pipelines, e.g. to send e-mails or run a shell script), handling of erroroneous records, support for CDC CRUD records, previewing of data within the editor UI, real-time reporting and alerting on a range of execution and data quality metrics, the ability to dynamically handle changes to schemas and the semantic meaning of data and a full Python SDK. Can run in standalone mode (as a single process, with the option to run single or multi-threaded), as a Spark Straming or MapReduce job on a cluster, or in an ultralight agent (StreamSets Data Collector Edge). Java based, Open Source under the Apache 2.0 licence, hosted on GitHub, with development led by StreamSets who also provide commercial support and a number of commercial add-ons, including Control Hub (cloud service for developing and managing pipelines), Dataflow Performance Manager (for managing data metrics) and Data Protector (for managing senstive data). Started in October 2014, with a v1.0 release in September 2015.

Technology Information

VendorsStreamSets
TypeCommercial Open Source
Last UpdatedAugust 2019 - v3.10

Release History

versionrelease daterelease linksrelease comment
3.02017-12-15See 3.0 notes on documentation and release page; blog post 
3.12017-03-30See 3.1 notes on documentation and release page 
3.22018-05-11See 3.2 notes on documentation and release page 
3.32018-05-24See 3.3 notes on documentation and release page 
3.42018-08-10See 3.4 notes on documentation and release page; blog post 
3.52018-10-01See 3.5 notes on documentation and release page; blog post 
3.62018-11-26See 3.6 notes on documentation and release page 
3.72019-01-08See 3.7 notes on documentation and release page 
3.82019-03-14See 3.8 notes on documentation and release page 
3.92019-06-06See 3.9 notes on documentation and release pageblog post
3.102019-08-01See 3.10 notes on documentation and release pageblog post

News