Google Cloud DataProc edit  

Service for dynamically provisioning Hadoop clusters on Google Compute Engine based on a single standard set of Hadoop services. Supports selection of virtual machines (including custom machine types and machines with GPUs), usage of custom VM images, a claimed cluster startup time of less than 90 seconds, local storage and HDFS filesystem, programmatic execution of jobs, workflows (parameterisable operations that create clusters, run jobs and then delete the cluster), manual and automatic scaling, initialisation actions (to install extra services or run scripts, with a set of open source actions available), optional components (automatic addition of extra services), automatic deletion of clusters (based on time, usage or idleness), integration with Stackdriver Logging and Monitoring and encryption of data in HDFS and Cloud Storage. Manageable via the Google Cloud Console Web UI and SDK plus an RPC and REST API. Priced an an hourly rate (charged per second) based on the specification of the VMs being used, which is in addition to any Compute Engine or Persistent Disk charges.

Technology Information

Other NamesGoogle DataProc, DataProc
Last UpdatedOctober 2018 - v1.3

Related Technologies

PackagesApache Hadoop, Apache Hive, Apache Pig, Apache Spark, Apache Tez


See Google Cloud Platform updates