Firstly, let’s review what I said in The Mid Week News - 27/09/2017:
The big news this week is the simultaneous big product announcements from Hortonworks and Cloudera that look like they might be similar capabilities, but I think are probably trying to solve subtly different problems - we’ll revisit these in a few weeks once there’s more information available and do some technology summaries.
Cloudera SDX (Shared Data Experience, coming in CDH 5.13) appears to be trying to enable the “one” data platform experience that you get with an on premesis CDH cluster in the cloud, specifically a persistent shared storage layer with shared metadata, security and governance and a range of workloads on top. That looks different in the cloud - you probably don’t want a persistent Cloudera cluster that you’re paying for by the hour even if you’re not using it - so SDX gives you a shared storage layer using cloud object storage, a shared metadata and management layer, and then the ability to run compute workloads in isolated transient workload clusters managed through Cloudera Altus. The original sales pitch of a single shared Hadoop data platform re-imagined for the cloud. More details via a Cloudera VISION blog post and a Cloudera Engineering blog post
Hortonworks Data Plane is again all about shared metadata, security and data management, but this time across a range of different data platforms - Hadoop, relational databases and your EDW, either on-premesis or in the cloud, and for data in motion or at rest. It’s open source, extensible for adding new services, with data lifecycle management being first up, allowing you to replicate, backup & restore and tier your data across your data platforms. It’s another cloud service (because obviously), and they talk about it as a Global Data Management Platform. More details via a Hortonworks blog post
With Cloudera SDX, CDH 5.13 has come and gone, and there’s almost no new information about SDX. The 5.13 announcement name checks SDX as the “SDX Cloud Reference Architecture”, which I think probably sums up what it is as much as anything, especially given there’s absolutely no reference to SDX in the Cloudera documentation, and there’s nothing on their site beyond the product page and two blog posts linked above. It feels like this is Cloudera pushing the traditional Hadoop one platform, lots of different workloads message, but now applying it to the cloud as well.
Hortonworks on the other hand seem to be heading in a slightly different direction with the new Hortonworks DataPlane Service. The premise for this is that it becomes a single place to understand, manage and govern all the data your enterprise holds, wherever it may be - it’s a big ask, but it feels like there’s value there. Saying that it’s early days for this product is an understatement - it’s now had it’s first generally available release (see this post) and there’s a big pile of documentation on the Hortonworks site, but the functionality at the moment is pretty limited, and there’s no visibility yet of plugin services coming from any of the Hortonworks partners. And this is an interesting change for Hortonworks, in that this is a commercial managed service offering and not open source software (although there’s no public sign up process yet and the documentation talks about how to install it), and it only works with Ambari managed clusters and you have to have a SmartSense ID. Which makes you wonder whether this a response to challenges in generating revenue from support and consultancy from fully open source software. It will also be interesting to see how this will impact Atlas and Ranger - you could easily see a world where a lot of the end user functionality in these products migrates into the DataPlane service. One to watch I think - it’ll be interesting to see where this goes.