So I’ve been looking at self service data preparation tools this week, and it’s fair to say that once again the topic at hand has turned out to be far much more that I expected…
Last week we looked at data ingest - getting data to your analytical platform from which point you could then conform, standardise, integrate and otherwise prepare it for analytics. This is difficult - the complexity of this preparation, the variety and volume of input data, the potentially widely varying levels of data quality, and the range of analytics you might want to do make this extreemly challenging - just look at every failed or massively overrun BI or analytics project.
This is going to be a really important area for us to look at one this site - perhaps the most important one. And these week I’ve stumbled into it by accident before I was ready.
Self service data preparation tools - I’ve seen some of these before I thought - they’re basically tools that allow you to do basic ingestion and transformation of ad-hoc data sources, often targeting power users or analysts rather than data engineers.
And that may have been true a few years ago, but it’s clear this is an area that’s seen massive change over the last few years, to the point where there are now a huge range of tools covering a range of capabilities including data cataloging (crawling your data sources and constructing models of how it all fits together, often supported by machine learning), data profiling, test data management, data preparation (targeting both analysts and power users with user friendly and powerful graphical user interfaces and data engineers via extensions to existing and established data integration and transformation tools) as well as all the follow up stuff including workflow management, data quality management, metadata management and data governance.
And these tools all cover different capabilities - although there are some stand alone tools there’s a huge range that cover multiple capabilities, from traditional data integration tools that have added new functionality, data lake management tools, analytics tools that including data ingest/preparation functionality, semantic web technologies, data warehouse automation tools, and all in one end to end analytical tools.
So it’s going to take me a while to get to the bottom of these, and I feel like I have a lot of reading to do.
For now, we’re going to take a three week break for Christmas, back on the 8th of January. When we come back we might take a quick look at streaming analytics, maybe a bit of a review of commercial analyst reporting, and then we’ll dive headlong into this.
Have a good holiday everyone, and we’ll see you soon…