Welcome edit   discuss  

Site

For me, one of the biggest challenges in exploiting data (be that through reporting, big data analytics, machine learning or any one of a dozen similar capabilities) is making sure you have the right data in the right place at the right time to allow you to do this efficiently.

For example, this article from the New York Times and this more recent one from Forbes talks about how the analysis of big data promises unique business insights, but for big-data scientists there is significant manual ‘janitor work’ (up to 80% of their time) required to prepare data, and although the research is sponsored by Data Wrangling tool vendors, the conclusions will resonate with many data scientists. Combine this with the historical cost, delivery speed and agility issues typically associated with delivery data warehouse or reporting solutions, and for me it’s never been clearer that we need to get smarter at how we prepare and manage data.

Part of the solution to this is better Data Engineering, ensuring the processes, tools, technologies, data platforms, regular data feeds and their data preparation jobs are in place to allow the data to be exploited in an efficient, reliable and repeatable way. The aim of this site is therefore to try to offer independent, critical and technical thinking on the technologies, architectural patterns and delivery capabilities that can help address this.

My hope is that this becomes a community owned and authored site of trusted reference material on these topics. To that end, all the content on this site is licensed under the Creative Commons Attribution 4.0 International License and hosted in a public GitHub repository, and there are a set of Discourse forums for discussions. Details of how to contribute and get involved can be found on every page.