A catalogue of data transformation, data platform and other technologies used within the Data Engineering space, organised by vendor
Amazon Web Services A subsidiary of Amazon.com that provides infrastructure and platform cloud services, including virtual machines and storage infrastructure services and plus database, analytics, real time data processing and data pipeline platform services, with services available in 16 geographical regions. Launched to support internal Amazon.com services in July 2002, with the first launch of a public service in November 2004, and now comfortably the largest cloud services provider. Cloudera Cloudera is a commercial company focusing on offerings based around an Apache Hadoop distribution that's supplemented with a number of commercial components, distributed as a free express version (with cut down versions of some of the commercial components), and as an enterprise version with an annual subscription fee. They are extreemly active in the Apache open source space, with committers on all the technologies they distribute, and with a history of donating projects to the Apache Foundation that they have either initiated or acquired. Formed in 2008 by ex-employees from Google, Yahoo, Facebook and Oracle, with Doug Cutting, the original author of Hadoop, joining in 2009 as Chief Architect. Google Cloud Platform A cloud computing service operated by Google, with support for infrastructure, storage, databases and analytics services. First services were available in preview in April 2008. Hortonworks Hortonworks is a commercial company focusing on products that support the exploitation of data both at rest and in motion. Their business model is to provide support and professional services for a range of Apache open source technologies which they package and distribute for free. They are therefore extreemly active in the Apache open source space, with committers on all the technologies they distribute, and with a history of donating projects to the Apache Foundation that they have either initiated or acquired. Hortonworks was formed in June 2011 by ex-Yahoo employees. MapR MapR is a commercial company focusing on products built around it's Converged Data Platform, which provides Hadoop compatibility plus NoSQL and streaming data storage capabilities, and which is bundled with a number of Hadoop open source products. They have started and are active in a number of open source components, including Apache Drill and Apache Myriad, both of which they founded. MapR was founded in 2009. Mesosphere Mesosphere is a commercial company developing the Mesosphere Datacenter Operating System (DC/OS). DC/OS is built around Apache Mesos and is itself an open source project. They are therefore extremely active in the open source space. Their business model is to sell subscription licenses based around an Enterprise version of DC/OS, provide training and support for DC/OS and partner-supported technologies. Mesosphere was founded in May 2013 by ex-engineers from Twitter and Airbnb. Microsoft Azure A cloud computing service operated by Microsoft, with support for infrastructure, storage, databases and analytics services, available in 34 geographical regions. Announced in Otober 2008, with first services available in February 2010. Previously known as Windows Azure. ODPi ODPi is a non profit organisation and member of the Linux Foundation that distributes reference specifications for key Hadoop components and APIs to help drive compatibility between Hadoop distributions, sponsoring Apache Bigtop as a reference implementation. Compliance against the spec for platform vendors (to ensure any certified app will run on their platform) and software vendors (to ensure their app will run on any certified platform) is achieved through self-certification against a test suite that's bundled with Apache Bigtop. Current technologies covered by the specifications are HDFS, YARN, MapReduce, HCFS and Hive. Current certified distributions include Altiscale, ArenaData, Hortonworks, IBM and Infosys but notably does not include either Cloudera or MapR who have both publicly stated their objections to the organisation. Currently certified applications are limited to DataTorrent, Apache Hawq, SAS, Syncsort, WANDisco and a range of IBM technologies. Originally founded in February 2005 as the Open Data Platform with language that suggested it was looking to build a standard Hadoop core (the ODP core) based on HDFS, Ambari, YARN and MapReduce. Moved under the Linux Foundation in September 2015. The Apache Software Foundation The Apache Software Foundation is a non-profit organisation that supports a wide range of open source projects, including providing and mandating a standard governance model (including the use of the Apache license), holding all trademarks for project names and logos, and providing legal protection to developers. It was founded in 1999 and now oversees nearly 200 projects.