The Mid Week News 11/09/2019 edit  

Apologies - we’ve been off on holiday again, hence the radio silence. But we’re back, and with a big old news bump.

Remember, you can get daily news updates from our twitter feed (@OnDataEng)…

Technology updates (details are on the relevant technology pages):

  • Amazon EMR release 5.26 is out, with even better Spark performance
  • And Amazon have also announced Amazon EMR 6.0, with support for Hadoop 3.1 and running Spark jobs in Docker containers
  • Apache ORC 1.6 is out if you’re looking for columnar data storage on HDFS
  • Greenplum 6.0 is finally out if you’re looking for mature shared nothing MPP database
  • Apache CarbonData 1.6 is out if you’re looking for indexed storage of data on HDFS with supports for batch inserts and updates
  • Version 0.5 of the NiFi Registry is out if you’re looking to configuration manage your NiFi flows
  • Version 0.4 of Apache Myriad is out
  • Zenko CloudServer has just released version 8.2

Other technology news:

  • Are you running an Apache Solr version prior to 5.0 - if so there’s an XML bomb attack - link
  • ApacheIoTDB - the Apache open source time series database focusing on IoT use cases has it’s first official release @ 0.8 - link
  • From the ever excellent The Morning Paper, a review of a paper that used “the TPC-H benchmark to assess Redshift, Redshift Spectrum, Athena, Presto, Hive, and Vertica to find out what works best and the trade-offs involved” - link
  • Elastic Cloud is now available on Azure - link
  • Confluent Schema Registry is now available as a cloud service in Confluent Cloud - link
  • Looking for an open source object store - Datanami have the latest on MinIO - link
  • From Datanami, Cloudera’s Q2 results are better than expected - link
  • StreamSets have announced StreamsetsTranformer - a graphical tool for creating Apache Spark pipelines that’s part of their DataOps Platform - link
  • Using Google Cloud Storage with Hadoop - Google have a new version of their Cloud Storage Connector for Hadoop out with a bunch of performance improvements and locking for directory modifications - link
  • ApacheDolphinScheduler has just been accepted into the Apache Incubator - originally called Easy Scheduler, donated by Analysys, it’s a tool for distributed ETL scheduling - link