The Big Data Warehouse Landscape - Q4 2016
The idea of the data warehouse originated back in the late 1980s. The databases of that time were designed for handling transaction processing, and struggled to perform well with analytic workloads. Consequently a separate analytic-only copy of corporate transaction data, called a data warehouse, was frequently implemented. This separate store also allowed the possibility of unravelling the multiple, frequently inconsistent, copies of data stored across the various transaction systems such as ERP, CRM and supply chain. In particular data such as customer, product and material typically appeared in multiple systems and were prone to duplication. Although database technologies have evolved considerably, the data warehouse concept has persisted, with specialist databases springing up that were specifically optimised for analytical workloads.
Data warehouses have to deal with ever-increasing data volumes. In 2003 the largest data warehouse in the world was 30 TB in size, yet there are many examples now of petabyte sized operational data warehouses, a 30 fold+ increase in just over a decade. With typical data growth rates of 20-50% annually, traditional databases have begun to creak under the strain. Much of the increase in data volume can be attributed to newer data sources, such as web traffic, sensor data and telephone call data, much of which can be huge in volume. Data warehouses have had to adapt to the rise of this “big data”, which is frequently stored in non-traditional file systems such as Hadoop and may include machine-generated data, text, and images. Data warehouses have coped with this either by providing adaptors to big data file systems, or developing technology that is able to handle such data within existing platforms. One approach has been to store such data in a separate physical store from traditional structured data, but to design an optimiser capable of running queries across these different data stores.
Another major change has been the rise of cloud storage as an alternative to on-premise technology within the enterprise. This approach, which promises more scalable platforms that are simpler to maintain for the end-user, is gradually eroding the traditional boundaries of data being stored within the physical data centres of a company. Data warehouses are expected to be capable of being deployed in either a private or public cloud, the latter being exemplified by the advent of Amazon Redshift and its rivals into the market.
Within the data warehouse world, the largest vendors remain Oracle, IBM, Microsoft and Teradata, with Greenplum (now ultimately owned by Dell) and SAS Institute being other large-scale providers. Assorted niche providers fill out the market. Increasingly, but not exclusively, columnar approaches are used for large-scale data warehouses. In general, columnar databases allow greater compression than row-based and offer faster performance for queries at the expense of slower load times. Some traditional database vendors now offer columnar options “under the covers” for suitable database workloads.
The data warehouse world shows signs of both consolidation and innovation, as the large established vendors acquire innovative technologies in the race to stay ahead of the challenges of the market. Data warehouses are being pulled in several directions, having to cope not just with greater data volumes but with non-traditional datatypes as well as being expected to cope with a mix of deployment options, both on-premise and cloud. The significant challenges that result are encouraging the advent of innovative start-ups that in time may reshape the data warehouse landscape considerably.
The main vendors in the market are summarised in the diagram below.
A significant part of the “technology” dimension scoring is assigned to customer satisfaction, as determined by a survey of vendor customers. In this research cycle the vendor with the happiest customers was Teradata. Our congratulations to them.
(*) In the absence of sufficient completed references, a neutral score was assigned to this factor.
Below is a list of the significant data warehouse vendors.
|1010 Data||Provides column-oriented database and web-based data analysis platform.||www.1010data.com|
|Actian||Actian's product is an analytic database on commodity hardware.||www.actian.com|
|Amazon Redshift||Cloud-based data warehouse solution.||www.aws.amazon.com/redshift/|
|Exasol||German data warehouse appliance vendor.||www.exasol.com|
|Greenplum||Appliance vendor aiming at high-end warehouses, now part of Pivotal, a subsidiary of EMC, itself acquired by Dell in 2015.||pivotal.io/big-data/pivotal-greenplum|
|HPCC||An open-source, massively parallel platform for big data processing, developed by LexisNexis Risk Solutions.||www.hpccsystems.com|
|IBM||Infosphere Balanced Warehouse (formerly DB2) is the data warehouse software offering from the industry giant, which also offers two appliances: PureData for Operational Analytics (based on DB2) and PureData for Analytics powered by Netezza technology.||www.ibm.com|
|InfoBright||Provides a columnar-database analytics platform.||www.infobright.com|
|jSonar||Boston-based NoSQL data warehouse vendor.||www.jsonar.com|
|Kalido||Now part of Magnitude Software, Kalido is an application to automate building and maintaining data warehouses that adapt to change, running on various database platforms.||www.kalido.com|
|Kognitio||Mature data warehouse appliance, offering its data warehouse as a service.||www.kognitio.com|
|MarkLogic||Enterprise NoSQL database vendor.||www.marklogic.com|
|Microsoft||As well as its SQL Server relational database, Microsoft acquired Data Allegro and at the end of 2010 launched its Parallel Warehouse based on this technology.||www.microsoft.com|
|MonetDB||MonetDB is an open-source columnar database system for high-performance applications.||www.monetdb.cwi.nl|
|Neo4j||Open source graph database.||www.neo4j.org|
|Oracle||Database and applications giant with its own data warehouse appliance.||www.oracle.com|
|ParStream||Columnar, in-memory, MPP database vendor aimed at analytic processing.||www.parstream.com|
|Qubole||Markets the Qubole Data Service, which accelerates analytics workloads working on data stored in cloud databases.||www.qubole.com|
|Sand||Focuses on allowing customers to effectively retain massive amounts of compressed data in a near-line repository for extended periods.||www.sand.com|
|SAP/Sybase||Sybase was a pioneer in column-oriented analytic database technology, acquired in mid-2010 by giant SAP. SAP also offers the in-memory database technology HANA.||www.sap.com|
|SAS Institute||Comprehensive data warehouse technology from the largest privately owned software company in the world.||www.sas.com|
|Teradata||Database giant with its own data warehouse solutions.||www.teradata.com|
|Vertica||Appliance vendor Vertica was purchased by HP in 2011||www.vertica.com|
|WhereScape||Not an appliance, but a framework for the development and support of data warehouses.||www.wherescape.com|
|XtremeData||US vendor that provides highly scalable cloud database platform.||www.xtremedata.com|