DW Landscape - The Information Difference Company Limited

Search

Go to content

Main menu

DW Landscape

Products > Landscape

The Data Warehouse Landscape - Q4 2011

The Information Difference Landscape is a high level assessment of the main and most innovative vendors in a market at a point in time.  The diagram shows three dimensions.  The size of the bubble is an indication of the customer base of the vendor i.e. the number of corporations it has sold to, adjusted for deal size.  The larger the bubble, the broader the customer base, though it is by no means to scale.  The technology dimension position is derived from a weighted set of scores based on four factors: customer satisfaction as measured by a survey of reference customers, analyst impression of the technology, maturity of the technology and breadth of technology in terms of its coverage against our functionality model. The market strength position is derived from a weighted set of scores based on five factors: data warehouse revenues, growth, financial strength, breadth of partner network and geographic coverage.

 
 

The data warehouse market, following an explosion in the number of vendors over the last few years, went through some consolidation in 2011.  IBM purchased Netezza, Teradata bought Aster Data, while Greenplum had already been swallowed by up EMC.  HP gave up its ill-fated Neoview offering and bought Vertica instead.  There is now a clear divide in scale between the major platform vendors i.e. Oracle, IBM, Microsoft and Teradata, and clutch of smaller, specialist vendors.  The major platform vendors have thousands of deployed data warehouse customers, while the next group measure their deployments in the tens or low hundreds.  Oracle remains the largest data warehouse vendor, with its Exadata appliance widely deployed in addition to its main database being used for data warehousing, while Teradata, IBM and Microsoft continue to invest heavily in their platforms.  The sheer size of data warehouses continues to rapidly expand, posing serious performance challenges to conventional approaches.   Some industries, such as telecoms, social media, banking and on-line advertising, now have to deal with extremely large data volumes, and petabyte scale data warehouse deployments are no longer unheard of.  
          
One interesting development is how the row versus columnar argument is changing.  At one time Sybase was a lone prophet for the columnar cause, but many of the recent appliance vendors have combined columnar with massively parallel processing (MPP) architectures, demonstrating that this combination can offer appealing performance gains for many analytic use cases.  The traditional vendors, who initially denied such claims, have now responded by incorporating columnar design options to various degrees within their own products, essentially offering columnar as well as row orientation as a customer design decision.  Columnar specialists continue to develop their own technologies, offering customers a richer choice of warehouse design approaches than was the case.

 
Appliances continue to gain ground, whether from new or traditional vendors, offering as they do the advantage of pre-built, pre-tuned hardware and database software, reducing the need for database tuning.  Given the variety of choices now available to them, customers with large existing warehouses are well-advised to consider whether either performance or cost advantages can be gained by considering fresh approaches, though the effort in migrating existing data warehouses (especially ones with a lot of proprietary code in stored procedures etc.) will continue to be a barrier to migration in many cases.  

              
A major development in 2011 has been the level of attention placed on non-traditional database processing.  The Apache Hadoop framework (with its own Hadoop Distributed File System) has gained considerable interest as a means of tackling analysis of high volumes of certain types of data, such as web content.  Appliance vendors that took an early interest in this include Greenplum (now bought by EMC) and Aster Data (now bought by Teradata).  However most other vendors are now offering some form of support for Hadoop processing, though often in a limited form (such as a basic connector) at this stage.  This area is still emerging, and is likely to undergo particularly rapid development in the coming years, especially given the ever-increasing data volumes that customers have to cope with.     

    
Specialist vendors continue to add value in certain niches.  Kalido and Wherescape offer products that speed up the design and implementation of data warehouses, driven by business models and reducing the need for ETL scripting and manual schema design.  Offering a warehouse via the cloud is another area that is gradually attracting interest, with Kognitio a specialist in this area.  Many of the specialist appliance vendors have built their technology on top of a MySQL interface, reducing the learning curve for customers: Infobright is an example of this approach.

      
In 2011 SAP unveiled a new strategy for analytics.  It has already acquired Sybase (an innovator in columnar databases) but has now added an in-memory database, HANA, to its portfolio, complementing its business warehouse application, and putting its considerable marketing clout behind this.  Although it is early days for the product, this will clearly have an impact on the market.


It is important to understand that the different offerings target different sub-segments of the data warehouse market.
You may want a data warehouse that can scale to hundreds of terabytes or even petabytes of data, but you may equally have a data warehouse with just a few terabytes of data that still requires rapid analysis, perhaps with very complex data.  It can also be seen that, with such large data volumes, query performance is not the only criteria that customers need to consider. Data warehouses these days can be mission-critical, so a high degree of fault tolerance is expected and offered by most vendors e.g. if a single server node within an appliance fails then this should not result in the whole data warehouse crashing.  The degree of robustness varies: some vendors do not support an entirely “shared nothing” architecture, while many appliances can require a restart when a new or replacement server node is added, for example. Hardware continues to improve, and cheaper memory and solid-state disks means that customers with existing warehouses should carefully consider whether are taking full advantage of these developments.     

With such different sub-markets it is important that end-users carefully consider the alternatives appropriate to them to match their particular need; high-level overviews of the market, such as this Landscape, cannot capture specific customer requirements, and any technology selection process should be discussed in detail with an analyst.


As part of the research process vendors were asked to provide customer references, who were sent a survey on their satisfaction with the vendor’s products (if the vendor failed to provide sufficient references, a neutral score was assigned).
Based on this survey, the data warehouse vendor with the happiest customers in 2011 was Teradata, followed by Calpont, then IBM, followed by Kognitio and Kalido.

Below is a set of vendors who provide data warehouse technology, some are in addition to those covered in our main diagram.

 

Vendor

Brief Description

Website

Algebraix Data

Analytic database running on SMP boxes.

www.algebraixdata.com

Calpont

Provides a column-oriented database called InfiniDB.

www.calpont.com

Cloudera

Provides a distribution of the Hadoop data management platform.

www.cloudera.com

Exasol

German data warehouse appliance vendor.

www.exasol.com

Greenplum

Appliance vendor aiming at high-end warehouses, now part of EMC.

www.greenplum.com

IBM

IBM have as their appliance offerings IBM Smart Analytics System (based on InfoSphere Warehouse software and DB2) and Netezza. IBM's big data offering is BigInsights.

www.ibm.com

Infobright

Provides a column-oriented database.

www.infobright.com

Kognitio

Mature data warehouse appliance, and offers its data warehouse as a service.

www.kognitio.com

Kalido

Not an appliance, but rather an application to generate data warehouses that adapt to change, running on various database platforms.

www.kalido.com

Microsoft

As well as its SQL Server relational database, Microsoft acquired Data Allegro and at the end of 2010 launched its Parallel Warehouse based on this technology.

www.microsoft.com

MonetDB

MonetDB is an open-source database system for high-performance applications.

monetdb.cwi.nl

Oracle

As well as its well-established database, Oracle offers the Exadata warehouse appliance.

www.oracle.com

ParAccel

Provides a column-oriented database appliance.

www.paraccel.com

Sand

Focuses on allowing customers to-effectively retain massive amounts of compressed data in a near-line repository for extended periods.

www.sand.com

SAP/Sybase

Sybase was a pioneer in column-oriented analytic database technology, acquired in mid 2010 by giant SAP.  SAP is now offering the in-memory database technology HANA.

www.sap.com

1010 Data

Provides column-oriented database and web-based data analysis platform.

www.1010data.com

Teradata

Arguably the original pioneer of the data warehouse appliance.

www.teradata.com

Vertica

Appliance vendor Vetica was purchased by HP in 2011.

www.vertica.com

Wherescape

Not an appliance, but a framework for the development and support of data warehouses.

www.wherescape.com

 
 
Back to content | Back to main menu