The Data Quality Landscape – Q1 2020
Data quality is an issue that affects everyone. A KPMG survey in 2017 found that three-quarters of CEOs did not trust the data on which they were basing their decisions, while IBM estimate that data quality cost the US economy alone a staggering 3.1 trillion dollars every year. The computer systems that we rely on have data entered into them by human beings, and human beings are unreliable. Although most systems attempt, to a degree, to validate data when it is entered, some things are hard to pick up and slip though. Not only do typos occur and fields are left incomplete, but data is stored in multiple places. An Information Difference survey a few years ago found that a typical organization had a median of 15 systems generating competing versions of data about customer, products, suppliers, assets and more.
Different departments look at data from different perspectives. The marketing department care about a product’s brand, its price, packaging and whether it is on special offer, but the logistics group care about the product’s dimensions, weight and how many can fit on a pallet. These different requirements inevitably mean that data is scrutinised with different degrees of rigour by different business units, and data errors creep in, undetected. The result is data that is duplicated and has varying levels of accuracy.
The data quality industry has developed to try and improve this picture, focusing initially on the ubiquitous customer name and address record that just about every company has, but extending to other data too. Clever software can detect common misspellings, and is aware that “Andy Hayler” and “Andrew Hayler” living at the same address are probably one and the same person. Data quality software can “profile” files and databases to quickly assess aspects of data quality, and point out some likely issues. Clever matching algorithms can scan records in different systems and detect likely duplicates based on business rules. The software can be tuned so that almost certain matches and errors are fixed automatically, with borderline cases referred for human assessment. Recently, artificial intelligence has been applied to this too, with software being trained to watch human experts assessing which records are duplicates and which are not, and developing the ability to mimic that human behaviour, further automating the process of data cleansing. Existing data can be enriched via data quality software using 3rd party data, so for example a customer name and address may have added value if you add certain extra aspects to it. It might be useful to know the credit rating of a customer, which social demographic they are in (if you are a marketer), or whether their house is in a flood plain (if you are an insurer) or which voting district they live in (if you are a pollster).
The scope of data quality is having to extend beyond traditional file and database formats, as more and more data is being stored in the cloud, and as increasing volumes of data are being captured in new ways such as from sensors, mobile devices, smart meters and more. We see most data quality vendors now offering cloud as well as on-premise offerings. Government regulation, as well as cost saving, is increasingly a driver for improved data quality. Many industries are heavily regulated, and may be asked to demonstrate by regulators that the data they are providing to the regulator is accurate. Even without regulation, the potential for cost savings in most organizations is considerable, and usually much less than the cost of buying and implementing data quality software.
Data quality improvement can be a lot more important than reducing the number of mailing errors in an advertising campaign. A 2020 study of an NHS hospital in northern England found that linking medical device data directly to the hospital’s patient record system improved the quality of the data significantly, since previously “members of our ICU team were double-documenting and manually transcribing patient data, meaning there was an increased risk of transcription error.” Quite apart from the obvious benefits of having improved patient data, this enabled doctors to spend more time with their patients, according to the hospital’s ICU manager. Every little improvement in productivity matters in these days of healthcare systems where ICUs are stretched due to the coronavirus pandemic.
The diagram that follows shows the major data quality vendors, displayed in three dimensions. See later for definitions of these.
It is important to understand that this is a high-level representation of the market, with vendors represented on the chart specialising in different areas and at very different price-points. If you are considering data quality software, it is important to tailor your selection process to the particular needs that you have rather than relying on high-level diagrams such as this. The Information Difference has various detailed models that can assist you in vendor selection and evaluation.
As part of the landscape process, each vendor was asked to provide at least ten reference customers (some vendors provided many times that number), which were surveyed to determine their satisfaction with the data quality software of the vendor. The happiest customers based on this survey were those of Datactics, followed by those of Syncsort and Active Prime, closely followed by those of Innovative Systems and Melissa Data, then Experian. Congratulations to those vendors.
Below is a list of the main data quality vendors.
|ActivePrime||Canada-based vendor of data quality for CRM systems.||www.activeprime.com|
|Address Doctor||Vendor that specialises in providing wide coverage of name and address information; now owned by Informatica.||www.informatica.com/addressdoctor.html - fbid=-gz2yeRJkyH|
|Ataccama||Prague-based company with a modern data quality suite.||www.ataccama.com|
|Capscan||London-based provider of address management and data integrity services, now owned by GB Group.||www.gbgplc.com/uk/|
|Data Mentors||Long-established US data quality vendor.||www.datamentors.com|
|Datactics||UK-based vendor of data quality and matching software to banking, finance, government, healthcare and industry.||www.datactics.com|
|Datiris||Colorado vendor of data profiling technology.||www.datiris.com|
|Datras||Munich-based vendor with wide ranging data quality functionality.||www.datras.de|
|DQ Global||UK data quality and address verification software.||www.dqglobal.com|
|Experian||UK-based vendor specialising in customer name and address validation, data profiling and data enrichment.||www.edq.com|
|The search engine giant does data quality.||github.com/OpenRefine|
|helpIT/360 Science||US/UK vendor of integrated contact data quality solutions including matching and address validation.||www.helpit.com|
|Human Inference||Dutch data quality vendor.||www.humaninference.com|
|IBM||Data quality software from the industry giant.||www.ibm.com|
|Infogix||Illinois-based vendor specialising in controls and compliance.||www.infogix.com|
|Infoglide||US vendor specialising in identity resolution.||www.infoglide.com|
|Informatica||California-based vendor, a major player in data quality.||www.informatica.com|
|Infoshare||UK data quality specialising in the public sector market.||www.infoshare-is.com|
|Innovative Systems||Long established data management vendor with extensive offerings including data profiling, data quality, address|
validation/geocoding, 360° view, and risk management solutions.
|Inquera||Israeli company with an approach to product data quality using machine-learning technology based on subject domain experts' knowledge.||www.inquera.com|
|Intelligent Search||Identity management company now with a more general data quality capability.||www.intelligentsearch.com|
|Irion||Italian data quality vendor specialising in financial services.||www.irion.it/index.php/en/|
|Melissa||US/German global data quality vendor offering address verification, geocoding and matching solutions.||www.melissa.com|
|Microsoft||DQS is the data quality offering of the Redmond software behemoth.||www.microsoft.com|
|Netrics||New Jersey vendor of matching software. Now owned by Tibco.||www.tibco.com/products/automation/application-integration/pattern-matching|
|Oracle||The software giant's data quality offerings are based on the acquisitions of Datanomic and SilverCreek.||www.oracle.com|
|Pitney Bowes||Pitney Bowes, a global technology company, provides data quality solutions through its Customer Information Management (CIM) unit, which is part of its Digital Commerce Solutions division.||www.pitneybowes.com/us/customer-information-management/data-quality.html|
|Postcode Anywhere||UK vendor of web-based addressing software.||www.postcodeanywhere.co.uk|
|SAP||The software giant is a major data quality player.||www.sap.com|
|SAS||One of the leading players in data quality.||www.sas.com/en_us/software/data-management/data-quality.html|
|Satori Software||Seattle-based provider of address management solutions.||www.satorisoftware.com|
|Syncsort||One of the leading data quality vendors, having absorbed the data quality businesses of Trillium and Pitney Bowes.||www.syncsort.com|
|Talend||Open source vendor with wide range of quality functions that are tied to data integration and MDM.||www.talend.com|
|tamr||Vendor that applies machine learning to the data quality problem.||www.tamr.com|
|Trillium Software||One of the leading data quality vendors, now acquired by Syncsort.||www.trilliumsoftware.com|
|Uniserv||Large German data quality vendor.||www.uniserv.com|
Other vendors of data quality software include:
The Information Difference Landscape diagram shows three dimensions of a vendor:
- Market strength
- Customer base.
“Market strength” is made up of a weighted set of five factors: revenues, growth, financial strength, geographic scope and partner network. Each of these individual elements is scored, the total producing the “market strength” figure. Similarly “technology” is made up of four factors: “technology breadth” (the coverage of the vendors in various data quality areas as illustrated below), the longevity of the software in the market, analyst perception of the product via briefings, and customer feedback from reference customers (this has a high weighting), which we surveyed. In each case the scoring is on a scale of 0 (worst) to 6 (best).
Vendors were asked to submit answers to various questions via a questionnaire. Vendors were interviewed directly by an analyst and their software demonstrated and assessed. Reference customers were surveyed to give their experience of the software of each vendor. The technology functions which the vendors were asked about are as shown below. These are drawn from the Information Difference vendor functionality model; if you are interested in more detail on this then please contact The Information Difference.