DQ Landscape

The Data Quality Landscape – Q1 2021

Data quality, or rather the lack of it, is a perennial problem that affects us all. We are all familiar with duplicated bills and marketing flyers arriving on our doorsteps, but things can get much more serious than a dodgy name and address on an invoice. The loss of the NASA Orbiter spacecraft due to a unit of measure error is a famous example, but there are many other examples. One oil company nearly drilled into a pipeline at sea due a data quality error giving the wrong location, a major spill being averted only because this particular pipeline was down for maintenance and had no oil flowing through it. In the USA, patient misidentification is the third leading cause of preventable death according to the US government, while the Network for Excellence in Health Innovation reckons that prescription errors costs $21 billion a year, along with 7,000 deaths annually.

There are many causes of data quality issues, many that come down to human nature. People make mistakes when entering data into computer systems, especially if the data being typed in does not directly affect them. Staff care a lot about whether their expenses data is accurate and whether they have been paid this month, but may be less diligent about entering a code in a system that is of use to someone in another department.

Large organizations have dozens if not hundreds of different applications, and many of these have competing versions of data about products, customers, locations and inventory. It is easy for a sales person to create a new account, but not so easy to check and be certain that a record for that customer already exists elsewhere. This is especially true in companies that sell to businesses rather than consumers. Company names may bear little relationship to ownership, so it may not be at all clear that Aera Energy, Equilon Enterprises and Arrow Energy are all part of Shell, but they are. The software industry has been producing products to help improve data quality for decades, though actual implementation of data quality software is patchy. Many organizations use some form of data quality tool, but few have widely implemented such tools at source all the way through the company, and monitor their data quality on a regular basis. Just a third of CEOs actually trust their corporate data, according to one recent KPMG survey. The problem is worsened by the proliferation of data that increasingly exists, with much of it stored outside direct corporate control, for example in partner databases, bought-in third party data, or in applications residing in the public cloud. In many industries regulators are increasingly scrutinising the quality of data, with potentially serious consequences for major errors found in data.

Much of the data quality industry is focused on name and address validation, which is a relatively simple problem, at least in principle. Software has become very good at picking up common typing errors and likely duplicate entries, and can add a lot of value to a customer name and address. For example, some software can enhance a business address with information about the credit rating of the company at that address, or point out whether that building lies in a flood plane, which is useful to know if you are an insurer.

There are many opportunities for data quality vendors to spread their wings and better tackle problems in other data domains, such as product, asset and inventory data. Data quality software may include profiling, matching, cleansing and enrichment functionality, as well as monitoring changes in data quality once things are set up. Data matching has long used mathematical algorithms, but recently has also adopted artificial intelligence to help. An example is where an expert system is trained by seeing how a human expert deals with potential duplicate data, and then uses that experience to refine the software’s own efforts. Data quality is a problem that is not going away, but there are increasingly sophisticated solutions to help improve it.

The diagram that follows shows the major data quality vendors, displayed in three dimensions. See later for definitions of these.

It is important to understand that this is a high-level representation of the market, with vendors represented on the chart specialising in different areas and at very different price-points. If you are considering data quality software, it is important to tailor your selection process to the particular needs that you have rather than relying on high-level diagrams such as this. The Information Difference has various detailed models that can assist you in vendor selection and evaluation.

As part of the landscape process, each vendor was asked to provide at least ten reference customers (some vendors provided many times that number), which were surveyed to determine their satisfaction with the data quality software of the vendor. The happiest customers based on this survey were those of Innovative Systems, followed closely by ActivePrime, Datactics, Precisely, Ataccama and Experian. Congratulations to those vendors. A software vendor can show you an impressive slide deck and a dazzling demonstration, but it is actual paying customers that are in the best position to determine the true success of the implementation of software in practice.

 

Main Vendors

Below is a list of the main data quality vendors.

VendorBrief DescriptionWebsite
ActivePrimeCanada-based vendor of data quality solutions for CRM systems.www.activeprime.com
Address DoctorVendor that specialises in providing wide coverage of name and address information; now owned by Informatica.www.informatica.com/addressdoctor.html - fbid=-gz2yeRJkyH
AtaccamaCanada-based company with a modern data quality suite.www.ataccama.com
CapscanLondon-based provider of address management and data integrity services, now owned by GB Group.www.gbgplc.com/uk/
Data MentorsLong-established US data quality vendor.www.datamentors.com
DatacticsUK-based vendor of data quality and matching software to banking, finance, government, healthcare and industry. www.datactics.com
DatirisColorado vendor of data profiling technology.www.datiris.com
DatrasMunich-based vendor with wide ranging data quality functionality.www.datras.de
DQ GlobalUK data quality and address verification software.www.dqglobal.com
ExperianUK-based vendor specialising in data quality, including name and address validation, data profiling and data enrichment.www.edq.com
GoogleThe search engine giant does data quality.github.com/OpenRefine
helpIT/360 ScienceUS/UK vendor of integrated contact data quality solutions including matching and address validation.www.helpit.com
Human InferenceDutch data quality vendor.www.humaninference.com
IBMData quality software from the industry giant.www.ibm.com
InfogixIllinois-based vendor specialising in controls and compliance.www.infogix.com
InfoglideUS vendor specialising in identity resolution.www.infoglide.com
InformaticaCalifornia-based vendor, a major player in data quality.www.informatica.com
InfoshareUK data quality specialising in the public sector market.www.infoshare-is.com
Innovative SystemsLong established data management vendor with extensive offerings based on crowdsourced AI, including data profiling, data quality, address validation/geocoding, 360° view, and risk management solutions.www.innovativesystems.com
InqueraIsraeli company with an approach to product data quality using machine-learning technology based on subject domain experts' knowledge.
www.inquera.com
Intelligent SearchIdentity management company now with a more general data quality capability.www.intelligentsearch.com
IrionItalian data quality vendor specialising in financial services.www.irion.it/index.php/en/
Melissa DataInternational vendor offering data quality solutions including verification, matching and enrichment for address, geocoding, identity, corporate, health and other data domains.www.melissadata.com
MicrosoftDQS is the data quality offering of the Redmond software behemoth.www.microsoft.com
NetricsNew Jersey vendor of matching software. Now owned by Tibco.www.tibco.com/products/automation/application-integration/pattern-matching
OracleThe software giant's data quality offerings are based on the acquisitions of Datanomic and SilverCreek.www.oracle.com
Postcode AnywhereUK vendor of web-based addressing software.www.postcodeanywhere.co.uk
PreciselyThis major player in the market owns what were the data quality offerings of both Pitney Bowes and Syncsort (Trillium).www.precisely.com
SAPThe software giant is a major data quality player.www.sap.com
SASOne of the leading players in data quality.www.sas.com/en_us/software/data-management/data-quality.html
Satori SoftwareSeattle-based provider of address management solutions.www.satorisoftware.com
TalendOpen source vendor with wide range of quality functions that are tied to data integration and MDM.www.talend.com
tamrVendor that applies machine learning to the data quality problem.
www.tamr.com
UniservLarge German data quality vendor.www.uniserv.com

Other vendors of data quality software include:

Ciantwww.ciant.com
Data Leverwww.redpoint.net
Data Mentorswww.datamentors.com
Infosolvewww.infosolvetech.com
Interverawww.intervera.com
Ixsightwww.ixsight.com
MSIwww.msi.com.au
Reverwww.rever.eu
TIQ Solutionswww.tiq-solutions.com
Winpurewww.winpure.com
Wizsoftwww.wizsoft.com

Research Methodology

The Information Difference Landscape diagram shows three dimensions of a vendor:

  • Market strength
  • Technology
  • Customer base.

“Market strength” is made up of a weighted set of five factors: revenues, growth, financial strength, geographic scope and partner network. Each of these individual elements is scored, the total producing the “market strength” figure. Similarly “technology” is made up of four factors: “technology breadth” (the coverage of the vendors in various data quality areas as illustrated below), the longevity of the software in the market, analyst perception of the product via briefings, and customer feedback from reference customers (this has a high weighting), which we surveyed. In each case the scoring is on a scale of 0 (worst) to 6 (best).

Vendors were asked to submit answers to various questions via a questionnaire. Vendors were interviewed directly by an analyst and their software demonstrated and assessed. Reference customers were surveyed to give their experience of the software of each vendor. The technology functions which the vendors were asked about are as shown below. These are drawn from the Information Difference vendor functionality model; if you are interested in more detail on this then please contact The Information Difference.

Functional Areas