DQ Landscape

The Data Quality Landscape – Q1 2017

Data quality is not a new subject: we are all familiar with examples of letters and forms being sent to out-dated addresses, possibly multiple times. Some data quality issues cause more serious consequences than mis-routed mail or delays in processing utility or bank forms. The Mars Climate Orbiter spacecraft was lost due to a mismatch of imperial and metric units for rocket thrust, causing the craft to crash. In between these extremes there can be embarrassing and costly data quality problems. One well-known phone company ordered a huge consignment of display boxes for a new brand of phone, and only when lorries started to roll into warehouses was it discovered that the dimensions of the boxes had been specified in cm rather than mm, resulting in boxes being manufactured that were exactly ten times too large in each dimension. There are many aspects of data quality. Accuracy of data is one thing, but so is completeness (is all the data that you need present?), currency (is it sufficiently up to date?) and relevance.

As long as human beings are involved in capturing data, there will be issues with data quality, and
it was natural that technology be brought to bear to address this problem. Data quality technology typically provides a range of capabilities. These include “profiling” of data, where an initial assessment can be made to understand the scale of problem, through to defining business rules for correcting common errors, to matching potential duplicates using algorithms. Beyond this, some data may be enriched e.g. a precise geocode can be added to an address, and indeed supplementary information can be provided to supplement the original data. For example, a tool might be given a business address and be able to recognise that this address is owned by a certain company where a number of staff are employed, and even link back to business databases to tell you about the size of that company, or its credit rating. A residential address might be recognised by a tool as being in a certain voting district, or even within a flood plain, useful if you are an insurance company. Data quality is not just a one-off exercise. Once you have made an effort to improve data quality, you will want to monitor it on a regular basis, and see whether trends appear over time in the quality of the data. Modern data quality tools can do all these things and more.

The technologies on the market have a wide range of capabilities. Some focus on name and address validation for mass mailing, using algorithms to detect common misspellings of names and addresses, or having glossaries that understand common synonyms for names e.g. “Bob” and “Robert”. Others have much broader functionality and, naturally enough, the more functional products may have a much higher price tag than the more basic ones. Although name and address is the most common area addressed in data quality, product data is another broad domain requiring different approaches, and indeed business rules can be defined and applied to help with data quality across a wide range of data domains. Some data quality products are stand-alone, while others link to separate master data or data governance tools with varying degrees of smoothness.

The diagram that follows shows the major data quality vendors, displayed in three dimensions. See later for definitions of these.

It is important to understand that this is a high-level representation of the market, with vendors represented on the chart specialising in different areas and at very different price-points.  If you are considering data quality software, it is important to tailor your selection process to the particular needs that you have rather than relying on high-level diagrams such as this.  The Information Difference has various detailed models that can assist you in vendor selection and evaluation.

As part of the landscape process, each vendor was asked to provide at least ten reference customers (some vendors provided over 50 references), which were surveyed to determine their satisfaction with the data quality software of the vendor.  The happiest customers based on this survey were those of Innovative Systems, followed by Datactics and HelpIT, then those of Experian and Trillium. Congratulations to those vendors.

Main Vendors

Below is a list of the main data quality vendors.

VendorBrief DescriptionWebsite
ActivePrimeUS-based vendor of data quality for CRM systems.www.activeprime.com
Address DoctorVendor that specialises in providing wide coverage of name and address information; now owned by Informatica.www.informatica.com/addressdoctor.html - fbid=-gz2yeRJkyH
AtaccamaPrague-based company with a modern data quality suite.www.ataccama.com
CapscanLondon-based provider of address management and data integrity services, now owned by GB Group.www.gbgplc.com/uk/
Data MentorsLong-establsihed US data quality vendor.www.datamentors.com
DatacticsUK-based vendor of data quality and matching software to banking, finance, government, healthcare and industry. www.datactics.com
DatirisColorado vendor of data profiling technology.www.datiris.com
DatrasMunich-based vendor with wide ranging data quality functionality.www.datras.de
DQ GlobalUK data quality and address verification software.www.dqglobal.com
ExperianUK-based vendor specialising in customer name and address validation, data profiling and data enrichment.www.edq.com
GoogleThe search engine giant does data quality.github.com/OpenRefine
360 Science/helpITUS/UK vendor of integrated contact data quality solutions including matching and address validation.www.helpit.com
Human InferenceDutch data quality vendor.www.humaninference.com
IBMData quality software from the industry giant.www.ibm.com
InfogixIllinois-based vendor specialising in controls and compliance.www.infogix.com
InfoglideUS vendor specialising in identity resolution.www.infoglide.com
InformaticaCalifornia-based vendor, a major player in data quality.www.informatica.com
InfoshareUK data quality specialising in the public sector market.www.infoshare-is.com
Innovative SystemsLong established data management vendor with extensive offerings including data profiling, data quality, address
validation/geocoding, 360° view, and risk management solutions.
InqueraIsraeli company with an approach to product data quality using machine-learning technology based on subject domain experts' knowledge.
Intelligent SearchIdentity management company now with a more general data quality capability.www.intelligentsearch.com
IrionItalian data quality vendor specialising in financial services.www.irion.it/index.php/en/
Melissa DataUS/German global data quality vendor offering address verification, geocoding and matching solutions.www.melissadata.com
MicrosoftDQS is the data quality offering of the Redmond software behemoth.www.microsoft.com
NetricsNew Jersey vendor of matching software. Now owned by Tibco.www.tibco.com/products/automation/application-integration/pattern-matching
OracleThe software giant's data quality offerings are based on the acquisitions of Datanomic and SilverCreek.www.oracle.com
Pitney BowesPitney Bowes, a global technology company, provides data quality solutions through its Customer Information Management (CIM) unit, which is part of its Digital Commerce Solutions division.www.pitneybowes.com/us/customer-information-management/data-quality.html
Postcode AnywhereUK vendor of web-based addressing software.www.postcodeanywhere.co.uk
SAPThe software giant is a major data quality player.www.sap.com
SASOne of the leading players in data quality.www.sas.com/en_us/software/data-management/data-quality.html
Satori SoftwareSeattle-based provider of address management solutions.www.satorisoftware.com
TalendOpen source vendor with wide range of quality functions that are tied to data integration and MDM.www.talend.com
TAMRVendor that applies machine learning to the data quality problem.
Trillium SoftwareOne of the leading data quality vendors, now acquired by Syncsort.www.trilliumsoftware.com
UniservLarge German data quality vendor.www.uniserv.com

Other vendors of data quality software include:

Data Leverwww.redpoint.net
Data Mentorswww.datamentors.com
TIQ Solutionswww.tiq-solutions.com


Research Methodology

The Information Difference Landscape diagram shows three dimensions of a vendor:

▪ Market strength
▪ Technology
▪ Customer base.

“Market strength” is made up of a weighted set of five factors: revenues, growth, financial strength, geographic scope and partner network. Each of these individual elements is scored, the total producing the “market strength” figure. Similarly “technology” is made up of four factors: “technology breadth” (the coverage of the vendors in various data quality areas as illustrated below), the longevity of the software in the market, analyst perception of the product via briefings, and customer feedback from reference customers (this has a high weighting), which we surveyed. In each case the scoring is on a scale of 0 (worst) to 6 (best).

Vendors were asked to submit answers to various questions via a questionnaire. Vendors were interviewed directly by an analyst and their software demonstrated and assessed. Reference customers were surveyed to give their experience of the software of each vendor. The technology functions which the vendors were asked about are as shown below. These are drawn from the Information Difference vendor functionality model; if you are interested in more detail on this then please contact The Information Difference.

Functional Areas