DQ Landscape

The Data Quality Landscape – Q1 2018

Data quality has been an issue since the first explorers brought back inaccurate maps of distant lands with such minor issues as California being shown as an island on one antique map. Human nature means that mistakes are made when data is entered into computer systems, and all the validation rules in the world will not prevent that entirely. Consequently, we live in a world where our details are entered into supplier databases multiple times with inconsistencies such as old addresses and phone numbers, misspellings and missing data. A 2002 PWC study found that almost a quarter of mail is incorrectly addressed, and experienced consultants reckon than a typical materials master file will have errors in 20-30% of entries.

The job of data quality software is to address these data imperfections as much as possible, using algorithms and business rules to identify likely duplicate records, correct obvious misspellings and complete and consolidate records where possible. The industry has historically focused on customer name and address data, partly because every business has customers and so the problem is common to all businesses, and partly because the problem is relatively tractable. There are plenty of published algorithms (such as Soundex and Metaphone) that can spot likely matches in similar sounding names e.g. Smith and Smythe, and more elaborate statistical processes that can be applied to records to predict likely matches or duplicates. These days, data quality software goes much further, in some cases providing glossaries of common terms in various languages that enable software to recognise that “Richard”, “Dick” and “Ricky”, or “Kate”, “Kathie” and “Katherine”, are likely the same name. Vendor software can similarly be applied to address data, using postal codes to check addresses, and potentially enriching that data with latitude and longitude information, or whether an address is in a certain voting district or even within a flood plain.

As time has passed, data quality software has developed elaborate ability to “profile” data to spot likely errors e.g. spotting data that does not conform to an expected pattern, data that is out of expected range etc. Some products provide functionality for entering business-specific data quality rules, and for managing the workflow around alerting people to likely data errors and managing the process of correcting it. Some products can parse textual data, an important feature of handling product rather than customer data, where many source files are in free text rather than a more structured format. Matching algorithms in particular have grown more sophisticated, though there is usually a need for human intervention: the consequences of a false positive (or false negative) match in the case of a drug test or a terrorist watch alert are very different from those of a mis-addressed piece of direct mail.

Over the last few years many data quality suites have moved beyond simple profiling or name and address validation and developed broader functionality of the type described above. The last year has seen greater interest in applying the techniques of machine learning to data quality problems, though this is partly in response to the general level of increased interest in the field, so some vendors are now using “artificial intelligence” or “machine learning” labels about their software rather creatively. Nonetheless, the undoubted developments in machine learning definitely open up new possibilities for data quality software. Another area that has seen recent interest is the issue of applying data quality techniques to “big data” such as Hadoop files rather than just to traditional databases. Although much of this data Is machine generated, searching for meaningful content within it such as customer and product identifiers is an area that some vendors have started to develop functionality.

The diagram that follows shows the major data quality vendors, displayed in three dimensions. See later for definitions of these.

It is important to understand that this is a high-level representation of the market, with vendors represented on the chart specialising in different areas and at very different price-points. If you are considering data quality software, it is important to tailor your selection process to the particular needs that you have rather than relying on high-level diagrams such as this. The Information Difference has various detailed models that can assist you in vendor selection and evaluation.

As part of the landscape process, each vendor was asked to provide at least ten reference customers (some vendors provided many times that number), which were surveyed to determine their satisfaction with the data quality software of the vendor. The happiest customers based on this survey were those of Datactics followed by ActivePrime, then those of Innovative Systems, Experian and Syncsort (formerly Trillium). Congratulations to those vendors.

Main Vendors

Below is a list of the main data quality vendors.

VendorBrief DescriptionWebsite
ActivePrimeUS-based vendor of data quality for CRM systems.www.activeprime.com
Address DoctorVendor that specialises in providing wide coverage of name and address information; now owned by Informatica.www.informatica.com/addressdoctor.html - fbid=-gz2yeRJkyH
AtaccamaPrague-based company with a modern data quality suite.www.ataccama.com
CapscanLondon-based provider of address management and data integrity services, now owned by GB Group.www.gbgplc.com/uk/
Data MentorsLong-established US data quality vendor.www.datamentors.com
DatacticsUK-based vendor of data quality and matching software to banking, finance, government, healthcare and industry. www.datactics.com
DatirisColorado vendor of data profiling technology.www.datiris.com
DatrasMunich-based vendor with wide ranging data quality functionality.www.datras.de
DQ GlobalUK data quality and address verification software.www.dqglobal.com
ExperianUK-based vendor specialising in customer name and address validation, data profiling and data enrichment.www.edq.com
GoogleThe search engine giant does data quality.github.com/OpenRefine
helpIT/360 ScienceUS/UK vendor of integrated contact data quality solutions including matching and address validation.www.helpit.com
Human InferenceDutch data quality vendor.www.humaninference.com
IBMData quality software from the industry giant.www.ibm.com
InfogixIllinois-based vendor specialising in controls and compliance.www.infogix.com
InfoglideUS vendor specialising in identity resolution.www.infoglide.com
InformaticaCalifornia-based vendor, a major player in data quality.www.informatica.com
InfoshareUK data quality specialising in the public sector market.www.infoshare-is.com
Innovative SystemsLong established data management vendor with extensive offerings including data profiling, data quality, address
validation/geocoding, 360° view, and risk management solutions.
InqueraIsraeli company with an approach to product data quality using machine-learning technology based on subject domain experts' knowledge.
Intelligent SearchIdentity management company now with a more general data quality capability.www.intelligentsearch.com
IrionItalian data quality vendor specialising in financial services.www.irion.it/index.php/en/
MelissaUS/German global data quality vendor offering address verification, geocoding and matching solutions.www.melissa.com
MicrosoftDQS is the data quality offering of the Redmond software behemoth.www.microsoft.com
NetricsNew Jersey vendor of matching software. Now owned by Tibco.www.tibco.com/products/automation/application-integration/pattern-matching
OracleThe software giant's data quality offerings are based on the acquisitions of Datanomic and SilverCreek.www.oracle.com
Pitney BowesPitney Bowes, a global technology company, provides data quality solutions through its Customer Information Management (CIM) unit, which is part of its Digital Commerce Solutions division.www.pitneybowes.com/us/customer-information-management/data-quality.html
Postcode AnywhereUK vendor of web-based addressing software.www.postcodeanywhere.co.uk
SAPThe software giant is a major data quality player.www.sap.com
SASOne of the leading players in data quality.www.sas.com/en_us/software/data-management/data-quality.html
Satori SoftwareSeattle-based provider of address management solutions.www.satorisoftware.com
SyncsortTrillium Software, one of the leading data quality vendors, now acquired by Syncsort.www.syncsort.com
TalendOpen source vendor with wide range of quality functions that are tied to data integration and MDM.www.talend.com
TAMRVendor that applies machine learning to the data quality problem.
UniservLarge German data quality vendor.www.uniserv.com

Other vendors of data quality software include:

Data Leverwww.redpoint.net
Data Mentorswww.datamentors.com
TIQ Solutionswww.tiq-solutions.com


Research Methodology

The Information Difference Landscape diagram shows three dimensions of a vendor:

▪ Market strength
▪ Technology
▪ Customer base.

“Market strength” is made up of a weighted set of five factors: revenues, growth, financial strength, geographic scope and partner network. Each of these individual elements is scored, the total producing the “market strength” figure. Similarly “technology” is made up of four factors: “technology breadth” (the coverage of the vendors in various data quality areas as illustrated below), the longevity of the software in the market, analyst perception of the product via briefings, and customer feedback from reference customers (this has a high weighting), which we surveyed. In each case the scoring is on a scale of 0 (worst) to 6 (best).

Vendors were asked to submit answers to various questions via a questionnaire. Vendors were interviewed directly by an analyst and their software demonstrated and assessed. Reference customers were surveyed to give their experience of the software of each vendor. The technology functions which the vendors were asked about are as shown below. These are drawn from the Information Difference vendor functionality model; if you are interested in more detail on this then please contact The Information Difference.

Functional Areas