Data Quality Landscape (DQ) - The Information Difference

The Data Quality Landscape Analysis

Data quality has been an issue since the dawn of computing. As soon as human beings are involved with entering data into a computer system, there is the possibility of that data being incomplete, out of date, mistyped, or just plain wrong. This creates issues for companies that monitor their business performance, for example, wanting to understand their customers or the performance of their products. Despite data quality software solutions having existed for many decades, maintaining high data quality has long been an intractable problem. In survey after survey in recent years, only around a third of executives completely trust their corporate data. This is partly due to data entry problems and human nature, but also due to data being duplicated in different systems across an enterprise. Poor data quality has substantial costs. The US Network for Excellence in Health Innovation estimates that errors in prescriptions alone cost $21 billion and cause 7,000 deaths annually. Data quality issues apply to all industries, with costs associated with higher than necessary operational costs, wasted resources, compliance issues and fines, reputational damage and poor business decisions based on incorrect or incomplete data.

Data quality solutions offer a range of functionality to improve this state of affairs. Software can profile data to reveal statistical anomalies, can detect likely duplicate records, find incomplete data records and suggest ways to merge possible duplicates. Data can be enriched, for example by automatically adding a postal code to a customer record, but also in more elaborate ways by use of 3rd party datasets. Examples of this include adding the voting district of an address, some demographic data for that area or more esoteric information such as whether an address is based in a flood plain, which is useful for an insurance company. Data quality software has traditionally used rules to determine and improve data quality, and to speed things up it has for many years employed machine learning to partially automate data quality processes. Recently we have seen greater use of generative AI in some contexts, and the use of various flavours of AI to automate the detection of anomalies and issues in data records, in some cases fixing the data problems automatically and in other cases flagging suspect records for human intervention. While the industry has traditionally focused on structured data, there is greater emphasis now on considering the data quality in unstructured forms such as documents, emails, spreadsheets etc. Modern tools frequently use AI to automatically read and classify data from documents and help with monitoring its quality,

The data quality used to be quite fragmented, with different software products for data profiling, for merge matching, data cleansing and data enrichment. Over time these merged into broader data quality suites that could carry out pretty much all necessary data quality functionality. In turn, these features have become embedded in broader data management solutions, such as within master data management tools or wider platforms that include data integration and data governance capabilities. This consolidation has seen many vendors acquired and their software embedded into broader suites. Nonetheless, there have also been new market entrants appearing, often touting extensive use of AI. A data observability market has started to emerge alongside the traditional data quality market, with an emphasis on monitoring the health, lineage and performance of data pipelines in an enterprise, including anomaly detection and resolution. This market overlaps significantly with the general data quality market, and it is likely that its vendors will expand their functionality into other data quality aspects, while traditional data quality vendors will add more data observability features. This follows the historical parallel of data quality initially being subdivided into separate markets like profiling and merge/matching before consolidating into broader offerings.

Both data quality and data observability are multi-billion-dollar markets, with the latter showing more rapid growth recently, while data quality is a more mature market. Estimates vary by analyst and what exactly is included, but the data quality market is roughly a $4 billion market with growth of around 9%, compared to around 12% for the general enterprise software market.

The data quality market has had an injection of emphasis by the widespread interest in generative AI. For most enterprise applications of generative AI, it is necessary to supplement the raw large language models (LLMs) used with company-specific data, such as customer history data, product manuals or specifications, or contract information. LLMs are highly dependent on their training data and the data that they have access to, so if an enterprise supplements an LLM with additional datasets then it is crucial that those datasets have good quality data in them. In a sense, the success of most generative AI implementations is heavily dependent on the data that they are basing their answers on.

The diagram that follows shows the major data quality vendors, displayed in three dimensions. See later for definitions of these.

It is important to understand that this is a high-level representation of the market, with vendors represented on the chart specialising in different areas and at very different price points. If you are considering data quality software, it is important to tailor your selection process to the particular needs that you have rather than relying on high-level diagrams such as this. The Information Difference has various detailed models that can assist you in vendor selection and evaluation.

As part of the landscape process, each vendor was asked to provide at least ten reference customers (some vendors provided many times that number), who were surveyed to determine their satisfaction with the data quality software of the vendor. The happiest customers based on this survey were those of Experian, followed by Datactics. Congratulations to those vendors.

Below is a list of the main data quality vendors.

Main Vendors

Below is a list of the main data quality vendors.

Vendor	Brief Description	Website
Address Doctor	Vendor that specialises in providing wide coverage of name and address information; now owned by Informatica.	www.informatica.com/addressdoctor.html
Acceldata	Data pipeline observability vendor	https://www.acceldata.io/
ActivePrime	US-based vendor of data quality solutions for CRM systems.	www.activeprime.com
Anomalo	AI-driven data quality vendor	https://www.anomalo.com/
Ataccama	Vendor with a modern data quality suite.	www.ataccama.com
Capscan	London-based provider of address management and data integrity services, now owned by GB Group.	www.gbgplc.com/uk
Datactics	UK-based vendor of data quality and matching software to banking, finance, government, healthcare and industry.	www.datactics.com
Datras	Munich-based vendor with wide ranging data quality functionality.	www.datras.de
DQ Global	UK data quality and address verification software.	www.dqglobal.com
Experian	UK-based vendor specialising in data quality, including name and address validation, data profiling and data enrichment.	https://www.experian.co.uk/business/platforms/aperture-data-studio
Google	The search engine giant does data quality.	github.com/OpenRefine
360 Science/helpIT	US/UK vendor of integrated contact data quality solutions including matching and address validation. Now owned by Syniti.	www.helpit.com
Human Inference	Dutch data quality vendor.	www.humaninference.com
IBM	Data quality software from the industry giant. Instana is the data observability component.	www.ibm.com
Informatica	California-based data management vendor, a major player in data quality.	www.informatica.com
Infogix	Illinois-based vendor specialising in controls and compliance. Now owned by Precisely.	www.infogix.com
Infoglide	US vendor specialising in identity resolution.	www.infoglide.com
Infoshare	UK data quality specialising in the public sector market.	infoshare-is.com
Innovative Systems	Long established data management vendor with extensive offerings including data profiling, data quality, address validation/geocoding, 360° view, and risk management solutions.	www.innovativesystems.com
Intelligent Search	Identity management company now with a more general data quality capability. Now owned by Experian	www.intelligentsearch.com
Irion	Italian data quality vendor specialising in financial services.	www.irion.it/index.php/en
Melissa Data	US/German global data quality vendor offering address verification, geocoding and matching solutions.	www.melissadata.com
Microsoft	DQS is the data quality offering of the Redmond software behemoth.	www.microsoft.com
MIOsoft	US data quality vendor	https://miosoft.com/
Monte Carlo	US data quality and observability vendor	https://www.montecarlodata.com/
Oracle	The software giant’s data quality offerings are based on the acquisitions of Datanomic and SilverCreek.	www.oracle.com
Precisely	Precisely is a rebranding of Syncsort, which bought Trillium, and which itself acquired Pitney Bowes data quality software.	https://www.precisely.com/product/data-integrity/precisely-data-integrity-suite/data-quality
Redpoint	Data Integration software with a data quality component	https://www.redpointglobal.com/
SAP	The software giant is a major data quality player.	www.sap.com
SAS	One of the leading players in data quality, now integrated within their broader data management suite.	www.sas.com/en_us/software/data-management/data-quality.html
Satori Software	Seattle-based provider of address management solutions.	www.satorisoftware.com
Talend	Open source vendor with wide range of quality functions that are tied to data integration and MDM.	www.talend.com
TAMR	Vendor that applies machine learning to the data quality problem.	www.tamr.com
Uniserv	Large German data quality vendor.	www.uniserv.com

Other vendors of data quality software include:

Ciant	www.ciant.com
Data Lever	www.redpoint.net
Data Mentors	www.datamentors.com
Infosolve	www.infosolvetech.com
Intervera	www.intervera.com
Ixsight	www.ixsight.com
MSI	www.msi.com.au
Rever	www.rever.eu
TIQ Solutions	www.tiq-solutions.com
Winpure	www.winpure.com
Wizsoft	www.wizsoft.com

Research Methodology

The Information Difference Landscape diagram shows three dimensions of a vendor:

Market strength
Technology
Customer base.

“Market strength” is made up of a weighted set of five factors: revenues, growth, financial strength, geographic scope and partner network. Each of these individual elements is scored, the total producing the “market strength” figure. Similarly “technology” is made up of four factors: “technology breadth” (the coverage of the vendors in various data quality areas as illustrated below), the longevity of the software in the market, analyst perception of the product via briefings, and customer feedback from reference customers (this has a high weighting), which we surveyed. In each case the scoring is on a scale of 0 (worst) to 6 (best).

Vendors were asked to submit answers to various questions via a questionnaire. Vendors were interviewed directly by an analyst and their software demonstrated and assessed. Reference customers were surveyed to give their experience of the software of each vendor. The technology functions which the vendors were asked about are as shown below. These are drawn from the Information Difference vendor functionality model; if you are interested in more detail on this then please contact The Information Difference.

DQ Landscape Q2 2025

The Data Quality Landscape Analysis

Main Vendors

Research Methodology

Functional Areas