The Data Quality Landscape – Q1 2025

Data quality has been an issue since the dawn of computing. As soon as human beings are involved with entering data into a computer system, there is the possibility of that data being incomplete, out of date, mistyped, or just plain wrong. This creates issues for companies that monitor their business performance, for example, wanting to understand their customers or the performance of their products. Despite data quality software solutions having existed for many decades, maintaining high data quality has long been an intractable problem. In survey after survey in recent years, only around a third of executives completely trust their corporate data. This is partly due to data entry problems and human nature, but also due to data being duplicated in different systems across an enterprise. Poor data quality has substantial costs. The US Network for Excellence in Health Innovation estimates that errors in prescriptions alone cost $21 billion and cause 7,000 deaths annually. Data quality issues apply to all industries, with costs associated with higher than necessary operational costs, wasted resources, compliance issues and fines, reputational damage and poor business decisions based on incorrect or incomplete data.

Data quality solutions offer a range of functionality to improve this state of affairs. Software can profile data to reveal statistical anomalies, can detect likely duplicate records, find incomplete data records and suggest ways to merge possible duplicates. Data can be enriched, for example by automatically adding a postal code to a customer record, but also in more elaborate ways by use of 3rd party datasets. Examples of this include adding the voting district of an address, some demographic data for that area or more esoteric information such as whether an address is based in a flood plain, which is useful for an insurance company. Data quality software has traditionally used rules to determine and improve data quality, and to speed things up it has for many years employed machine learning to partially automate data quality processes. Recently we have seen greater use of generative AI in some contexts, and the use of various flavours of AI to automate the detection of anomalies and issues in data records, in some cases fixing the data problems automatically and in other cases flagging suspect records for human intervention. While the industry has traditionally focused on structured data, there is greater emphasis now on considering the data quality in unstructured forms such as documents, emails, spreadsheets etc. Modern tools frequently use AI to automatically read and classify data from documents and help with monitoring its quality,

The data quality used to be quite fragmented, with different software products for data profiling, for merge matching, data cleansing and data enrichment. Over time these merged into broader data quality suites that could carry out pretty much all necessary data quality functionality. In turn, these features have become embedded in broader data management solutions, such as within master data management tools or wider platforms that include data integration and data governance capabilities. This consolidation has seen many vendors acquired and their software embedded into broader suites. Nonetheless, there have also been new market entrants appearing, often touting extensive use of AI. A data observability market has started to emerge alongside the traditional data quality market, with an emphasis on monitoring the health, lineage and performance of data pipelines in an enterprise, including anomaly detection and resolution. This market overlaps significantly with the general data quality market, and it is likely that its vendors will expand their functionality into other data quality aspects, while traditional data quality vendors will add more data observability features. This follows the historical parallel of data quality initially being subdivided into separate markets like profiling and merge/matching before consolidating into broader offerings.

Both data quality and data observability are multi-billion-dollar markets, with the latter showing more rapid growth recently, while data quality is a more mature market. Estimates vary by analyst and what exactly is included, but the data quality market is roughly a $4 billion market with growth of around 9%, compared to around 12% for the general enterprise software market.

The data quality market has had an injection of emphasis by the widespread interest in generative AI. For most enterprise applications of generative AI, it is necessary to supplement the raw large language models (LLMs) used with company-specific data, such as customer history data, product manuals or specifications, or contract information. LLMs are highly dependent on their training data and the data that they have access to, so if an enterprise supplements an LLM with additional datasets then it is crucial that those datasets have good quality data in them. In a sense, the success of most generative AI implementations is heavily dependent on the data that they are basing their answers on.

The diagram that follows shows the major data quality vendors, displayed in three dimensions. See later for definitions of these.

Data Quality Landscape Chart for 2025

It is important to understand that this is a high-level representation of the market, with vendors represented on the chart specialising in different areas and at very different price points. If you are considering data quality software, it is important to tailor your selection process to the particular needs that you have rather than relying on high-level diagrams such as this. The Information Difference has various detailed models that can assist you in vendor selection and evaluation.

As part of the landscape process, each vendor was asked to provide at least ten reference customers (some vendors provided many times that number), who were surveyed to determine their satisfaction with the data quality software of the vendor. The happiest customers based on this survey were those of Experian, followed by Datactics. Congratulations to those vendors.

Below is a list of the main data quality vendors.

 

Main Vendors

Below is a list of the main data quality vendors.

VendorBrief DescriptionWebsite
Address DoctorVendor that specialises in providing wide coverage of name and address information; now owned by Informatica.www.informatica.com/addressdoctor.html
AcceldataData pipeline observability vendorhttps://www.acceldata.io/
ActivePrimeUS-based vendor of data quality solutions for CRM systems.www.activeprime.com
AnomaloAI-driven data quality vendorhttps://www.anomalo.com/
AtaccamaVendor with a modern data quality suite.www.ataccama.com
CapscanLondon-based provider of address management and data integrity services, now owned by GB Group.www.gbgplc.com/uk
DatacticsUK-based vendor of data quality and matching software to banking, finance, government, healthcare and industry.www.datactics.com
DatrasMunich-based vendor with wide ranging data quality functionality.www.datras.de
DQ GlobalUK data quality and address verification software.www.dqglobal.com
ExperianUK-based vendor specialising in data quality, including name and address validation, data profiling and data enrichment.https://www.experian.co.uk/business/platforms/aperture-data-studio
GoogleThe search engine giant does data quality.github.com/OpenRefine
360 Science/helpITUS/UK vendor of integrated contact data quality solutions including matching and address validation. Now owned by Syniti.www.helpit.com
Human InferenceDutch data quality vendor.www.humaninference.com
IBMData quality software from the industry giant. Instana is the data observability component.www.ibm.com
InformaticaCalifornia-based data management vendor, a major player in data quality.www.informatica.com
InfogixIllinois-based vendor specialising in controls and compliance. Now owned by Precisely.www.infogix.com
InfoglideUS vendor specialising in identity resolution.www.infoglide.com
InfoshareUK data quality specialising in the public sector market.infoshare-is.com
Innovative SystemsLong established data management vendor with extensive offerings including data profiling, data quality, address
validation/geocoding, 360° view, and risk management solutions.
www.innovativesystems.com
Intelligent SearchIdentity management company now with a more general data quality capability. Now owned by Experianwww.intelligentsearch.com
IrionItalian data quality vendor specialising in financial services.www.irion.it/index.php/en
Melissa DataUS/German global data quality vendor offering address verification, geocoding and matching solutions.www.melissadata.com
MicrosoftDQS is the data quality offering of the Redmond software behemoth.www.microsoft.com
MIOsoftUS data quality vendorhttps://miosoft.com/
Monte CarloUS data quality and observability vendorhttps://www.montecarlodata.com/
OracleThe software giant’s data quality offerings are based on the acquisitions of Datanomic and SilverCreek.www.oracle.com
PreciselyPrecisely is a rebranding of Syncsort, which bought Trillium, and which itself acquired Pitney Bowes data quality software.https://www.precisely.com/product/data-integrity/precisely-data-integrity-suite/data-quality
RedpointData Integration software with a data quality componenthttps://www.redpointglobal.com/
SAPThe software giant is a major data quality player.www.sap.com
SASOne of the leading players in data quality, now integrated within their broader data management suite.www.sas.com/en_us/software/data-management/data-quality.html
Satori SoftwareSeattle-based provider of address management solutions.www.satorisoftware.com
TalendOpen source vendor with wide range of quality functions that are tied to data integration and MDM.www.talend.com
TAMRVendor that applies machine learning to the data quality problem.www.tamr.com
UniservLarge German data quality vendor.www.uniserv.com

Other vendors of data quality software include:

Ciantwww.ciant.com
Data Leverwww.redpoint.net
Data Mentorswww.datamentors.com
Infosolvewww.infosolvetech.com
Interverawww.intervera.com
Ixsightwww.ixsight.com
MSIwww.msi.com.au
Reverwww.rever.eu
TIQ Solutionswww.tiq-solutions.com
Winpurewww.winpure.com
Wizsoftwww.wizsoft.com

Research Methodology

The Information Difference Landscape diagram shows three dimensions of a vendor:

  • Market strength
  • Technology
  • Customer base.

“Market strength” is made up of a weighted set of five factors: revenues, growth, financial strength, geographic scope and partner network. Each of these individual elements is scored, the total producing the “market strength” figure. Similarly “technology” is made up of four factors: “technology breadth” (the coverage of the vendors in various data quality areas as illustrated below), the longevity of the software in the market, analyst perception of the product via briefings, and customer feedback from reference customers (this has a high weighting), which we surveyed. In each case the scoring is on a scale of 0 (worst) to 6 (best).

Vendors were asked to submit answers to various questions via a questionnaire. Vendors were interviewed directly by an analyst and their software demonstrated and assessed. Reference customers were surveyed to give their experience of the software of each vendor. The technology functions which the vendors were asked about are as shown below. These are drawn from the Information Difference vendor functionality model; if you are interested in more detail on this then please contact The Information Difference.

Functional Areas