In December 2025 Snowflake Ventures made a strategic (though undisclosed in monetary value) investment in data management vendor Ataccama. This was a more interesting event than just a regular financing round for a software vendor, as it has implications for the broader data quality industry. There have been issues with data quality ever since the first data was entered into a computer. People make mistakes, and ambiguity creeps in. For something as simple as a customer name and address, there are lots of ways for data to get confused. My name may be entered as “Andy Hayler” or “Andrew Hayler” or “A. Hayler” even ignoring possible misspellings. Names like “Bob” and “Robert” are interchangeable, while addresses can have missing or incomplete postal codes. Many different systems in an enterprise store customer data, so a particular customer can be referenced in several different systems, each with potentially different information. Dubious addresses can result in failed deliveries, but data quality errors can have more serious implications. The Mars Climate Orbiter spacecraft famously failed due to a mismatch between imperial and metric units, a $327 million disaster, while a decimal point error in a calculation led to a Spanish submarine being almost 100 tons overweight, requiring a rebuild costing €2 billion.
There has been an industry of software vendors addressing data quality issues for a long time. Innovative Systems came out with a data quality tool as far back as 1968. The industry today is a multi-billion-dollar one (maybe $2-3 billion depending on the definition of what is included), but also quite a mature industry. It has been growing between 2019 and 2024 at around 5-10% per year, compared to an enterprise software industry average of around 3-7% or so. These estimates are all dependent on exactly what you include and exclude, which is why you see different figures from different analyst firms. Still, data quality has not been the raciest of sectors. By comparison, customer relationship management software grew at around 13% from 2019 to 2024. Data quality vendors usually provide a range of functionality, from data profiling to data cleansing, to matching of potentially duplicate records through to data enrichment.
The dramatic rise in demand for artificial intelligence (AI) solutions since late 2022 has given a fresh impetus to the data quality market. Large language models (LLMs) are heavily dependent on the data on which they are trained. In order for LLMs to be used in corporate environments, they need to access corporate systems containing policy manuals, product specifications, customer orders, etc. This has put the spotlight on data quality once again. That wave of shiny new AI projects in corporations can founder on the rocks of poor data quality, amongst other issues. As MIT reported in 2025, corporate AI projects are failing at a rate of 95% in 2025. Other estimates of failure rates are not much better, with various surveys and analyst firms (Rand, Gartner etc) estimating AI project failure rates of around 85%. There are probably many reasons for this, but data quality is certainly one of them. One survey blamed 68% of AI project failures on data quality, and another 2025 survey found that 81% of AI projects encountered significant data quality issues.
This new spotlight on data quality as a barrier to AI has rekindled interest in the formerly rather low-key data quality software sector. This was highlighted this week by the Snowflake Ventures investment in Ataccama, which grew up as a data quality vendor, though it later added master data management and data governance capabilities to its platform. The data warehouse industry, where Snowflake is the metaphorical 600-pound gorilla, has long regarded data quality as someone else’s problem. They typically just check for basic validation, such as “not null” in SQL, or checking for data uniqueness through data metric functions, but that is about as far as they go. The Ataccama investment signals a renewed interest in data quality from Snowflake, as it increasingly encounters AI-related projects within its customer base. The announcement of the deal specifically referenced data quality for AI as a driver of the deal. Harshe Kapre, head of Snowflake Ventures, said: “Ataccama empowers enterprises to automate quality, add context, and resolve issues before they reach downstream workloads. Their agentic platform amplifies the reliability and performance of the AI Data Cloud, helping customers accelerate their AI initiatives with confidence.” The relationship extends to integration between Snowflake and Ataccama, not just investment. A centralised library of data quality rules can be reused between systems and data pipelines. Ataccama is used for checking the quality of data within Snowflake’s Cortex AI product, which allows AI models to run directly within Snowflake for tasks like text generation and classification.
This deal may signal a new level of interest in data quality companies as possible acquisition targets by other enterprise software companies. Qlik acquired Talend in mid-2023, and Collibra bought OwlDQ in 2025, so the Snowflake investment in Ataccama is not unique. Previously, the sector had seen limited merger and acquisition activity in recent years. Now that AI has shone a spotlight on data quality, we can expect more activity of this kind. There are still plenty of data quality vendors out there as possible targets, so I anticipate more purchases of data quality vendors in the coming year.







