Why Data Quality Is the First Step to AI Readiness for Asset Managers

Every week, another asset manager announces an AI initiative. Board decks are full of it. Vendor pitches are relentless. The pressure to "do something with AI" has never been higher.

But the uncomfortable truth is this: the vast majority of AI projects in financial services fail. Not because the models are wrong. Because the data underneath them is broken.

I’ve seen this pattern repeat across every engagement we’ve done at Datafabric. The firms that succeed with AI are never the ones that started with a model. They started with their data.

The AI Hype vs Reality Gap in Asset Management

According to McKinsey, AI-enabled workflow reimagination could allow asset managers to capture 25–40% of their total cost base in efficiencies. That figure has been cited in countless boardrooms. What gets cited less often is Gartner’s prediction that through 2026, organisations will abandon 60% of AI projects that lack AI-ready data.

The gap between promise and delivery is not a technology problem. It is a data problem.

60%

of AI projects will be abandoned due to lack of AI-ready data

Gartner, Feb 2025

25–40%

cost base efficiency potential from AI — but only with trusted data

McKinsey, 2025

Asset managers operate in one of the most data-intensive corners of financial services. On any given day, a mid-sized fund manager with $3 billion in funds under administration (FUA) might pull data from a custodian, a fund administrator, three platform providers, a CRM, two market data vendors, a registry, and internal spreadsheets. That is nine or more sources before anyone has opened a dashboard.

When an AI model is built on top of fragmented, inconsistent, or stale data, it does not produce insights. It produces confident-sounding nonsense — and in a regulated industry, that is worse than having no AI at all.

Why Most AI Projects Fail: They Skip the Data Layer

There is a predictable sequence to how AI initiatives go wrong in asset management:

The executive mandate. The board or CEO says "we need an AI strategy."
The vendor selection. A tool is purchased — often a general-purpose LLM or analytics platform.
The integration attempt. The team tries to connect the tool to internal data. They discover that FUA figures in the custodian do not match the CRM. Platform flow data arrives in four different formats. Client names are spelled three different ways.
The workaround. Someone builds a manual data pipeline in Excel or Python. It works for the demo. It breaks the following week.
The stall. Six months and $200,000 later, the project is quietly shelved.

The mistake is always the same: treating AI as the starting point rather than the outcome of a data capability.

The 6 Dimensions of Data Quality

Data quality isn’t binary. At Datafabric we assess data across six dimensions, each of which needs to meet a minimum threshold before AI can be applied reliably.

Completeness

Are all required fields populated?

Accuracy

Does the data reflect reality?

Consistency

Same entity, same representation

Timeliness

How quickly does data arrive?

Uniqueness

No duplicate records

Freshness

When was it last updated?

Completeness

Are all required fields populated? A fund record missing its APIR code is incomplete. A client record without an adviser relationship is incomplete. Across the asset managers we work with, completeness gaps average 12–18% on initial assessment.

Accuracy

Does the data reflect reality? If your CRM says a client has $50 million FUA but the custodian says $47 million, which is correct? Accuracy errors compound when data flows downstream into reports, analytics, and AI outputs.

Consistency

Is the same entity represented the same way across systems? “ANZ,” “Australia and New Zealand Banking Group,” and “ANZ Banking Grp” might all refer to the same organisation — but a machine will treat them as three separate entities.

Timeliness

How quickly does data arrive after the event it represents? If platform flow data arrives with a two-week lag, any analytics built on it are already stale. For distribution teams making decisions about where to travel next week, stale data is useless data. This is why we process platform files as soon as they’re shared with us, rather than waiting until the end of the month.

Uniqueness

Are there duplicate records? Duplicate client records are one of the most common — and most damaging — data quality issues we encounter. One firm we onboarded had 340 duplicate adviser records in their CRM, inflating their prospect count by 22%.

Freshness

When was the data last updated? A record that was accurate six months ago may not be accurate today. Freshness is particularly critical for market data, flow data, and compliance records.

What “Data Centralisation” Actually Means for an Asset Manager

When we talk about data centralisation, we are not talking about ripping out your existing systems. We are not suggesting you replace your CRM, change your custodian, or migrate off your fund administrator.

Data centralisation means creating a single, trusted layer that sits on top of your existing systems and harmonises the data they produce. Think of it as a translation layer: each source speaks its own language, and the centralised layer ensures they all say the same thing.

For a growing asset manager with $500 million to $10 billion in FUA, this typically means connecting 15 to 20 data sources — custodians, fund administrators, platform providers, CRMs, market data feeds, registries, and internal spreadsheets — into a single environment where data is cleaned, matched, deduplicated, and quality-scored every day.

How The Foundry Addresses This

The Foundry is Datafabric’s data centralisation and quality engine. It connects to 18+ source types through 47 pre-built integrations, ingests data on a daily cycle, and applies trust metrics across all six quality dimensions.

Here is what that looks like in practice:

Ingestion. Data is pulled from each source automatically. No manual uploads. No CSV exports. The Foundry connects directly to custodian portals, platform APIs, CRM systems, and file-based feeds.
Harmonisation. Entity matching resolves the “ANZ vs ANZ Banking Grp” problem. Fields are standardised. Units are normalised. Dates are aligned to a single timezone.
Quality scoring. Every record receives a trust score based on the six dimensions above. Users can see at a glance which data is reliable and which needs attention.
Alerting. When quality drops below a threshold — a feed fails, a field goes blank, a duplicate appears — the system flags it immediately rather than letting it propagate downstream.

The result is not just clean data. It is data that the organisation can trust, and that AI models can rely on.

The Compounding Effect: Trusted Data Unlocks Everything Else

This is where the investment in data quality pays off many times over. Once The Foundry has established a trusted data layer, every other capability on the Datafabric platform benefits:

Compass (analytics and reports) draws from the same trusted source, so dashboards and board packs reflect reality rather than a best guess.
Sherpa (enterprise AI) can answer questions like “What were our net flows by platform last quarter?” with confidence, because it is querying verified data rather than a patchwork of spreadsheets.
OpsFlow (operations and compliance) can automate workflows knowing that the underlying data has been quality-checked and is audit-ready.

Without the data layer, each of these capabilities would require its own data pipeline, its own reconciliation process, and its own workarounds. With it, they share a single source of truth.

What This Looks Like in Practice

One of our clients, a specialist asset manager, came to us with data spread across multiple systems. Their team was manually reconciling platform flow data, pulling reports from different sources and cross-checking them in Excel.

Before

Multiple disconnected sources

Manual reconciliation in Excel

No single source of truth

After

Adviser count: 74% → 95% accuracy

FuM: 69% → 94% accuracy

27 integrations, each processed within an hour

We deployed The Foundry and connected 27 integrations. Each data source is processed within an hour of ingestion. The first quality assessment uncovered significant gaps:

Adviser count accuracy improved from 74% to 95% across approximately 750 adviser records
Funds under management accuracy improved from 69% to 94%
Completeness gaps and data lags that nobody had previously flagged

That is the real return on data quality: not a dashboard, but trusted data you can actually act on.

Getting Started: It Is Faster Than You Think

The most common objection I hear is “we know our data is a mess, but fixing it will take forever.” It doesn’t have to. The Foundry delivers a baseline data layer in weeks, not months. Our implementation model, what we call Service as Software, means Datafabric handles the integration, configuration, and ongoing operation. Your team doesn’t need to hire a data engineer or manage a pipeline.

If you’re evaluating AI for your asset management business, start with the data. Everything else follows.

Why data quality is the first step to AI readiness for asset managers

The AI Hype vs Reality Gap in Asset Management

Why Most AI Projects Fail: They Skip the Data Layer

The 6 Dimensions of Data Quality

Completeness

Accuracy

Consistency

Timeliness

Uniqueness

Freshness

What “Data Centralisation” Actually Means for an Asset Manager

How The Foundry Addresses This

The Compounding Effect: Trusted Data Unlocks Everything Else

What This Looks Like in Practice

Before

After

Getting Started: It Is Faster Than You Think

Ready to see where your data stands?