Data governance - laying the foundations for effective AI applications

By Yiannis Antoniou, Head of Data, AI, and Analytics at Lab49.

  • 1 month ago Posted in

While AI in one way or another has been integral to financial services for some time, it’s now hard to think of any area of the industry the technology hasn’t touched. AI algorithms have evolved from simple rule-based fraud alerts to influencing everything from billion-dollar investment strategies to customer service interactions.

Although AI holds significant promise for boosting efficiency and scaling operations, its increasing use comes with a rising demand for high-quality, well-governed data that is easy to discover and is reliable. Such data is integral in developing accurate, reliable, and fair AI models. Without it, organisations risk deploying AI applications that may have serious negative implications for financial institutions and their customers.

Building on solid foundations

AI systems are only as good as the data they have been trained on - their pattern identification and prediction capabilities rely on curated, high-quality data, ideally at very large sizes. Unfortunately, data can be flawed in many ways — inaccuracy, incompleteness, bias, discrimination, and many other ethical factors are all concerns that need to be systematically addressed. AI models will produce outputs that reflect and amplify these data flaws.

This is because AI models don’t inherently understand truth — they simplify, spot, and magnify patterns in the data without assessing whether those patterns are fair or accurately represent reality. If an AI model is trained on a dataset that disproportionately represents one demographic, it could unfairly favour that group in its decisions. These models naturally generalise the patterns they see, unaware of biases hidden in the data.

But poor data quality isn’t just a technical concern—it can have real-world consequences on fairness. For example, an AI system used for credit scoring might unjustly deny loans to qualified applicants if it's trained on biased historical data. The controversy surrounding the Apple Card issued by Goldman Sachs is one notable example where an algorithm was accused of gender bias, allegedly offering lower credit limits to women with higher credit scores. Although Apple and Goldman were cleared by the New York Department of Financial Services, the companies still faced criticism for lack of transparency, and the regulators called for laws to be tightened. Clearly, the potential for biased algorithms may expose a financial institution to potential regulatory scrutiny and reputational harm.

Challenges in achieving high-quality data

Data fragmentation is a big challenge for large organizations, especially financial institutions. Even though they hold vast volumes of proprietary data, this information is spread across different systems and departments, locked behind silos that prevent the creation of the unified, high-quality datasets that AI models need to perform well. This data also usually lacks clear definitions, ownership and overall governance. This makes it difficult for financial institutions to quickly find, assess and use the right data to feed AI models.

And Generative AI comes with its own set of hurdles. The Large Language Models in use today are typically trained on vast amounts of unstructured data sourced from web scraping and public sources which can be low quality, skewed, or contain prejudiced content. This can lead to "hallucinations" where the AI produces outputs that sound convincing but are factually incorrect. Add in the positive feedback loops inherent in generative AI systems, where outputs influence future inputs, and small biases can quickly spiral out of control, scaling up to potentially result in poor outcomes in financial services use cases such as investment research or wealth management.

The advent of generative AI has heightened existing concerns regarding algorithmic risks in financial systems. A salient historical precedent is the 2012 Knight Capital incident, where a solitary algorithmic error resulted in a $440 million loss within a mere 45 minutes. Contemporary financial markets exhibit significantly greater complexity, and the integration of generative AI —characterized by its inherent unpredictability and non-deterministic nature— further amplifies potential risks.

In light of these evolving challenges, financial institutions must prioritize robust data governance frameworks and stringent data quality processes. These measures are critical to ensure the reliability, accuracy, and intended functionality of AI systems deployed in high-stakes financial environments.

Establishing robust data governance

To address these concerns, financial institutions must make data governance an organisational priority. In practical terms, this means defining clear roles, responsibilities, and processes for data management, ownership, usage, and sharing across the organisation. Establishing clarity in these areas can encourage the necessary cross-functional collaboration to break down internal silos impeding the use of data for AI systems.

What’s more, accountability and clear ethical guidelines can help mitigate the risk of generative AI systems becoming polluted by irrelevant or incorrect data. On a structural level, financial institutions that deploy secure and private AI models can be more confident that their AI systems are based on relevant and high-quality client data. By building in a way that aligns with responsible and ethical AI principles, they can also bolster the trustworthiness and credibility of their offerings to clients.

On an ongoing basis, investing in regular cleansing, curating, and reviewing of data —regular assessments and audits with human oversight— goes a long way toward combating hallucinations. As the volume and variety of data increase, identifying and correcting data quality issues can become more challenging. Human experts, who are closer to their clients than algorithms, can naturally sense-check for relevance. In this way, human intervention can make judgment calls that automated systems might miss. Complete reliance on automated processes without humans in the loop may allow errors and biases to go unnoticed.

The integration of AI in finance therefore necessitates a fundamental shift in how financial organizations manage their data estate. Robust data governance is now critical for operational success, not just a technical requirement, and the performance of AI systems is directly tied to data quality. By implementing strong data governance, financial institutions can improve the accuracy, reliability, and fairness of their AI systems. This approach aligns with ethical standards and regulations while building client trust.

Ultimately, effective data governance forms the foundation for AI systems that enhance strategic decision-making and mitigate risks. It's not just a safeguard, but a strategic imperative for financial institutions navigating the AI landscape.

By Dael Williamson, Chief Technology Officer EMEA at Databricks.
By Ramzi Charif, VP Technical Operations, EMEA, VIRTUS Data Centres.
Companies are facing a Catch 22 when it comes to the need to invest in new forms of AI, whilst...
By Mahesh Desai, Head of EMEA Public Cloud, Rackspace Technology.
By Narek Tatevosyan, Product Director at Nebius AI.
By Mazen El Hout, Senior Product Marketing Manager at Ansys.
By Amit Sanyal, Senior Director of Data Center Product Marketing at Juniper Networks.
By Gert-Jan Wijman, Celigo Vice President and General Manager, Europe, Middle East and Africa.