Data Governance: The Foundation for Trustworthy and Effective GenAI

By Marc Potter, CEO, Actian.

  • 1 month ago Posted in

The rise of Generative AI (GenAI) is undeniable, with its potential to transform industries and the way we live and work. In fact, 58% of UK businesses are already piloting GenAI use cases, eager to harness its power. 

Successful GenAI models rely on vast amounts of accurate, relevant, and trustworthy data for training and effective operation, so data quality is critical. Data governance is equally important as it provides the overarching framework for deploying and orchestrating the internal control of data in all its dimensions – including defining roles, responsibilities, company policies, procedures and controls.

Unfortunately, there is a huge disconnect in the perception of data readiness for GenAI. In an Actian survey, 79% rated their data quality and cleanliness as “outstanding” or “above average”. However, when Gartner asked the people in charge of AI data readiness, only 4% said they were ready.

Data challenges for GenAI

Data quality in the context of any GenAI initiative cannot be overstated, and the consequences of neglecting data quality are significant. According to Gartner, at least 30% of GenAI projects will be abandoned after proof of concept by the end of 2025. This alarming statistic highlights the pitfalls of poor data quality, inadequate risk controls, escalating costs, and unclear business value. 

The obstacles to realising the benefits of GenAI go far beyond issues with data quality and cleanliness. Data legacy, characterised by rigid silos and fragmented knowledge, undermines data quality, as does a lack of understanding the data’s lineage. Outdated and erroneous data cripple AI models, turning them into unreliable tools and leaving organisations struggling to realise their full potential. The absence of robust data governance practices further erodes data value and trust.

Additional challenges include the rise of data privacy regulations, such as GDPR and CCPA, which require strong data governance practices to ensure compliance and mitigate legal risks.

Benefits of a data governance framework

By prioritising data governance, organisations can mitigate these risks and increase the likelihood of successful GenAI implementation. 

A comprehensive data governance framework should manage data as a valuable asset throughout its lifecycle. This approach empowers organisations to elevate data from a technical byproduct to a strategic asset. It also fosters a data-driven culture, leading to informed decision-making and increased business value. 

A well-designed governance framework enables data utilisation for GenAI initiatives while maintaining regulatory compliance and adherence to industry-specific regulations and data privacy laws. Innovating while operating responsibly and ethically builds trust with customers and regulators and supports long-term success.

Data governance also bridges the gap between business and IT. Aligning technical data management with business objectives enhances collaboration and accelerates value creation from GenAI projects. Clear communication channels and shared goals ensure that GenAI initiatives align with overall business strategy, maximising the potential for positive impact and return on investment.

The importance of data catalogs and data lineage

Key concepts within data governance are data catalog and data lineage, which are key enablers for responsible and effective AI utilisation. 

A data catalog inventories and classifies all usable datasets in an organisation, describes their content and characteristics, and provides the information needed to use the data in accordance with internal procedures. The data catalog enables operational teams working on the data to discover, identify, understand and select the datasets they need to create value. It also enriches the metadata for users to better select the datasets and use them in data projects. 

Data lineage provides a comprehensive understanding of the data's lifecycle, including its origin, transformations, and manipulations to enhances transparency, auditability, and data readiness. This historical record is essential for pinpointing potential sources of errors or inconsistencies, allowing for corrective action to ensure the data used for training and powering GenAI models is accurate and reliable. 

The future of data governance

The future of data governance presents exciting possibilities as technology and organisational approaches continue to evolve to encompass data intelligence. AI-powered automation holds immense potential in enabling metadata generation and data descriptions to enable data usage metrics and scorecards. Additionally, the automation enhances the accuracy and speed of data governance activities, ensuring that data remains trustworthy and readily available for business use.

As organisations increasingly rely on real-time data for insights and decision-making, data governance frameworks must adapt to support this shift. This involves implementing real-time data quality monitoring, risk mitigation, and compliance checks. By ensuring that data is accurate, secure, and compliant in real-time, organisations can make better informed decisions and respond rapidly to changing market dynamics or emerging opportunities.

Importantly, data quality and data governance should always be viewed in terms of how they contribute to the business outcomes. This is not a one-time project, but an ongoing journey of continuous improvement. 

No longer a purely IT-centric function, data governance is rapidly evolving into a company-wide responsibility with ownership across business units. By prioritising data readiness and adopting a tailored governance framework, organisations can confidently leverage their data to fuel GenAI success.

By David de Santiago, Group AI & Digital Services Director at OCS.
By Krishna Sai, Senior VP of Technology and Engineering.
By Danny Lopez, CEO of Glasswall.
By Oz Olivo, VP, Product Management at Inrupt.
By Jason Beckett, Head of Technical Sales, Hitachi Vantara.
By Thomas Kiessling, CTO Siemens Smart Infrastructure & Gerhard Kress, SVP Xcelerator Portfolio...
By Dael Williamson, Chief Technology Officer EMEA at Databricks.