Data Excellence as the Foundation for AI Success Amidst the ‘Good Enough’ Data Perspective

“79% of corporate strategists say that analytics, artificial intelligence (AI) and automation will be critical to their success over the next two years.” - Gartner 

While the attention surrounding AI is justified, many enterprises at the forefront of AI adoption have learned that the success of AI initiatives heavily depends on data excellence: the quality, trustworthiness, security, and compliance of the data they feed their models.  

An emerging perspective suggests enterprise data quality may be sufficient for initial AI adoption in some domains. This article explores this view’s potential benefits and risks and responds with the ongoing critical role of data excellence in enabling successful AI initiatives. It also examines the risks of prioritizing AI without first addressing data challenges and provides best practices for building a strong data foundation.   

Data’s Evolution 

Data has undergone a significant evolution through four distinct stages in recent years. How organizations have progressively leveraged data highlights the importance of data excellence in the era of AI.  

1. The Data-Aware Stage 

In the 90s, during the Data-Aware stage, enterprises recognized the existence of data primarily stored in relational database management systems (RDBMS). The focus was on operational reporting, with minimal strategic value placed on the data itself.  

2. The Data-Informed Stage 

From 2000 to 2010, the Data-Informed stage saw the emergence of faster RDBMS like Oracle and data warehouses like Teradata, enabling organizations to make informed decisions based on historical data.  

3. The Data-Enabled Stage 

The Data-Enabled stage, from 2010 to 2018, witnessed an explosion of data sources, including social media, third-party data, and unstructured data, empowering organizations to leverage data for personalized services and predictive analytics.  

4. The Data-Driven/Event-Driven Stage  

Finally, from 2018 onwards, the Data-Driven/Event-Driven stage has been characterized by the widespread adoption of cloud platforms and the productization of data, with business users able to access and utilize data based on their specific domain needs. 

 

The Emerging Counter Perspective: Is Enterprise Data Quality Already Sufficient for AI Adoption? 

In recent years, an emerging perspective has gained traction, suggesting that data quality in most enterprises is already sufficient to build the initial set of AI, machine learning, and analytics use cases for specific domains. Entire businesses have emerged based on this perspective. For example, Algomarketing leverages AI in algorithmic marketing with imperfect data.  

This viewpoint challenges the more established position that extensive data quality initiatives are a prerequisite for successful AI adoption. 

Potential Benefits and Risks of the Emerging Perspective 

Proponents of this perspective argue that the focus should be on leveraging existing data assets rather than striving for perfect data quality from the outset. They contend that by utilizing readily available data, organizations can accelerate their AI adoption and realize value more quickly.   

Other benefits include:

  • Faster time-to-market for AI solutions: Organizations can bypass lengthy data quality improvement projects and instead focus on developing and deploying AI models.
  • Reduced costs and resources: By using existing data assets, organizations can avoid significant investments in data cleansing, transformation, and enrichment processes, freeing up valuable resources for AI development and implementation efforts.
  • Iterative improvements in data quality: As organizations deploy AI models and gain insights from their data, they can identify specific areas where data quality improvements are needed, allowing for a targeted approach to data quality efforts. 

However, it is crucial to recognize this emerging perspective’s potential risks and limitations.  

These risks include: 

  • Suboptimal AI model performance and decision-making: AI models trained on incomplete, inaccurate, or biased data will likely produce unreliable or misleading results, leading to flawed decision-making and potentially harming an organization's reputation.
  • Challenges in scaling AI initiatives: Without a strong foundation of high-quality data, organizations may struggle to maintain the accuracy and reliability of their AI models as they expand their AI capabilities and deploy models across various business functions.
  • Biased or untrustworthy AI outcomes: If the data used to train AI models contains inherent biases or lacks representativeness, the resulting AI systems may perpetuate or amplify those biases, leading to discriminatory or unfair decisions and exposing organizations to legal and reputational risks.  

Despite the appeal of this viewpoint, a closer examination reveals the critical importance of upholding the four pillars of data excellence to truly enable successful AI initiatives. 

 

The Four Pillars of Data Excellence 

Defining data excellence through its four core components is a necessary precursor to understanding how it enables AI success. These four pillars form the foundation upon which organizations must build to ensure their AI models are trained on accurate, reliable, and responsible data. 

1. High Quality 

Forrester’s Data Quality Analyst, Jayesh Chaurasia, describes high-quality data as accurate, complete, consistent, reliable, and timely. Key practices to ensure high-quality data include establishing data governance policies, monitoring quality metrics, and continuous assessment. 

2. Trustworthy 

Trustworthy data  adheres to the five Ts of trust: it is transparent (clean and compliant), thorough (complete), timely (available when needed), trending (highly used and rated by experts), and telling (verifiable). Methods to meet these criteria include data lineage tracking, data quality monitoring, and third-party data validation. 

3. Secure

Secure data  involves processes and tools that protect sensitive information assets in transit and at rest. Principal methods to achieve data security include encryption (protecting data from unauthorized access), masking (substituting sensitive data with representative tokens), erasure (reliably deleting unused data), and resilience (creating backup copies for recovery). 

4. Compliant 

Compliant data  meets relevant standards, regulations, frameworks, and legislation that define how data should be securely managed throughout its lifecycle. Fundamental approaches to ensure compliance include identifying applicable data governance requirements (e.g., NIST, ISO, GDPR, HIPAA), implementing controls and policies to meet those criteria, conducting audits, and securing management support for compliance initiatives. 

The Importance of Investing in Data Excellence for AI Sucess  

The maturity of an organization's overall data capabilities significantly impacts its readiness for AI. While data maturity encompasses a broad range of factors, from governance to architecture, the core components of data excellence are essential prerequisites for successful AI initiatives.  

Although many organizations have made progress toward data maturity and data excellence, there is still room for improvement in one or more of these key areas. 

 

Data diagram 1 (1)

Sastry, Nagaraj. Elevating from Unstructured to an Intelligent Self-sufficient Data Platform. Encora. April 29, 2024

This lack of mature, high-quality data can ultimately hinder the success of AI projects. 

Prioritizing AI, machine learning, and generative AI without first addressing data challenges can lead to several risks, such as: 

  • Flawed AI/ML models and biased outcomes due to poor data quality
  • Loss of trust among stakeholders because of unreliable or biased AI results
  • Wasted investments in AI initiatives that fail to deliver value due to underlying data issues 

Many real-world examples of failed AI initiatives because of poor data continue to emerge:  

  • In the Q1 2022 earnings call of an anonymous U.S.-based video game software company, they disclosed a $110 million reduction in their 2022 revenue due to a fault in their platform that reduced the accuracy of their machine learning model used for advertising. The issue was compounded by ingesting poor data from a large customer, which corrupted the model's training data. 
  • Amazon developed an AI system to streamline its recruitment process. However, the system produced discriminatory results against women because it was trained on resumes submitted to Amazon over a ten-year period, which were predominantly from male job seekers. As a result, Amazon discontinued the AI-driven hiring program. 
  • Canada's flagship airline, Air Canada, suffered financial losses stemming from an adverse court ruling, triggered by a chatbot's confabulations, which provided erroneous responses to questions about the company’s bereavement rate policy.” This led to legal issues and subsequent financial consequences for the airline. 

Investing in data excellence before prioritizing AI initiatives is crucial for several reasons, including: 

  • High-quality, trustworthy data ensures that AI models are trained on accurate and reliable information, leading to better outcomes and decisions. 
  • Secure data protects sensitive information and prevents breaches that could harm the organization's reputation and erode trust in AI systems. 
  • Compliant data helps organizations meet regulatory requirements and avoid legal and financial repercussions. 

Everest Group’s research found that most organizations recognize the importance of data excellence but struggle with implementation. Regarding customer experience, for example, Everest Group states that while 94% of surveyed enterprises identified data and analytics as crucial for CX objectives, many still face challenges in realizing the full benefits of AI due to issues such as skills gaps, outdated technology, data security concerns, poor data quality, and resistance to change. 

To successfully implement AI, how can organizations embrace data excellence and prioritize foundational data improvements? 

Building a Strong Data Foundation 

A strong data foundation is the bedrock upon which successful AI initiatives are built. ‘Data Foundation’ refers to the fundamental infrastructure, processes, and strategies that ensure effective data collection, management, storage, organization, and utilization. A robust data foundation’s key features include the four pillars of data excellence (security, compliance, trustworthiness, and high quality) as well as data cleansing, governance, and metadata management, etc. 

data diagram 2 (1)

Sastry, Nagaraj. Capabilities of Data Management & Analytical Platforms. Encora. April 29, 2024 

 

Best Practices and Recommendations for Implementation 

While the specific recommendations may vary depending on a company's unique needs and circumstances, there are general best practices that can be applied across all organizations to ensure a solid data foundation that upholds the four pillars of data excellence: 

1. Prioritize data quality, governance, and security initiatives: Implement robust data quality controls, establish clear data governance policies, and ensure the security of sensitive data to guarantee accurate, consistent, and protected data for AI models. 

2. Foster a data-driven culture and promote data literacy: Strive to reach the Data-Driven stage, where data is fully productized, and empower employees with the skills and tools to effectively leverage data in their decision-making processes. 

3. Collaborate with trusted partners: Work with experienced providers of data management solutions and AI technologies to tap into their expertise and best practices, streamlining data initiatives and receiving guidance on data architecture design, governance frameworks, and AI implementation strategies.

4. Explore innovative approaches to data governance: Leverage AI to monitor and improve data quality by using machine learning algorithms to detect and correct data anomalies. Consider decentralized data management techniques like blockchain to ensure data security and privacy in AI initiatives. 

5. Adopt a DataOps approach for data management: Implement DataOps practices, which apply DevOps principles to data management. These principles emphasize collaboration, automation, and continuous improvement to break down data silos, accelerate data pipeline development, and ensure data quality and reliability. 

6. Leverage data as a product: Move away from viewing data as a mere byproduct. Treat it as a strategic asset by creating well-defined data products. These data products cater to specific business needs, with ownership and quality control residing with the domain experts who understand the data best. 

7. Utilize a data mesh architecture: Empower business functions through a decentralized data management approach. Data Mesh distributes data ownership and management to domain-specific teams. These teams are accountable for the entire data product lifecycle, from acquiring the data to ensuring its quality and accessibility. This fosters faster data delivery, improved data quality through domain expertise, and a collaborative data ecosystem. 

Conclusion: Embracing Data Excellence as the Bedrock of AI Sucess  

The success of AI initiatives heavily relies on the four pillars of data excellence. While an emerging perspective suggests that enterprise data quality may already be sufficient for initial AI adoption, there are risks. 

The way forward lies in embracing data excellence as the bedrock of AI. Organizations must prioritize foundational data improvements, foster a data-driven culture, collaborate with trusted partners, and explore innovative approaches to data management, such as leveraging AI itself to monitor and improve data quality. 

 

SPEAK WITH AN EXPERT

 

Share this post

Table of Contents