Written By Matt McGuire, Senior Director of Product Management, Lakeside Software
AI is here to stay with 92% of organisations either already adopting a generative AI-based (GenAI) solution or planning to do so this year, according to a recent Enterprise Strategy Group white paper on “The essential role of data and data quality in IT-related AI model training,” Alongside this explosive growth in adoption, however, employees and business leaders are concerned about whether or not they can trust AI.
These concerns are not unwarranted. AI hallucinations, when AI offers false or misleading responses, are common. In fact, in March, OpenAI estimated ChatGPT hallucinated 15-20% of the time. Google AI just had another embarrassing failure with its odd answers.
With data as the cornerstone of artificial intelligence, these hallucinations are driven by bad data, as well as the ongoing input of new (and often more bad) data. Data provides the essential materials required to refine AI models, essentially training the models to recognize patterns and determine the next best response. The success of AI implementations hinges on the quality of the underlying data.
Imagine an AI model, or algorithm, is like a recipe for a cake. Would you be more apt to eat a cake made from fresh eggs, high-quality flour, and pure vanilla, or one made from spoiled milk and rotten eggs? Obviously, you wouldn’t trust a cake made of poor-quality ingredients, and similarly, businesses shouldn’t trust AI trained by bad data.
When training datasets are incomplete, outdated, mislabeled, or unstructured, AI models learn incorrect patterns, leading to faulty recommendations and decisions. This degrades the user experience and, by extension, erodes trust, increasing user dissatisfaction and disengagement. According to the Enterprise Strategy Group white paper, 38% of business leaders are worried about validating results and another 38% are worried about or trusting recommendations generated by their AI solution.
Using subpar data not only wastes valuable resources like time and money but also complicates data cleaning, preprocessing, and AI model refinement. While the consequences of eating a bad cake may just be a stomach ache, the consequences of poorly trained AI for businesses are considerable, including inefficiency, missed opportunities, revenue loss, and reputational damage, project delays and increased development costs, ultimately affecting overall business performance.
Ensuring data trustworthiness
Depth, breadth, and structure in data are crucial for effective AI models, as seen with a chatbot example where a bot trained with specific, domain-relevant data outperforms one trained on generic data. Both broadly sourced, contextual data across various systems as well as in-depth data from specific systems give AI a complete picture to be trained on. Historical data further enriches this by offering insights into trends and optimisation areas, essential for contextual awareness. The labelling and structure of data is paramount. Without high-quality, noise-free data, the presence of overlapping, non-integrated, or conflicting data from various tools can lead to trust issues and increased training costs without added value.
Organising data into relevant categories further enhances its use for GenAI models. For instance, if the data is financial, it represents different data entities, such as customers, products, transactions, wages, earnings, profits, etc. so it should be saved under each of those categories so multiple stakeholders can access it easily.
The right data for AI use, especially for AI trained for an enterprise IT team, means that it is error-free and complete, including all necessary information without missing values or gaps. Good data that is organised and well-labelled is also up-to-date and reflects the most recent information available, so recommendations and results will also be up-to-date and the most relevant to your business.
When the data being used for AI in businesses is well-labelled, organised, relevant and up-to-date, IT teams can trust the AI model and move to proactive and even predictive IT. At Lakeside, we have developed AI-driven endpoint monitoring software, and it gathers data from far and wide: 10,000 metrics from every endpoint at 15-second intervals, providing access to a continuous stream of insights into device performance and its impact on employee productivity and potential downtime. This level of quality data ensures that our purpose-built AI models provide accurate recommendations and increase trust levels.
For enterprise IT teams, the advantage of leveraging vast amounts of well-structured data from your own IT environment is that it ensures trustworthiness, eliminating the need to rely on potentially inaccurate or irrelevant external data. To ensure that the AI keeps improving, consistently introducing new information and insights is crucial for enhancing operational efficiency and making improvements. This process involves thorough analysis and management of data – with a human in the loop – across an organisation’s entire digital estate, particularly as employees increasingly use a variety of devices.
Well-organised, contextually rich, and comprehensive data forms the backbone of robust AI models, enhancing decision-making and operational efficiency. The quality of this data is crucial in fostering trust in AI systems and is key to unlocking their full potential, enabling more proactive and predictive IT strategies. Such strategies empower organisations to identify and resolve issues before they escalate, improving employees’ digital experiences with essential workplace tools and technology. So is it time to review and upgrade your data management practices to prepare for a wider use of AI?