Three reasons why businesses still distrust their data

10 May 2023 6 min. read
More news on

Internal appetite for trusted operational data has never been greater, so why are there still trust and quality issues in many organisations, asks Rajith Haththotuwegama, Data Analytics & Automation Manager at Tecala.

Six years is a long time in technology. Reflecting on that timeframe, one is likely to characterise it as a period of incredible change.

And yet there’s one constant issue that’s nagged at business teams and leaders across this whole period of time: data quality. Indeed, as far back as seven years ago, data quality was identified by KPMG’s Global CEO Outlook as a key concern among C-level management.

This is similarly reflected today in recent MIT Research, where data quality is second only to a related data domain – governance – in the priority list of chief data officers.

Three reasons why businesses still distrust their data

Today, Australian businesses have renewed their focus on data quality in the first part of 2023. This is being driven by a sharper need to understand what’s happening in different parts of operations and to use that to drive better business decision-making.

During challenging operating circumstances, accurate data is naturally in high demand to help businesses and leaders make more data-informed decisions. Trusted data is a crucial input into restructures, relaunches and refocuses, into growth ventures or pivots, into achieving efficiency and navigating the challenges of staff shortages. In short, good data can be used to drive great outcomes.

However, it’s clear not every business has good, clean data. It’s clear that many have fallen, and will continue to fall, short of the mark in terms of achieving a level of data quality that everyone in the business trusts.

Businesses typically fall short in three key areas. They do not have a way of getting information in a timely manner; they do not understand the provenance of the data – where it came from, who touched it and how it was influenced by different business rules; and they don’t fully comprehend the risks to the business of using or interpreting the data in certain ways.

Data quality may be a difficult challenge to solve, but it’s not impossible. It’s worth digging into each of these three key areas in more detail to raise awareness of these pitfalls and to drive actions that can either build or enhance quality –  and trust – in internal datasets.

1) Timeliness matters

A common problem in businesses is data jobs such as collation across silos for reporting purposes is a largely manual task that can take weeks to complete. By the time a formal report is pulled together and is ready for consumption, the data and insights contained within it are dated.

Given business pressures to use data to drive decision-making, data access needs to be real-time or near real-time in order to be a trusted input into decisions.

2) Data provenance (data lineage)

Provenance combines the (related) problems of data governance and quality. To ensure a dataset can be trusted, the lineage of the data needs to be both established and documented in a way that anyone can review it and use it with the same high level of assurance.

How do you know where the data came from, who touched it, and what business rules or transformations were applied to it before it came to be in your possession? Is the data complete – for example, if certain fields were incomplete or blank at the time of collection, how is this handled? Are these line items included or excluded in the full dataset, and does this have a material bearing on the aggregate result?

Without transparency into the history, lineage and treatment of the data, it becomes difficult to trust that the dataset is accurate, and therefore that any decision made with reference to the data is accurate as well.

3) Accounting for other risk factors

One of the challenges of using time series data today is that data collected before 2020 may now be outdated or irrelevant, because the pandemic has changed operating conditions and ways of working to such a significant extent. Businesses that thought they had a deep multi-year dataset that could be used to identify past patterns and predict future performance, may now only have a couple of years of useful, stable data on which to base their decisions.

Organisations should also consider how the tools they use collect and store data. A common error experienced by Australian users of US-hosted SaaS applications, for example, is that the data may be created with a US timestamp, leading to day-of-the-week trends being misinterpreted. Any pattern and trend analysis on the data needs to either take into account timezone conversion, or better yet, the data should be standardised before it’s ingested by a reporting suite.

Building trust in data
Improving data quality and governance requires appropriate resourcing and the support of systems that implement and maintain appropriate guardrails and standards. According to the Data Management Body of Knowledge framework, the following goals should be established:

1) Develop a governed approach to make data fit for purpose based on data consumers' requirements.
2) Define standards, requirements, and specifications for data quality controls as part of the data lifecycle.
3) Define and implement processes to measure, monitor, and report on data quality levels.
4) Identify and advocate for opportunities to improve the quality of data, through process and system improvements.

While data quality dimensions such as accuracy, timeliness, lineage and consistency are crucial,  other dimensions also advised by the Data Management Body of Knowledge framework can include:
• Completeness – degree to which all required data is present
• Integrity – referential integrity or internal consistency
• Reasonability – data pattern meets expectations
• Uniqueness/Deduplication – No entity instance occurs more than once within the data set

The list can vary based on which data quality framework is referenced. Mid-market organisations often do not have the budget or the constant flow of interesting work to have their own data teams. Additionally, the systems containing source data, especially SaaS platforms, may not be set up with APIs or other simple ways to export data for centralised analysis and reporting purposes.

It can make sense to put this in the hands of an experienced partner that has access to sophisticated tooling and the depth of resources required to manage data infrastructure end-to-end and streamline data extraction and ingestion processes. Business users can then simply focus on finding meaning and making decisions from a starting point of having trusted data.

A previous insight from Rajith Haththotuwegama: Escaping the inflexible parts of work with robot companions.