#64 Can I trust in data?

Can you trust your organization’s data? As an example: according to Talend´s 2021 data health survey, 44% of finance executives said that they make the majority of their decisions without relying on data. Why can´t decision-makers trust in data?


Today, data is available everywhere. Various devices, web applications, sensors and e-services collect data without even any active user intervention. Also people are producing more and more different kinds of data. Even if the machines themselves don´t make errors, erroneous data is still generated. So, instead of quantity, it eventually is nowadays more about trust in data.


what does it mean to trust in data?


Trust in data means trust that the data is healthy and can be used as a basis. Trust is the ability to successfully utilize data and use it to measure and develop business, for example. Just like in human relationships, including data, it is trust that ultimately matters. Trust should not be based on beliefs and assumptions.


How can you measure and estimate trust in data?


One key and probably the most important element of data trust is the quality of data. Data quality topic was more widely covered in previous post: #47 What does the quality of data matter? But the data quality isn´t all what is needed. For example, if the data isn´t available when needed or its access, ownership or usage rights are unclear, you cannot trust it.


Other significant dimension are:


Thoroughness - is the data clean, complete, and consistent?

Transparency - is the data accessible and understandable?

Interoperability and compatibility - Can you achieve data interoperability? Is the data compatible? Or harmonized?

Security - Are there clear process of protecting the data from unauthorized access and data corruption?

Timeliness - is the data up-to-date and readily available to the people who need it?

Traceable - Does the data tell you its origin, where it came from and how it has been used? Are the parties and their roles identifiable?

Tested - Has the data been rated and certified by other users? Or does the provider have descriptions of the testings?


By measuring all the different dimensions of data trust and trying to form some kind of measurement value or trust score, data users can be shown how trustworthy the data is.


When calculating the trust score, you can also ask the data provider and users for a rating (e.g. from 1 to 5) of the data and its different dimensions. Some of the trust score factors can be calculated automatically (machine-based), using for example AI and ML, according to defined conditions. For example, how comprehensively the data has a particular value, whether there are errors in the values or whether there are anomalies in the data? Data score can significantly affect data price and demand. Just like in other assets and products.


The rules of trust score calculation should always be clearly and transparently communicated to users and described as to where it should be used and where not.

In many cases, 100% trust is not needed or even achievable. It all depends on the business requirements. Often the most important thing is to get the data flowing at all, so that, for example, quality issues can be looked at and trust in the data can begin to develop.


The rules of trust score calculation should always be clearly and transparently communicated to users and described as to where data should be used and where not. Trust is built gradually, so try to concretize things and seek milestones.