The Pitfalls of Data Quality Assessment – GOV.UK
Data quality assessments assess whether the data is fit for purpose. This is the basis for improving data quality. In this article, we’ll cover what not to do if you want to better understand the quality of your data.
A data quality assessment is a process of evaluating data and measuring it against selected quality criteria such as completeness and validity. This also includes analyzing the cause and impact of quality issues and sharing the results. In order to properly resolve data quality issues, you need a strong and robust assessment process.
If your assessment isn’t good enough, you may think the data is good quality when it isn’t, or vice versa. Doing the wrong assessment can lead to the wrong actions being taken and this will impact your organization’s bottom line.
Lack of planning
Your data quality assessment will be more reliable if it is carefully planned. It is important to involve people with the skills and knowledge to carry out the assessment. Technical specialists and process specialists should work closely together to identify, process, test, and refine the business rules used to confirm and measure data quality.
Before the actual assessment takes place, make sure you have a good understanding of the data. It is useful to review any existing documentation on the dataset to be assessed, as this will speed up the assessment process.
What you learn from the evaluation will be very valuable. It is important that this is documented and recorded for future reference. Plan for your assessment to be reproducible. Keeping track of the assessment process and its findings will allow you to repeat the process. It will become an invaluable source of information for future data quality improvements. This will also maintain your team’s knowledge, ensure continuity and minimize risk.
Disregard the purpose of the data
Understanding the objective can help you identify what to measure, and this is especially important for communicating quality. If people don’t understand the purpose of the data, they won’t understand the impact of quality issues. Incorrect data in one part of the dataset may not have a significant impact on one purpose of the data, but it may cause quality issues for another use.
A good data quality assessment should be purpose-related and therefore an assessment should be repeated when a dataset is used for purposes other than originally intended.
measure the wrong things
Measuring data that doesn’t matter or has low risk not only wastes valuable time and resources, but also diverts attention from fitness for purpose. It is not always necessary to cover a complete data set. Instead, look at the parts of the data that are important to achieving your goal(s).
Focusing on fields of data that can be measured rather than what should be will lead to conclusions about the data that do not address their fitness for purpose. An assessment should focus on the data elements you need to be right. This can happen by making sure you have people with the right skills and experience to make that judgment.
Aiming to be perfect
Getting perfect data is not a realistic goal. Achieving 100% all the time cannot be a requirement. A dataset containing a single error may be sufficient for use. It can be more efficient to process data that has a few errors than to spend time trying to fix every error. What you should be looking for is fitness for the job, not perfection. Use realistic thresholds for your chosen quality criteria.
Examine data in isolation
Understanding the data lifecycle is important for meaningful assessment. Data may be subject to errors as it arrives. Data may be corrupted during processing. Sometimes data sets are put together in the wrong way and incorrect assumptions are made when data is combined. Before you start evaluating your data, it’s important that you understand the processes it has gone through to avoid misleading measurements.
Not being proactive
If you passively wait for a quality issue to be reported before investigating, it will have already had a negative impact on your organization. It is much safer to regularly check the quality of your data. This will prevent poor quality data from moving through the data lifecycle and allow you to identify issues before they cause greater damage.
The data quality assessment exercise should be a recurring process. Lessons learned from an evaluation will help to revise the objectives for the next exercise.
You can learn more about data quality assessment in our upcoming training on “Data Quality Action Plans”. The Government Data Quality Hub (DQHub) develops tools, guidance, and training to help you with your data quality initiatives. Please visit the Data Quality Hub website for articles, tools, and case studies.
We also offer personalized government-wide advice and support. Contact us by emailing [email protected]