Dirty data most often refers to erroneous, incomplete or poorly recorded data, and its effects can be costly, especially in the healthcare industry. In this blog, SpendVu defines dirty data, how it comes about and what to do about it.

When we talk about the quality of data, we’re most often referring to the number of errors contained within a particular data set. The more errors the set contains, the dirtier the data is considered to be. Those errors can originate from different sources, the most common being sloppy data entry habits, a lack of consistency and holding onto data longer than necessary. Dirty data is something that must be accounted for if you hope to reap the benefits of data-driven decision making.

What does dirty data look like?

In the healthcare supply chain, dirty data often refers to missing or incomplete records, duplicated orders, or multiple representations of the same data across various locations and repositories. An example would be one system recording the manufacturer as ‘Johnson & Johnson’, while another has the same manufacturer recorded as ‘Johnson and Johnson’. Other types of dirty data include misleading entries, such as not asking for Unit of Measure (what does that ‘3 ordered’ really mean in the PO?!), inaccurate information, numbers (like currency or date) existing outside of reasonable ranges, and generally any other data that is incorrectly or inaccurately recorded.

Dirty data doesn’t necessarily have to be incorrect for it to be inaccurate. A physician might request new Medical Services and write the incorrect abbreviation of ‘MS’; the Materials Management Director may review that form and interpret ‘MS’ as ‘Medical Surgical’ instead of ‘Medical Services’. In that case, no red flags would be raised until it was too late because ‘MS’ isn’t technically incorrect.

Why is it important to keep your data clean?

While mistakes may ruffle a few feathers, it’s nothing a little extra work can’t put straight, right? Wrong! In the healthcare supply chain, there’s more at stake than decreased productivity. Mistakes made due to dirty data can be extremely costly when supplies sit idle and unused, or are sent to the wrong clinic, only to have to be shipped to another immediately thereafter. It may also mean under- or over-hiring staff, and causing massive delays, as you remedy a mistake. Most importantly, lives dependent on those supplies can be put at risk if they don’t arrive on time, or at all.

What are the problems with keeping old data?

As data is gathered over time, various employees are bound to work with and manipulate it (some more meticulously than others). This means inconsistencies arise as data ranges and values are adopted or abandoned, incomplete data and errors are accrued, and duplicates are made. Also, as an organization grows and matures, new products and management systems are adopted and retired, meaning your data could be spread out over multiple locations, different formats – printed, electronic, etc. With such a spread in data, it’s practically impossible to track all contracts, as well as the various suppliers and vendors, let alone begin to reduce unnecessary data.

How to keep it clean?

Curing dirty data isn’t an overnight solution. As the old adage goes, ‘An ounce of prevention is worth a pound of cure’. And while that sounds great, depending on where you are in your data clean-up, prevention might not even be feasible at that point. Before you can prevent dirty data from occurring, you need to understand how far reaching current dirty data is. So, here are a few things to look for:

  • Do you use a standardized nomenclature for new projects and contracts?
  • Do you have a system in place to track expiring contracts, e.g. 30, 60, 90 days out?
  • Do you have a vendor onboarding process?
  • Do you have regular audits for your vendors and manufacturers, to ensure there are no duplicate records?

Data cleaning means going through your data with a fine-tooth comb, and noting where incorrect or absent values could be hurting accuracy. Once you’ve targeted and tackled the areas that need a clean-up, begin looking at tools and incorporating business processes that can help you prevent this from lapsing back into a dirty state, such as:

  • A Contract Administration tool, where you can consolidate all of your various repositories
  • A Vendor Management System that has a strict vendor onboarding process
  • Standardized naming conventions
  • Due Diligence Committees

Going forward, enforce a set of best practices and focus on a set of universal conventions for a more holistic collection of records. The longer you wait, the dirtier your data will become and the less data-informed your decisions will be. In an age where data is overwhelmingly abundant and of paramount importance, not taking full advantage of the insights, ideas and cost savings that data decisions bring, will only run your supply chain budget dry.