Data Quality and the Spill Chucker

One of my favorite software tools is the spell checker, due to its entertainment value. Colloquially known as the spill chucker due to the fact that if you mistype spell checker as spill chucker, the spell checker identifies that both “spill” and “chucker” are valid words, the spell checker has no concept of context. I was reminded of this the other day, when I received a resume from someone who had two stints as an “Account Manger” and was then promoted to “Senior Account Manger” 🙂 It would be very useful if the spell checker dictionary was more easily customizable, because then most business users (and probably all job applicants) would no doubt remove “Manger” from the dictionary as they have no need to use the word, or it is so infrequent that they’re happy for the spell checker to question it.

We have the same challenges with Data Quality – most data items are only correct if they are in the right context. For example, if you have a column in a table that contains last names, and then find a record that contains a company name in the last name column, it is out of context and is poor quality data. Another example I encountered nearly 20 years ago was reported in a computer magazine – a major computer company addressed a letter to:

Mr David A Wilson
Unemployed At Moment
15 Lower Rd
Farnborough
Hants
GU14 7BQ

Someone had faithfully entered what Mr. Wilson had written in the job title field rather than enter it in a Notes field – maybe the database designer hadn’t allowed for notes.

Effective Data Quality tools must allow for poorly structured data – they must be able to recognize data that is in the wrong place and relocate it to the right place. You can’t match records, correct addresses effectively etc. unless you can improve the structure of poorly structured data. Of course, the context can depend on the language – even British English and American English are different in this respect. I remember when we at helpIT first Americanized our software over 10 years ago, coming across a test case where Earl Jones was given a salutation of “My Lord” rather than simply “Mr. Jones”! Of course, “Earl” is almost certainly a first name in the US but more likely to be a title in the UK. Often, it isn’t easy programming what we humans know instinctively. Salutations for letters derived from unstructured data can be a major source of discomfort and merriment e.g. MS Society is an organization, not to be addressed as “Dear Ms Society”. The landlord at The Duke of Wellington pub shouldn’t receive a letter starting “My Lord”. “Victoria and Albert Museum” is an organization not “Mr & Mrs Museum”, even if it hasn’t been entered in the Organization column.

But going back to spell checkers, maybe they’re sometimes more intelligent than we give them credit for? Just the other day, mine changed what I was attempting to type: “project milestones” to “project millstones”. I did wonder whether it knew more than I did, or maybe it was just feeling pretty negative that day…

Assessing Your Data Quality Needs

So you have data quality issues. Who doesn’t? Should you embark on a data quality project? Maybe but what are your objectives? Are there service issues related to poor data quality? Marketing issues? Other major integrations or warehousing projects going on? And once you clean up your data – then what? What will you do with the data? What benefit will a clean database pose for your organization? And without clear objectives, how can you even justify another major technology initiative?

Before any data quality project, it is critical to go beyond the immediate issues of duplicate records or bad addresses and understand the fundamental business needs of the organization and how cleaner day will enable you to make better business decisions. This will help you to establish accurate project parameters, keep your project on track and justify the investment to C level executives. So where do you begin? At the beginning.

Look beyond the pain.
In most cases, a specific concern will be driving the urgency of the initiative but it will be well worth the effort to explore beyond the immediate pain points to other areas where data is essential. Plan to involve a cross-section of the departments including IT, marketing, finance, customer service and operations to understand the global impact that poor data quality could be having on your organization.

Look back, down and forward.
Consider the data quality challenges you’ve had in the past, the ones you face today and the ones that have yet to come. Is a merger on the horizon? Is the company migrating to a new platform? Do you anticipate signficant staffing changes? Looking ahead in this way will ensure that the investment you make will have a reasonable shelf-life.

Look at the data you don’t have.
As you review the quality of the data you have, also consider what’s missing and what information would be valuable to customer service reps or the marketing department. It may exist in another data silo somewhere that just needs to be made accessible or it could require new data be collected or appended.

Be the customer.
Call the Customer Service Department and put them through the paces. Sign up for marketing materials online. Place an order on the website. Use different addresses, emails and nicknames. Replicate perfectly reasonable scenarios that happen every day in your industry and see how your infrastructure responds. Take good notes on the places where poor data impacts your experience and then look at the data workflow through fresh eyes.

Draw out the workflow.
Even in small organizations, there is tremendous value in mapping out the path your data takes through your business. Where it is entered, used, changed, stored and lost. Doing this will uncover business rules (or lack of) that are likely impacting the data, departments with complementary needs and or places in the workflow where improvements can be made (and problems avoided).

Think big and small.
Management and C-Level executives tend to think big. Data analysts and technical staff tend to think granularly and departmental users usually fall somewhere in the middle. Ultimately, the best solution can only be identified if you consider the global, technical and strategic business needs.

The challenges with identifying, evaluating and implementing an effective data quality solution are fairly predictable but problems almost always begin with incorrect assumptions and understanding of the overall needs of the organization. In some cases, the right data quality vendor can help you move through this process but ultimately, failure to broaden the scope in this way can result in the purchase of a solution that does not meet all the requirements of the business.

Click here to download a comprehensive Business Checklist that will help you to identify the data quality business needs within your organization. Then stay tuned for our next post in our Data Quality Project series.