Data Quality and the Spill Chucker

One of my favorite software tools is the spell checker, due to its entertainment value. Colloquially known as the spill chucker due to the fact that if you mistype spell checker as spill chucker, the spell checker identifies that both “spill” and “chucker” are valid words, the spell checker has no concept of context. I was reminded of this the other day, when I received a resume from someone who had two stints as an “Account Manger” and was then promoted to “Senior Account Manger” 🙂 It would be very useful if the spell checker dictionary was more easily customizable, because then most business users (and probably all job applicants) would no doubt remove “Manger” from the dictionary as they have no need to use the word, or it is so infrequent that they’re happy for the spell checker to question it.

We have the same challenges with Data Quality – most data items are only correct if they are in the right context. For example, if you have a column in a table that contains last names, and then find a record that contains a company name in the last name column, it is out of context and is poor quality data. Another example I encountered nearly 20 years ago was reported in a computer magazine – a major computer company addressed a letter to:

Mr David A Wilson
Unemployed At Moment
15 Lower Rd
Farnborough
Hants
GU14 7BQ

Someone had faithfully entered what Mr. Wilson had written in the job title field rather than enter it in a Notes field – maybe the database designer hadn’t allowed for notes.

Effective Data Quality tools must allow for poorly structured data – they must be able to recognize data that is in the wrong place and relocate it to the right place. You can’t match records, correct addresses effectively etc. unless you can improve the structure of poorly structured data. Of course, the context can depend on the language – even British English and American English are different in this respect. I remember when we at helpIT first Americanized our software over 10 years ago, coming across a test case where Earl Jones was given a salutation of “My Lord” rather than simply “Mr. Jones”! Of course, “Earl” is almost certainly a first name in the US but more likely to be a title in the UK. Often, it isn’t easy programming what we humans know instinctively. Salutations for letters derived from unstructured data can be a major source of discomfort and merriment e.g. MS Society is an organization, not to be addressed as “Dear Ms Society”. The landlord at The Duke of Wellington pub shouldn’t receive a letter starting “My Lord”. “Victoria and Albert Museum” is an organization not “Mr & Mrs Museum”, even if it hasn’t been entered in the Organization column.

But going back to spell checkers, maybe they’re sometimes more intelligent than we give them credit for? Just the other day, mine changed what I was attempting to type: “project milestones” to “project millstones”. I did wonder whether it knew more than I did, or maybe it was just feeling pretty negative that day…

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply