helpIT Systems is Driving Data Quality

For most of us around the US, the Department of Motor Vehicles is a dreaded place, bringing with it a reputation of long lines, mountains of paperwork and drawn out processes. As customers, we loathe the trip to the DMV and though while standing in line, we may not give it much thought  – the reality is, poor data quality is a common culprit of some of these DMV woes. While it may seem unlikely that an organization as large and bureaucratic as the DMV can right the ship, today, DMV’s around the country are fighting back with calculated investments in data quality.

While improving the quality of registered driver data is not a new concept, technology systems implemented 15-20 years ago have long been a barrier for DMVs to actually take corrective action. However, as more DMVs begin to modernize their IT infrastructure, data quality projects are becoming more of a reality. Over the past year, helpIT has begun work with several DMVs to implement solutions designed to cleanse driver data, eliminate duplicate records, update addresses and even improve the quality of incoming data.

From a batch perspective, setting up a solution to cleanse the existing database paves the way for DMVs to implement other types of operational efficiencies like putting the license renewal process online, offering email notification of specific deadlines and reducing the waste associated with having (and trying to work with) bad data.

In addition to cleaning up existing state databases, some DMVs are taking the initiative a step further and working with helpIT to take more proactive measures by incorporating real-time address validation into their systems.  This ‘real-time data quality’ creates a firewall of sorts, facilitating the capture of accurate data by DMV representatives – while you provide it (via phone or at a window). With typedown technology embedded directly within DMV data entry forms, if there is a problem with your address, or you accidently forgot to provide them with information that affects the accuracy, like your apartment number or a street directional (North vs. South), the representatives are empowered to prompt and request clarification.

Getting your contact data to be accurate from the start means your new license is provided immediately without you having to make another visit, or call and wait on hold for 30 minutes just to resolve the problem that could have been no more than a simple typo.

Having met several DMV employees over the past year, it’s obvious that they want you to have an excellent experience. Better data quality is a great place to start. Even while DMV budgets are slashed year after year, modest investments in data quality software are yielding big results in customer experience.


If you want to learn more about improving the quality of your data, contact us at 866.332.7132 for a free demo of our comprehensive suite of data quality products.

Any Advance on Soundex?

A lot has been written about “phonetic algorithms” since Soundex was created for the US Census in (I think) 1880, but the world seemed to stand fairly still until computer software started to implement name matching in the 70’s. The strange thing is that Soundex seems to have remained the de facto standard until well into the 90’s, even up to the present day – strange because Soundex is manifestly prone to false matches as well as missing some typical true name matches. To explain why this happens, let’s look at how Soundex works:

Soundex constructs a crude non-phonetic key by keeping the initial letter of the name, then removing all vowels, plus the letters H, W and Y, and translating the remaining letters to numbers e.g. Tootill and Toothill both become T340. It gives the same number to letters that can be confused e.g. ‘m’ and ‘n’ both become 5. It also drops repeated consonants and consecutive letters that give the same number e.g. S and C. To illustrate some of the issues: Soundex translates Brady, Beard and Broad all to B630, and Wilkins and Wilson both to W425, and yet doesn’t match Knot and Nott – let alone more challenging examples like Dayton and Deighton.

A lot of the work done in the 70’s and 80’s focused on shoring up Soundex’s more obvious symptoms, rather than addressing the root problem – it doesn’t attempt to understand how people pronounce names. In the last 20 years, there have been various “phonetic” algorithms enter the public domain but Metaphone 3 is the only one that I know of that doesn’t invariably disregard the vowels after the first letter of the name.

Much of the material I read back in 1995 (when searching for a phonetic algorithm that worked better than Soundex) started off on the wrong tack by adopting a similar approach to Soundex. Often, the authors quoted various sounds to support their solutions which are rarely present in names of people e.g. the different “ua” sounds in “persuade” and “casual”, the “pt” in “Ptolemy”. Even when I’ve revisited the subject in the intervals since then, there has been little advance. Back in 1995, I decided that we at helpIT would write our own genuinely phonetic algorithm, and I laid down these requirements:

  • It must produce phonetic codes that represent typical pronunciations
  • It should focus on “proper names” and not consider other words
  • It should be loose enough to allow for British and American regional differences in pronunciation (e.g. “Shaw” and “Shore”) but not so loose as to equate names that sound completely different
  • It should not try and address other forms of fuzzy matching that arise from keying or reading errors and inconsistencies – phonetic matching should, by definition, simply address words that may sound the same.

The last point is important – the most effective way to match names is to use a combination of techniques to allow for different kinds of error, rather than try to create one algorithm that is a jack of all trades. I will describe our phonetic algorithm in a later post, but software development, like data quality, is a journey and not a destination – so we’re always looking to improve. If any of our readers know of any other algorithms that attempt to genuinely represent the phonetic attributes of names, I would be delighted to hear from you.

Steve Tootill (stEv tWtyl)

Real-Time ERP Data? Yes.

Over the past several months our findIT S2 application has been gaining some significant traction as a real-time duplicate prevention and record linking solution on websites, CRM and ERP applications.

Just last month, helpIT added our first Epicor client seeking these same capabilities directly within their ERP. The acquisition of findIT S2 comes on the heels of their Epicor ERP purchase, validating our belief that there is a strategic opportunity for organizations to take advantage of a real-time data quality firewall, even for those already utilizing some of the world’s leading strategic CRM and ERP applications.

In each interaction with our prospects and customers we are finding new ways that the power of real-time matching is helping companies. Some unprecedented ways include:

• Providing a single customer view in both CRM and ERP applications, despite technical incompatibilities between systems
• Linking captured web leads to credit bureau information to identify applicable credit offers
• Preventing the overextension of credit to customers by finding terms previously offered

The most common scenario, of course, is organizations who understand that the prevention of duplication can lead to very significant improvements in efficiency and decisioning when compared to retroactive data quality resolution.

helpIT systems is proactively working with a range of enterprise partners to extend the value of exceptional data quality through our best-in-class front-end and back-end matching technologies.

To explore partnership or direct opportunities, please contact Josh Buckler at 866.628.2448 or connect with him via Twitter @bucklerjosh.