Data Quality and Gender Bending

We have all heard the story about the man who was sent a mailing for an expectant mother. Obviously this exposed the organization sending it to a good deal of ridicule, but there are plenty of more subtle examples of incorrect targeting based on getting the gender wrong. Today I was amused to get another in a series of emails from addressed to [email protected] The subject was “Eileen, will the ECJ gender ruling affect your insurance premiums?” 🙂 The email went on to explain that from December, insurers in the EU will no longer be able to use a person’s gender to calculate a car insurance quote, “which may be good news for men, but what about women…” They obviously think that my first name is Eileen and therefore I must be female.
Now, I know that my mother had plans to call me Stephanie, but I think that was only because she already had two sons and figured it was going to be third time lucky. Since I actually emerged noisily into the world, I have gotten completely used to Stephen or Steve and never had anyone get it wrong – unlike my last name, Tootill, which has (amongst other variations) been miskeyed as:

• Toothill                    • Tootil
• Tootle                      • Tootal
• Tutil                         • Tooil
• Foothill                    • Toohill
• Toosti                       • Stoolchill

“Stephen” and “Steve” are obviously equivalent, but to suddenly become Eileen is a novel and entertaining experience. In fact, it’s happened more than once so it’s clear that the data here has never been scrubbed to remedy the situation.
Wouldn’t it be useful then if there was some software to scan email addresses to pick out the first and/or last names, or initial letters, so it would be clear that the salutation for [email protected] is not Eileen?

Yes, helpIT systems does offer email validation software, but the real reason for highlighting this is that we just hate it when innovative marketing is compromised by bad data.  That’s why we’re starting a campaign to highlight data quality blunders, with a Twitter hash tag of #DATAQUALITYBLUNDER. Let’s raise the profile of Data Quality and raise a smile at the same time! If you have any examples that you’d like us to share, please comment on this post or send them to [email protected].

Note: As I explained in a previous blog (Phonetic Matching Matters!), the first four variations above are phonetic matches for the correct spelling, whereas the next four are fuzzy phonetic matches. “Toosti” and “Stoolchill” were one-offs and so off-the-wall that it would be a mistake to design a fuzzy matching algorithm to pick them up.

Assessing Your Data Quality Needs

So you have data quality issues. Who doesn’t? Should you embark on a data quality project? Maybe but what are your objectives? Are there service issues related to poor data quality? Marketing issues? Other major integrations or warehousing projects going on? And once you clean up your data – then what? What will you do with the data? What benefit will a clean database pose for your organization? And without clear objectives, how can you even justify another major technology initiative?

Before any data quality project, it is critical to go beyond the immediate issues of duplicate records or bad addresses and understand the fundamental business needs of the organization and how cleaner day will enable you to make better business decisions. This will help you to establish accurate project parameters, keep your project on track and justify the investment to C level executives. So where do you begin? At the beginning.

Look beyond the pain.
In most cases, a specific concern will be driving the urgency of the initiative but it will be well worth the effort to explore beyond the immediate pain points to other areas where data is essential. Plan to involve a cross-section of the departments including IT, marketing, finance, customer service and operations to understand the global impact that poor data quality could be having on your organization.

Look back, down and forward.
Consider the data quality challenges you’ve had in the past, the ones you face today and the ones that have yet to come. Is a merger on the horizon? Is the company migrating to a new platform? Do you anticipate signficant staffing changes? Looking ahead in this way will ensure that the investment you make will have a reasonable shelf-life.

Look at the data you don’t have.
As you review the quality of the data you have, also consider what’s missing and what information would be valuable to customer service reps or the marketing department. It may exist in another data silo somewhere that just needs to be made accessible or it could require new data be collected or appended.

Be the customer.
Call the Customer Service Department and put them through the paces. Sign up for marketing materials online. Place an order on the website. Use different addresses, emails and nicknames. Replicate perfectly reasonable scenarios that happen every day in your industry and see how your infrastructure responds. Take good notes on the places where poor data impacts your experience and then look at the data workflow through fresh eyes.

Draw out the workflow.
Even in small organizations, there is tremendous value in mapping out the path your data takes through your business. Where it is entered, used, changed, stored and lost. Doing this will uncover business rules (or lack of) that are likely impacting the data, departments with complementary needs and or places in the workflow where improvements can be made (and problems avoided).

Think big and small.
Management and C-Level executives tend to think big. Data analysts and technical staff tend to think granularly and departmental users usually fall somewhere in the middle. Ultimately, the best solution can only be identified if you consider the global, technical and strategic business needs.

The challenges with identifying, evaluating and implementing an effective data quality solution are fairly predictable but problems almost always begin with incorrect assumptions and understanding of the overall needs of the organization. In some cases, the right data quality vendor can help you move through this process but ultimately, failure to broaden the scope in this way can result in the purchase of a solution that does not meet all the requirements of the business.

Click here to download a comprehensive Business Checklist that will help you to identify the data quality business needs within your organization. Then stay tuned for our next post in our Data Quality Project series.

helpIT Systems is Driving Data Quality

For most of us around the US, the Department of Motor Vehicles is a dreaded place, bringing with it a reputation of long lines, mountains of paperwork and drawn out processes. As customers, we loathe the trip to the DMV and though while standing in line, we may not give it much thought  – the reality is, poor data quality is a common culprit of some of these DMV woes. While it may seem unlikely that an organization as large and bureaucratic as the DMV can right the ship, today, DMV’s around the country are fighting back with calculated investments in data quality.

While improving the quality of registered driver data is not a new concept, technology systems implemented 15-20 years ago have long been a barrier for DMVs to actually take corrective action. However, as more DMVs begin to modernize their IT infrastructure, data quality projects are becoming more of a reality. Over the past year, helpIT has begun work with several DMVs to implement solutions designed to cleanse driver data, eliminate duplicate records, update addresses and even improve the quality of incoming data.

From a batch perspective, setting up a solution to cleanse the existing database paves the way for DMVs to implement other types of operational efficiencies like putting the license renewal process online, offering email notification of specific deadlines and reducing the waste associated with having (and trying to work with) bad data.

In addition to cleaning up existing state databases, some DMVs are taking the initiative a step further and working with helpIT to take more proactive measures by incorporating real-time address validation into their systems.  This ‘real-time data quality’ creates a firewall of sorts, facilitating the capture of accurate data by DMV representatives – while you provide it (via phone or at a window). With typedown technology embedded directly within DMV data entry forms, if there is a problem with your address, or you accidently forgot to provide them with information that affects the accuracy, like your apartment number or a street directional (North vs. South), the representatives are empowered to prompt and request clarification.

Getting your contact data to be accurate from the start means your new license is provided immediately without you having to make another visit, or call and wait on hold for 30 minutes just to resolve the problem that could have been no more than a simple typo.

Having met several DMV employees over the past year, it’s obvious that they want you to have an excellent experience. Better data quality is a great place to start. Even while DMV budgets are slashed year after year, modest investments in data quality software are yielding big results in customer experience.


If you want to learn more about improving the quality of your data, contact us at 866.332.7132 for a free demo of our comprehensive suite of data quality products.

Microsoft Convergence 2012 Review

helpIT systems recently exhibited its findIT S2 and matchIT SQL products at Microsoft Convergence 2012, the 16th Microsoft convergence gathering, which this year was situated in the spacious George R Brown Convention Center in Houston. Having exhibited the previous year in Atlanta, we had an idea of what to expect, however Convergence 2012 exceeded our expectations!

As always Microsoft took care of everything, so being an exhibitor we had everything that we needed right on site. The huge conference center housed over 270 ISP’s however the layout was such that the hall was never overwhelming and there was such keen interest that we were certainly glad we had four staff members on hand to answer questions, demonstrate software and represent helpIT systems technology.

The expo hours were far friendlier this year, and it allowed us access to many potential customers and partners while they were fresh, engaged and interested in learning more about our products. Also the catering was top notch with a special mention to the chicken waffles and some of the best cupcakes Texas has to offer!

While our experiences were largely from an exhibitor point of view, there were also sessions to attend, and these were what you expect from Microsoft, a fantastically structured view of these products from the inside, and while they are not information overload, it demonstrates how Microsoft have taken their disparate products and managed to refine them into a cohesive and lean platform, with scalability to suit pretty much organization, no mean feat.

COO, Kevin Turner and Kirill Tatarinov the President of the Dynamics team began proceedings sharing their clear vision for the Dynamics platform to an eager and engaged audience with the first keynote session. After speaking to attendees it’s clear that Microsoft are committed to growth and the key message of supporting business was well received.

Another session of particular interest was hosted by the CRMUG user group, Best Practices for Clean Data in Microsoft Dynamics CRM. With over 200 attendees, it’s obvious that data quality is now pervasive wherever there is contact data, and of tremendous interest to administrators and marketers alike. There were some very good recommendations made on designing data structures which lend themselves to keeping data clean. Duke Mocanu of FM Global also extolled use of MSCRM duplicate detection as “60 or 70 percent of the battle, you’ll get a lot of false positives but still indispensable”. This is the problem with the approach of exact matching of matchcodes that MS CRM uses – as there is no grading of matches, users either have to have very loose matching with a lot of false matches, or very tight and miss many good matches. As a reflection of this, a key topic of conversation in the booth was how intelligent grading of matches as provided by findIT S2 and matchIT SQL is essential to allow automatic flagging of the vast majority of matches whilst leaving a very small proportion for human review.

On the expo floor we found that the Microsoft strategy of supporting its ISV’s increased synergy between ourselves and other partners. Interest in both the findIT S2 Real-Time Data Quality Firewall for Dynamics CRM and the matchIT SQL Batch Data Cleansing for SQL Server increased dramatically in our second year at Convergence from both the Dynamics customer and partner communities. Hundreds of customers stopped by the booth and expressed genuine excitement that we could help them not only prevent duplication but also help solve business concerns including better lead assignment for key accounts, cross selling opportunities for financial product lines, and preparing organizations for their migration to the Dynamics platform.

The Partner Community also showed great interest in the potential of shifting the ownership of the data preparation and cleanup projects from a customer onus to an added-value they could provide to their clients. Many Partners commented on the welcome opportunity to enhance the “Duplicate Detection” functionality native within CRM with a tool that could empower the users to find the right account, contact, and/or lead with ease. Some even commented that implementing matchIT SQL and findIT S2 together assures their new clients the ROI on Dynamics, as there is limited value to a CRM with poor quality data.

Microsoft do an amazing job putting together these events as they have wide appeal to ISV’s, Partners, current users and the increasing number of people looking to implement systems later down the line and are keen to gather information. When exhibiting we have seen a sharp rise in the number of existing users who are looking to fill out their CRM and ETL needs by expanding into companion products, which would indicate a great level of satisfaction with their existing Dynamics system, and looking to branch out into other areas of the business.

This was a very successful show for helpIT systems, and while we work with the many people who visited our stand at the show, there has already been a great level of interest in what data quality means to these organizations. We hope to uncover this in the coming weeks with a series of blog entries targeted towards making integrated data quality one of the most useful tools available to organizations in utilizing their contact data to the full.

Thank you to everyone who took some time to visit us this week, and if you were unable to, please just let us know so we can send you our email show pack with details on our products. We look forward to Convergence in New Orleans in 2013!

Creating Your Ideal Test Data

Every day we work with customers to begin the process of evaluating helpIT data quality software (along with other vendors they are looking at). That process can be daunting for a variety of reasons from identifying the right vendors to settling on an implementation strategy, but one of the big hurdles that occurs early on in the process is running an initial set of data through the application.

Once you’ve gotten a trial of a few applications (hopefully including helpIT’s) and you are poised to start your evaluation to determine which one is going to generate the best result – you’ll need to develop a sample data set to run on the software. This is an important step not to be overlooked because you want to be sure that the software you invest in can deliver the highest quality matches so you can effectively dedupe your database and most importantly, TRUST that the resulting data is as clean as it possibly can be with the least possible wiggle room. So how do you create the ideal test data?

The first word of advice – use real data.

Many software trials will come preinstalled with sample or demo data designed primarily to showcase the features of the software. While this sample data can give you examples of generic match results, they will not be a clear reflection of your match results. This is why it is best to run an evaluation of the software on your own data whenever possible. Using the guidelines below, we suggest ‘identifying’ a real dataset that is representative of the challenges you will typically see within your actual database. That dataset will tell you if the software can find your more challenging matches, and how well it can do that.

For fuzzy matching features, you may like to consider whether the data that you test with includes these situations:

  • phonetic matches (e.g. Naughton and Norton)
  • reading errors (e.g. Horton and Norton)
  • typing errors (e.g. Notron, Noron, Nortopn and Norton)
  • one record has title and initial and the other has first name with no title
    (e.g. Mr J Smith and John Smith)
  • one record has missing name elements (e.g. John Smith and Mr J R Smith)
  • names are reversed (e.g. John Smith and Smith, John)
  • one record has missing address elements (e.g. one record has the village or house
    name and the other address just has the street number or town)
  • one record has the full postal code and the other a partial postal code or no postal code

When matching company names data, consider including the following challenges:

  • acronyms e.g. IBM, I B M, I.B.M., International Business Machines
  • one record has missing name elements e.g.
  1. The Crescent Hotel, Crescent Hotel
  2. Breeze Ltd, Breeze
  3. Deloitte & Touche, Deloitte, Deloittes.

You should also ensure that you have groups of records where the data that matches exactly, varies for pairs within the group. For example:

If you don’t have these scenarios all represented, you can doctor your real data to create them, as long as you start with real records that are as close as possible to the test cases and make one or at the most two changes to each record. In the real world, matching records will have something in common – not every field will be slightly different.

With regard to size, it’s better to work with a reasonable sample of your data than a whole database or file, otherwise the mass of information runs the risk of obscuring important details and test runs take longer than they need to. We recommend that you take two selections from your data – one for a specific postal code or geographic area, and one (if possible) an alphabetical range by last name. Join these selections together and then eliminate all the exact matches – if you can’t do this easily, one of the solutions that you’re evaluating can probably do it for you.

Ultimately, you should have a reasonable size sample without so many obvious matches, which should contain a reasonable number of fuzzier matches (e.g. matches where the first character of the postal code or last name is different between two records that otherwise match, matches with phonetic variations of last name, etc.)


For more information on data quality vendor evaluations, please download our Practical Guide to Data Quality Vendor Selection.

Real-Time ERP Data? Yes.

Over the past several months our findIT S2 application has been gaining some significant traction as a real-time duplicate prevention and record linking solution on websites, CRM and ERP applications.

Just last month, helpIT added our first Epicor client seeking these same capabilities directly within their ERP. The acquisition of findIT S2 comes on the heels of their Epicor ERP purchase, validating our belief that there is a strategic opportunity for organizations to take advantage of a real-time data quality firewall, even for those already utilizing some of the world’s leading strategic CRM and ERP applications.

In each interaction with our prospects and customers we are finding new ways that the power of real-time matching is helping companies. Some unprecedented ways include:

• Providing a single customer view in both CRM and ERP applications, despite technical incompatibilities between systems
• Linking captured web leads to credit bureau information to identify applicable credit offers
• Preventing the overextension of credit to customers by finding terms previously offered

The most common scenario, of course, is organizations who understand that the prevention of duplication can lead to very significant improvements in efficiency and decisioning when compared to retroactive data quality resolution.

helpIT systems is proactively working with a range of enterprise partners to extend the value of exceptional data quality through our best-in-class front-end and back-end matching technologies.

To explore partnership or direct opportunities, please contact Josh Buckler at 866.628.2448 or connect with him via Twitter @bucklerjosh.

Cleaner Data. Better Decisions.

We thought the best place to start the new Clean Data Blog would be with a quick look at our new tagline – Cleaner Data. Better Decisions.

It’s catchy – we know – but it’s also a very important principle to us. When a company is going down the path of planning a data quality solution there are many facets that are being considered. With data integration, warehousing, budget and implementation questions to answer – it’s easy to forget that the ultimate goal behind the initiative is to make better decisions. Period.

When we first speak to a company, this is the primary focus. To understand the business challenges that they face so that we can build a solution to  allow you to  TRUST your data. The more you trust your data, the more successful you and your organization can become. When you are confident that your customer data is accurate, you can market to them without ticking anyone off and without throwing money out the window. When you know that your transaction histories are linked to your customer contact data, you can rest easy that your customer service team can provide your customers with the best possible support and service. When you know that your metrics are accurate, you can feel confident planning merchandising and expansion strategies.

Makes sense right? We thought so.

Here’s the most important thing: When you know your data is good – you can move quickly, strategically and efficiently to outsmart your competition and drive the business forward. Hesitating because you’re not sure if what you are looking at is a real reflection of the current state of the business – can mean certain doom. It’s not a practical way to run a business in today’s market. And with strong data quality tools (like ours) it’s also unnecessary.

How clean is your data?