Weighing up the Cost of Bad Data

Weighing up the Cost of Bad Data

In a recent survey conducted by helpIT systems, almost 25 percent of respondents cited finances as the biggest hindrance to maintaining superior contact databases.  We get it.  Data quality solutions can carry what may seem to be a hefty pricetag, and they won’t show up two days later in a nicely wrapped package like an Amazon Prime purchase.  As such, like any other expensive and complicated decision, data quality may well get pushed to the bottom of the pile.

Then again, just like going to the gym or eating salad instead of steak, the toughest behaviors to adapt are usually the most beneficial.  Because even though database management may be something we’d rather forget about, 40 percent of those same respondents stated that their companies were losing tens of thousands of dollars each year due to poor contact data quality.  So while the solution may not be cheap and easy, the cost of living without it does not appear to be either.  Data Warehousing Institute found that the cost of bad data to US businesses is more than $600 billion each year.  Is that a number your company can afford to ignore?

Many businesses do notice these dollars disappearing and choose to do something about it.  Unfortunately however, this is often simply a “quick fix”.  They look at their messy databases, pay someone to “clean them up”, and then everyone gets a pat on the back for a job well done.  And it is.  Until someone enters a new record in the CRM, a customer moves, or perhaps even dares to get a new phone number.  And I will shock everyone by reporting that this happens all the time.  Studies indicate up to a 2 percent degradation each month…even in a perfect database.

Right now you’re probably picking up on the fact that maintaining good data is going to cost money.  You’re right.  But the fact is, avoiding that cost is only going to cost more in the long run.  Just like having a well-trained sales team, a finely-targeted marketing plan, or a boss with years of experience…great results are an investment of time and resources rather than a happy accident.

Companies that choose to invest in good data quality, as well as to view it as an ongoing process rather than a simple one-time fix, are finding that the benefits by far outweigh the initial costs.  Advertising dollars are reaching their intended audiences and sales calls are reaching the right recipient, with customer satisfaction going through the roof.  Today’s consumer expects the personal touches that can only come from having an accurate and up-to-date Single Customer View, and it is good data quality solutions that will achieve them.

How Ashley Madison Can Inspire Your Business

As each new name and every illicit detail is revealed, the 37 million members of Ashley Madison, a website promoting extramarital affairs, are scrambling to save their marriages, careers, and reputations.  This list, which is now available to anyone aware ofthe existence of Google, reportedly includes the names and sexual fantasies of members of the armed services, United Nations, and even the Vatican.  Looks like someone’s prayers weren’t heard this week.

As the extent of the contact information becomes more easily accessible, a new breed of data analyst is emerging.  Creative thinkers are using the information to win custody battles, deduce which cities have the most cheaters, and even get a leg up over another candidate for a job promotion.

If everyone from neglected housewives to tawdry tabloid writers is capable of using data to form opinions and make well-informed decisions, the question is… why aren’t you?

Now I’m not talking about crawling through Ashley Madison’s troves of cheaters, I’m talking about your company.  Your data.  Demographics, geographic locations, purchasing behavior… your contact records say a million things about your customers.  A million patterns are lying in wait, holding the key to better marketing, better operations, and better business decisions.  Whereas for Ashley Madison data spelled disaster, for you it should spell potential.

Customer data, when compromised, can be a company’s worst nightmare.  When used intelligently, customer data can increase profits and reduce the guessing game so many businesses play on a day-to-day basis.

In order to use your data intelligently, you must be confident that it is accurate and up-to-date.  If your records indicate you have 14 Jeremiah Whittinglys living in Chicago, you can either double your production of Jeremiah Whittingly personalized baseball caps, or perhaps take a closer look at how clean your data is.  I’m personally leaning towards the second option.

However, beefing up marketing efforts in Juneau, where your database says 10 percent of your client base is located, is a smart idea.  Unless your data entry employee didn’t realize ‘AK’ was the postal code abbreviation for Alaska rather than Arkansas.  In which case, polar bears stand a better chance of appreciating your new billboard than your target market.

Ridding your database of duplicate, incorrect, or incomplete records is the first step in recognizing the power of customer data.  The next step is figuring out what this data means for you and your company, and if every talk show host and dark web hacker can do it with the right tools, so can you.

Why Customers Must Be More Than Numbers

I read with some amazement a story in the London Daily Telegraph this week about a customer of NatWest Bank who sent £11,200 last month via online banking to an unknown company instead of his wife. Although Paul Sampson had correctly entered his wife’s name, sort code and account number when he first made an online payment to her HSBC account, he wasn’t aware that she had subsequently closed the account.

Mr Sampson thought he was transferring £11,200 to his wife: he clicked Margaret’s name among a list of payees saved in his NatWest banking profile and confirmed the transaction, but the payment went to a business in Leeds. Mr Sampson believes that HSBC had reissued his wife’s old account number to someone else, a company whose name they refused to tell him. NatWest told Mr Sampson it was powerless to claw the money back.

HSBC said it had contacted its customer, but it had no obligation regarding the money. HSBC insisted that the account number in question was not “recycled”, saying Mr Sampson must have made a typing error when he first saved the details, which he disputes. Although the money was in fact returned after the newspaper contacted HSBC, a very large issue has not been resolved.

Although news to most of us, it is apparently a common practice among banks in the UK to recycle account numbers, presumably because banking systems are so entrenched around 8 or 9 digit account numbers that they are concerned about running out of numbers. Apparently a recent code of practice suggests that banks should warn the customer making the payment if they haven’t sent money to this payee for 13 months, but according to the Daily Telegraph “No major high street bank could confirm that it followed this part of the code”.

The Daily Telegraph goes on to state that the recipients of electronic payments are identified by account numbers only. The names are not checked in the process, so even if they do not match, the transaction can proceed. “This is now a major issue when you can use something as basic as a mobile phone number to transfer money,” said Mike Pemberton, of solicitors Stephensons. “If you get one digit wrong there’s no other backup check, like a person’s name – once it’s gone it’s gone.” If you misdirect an online payment, your bank should contact the other bank within two working days of your having informed them of the error, but they have no legal obligation to help.

Mr Sampson obviously expected that the bank’s software would check that the account number belonged to the account name he had stored in his online payee list, but apparently UK banking software doesn’t do this. Why on earth not? Surely it’s not unreasonable for banks with all the money they spend on computer systems to perform this safety check? It’s not good enough to point to the problems that can arise when a name is entered in different ways such as Sheila Jones, Mrs S Jones, Sheila M Jones, SM Jones, Mrs S M Jones, Mrs Sheila Mary Jones etc.

These are all elementary examples for intelligent name matching software.  More challenging are typos, nicknames and other inconsistencies such as those caused by poor handwriting, which would all occur regularly should banks check the name belonging to the account number. But software such as matchIT Hub is easily available to cope with these challenges too, as well as the even more challenging job of matching joint names and business names.

There are also issues in the USA with banking software matching names – I remember when I first wanted to transfer money from my Chase account to my Citibank account, I could only do so if the two accounts had exactly the same name – these were joint accounts and the names had to match exactly letter for letter, so I had to either change the name on one of the accounts or open a new one! Having been an enthusiastic user of the system in the USA for sending money to someone electronically using just their email address, I’m now starting to worry about the wisdom of this…

We banking customers should perhaps question our banks more closely about the checks that they employ when we make online payments!

When Data Quality Goes Wrong…

When Data Quality Goes Wrong…

Whether you are a data steward or not, we’ve all experienced the unfortunate consequences of data quality gone terribly awry. Multiple catalogues to the same name and address. Purchasing a product through an online retailer only to find you have three different accounts with three different user names. Long, frustrating phone calls with customer service who can’t help you because they don’t have access to all the relevant info.

As the Director of Marketing for a data quality company, it brings me exceptional pain to see bad data quality in action. Such inefficiency is what gives marketing a bad reputation. It can ruin brands, destroy customer loyalty, waste opportunities and…let’s face it, it also kill trees.

Indeed, throughout our entire company, the water cooler occasionally buzzes with stories of bad data quality. So what’s a data quality company to do with all these DQ “blunders”? Call them out!

So this summer we’re going to dig through our box of examples and showcase a few #dataqualityblunders. We’ll try to be nice about it of course but the important part is that we’ll also highlight the ways that a good data quality strategy could have addressed these indiscretions. Because where there is bad data, there is also a clean data solution.

Have a #dataqualityblunder you’re just dying to spill?

We know that you’ve seen your fair share of data quality blunders. Send them in and win a $10 Starbucks gift card! Just email [email protected]!

 

Where Is Your Bad Data Coming From?

As Kimball documents in The Data Warehouse Lifecycle Toolkit (available in all good book stores), there are five concepts that together, can be considered to define data quality:

Accuracy – The correctness of values contained in each field of each database record.

Completeness – Users must be aware of what data is the minimum required for a record to be considered complete and to contain enough information to be useful to the business.

Consistency – High Level or summarized information is in agreement with the lower-level detail.

Timeliness – Data must be up-to-date, and users should be made aware of any problems by use of a standard update schedule.

Uniqueness – One business or consumer must correspond to only one entity in your data. For example, Jim Smyth and James Smith at the same address should somehow be merged as these records represent the same consumer in reality.

So using Kimball’s list, we might know what kind of data we want in the database but unfortunately, despite our best intentions, there are forces conspiring against good data quality. While it doesn’t take a forensics degree, there are so many sources of poor data you may not even know where to look. For that, we’ve come up with our own list. Let’s take a look…

1. Data Entry Mistakes.

The most obvious of the bad data sources, these take the form of simple typing mistakes that employees can make when entering data into the system e.g. simple typos, entering data into the wrong fields, using variations on certain data elements.  Even under ideal circumstances, these are easy mistakes to make and therefore extremely common but unfortunately can be the source of high numbers of duplicate records.  But why is it so hard to get the data right? Consider these circumstances that can exacerbate your data entry process:

  • Poorly trained staff with no expectations for data entry
  • High employee turnover
  • Under-resourcing of call centres that leads to rushing customer exchanges
  • Forms that do not allow room for all the relevant info
  • Unenforced business rules because bad data is not tracked down to its source

2. Lazy Customers.

Let’s face it. Customers are a key source of bad data. Whether they are providing information over the phone to a representative or completing a transaction online, customers can deliberately and inadvertently provide inaccurate or incomplete data. But you know this already. Here are a few specific circumstances to look out for, especially in retail settings:

  • In store business rules that permit staff to enter store addresses or phone numbers in place of the real customer info
  • Multiple ‘rewards cards’ per household or family that are not linked together
  • Use of store rewards cards that link purchases to different accounts
  • Customers that subconsciously use multiple emails, nicknames or addresses without realizing it
  • Web forms that allow incorrectly formatted data elements such as phone numbers or zip codes
  • Customers pushed for time who then skip or cheat on certain data elements
  • Security concerns of web transactions that lead customers to leave out certain data or simply lie to protect their personal information

3. Bad Form

Web forms. CRMs. ERP systems. The way they are designed can impact data quality. How? Some CRM systems are inflexible and may not allow easy implementation of data rules, leading to required fields being left blank, or containing incomplete data. Indeed many web forms allow any kind of gibberish data to be entered into any fields which can immediately contaminate the database. Not enough space for relevant info or systems and forms that have not been updated to match the business process also pose a challenge. Many systems also simply do not perform an address check at entry – allowing invalid addresses to enter the system. When it comes to data quality, good form is everything.

4. Customization Simply Reroutes Bad Data

All businesses have processes and data items unique to that business or industry sector. Unfortunately, when systems do not provide genuine flexibility and extensibility, IT will customize the system as necessary. For example, a CRM system may be adjusted to allow a full range of user-defined data (eg to allow a software company to store multiple licence details for each customer). Where this happens, the hacks and workarounds can lead to a lack of data integrity in the system (e.g. you end up storing data in fields designed for other data types (dates in character fields).

5. Data Erosion is Beyond Your Control

Businesses and consumers move address. People get married and change their name. Business names change too plus contacts get promoted or replaced. Email addresses and phone numbers are constantly evolving. People die. No matter how sophisticated your systems are, some measure of data erosion is simply unavoidable. While good business rules will assist in updating data at relevant checkpoints, to maintain the best quality data, it’s important to update the data from reliable data sources on a regular basis.

6. New Data. Bad Data. Duplicate Data.

Many businesses regularly source new prospect lists that are subsequently loaded into the CRM. These can come from a variety of places including list vendors, trade shows, publications, outbound marketing campaigns and even internal customer communications and surveys. Although it’s exciting to consider procuring a new, large database of prospects, there are two ways this addition of data can go horribly wrong. First, the data itself is always suspect, falling prey to all the potential issues of data entry, data erosion and customer error. But even if you can corroborate or cleanse the data before entering, there is still a chance you will be entering duplicate records that won’t always be quickly identified.

7. Overconfidence

OK. So this may not be a true ‘source’ of bad data but it is the most important precipitating factor. You may think that by implementing business rules or by using a CRM’s built-in duplicate detection tools, that you are covered. In practice, business rules are important and valuable but are never foolproof and require constant enforcement, evaluation and updates. Moreover, built-in data quality features are typically fairly limited in scope and ability to simply detect exact matches. They simply not powerful enough to do the heavy lifting of a more sophisticated fuzzy and phonetic matching engine that will catch the subtle data quality errors that can lead to major data quality issues. This false sense of confidence means you can easily overlook sources of poor data and neglect to perform critical data quality checks.

So if you keep these seven bad data sources in mind – are you home free? Unfortunately not. These are simply the building blocks of bad data. When even just some of these conditions occur simultaneously, the risk of bad data multiplies  exponentially. The only true way to achieve the five-pronged data quality ideal outlined by Kimball (accuracy, completeness, consistency, timeliness and uniqueness) is through a comprehensive data quality firewall that addresses each of these components individually.

Stay tuned for more information on Best Practices in data quality that pinpoint specific business rules and software solutions to achieve true real-time data quality.