New Years Resolutions

2016 New Year’s Resolutions – The Stats

The New Year has arrived, and with January 1st comes the obligatory New Year’s Resolutions.  Almost 50 percent of adults consistently make these resolutions, according to Statistics Brain Research Institute.  Which is surprising considering that statistically we are more likely to fail than succeed.  Yet human perseverance prevails each year as people vow to change their lives in every way from getting into shape to falling in love.

Because we love data, we were especially interested in these numbers, as well as those found in the new study released by the University of Scranton stating that only 1 in 8 people achieve their New Year’s Resolutions.  It sounds much worse than it is.  Consider some other odds:

  • Odds of being audited by the IRS:  1 in 175
  • Odds of finding a pearl in an oyster:  1 in 12,000
  • Odds of dating a supermodel:  1 in 88,000
  • Odds of becoming a billionaire:  1 in 7,000,000
  • Odds of winning $1000 in the McDonald’s Monopoly game:  1 in 36,950,005 (that’s a lot of Big Macs!)

So all odds considered, keeping your New Year’s Resolutions seems very doable.  At least we think so.  And the study goes on to say that people who make resolutions are 10 times more likely to change their lives than those who don’t.  Which means that succeed or fail, trying is half the battle.

To help you kick off the New Year, helpIT systems made a few resolutions of our own to inspire our colleagues and ourselves.  We want to prioritize database quality management in 2016 for a more profitable and productive New Year.  Statistically, 1 in 8 of you will take our challenge.  Will you be that one?

New Year’s Resolution #1:  Getting Organized

Getting Organized is the second most popular New Year’s Resolution, right after shaping back up to pre-holiday jeans size.  The average office employee spends 1.5 hours per day (6 hours a week) looking for things!  According to the authors of Book of Odds, From Lightning Strikes to Love at First Sight, men at home are constantly looking for clean socks, remote control, wedding album, car keys (guilty!), and driver’s license.  Women are always on the hunt for their favorite shoes, a child’s toy, wallet, lipstick, and the remote control.

Let contact data records be the one thing you are not looking for this year.  If we spend that many hours looking for the remote control, imagine how many hours of productivity are lost each year by employees sifting through CRMs for contact data.  “Dirty data” not only sucks hours of productivity out of your day, it will also affect the success of marketing efforts, the sales process, and the bottom line.

Getting organized in your contact database is the first step for a #CleanData2016.  Since we know database management can be a daunting task, like so many New Year’s Resolutions are, we at helpIT systems have 25 years experience and are here to help.

This month helpIT systems is offering a FREE analysis of your company’s contact database by one of our data quality experts.  The analysis will review the effectiveness of current data quality initiatives, pinpoint weaknesses, and run a free data deduplication and matching test on your own data.

Don’t miss out on your chance to kick the New Year off right, click here to claim your free analysis.

New Year’s Resolution #2:  Saving Money

Who wouldn’t like to have a few extra dollars or pounds in their pocket this year?  The second most popular New Year’s Resolution is to save money.  People go about this all sorts of ways, from cutting back on designer handbag purchases to taking public transit instead of their car.  But here are a few ways you might be wasting money without even realizing it:

  1. Small fees that add up:  Credit card interest, paying for speedy shipping, and ATM fees all add up over time.  So while it might seem worth it to have that Amazon purchase overnighted or to hit the ATM at a concert rather than skip out on purchasing a Dave Matthews Band t-shirt, remember that a few years ago ATM fees totaled $7 billion.  To put that in perspective, the average ATM fee is $3.  Let’s say you use an ATM twice a month…that’s THREE Dave Matthews t-shirts from Amazon…maybe four if you don’t pay for the expedited shipping.
  2. Bad habits:  nearly half of Americans consume soft drinks daily, and their fast food consumption totaled $117 billion last year.  That’s almost $400 per person!  Throw in the cost of alcohol, cigarettes, and that daily $5 mocha cappuccino grande at your local Starbucks, and these habits are adding up quick.
  3. Good habits that you’re not actually doing:  While we had the best of intentions when joining that gym, signing up for Spanish lessons, or purchasing the Daily Deal for unlimited monthly meditation sessions, how often have you used it?  Studies show that gyms sell memberships expecting only 18 percent of members to use their facilities on the regular.  Take a look at what you are paying for and ask, “Am I really using this?”
  4. Gambling:  With the American Powerball topping out at a record $1.3 billion this week, spending some of your hard earned cash on a lottery ticket might seem like a worthwhile investment.  The sight of all those zeros make normally sane people forget they have a better chance of being struck by lightning, becoming President of the United States, or being attacked by a shark.  Probably a better chance of all three of those happening at the same time before you hit those lucky numbers.
  5. Waste:  A mind-boggling 33 percent of the world’s food is thrown away each year.  The math works out to about $529 per person.  That’s a nice start towards a down payment on a car or a beach vacation.  Most households could also cut energy costs by a third if they followed recommended guidelines.

So as you are looking to save money this year, consider all the places your money is going, rather than just the obvious few.  Here in the data quality world, we see this happen all the time.  Companies know exactly how much money they are losing due to employee turnover or loss of market share.  However when it comes to how much they are losing due to poor data quality, most are in the dark.  Studies suggest that companies are losing billions of dollars each year from poor data quality.  Don’t hide behind 2015’s denials, whether it’s how much that cup of coffee is really costing you or the effects of dirty data on your organization.  It’s time for a #CleanData2016.

New Year’s Resolution #3:  Be Healthy

This is a big one.  Whether it is to get fit, join a gym, meditate, or eat better, many people focus their New Year’s resolutions on improving their health.  One healthy habit can unintentionally permeate into other behaviors, often changing many aspects of a person’s life for the better.

The tricky part is making these goals stick.  Studies have shown that on average, a person needs to maintain a behavior for 66 days before it becomes a habit.  Some behaviors are harder to change than others, meaning that the 66 day rule is just a guideline rather than an absolute.

Change is hard.  Anyone who has been on a diet or quit smoking knows this.  But the great part, the biggest relief, is that it gets easier.  In his book The One Thing, Gary Keller states that, “Success is actually a short race—a sprint fueled by discipline just long enough for habit to kick in and take over.”  Meaning that we don’t have to be this disciplined forever, we just have to do it long enough for it to become a habit.  Maybe that is 66 days.  Maybe it is 246 days.  But once it is a habit, the effort needed to keep eating veggies or meditating daily will decrease substantially.  After all, how much thought do you put into brushing your teeth in the morning or buckling your seatbelt?  Habits are sometimes done without us even realizing it!

Here are a few tips to help keep you building habits in 2016:

Track yourself.  Imagine a bowl of M&M’s was on your desk right now.  It is mid-afternoon, the sun is streaming in your window, and the to-do list does not seem to be getting any shorter as the minutes slowly tick towards 5:00.  Would you reward your hard efforts so far with one M&M?  Two?  Perhaps a handful.  After all, they’re small.

Now imagine that for every M&M you ate, you had to pull out a little journal and write “1 M&M – 25 Calories”.   Would you still eat a handful?  Probably not.  For whatever behavior you are trying to eliminate or add to your life, write it down.  Every minute, every calorie, every dollar spent.  Darren Hardy advocates for this method in his book The Compound Effect.  As the name implies, these little actions add up big over time.

Mix it up.  Everyone gets into a rut.  Dr. Frank Farley, a professor of psychological studies in education at Philadelphia’s Temple University, tells Wall Street Journal that making the same resolutions year after year can lead to boredom and failure as a result. Want to lose 20 pounds?  Try pledging to walk 3 miles every day instead.  Focusing on adding a healthy behavior rather than the end result can help you feel a sense of daily accomplishment.  Each day of completing your walking resolution will bring you closer to your underlying goal.

Let others help.  No one accomplishes anything alone.  The world’s most successful people had advisors, mentors, and colleagues in their corner that made their achievements possible.  God had Moses.  Barnum had Bailey.  Let others in on what you are trying to accomplish.  Even better, find someone who has the same goals as you so you can encourage each other.

New Year’s Resolution #4:  Stop Procrastinating

In the madness that ensues during the holidays, the calm of January often leaves many people confused.  Where did all this time come from?  And more importantly, what in the world do we do with it?  Several of you are already dreaming of a Star Wars movie marathon or the chance to conquer the next level of Angry Birds.

Yet many people are shrugging off those comfortable time-killers and resolving to make 2016 a productive year both personally and professionally.  This could mean finally training for that 10K run, spending more time with family and friends, or even chasing a passion like watercolors or writing a great novel.

Companies are stopping the procrastinations of 2015 and seeking a more effective data quality plan for the New Year.  While cleaning up millions of contact data records and stopping the influx of bad data can seem like a daunting task, it is one situation that will not improve by delaying the process.  For every year companies procrastinate, bad records are piling up in CRMs, and the effects are staggering.  Departments ranging from marketing to customer service are seeing money and time wasted due to poor data quality.

How do companies accomplish a task of this magnitude?  The same way you eat an elephant…one bite at a time.  Let us help by knocking out a few of those last year excuses:

  • I don’t have the time for a project like that.  Do you have 20 minutes?  Yes?  Twenty minutes will get you started on a free data quality analysis with one of our database experts.  Do you have 20 minutes tomorrow?  If you could spend 20 minutes of each day working on data quality, by the end of 2016 you will have put in 86 hours.  That’s almost FOUR full 24 hour days!  A lot can be done in 86 hours.
  • I don’t know where to start.  Start with those 20 minutes on the phone with one of our data quality experts.  They will talk about your data, your company’s goals, and solutions tailored for you.  While many companies sell a one-size-fits-all solution, there is no “one-size” company.  Let our knowledgeable staff build a solution that is best for your company individually.
  • We tried that last year and the problem just came back.  Maintaining data quality is a habit to be maintained, not a one-time accomplishment.  Just like eating jelly doughnuts will eliminate last year’s workout goals, dirty data will creep up on you if the correct systems are not in place.  helpIT systems offers our clients complete data solutions with long-term results rather than a few quick fixes.
  • I don’t have the money.  Sure you do.  Except you are throwing it away in wasted marketing spend and lost productivity each year.  We work with hundreds of companies that originally thought “we don’t have the money” who have since discovered that not only do they have the money for a data quality solution, they have much more.  The profits realized from clean contact databases enabled them to accomplish many other projects that had been on the back burner as well.

Don’t delay.  This is our last week of offering FREE Data Quality Analysis.  Request yours here.

 

 

The 12 Days of Shopping

According to IBM’s real-time reporting unit, Black Friday online sales were up close to 20% this year over the same period in 2012.  As for Cyber Monday, sales increased 30.3% in 2012 compared to the previous year and is expected to grow another 15% in 2013. Mobile transactions are at an all time high and combined with in store sales, The National Retail Federation expects retail sales to pass the $600 billion mark during the last two months of the year alone. While that might sound like music to a retailer’s ears, as the holiday shopping season goes into full swing on this Cyber Monday, the pressure to handle the astronomical influx of data collected at dozens of possible transaction points is mounting. From websites and storefronts to kiosks and catalogues, every scarf or video game purchased this season brings with it a variety of data points that must be appropriately stored, linked, referenced and hopefully leveraged. Add to that a blinding amount of big data now being collected (such as social media activity or mobile tracking), and it all amounts to a holiday nightmare for the IT and data analysis teams. So how much data are we talking and how does it actually manifest itself? In the spirit of keeping things light, we offer you, The 12 Days of Shopping…

On the first day of shopping my data gave to me,
1 million duplicate names.

On the second day of shopping my data gave to me,
2 million transactions, and
1 million duplicate names.

On the third day of shopping my data gave to me,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the fourth day of shopping my data gave to me,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the fifth day of shopping my data gave to me,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the sixth day of shopping my data gave to me,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the seventh day of shopping my data gave to me,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the eighth day of shopping my data gave to me,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the ninth day of shopping my data gave to me,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the tenth day of shopping my data gave to me,
10,000 tweets,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the eleventh day of shopping my data gave to me,
11 new campaigns,
10,000 tweets,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the twelfth day of shopping my data gave to me,
12 fraud alerts,
11 new campaigns,
10,000 tweets,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

While we joke about the enormity of it all, if you are a retailer stumbling under the weight of all this data, there is hope and over the next few weeks we’ll dive a bit deeper into these figures to showcase how you can get control of the incoming data and most importantly, leverage it in a meaningful way.

Sources:
http://techcrunch.com/2013/11/29/black-friday-online-sales-up-7-percent-mobile-is-37-percent-of-all-traffic-and-21-5-percent-of-all-purchases/

http://www.pfsweb.com/blog/cyber-monday-2012-the-results/

http://www.foxnews.com/us/2013/11/29/retailers-usher-in-holiday-shopping-season-as-black-friday-morphs-into/

6 Reasons Companies Ignore Data Quality Issues

When lean businesses encounter data quality issues, managers may be tempted to leverage existing CRM platforms or similar tools to try and meet the perceived data cleansing needs. They might also default to reinforcing some existing business processes and educating users in support of good data. While these approaches might be a piece of the data quality puzzle, it would be naive to think that they will resolve the problem. In fact, ignoring the problem for much longer while trying some half-hearted approaches, can actually amplify the problem you’ll eventually have to deal with later. So why do they do it? Here are some reasons we have heard about why businesses have stuck their heads in the proverbial data quality sand:

1. “We don’t need it. We just need to reinforce the business rules.”

Even in companies that run the tightest of ships, reinforcing business rules and standards won’t prevent all your problem. First, not all data quality errors are attributable to lazy or untrained employees. Consider nicknames, multiple legitimate addresses and variations on foreign spellings just to mention a few. Plus, while getting your process and team in line is always a good habit, it still leaves the challenge of cleaning up what you’ve got.

2. “We already have it. We just need to use it.”

Stakeholders often mistakenly think that data quality tools are inherent in existing applications or are a modular function that can be added on. Managers with sophisticated CRM or ERP tools in place may find it particularly hard to believe that their expensive investment doesn’t account for data quality. While customizing or extending existing ERP applications may take you part of the way, we are constantly talking to companies that have used up valuable time, funds and resources trying to squeeze a sufficient data quality solution out of one of their other software tools and it rarely goes well.

3. “We have no resources.”

When human, IT and financial resources are maxed out, the thought of adding a major initiative such as data quality can seem foolhardy. Even defining business  requirements is challenging unless a knowledgeable data steward is on board. With no clear approach, some businesses tread water in spite of mounting a formal assault. It’s important to keep in mind though that procrastinating a data quality issue can cost more resources in the long run because the time it takes staff to navigate data with inherent problems, can take a serious toll on efficiency.

4. “Nobody cares about data quality.”

Unfortunately, when it comes to advocating for data quality, there is often only one lone voice on the team, advocating for something that no one else really seems to care about. The key is to find the people that get it. They are there, the problem is they are rarely asked. They are usually in the trenches, trying to work with the data or struggling to keep up with the maintenance. They are not empowered to change any systems to resolve the data quality issues and may not even realize the extent of the issues, but they definitely care because it impacts their ability to do their job.

5. “It’s in the queue.”

Businesses may recognize the importance of data quality but just can’t think about it until after some other major implementation, such as a data migration, integration or warehousing project. It’s hard to know where data quality fits into the equation and when and how that tool should be implemented but it’s a safe bet to say that the time for data quality is before records move to a new environment. Put another way: garbage in = garbage out. Unfortunately for these companies, the unfamiliarity of a new system or process compounds the challenge of cleansing data errors that have migrated from the old system.

6. “I can’t justify the cost.”

One of the biggest challenges we hear about in our industry is the struggle to justify a data quality initiative with an ROI that is difficult to quantify. However, just because you can’t capture the cost of bad data in a single number doesn’t mean that it’s not affecting your bottom line. If you are faced with the dilemma of ‘justifying’ a major purchase but can’t find the figures to back it up, try to justify doing nothing. It may be easier to argue against sticking your head in the sand, then to fight ‘for’ the solution you know you need.

Is your company currently sticking their head in the sand when it comes to data quality? What other reasons have you heard?

Remember, bad data triumphs when good managers do nothing.

Click & Collect – How To Do It Successfully?

In the UK this Christmas, the most successful retailers have been those that sell online but allow collection by the shopper – in fact, these companies have represented a large proportion of the retailers that had a good festive season. One innovation has been the rise of online retailers paying convenience stores to take delivery and provide a convenient collection point for the shopper, but two of the country’s biggest retailers, John Lewis and Next, reckon that click and collect has been the key to their Christmas sales figures – and of course they both have high volume e-commerce sites as well as many bricks and mortar stores.

The article here by the Daily Telegraph explains why “click and collect” is proving so popular, especially in a holiday period. The opportunities for major retailers are  obvious, especially as they search for ways to respond to the Amazon threat – but how do they encourage their customers to shop online and also promote in store shopping? The key is successful data-driven marketing: know your customer, incentivize them to use loyalty programs and target them with relevant offers. However, this also presents a big challenge – the disparity and inconsistency in the data that the customer provides when they shop in these different places.

In store, they may not provide any information, or they may provide name and phone number, or they may use a credit card and/or their loyalty card. Online they’ll provide name, email address and (if the item is being delivered), credit card details and their address. If they are collecting in store, they may just provide name and email address and pay on collection – and hopefully they’ll enter their loyalty card number, if they have one. To complicate matters further, people typically have multiple phone numbers (home, office, mobile), multiple addresses (home and office, especially if they have items delivered to their office) and even multiple email addresses. This can be a nightmare for the marketing and IT departments in successfully matching this disparate customer data in order to establish a Single Customer View. To do this, they need software that can fulfill multiple sophisticated requirements, including:

  • Effective matching of customer records without being thrown off by data that is different or missing.
  • Sophisticated fuzzy matching to allow for keying mistakes and inconsistencies between data input by sales representatives in store and in call centers, and customers online.
  • The ability to recognize data that should be ignored – for example, the in-store purchase records where the rep keyed in the address of the store because the system demanded an address and they didn’t have time to ask for the customer’s address, or the customer didn’t want to provide it.
  • Address verification using postal address files to ensure that when the customer does request delivery, the delivery address is valid – and even when they don’t request delivery, to assist the matching process by standardizing the address.
  • The ability to match records (i) in real-time, in store or on the website (ii) off-line, record by record as orders are fed though for fulfillment and (iii) as a batch process, typically overnight as data from branches is fed through. The important point to note here is that the retailer needs to be able to use the same matching engine in all three matching modes, to ensure that inconsistencies in matching results don’t compromise the effectiveness of the processing.
  • Effective grading of matches so that batch and off-line matching can be fully automated without missing lots of good matches or mismatching records. With effective grading of matching records, the business can choose to flag matches that aren’t good enough for automatic processing so they can be reviewed by users later.
  • Recognition of garbage data, particularly data collected from the web site, to avoid it entering the marketing database and compromising its effectiveness.
  • Often, multiple systems are used to handle the different types of purchase and fulfillment. The software must be able to connect to multiple databases storing customer data in different formats for the different systems

With a wide range of data quality solutions on the market, it’s often difficult to find a company that can check all of these boxes. That’s where helpIT systems comes in. If you are a multi-channel retailer currently facing these challenges, contact helpIT systems for a Free Data Analysis and an in depth look at how you can achieve a Single Customer View.

When Charitable Donations Fall – Who’s to Blame?

I was listening to a program on BBC Radio 4 yesterday morning (You and Yours) about the difficulties that charities are facing in these straitened times:

“Christmas is the season for giving and is often the big year-end push for many charities. But according to a report compiled by the Charities Aid Foundation and the National Council for Voluntary Organisations charitable donations have fallen by 20% in real terms in the past year, with £1.7bn less being given.”

There was a lot of interesting feedback from the expert contributors:

  • Sarah Miller head of public affairs at the Charities Commission commented that “The top complaint made to the FRSB (the Fundraising Standards Board) … is to do with the use of data, where people are perhaps being sent mailings that they don’t wish to receive or perhaps incorrect information is being used on mailings or they want to know where the data has come from or perhaps a mailing is going to a deceased family member and they’ve asked for it to stop and perhaps the charity still hasn’t made that change – so that’s the top complaint by far”.
  • John Low, Chief Executive of the Charities Aid Foundation stressed that “You must be as efficient in the way you run a charity as any business, and maybe more efficient, because it’s precious public money that you have and you have very serious responsibilities to your beneficiaries.

It is certainly true that as a charity or any form of non-profit organization, you have far less margin for error when mailing your donors than a commercial organization. If I get duplicate mail from a retailer that I shop at, or incorrectly addressed mail that obviously hasn’t been able to obtain postal discounts even if it was actually delivered, it might make me wonder whether their prices have to be inflated to allow for such inefficiencies – but I’ll still do the price comparison when next shopping. When I get duplicate or incorrectly addressed mail from a charity that I give to, I get upset that they’re wasting my donation. Even more so given that I know there are money-saving solutions (ranging from desktop software, to services and hybrid solutions) for ensuring that mail is not duplicated and correctly addressed. Moreover, many mailers upset next of kin by mailing to the deceased or simply waste large amounts of money by mailing to people who have moved.

Based on the feedback received by the FRSB, some charities have a pressing need to implement effective solutions for eliminating wastage in their direct mail:

  • Gone Away suppression will more than pay for itself by reducing print and post costs.
  • NCOA (National Change of Address) and other services will allow charities to mail donors at their new address.
  • Deceased and duplicate suppression will avoid the damage to the donor relationship that otherwise will inevitably occur.

Sarah Miller also told listeners:

“If there are ways that charities are interacting with you that you don’t like, do tell them. Tell them how you want to interact with them.”

I remember about 15 years ago, one of our customers working for Botton Village (a centre for adults with learning disabilities and other special needs in North Yorkshire in the UK) won a direct marketing award simply because they asked their donors how often and when they would like to be contacted and at what time(s) of year. This led to a significant increase in donations. These days of course, it is far less expensive to contact people by email, but some donors may prefer at least some communication by mail, or not want email contact. Consolidating and matching donor information when they may donate via the web or by post is obviously important – for example, so you can make sure that you claim Gift Aid for relevant donors, or avoid sending a scheduled communication if they’ve just donated.

Chris Mould, Executive Chairman of the Trussell Trust, the charity behind the UK Foodbank Network talked about how a front line food bank in the Network can get a web site at minimal cost with online data collection: “It doesn’t have to reinvent the wheel”. This chimed with John Low’s recommendation that charities can become more efficient by cooperating on their resource requirements.

One last and very important point: all the experts on the program agreed that fundraising campaigns really work  – regular communication with your donors is important to show where the money is going, but efficiency is even more important.

 

If you are a charity, struggling to get hold of your data quality challenges OR if you’ve noticed a major drop in donations and want to know if data quality is the cause, email us for a Free Data Quality Audit and we’ll highlight the issues that could be putting your initiatives at risk.

Data Quality Makes the Best Neighbor

So this week’s #dataqualityblunder is brought to you by the insurance industry and demonstrates that data quality issues can manifest themselves in a variety of ways and have unexpected impacts on the business entity.

Case in point – State Farm. Big company. Tons of agents. Working hard at a new, bold advertising campaign. It’s kind of common knowledge that they have regional agents (you see the billboards throughout the NY Tri-State area) and it’s common to get repeated promotional materials from your regional agent.

But, what happens when agents start competing for the same territory? That appears to be the situation for a recent set of mailings I received. On the same day, I got the same letter from two different agents in neighboring regions.

Same offer. Same address. So, who do I call? And how long will it take for me to get annoyed by getting two sets of the same marketing material? Although it may be obvious, there are a few impacts from this kind of blunder:

  • First of all – wasted dollars. Not sure who foots the bills here – State Farm or the agents themselves, but either way, someone is spending more money than they need to.
  • Brand equity suffers. When one local agent promotes themselves to me, I get a warm fuzzy feeling that he is somehow reaching out to his ‘neighbor’. He lives in this community and will understand my concerns and needs. This is his livelihood and it matters to him. But, when I get the same exact mailing from two agents in different offices, I realize there is a machine behind this initiative. Warm feelings gone and the brand State Farm has worked so hard to develop, loses its luster.
  • Painful inefficiency.  I am just one person that got stuck on two mailing lists. How many more are there? And how much more successful would each agent be if they focused their time, money and energy on a unique territory, instead of overlapping ones.

There are lots of lessons in this one and there are a variety of possible reasons for this kind of blunder.  A quick call to one of the agents and I learned that most of the lists come from the parent organization but some agents do supplement with additional lists but they assured me, this kind of overlap was not expected or planned. That means there is a step (or tool) in the process that is missing. It could require a change in business rules for agent marketing. It’s possible they have the rules in place but requires greater enforcement. It could just be a matter of implementing the right deduplication tools across their multiple data sources. There are plenty of ways to insure against this kind of #dataqualityblunder once the issue is highlighted and data quality becomes a priority.

Keep your SQL Server data clean – efficiently!

Working with very large datasets (for example when identifying duplicate records using matching software) frequently can throw up performance problems if you are running queries returning large  volumes of data. However there are some tips and tricks that you can use to ensure your SQL code works as efficiently as possible.

In this blog post, I’m going to focus on just a few of these – there are many other useful methods, so feel free to comment on this blog and suggest additional techniques that you have seen deliver benefit.

Physical Ordering of Data and Indices

Indices and the actual physical order of your database can be very important. Suppose for example that you are using matching software to run a one off internal dedupe, looking to compare all records with several different match keys.  Let’s assume that one of those keys is zip or postal code and it’s significantly slower than the other key.

If you put your data into the physical postal code/zip order, then your matching process may run significantly faster since the underlying disk I/O will be much more efficient as the disk head won’t be jumping around (assuming that you’re not using solid state drives).  If you are also verifying the address data using post office address files, then again having it pre-ordered by postal code/zip will be a big benefit.

So how would you put your data into postcode order ready for processing?

There are a couple of options:

  • Create a clustered index on the postcode/zip field – this will cause the data to be stored in postcode/zip order,
  • If the table is in use and already has a clustered index, then the first option won’t be possible. However you may still see improved overall performance if you run a “select into” query pulling out the fields required for matching, and ordering the results by postal code/zip. Select this data into a working table and then use that for the matching process – having added any other additional non-clustered indices needed.

Avoid  SELECT *

Only select the fields you need.  SELECT * is potentially very inefficient when  working with large databases (due to the large amount of memory needed). If you only need to select a couple of fields of data (where those fields are in a certain range), and those fields are indexed, then selecting only those fields allows the index to be scanned and the data returned.  If you use SELECT *, then the DBMS will join the index table with the main data table – which is going to be a lot slower with a large dataset.

Clustered Index and Non-clustered Indices

Generally when working with large tables, you should ensure that your table has a clustered index on the primary key (a clustered index ensures that the data is ordered by the index – in this case the primary key).

For the best performance, clustered indices ought to be rebuilt at regular intervals to minimise disk fragmentation – especially if there are a lot of transactions occurring.  Note that non-clustered indices will also be rebuilt at the same time – so if you have numerous indices then it can be time consuming.

Use Appropriate Non-clustered Indices to Boost Query Efficiency

Non-clustered indices can assist with the performance of your queries – by way of example, non-clustered indices may benefit the following types of query:

  •  Columns that contain a large number of distinct values, such as a combination of last name and first name. If there are very few unique values,  most queries will not use the index because a table scan is typically more efficient.
  • Queries that do not return large result sets.
  • Columns frequently involved in the search criteria of a query (the WHERE clause) that return exact matches.
  • Decision-support-system applications for which joins and grouping are frequently required. Create multiple non-clustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
  • Covering all columns from one table in a given query. This eliminates accessing the table or clustered index altogether.

In terms of the best priority for creating indices, I would recommend the following:

1.) fields used in the WHERE condition

2.) fields used in table JOINS

3.) fields used in the ORDER BY clause

4.) fields used in the SELECT section of the query.

Also make sure that you use the tools within SQL Server to view the query plan for expensive queries and use that information to help refine your indices to boost the efficiency of the query plan.

Avoid Using Views

Views on active databases will perform slower in general, so try to avoid views. Also bear in mind that if you create indices on the view, and the data in the base tables change in some way, then the indices on both the base table and view will need updating – which creates an obvious performance hit.  In general, views are useful in data warehouse type scenarios where the main usage of the data is simply reporting and querying, rather than a lot of database updates.

Make use of Stored Procedures in SQL Server

The code is then compiled and cached, which should lead to performance benefits. That said you need to be aware of parameter sniffing and designing your stored procedures in such a way that SQL Server doesn’t cache an inefficient query execution plan.  There are various techniques that can be used:

  • Optimising for specific parameters
  • Recompile For All Execution
  • Copy parameters into Local Variables

For those interested, there’s a more in-depth, but easy to follow description of these techniques covered on the following page of SQLServerCentral.com

http://www.sqlservercentral.com/blogs/practicalsqldba/2012/06/25/sql-server-parameter-sniffing/

Queries to compare two tables or data sources

When using matching software to identify matches between two different data sources, you may encounter scenarios where one of the tables is small relative to another, very large, table, or where both tables are of similar sizes. We have  found that some techniques for comparing across the two tables run fine where both tables are not too large (say under ten million records), but do not scale if one or both of the tables are much larger than that.  Our eventual solution gets a little too detailed to describe effectively here, but feel free to contact us for information about how we solved it in our matchIT SQL application.

And Finally

Finally I’d recommend ensuring that you keep an eye on the disks housing your SQL Server database files: ensure that there’s at least 30% storage space free and that the disks are not highly fragmented; regularly doing this produces better performance.

In summary by making efforts to optimise the performance of your data cleaning operations, you will reduce load on your database server, allow regular use of the necessary applications to keep your data clean – and as a result keep your users happy.

Assessing Your Data Quality Needs

So you have data quality issues. Who doesn’t? Should you embark on a data quality project? Maybe but what are your objectives? Are there service issues related to poor data quality? Marketing issues? Other major integrations or warehousing projects going on? And once you clean up your data – then what? What will you do with the data? What benefit will a clean database pose for your organization? And without clear objectives, how can you even justify another major technology initiative?

Before any data quality project, it is critical to go beyond the immediate issues of duplicate records or bad addresses and understand the fundamental business needs of the organization and how cleaner day will enable you to make better business decisions. This will help you to establish accurate project parameters, keep your project on track and justify the investment to C level executives. So where do you begin? At the beginning.

Look beyond the pain.
In most cases, a specific concern will be driving the urgency of the initiative but it will be well worth the effort to explore beyond the immediate pain points to other areas where data is essential. Plan to involve a cross-section of the departments including IT, marketing, finance, customer service and operations to understand the global impact that poor data quality could be having on your organization.

Look back, down and forward.
Consider the data quality challenges you’ve had in the past, the ones you face today and the ones that have yet to come. Is a merger on the horizon? Is the company migrating to a new platform? Do you anticipate signficant staffing changes? Looking ahead in this way will ensure that the investment you make will have a reasonable shelf-life.

Look at the data you don’t have.
As you review the quality of the data you have, also consider what’s missing and what information would be valuable to customer service reps or the marketing department. It may exist in another data silo somewhere that just needs to be made accessible or it could require new data be collected or appended.

Be the customer.
Call the Customer Service Department and put them through the paces. Sign up for marketing materials online. Place an order on the website. Use different addresses, emails and nicknames. Replicate perfectly reasonable scenarios that happen every day in your industry and see how your infrastructure responds. Take good notes on the places where poor data impacts your experience and then look at the data workflow through fresh eyes.

Draw out the workflow.
Even in small organizations, there is tremendous value in mapping out the path your data takes through your business. Where it is entered, used, changed, stored and lost. Doing this will uncover business rules (or lack of) that are likely impacting the data, departments with complementary needs and or places in the workflow where improvements can be made (and problems avoided).

Think big and small.
Management and C-Level executives tend to think big. Data analysts and technical staff tend to think granularly and departmental users usually fall somewhere in the middle. Ultimately, the best solution can only be identified if you consider the global, technical and strategic business needs.

The challenges with identifying, evaluating and implementing an effective data quality solution are fairly predictable but problems almost always begin with incorrect assumptions and understanding of the overall needs of the organization. In some cases, the right data quality vendor can help you move through this process but ultimately, failure to broaden the scope in this way can result in the purchase of a solution that does not meet all the requirements of the business.

Click here to download a comprehensive Business Checklist that will help you to identify the data quality business needs within your organization. Then stay tuned for our next post in our Data Quality Project series.

helpIT Systems is Driving Data Quality

For most of us around the US, the Department of Motor Vehicles is a dreaded place, bringing with it a reputation of long lines, mountains of paperwork and drawn out processes. As customers, we loathe the trip to the DMV and though while standing in line, we may not give it much thought  – the reality is, poor data quality is a common culprit of some of these DMV woes. While it may seem unlikely that an organization as large and bureaucratic as the DMV can right the ship, today, DMV’s around the country are fighting back with calculated investments in data quality.

While improving the quality of registered driver data is not a new concept, technology systems implemented 15-20 years ago have long been a barrier for DMVs to actually take corrective action. However, as more DMVs begin to modernize their IT infrastructure, data quality projects are becoming more of a reality. Over the past year, helpIT has begun work with several DMVs to implement solutions designed to cleanse driver data, eliminate duplicate records, update addresses and even improve the quality of incoming data.

From a batch perspective, setting up a solution to cleanse the existing database paves the way for DMVs to implement other types of operational efficiencies like putting the license renewal process online, offering email notification of specific deadlines and reducing the waste associated with having (and trying to work with) bad data.

In addition to cleaning up existing state databases, some DMVs are taking the initiative a step further and working with helpIT to take more proactive measures by incorporating real-time address validation into their systems.  This ‘real-time data quality’ creates a firewall of sorts, facilitating the capture of accurate data by DMV representatives – while you provide it (via phone or at a window). With typedown technology embedded directly within DMV data entry forms, if there is a problem with your address, or you accidently forgot to provide them with information that affects the accuracy, like your apartment number or a street directional (North vs. South), the representatives are empowered to prompt and request clarification.

Getting your contact data to be accurate from the start means your new license is provided immediately without you having to make another visit, or call and wait on hold for 30 minutes just to resolve the problem that could have been no more than a simple typo.

Having met several DMV employees over the past year, it’s obvious that they want you to have an excellent experience. Better data quality is a great place to start. Even while DMV budgets are slashed year after year, modest investments in data quality software are yielding big results in customer experience.

 

If you want to learn more about improving the quality of your data, contact us at 866.332.7132 for a free demo of our comprehensive suite of data quality products.

Golden Records Need Golden Data: 7 Questions to Ask

If you’ve found yourself reading this blog then you’re no doubt already aware of the importance of maintaining data quality through processes such as data verification, suppression screening, and duplicate detection. In this post I’d like to look a bit closer at how you draw value from, and make the best use of, the results of the hard work you invest into tracking down duplicates within your data.

The great thing about fuzzy matching is that it enables us to identify groups of two or more records that pertain to the same entity but that don’t necessarily contain exactly the same information. Records in a group of fuzzy matches will normally contain similar information with slight variations from one record to the next. For example, one record may contain a full forename whilst another contains just an abbreviated version or even none at all. You will also frequently encounter fuzzy matches where incorrectly spelt or poorly input data is matched against its accurate counterpart.

Once you’ve identified these groups of fuzzy matches, what do you do with them? Ultimately you want to end up with only unique records within your data, but there are a couple of ways that you can go about reaching that goal. One approach is to try and determine the best record in a group of matches and discard all of the records that matched against it. Other times, you may find that you are able to draw more value from your data by taking the most accurate, complete, and relevant information from a group of matched records and merging it together so that you’re left with a single hybrid record containing a superior set of data than was available in any of the individual records from which it was created.

Regardless of the approach you take, you’ll need to establish some rules to use when determining the best record or best pieces of information from multiple records. Removing the wrong record or information could actually end up making your data worse so this decision warrants a bit of thought. The criteria you use for this purpose will vary from one job to the next, but the following is a list of 7 questions that target the desirable attributes you’ll want to consider when deciding what data should be retained:

  1. How current is the data?
    You’ll most likely want to keep data that was most recently acquired.
  2. How complete is the data?
    How many fields are populated, and how well are those fields populated?
  3. Is the data valid?
    Have dates been entered in the required format? Does an email address contain an at sign?
  4. Is the data accurate?
    Has it been verified (e.g. address verified against PAF)?
  5. How reliable is the data?
    Has it come from a trusted source?
  6. Is the data relevant?
    Is the data appropriate for its intended use (e.g. keep female contacts over male if compiling a list of recipients for a woman’s clothing catalogue)?
  7. Is there a predetermined hierarchy?
    Do you have a business rule in place that requires one set of data is always used over another?

When you have such a large range of competing criteria to consider, how do you apply all of these rules simultaneously? The approach we at helpIT use in our software is to allow the user to weight each item or collection of data, so they can choose what aspects are the most important in their business context. This isn’t necessarily whether an item is present or not, or how long it is, but could be whether it was an input value or derived from supplied information, or whether it has been verified by reference to an external dataset such as a Postal Address File. Once the master record has been selected, the user may also want to transfer data from records being deleted to the master record e.g. to copy a job title from a duplicate to a master record which contains fuller/better name and address information, but no job title. By creating a composite record, you ensure that no data is lost.

Hopefully this post will have given you something to think about when deciding how to deal with the duplicates you’ve identified in your data. I’d welcome any comments or questions.