Data, Hoverboards and Ashley Madison. 2015 Review

Data, Hoverboards and Ashley Madison. 2015 Review

Imagine if Marty McFly had gone back to 1985 telling the world what he saw in the real 2015, they would have locked him up in a loony bin. That people of today walk around with electronic devices smarter than anything or everything in the world had back then, in their pockets. And these devices can do almost everything, from hailing a cab, buying 3D printed ornaments from Venezuela, check Chinese stock markets or watch television.

The irony is that people of today call these devices “phones”, which is one of the less popular functions of them. Yet the possibilities of phones grow every day, largely due to all the data they collect. Data being generated and recorded of every person, making their lives easier, less congested and more connected. While we churn this data and create it by the petabyte everyday, we’re realizing that making sense of it is the new challenge – and securely maintaining and storing accurate data is essential.

At helpIT we’ll have been working with data for 25 years as of 2016, which makes us some of the few young adults in a world of newborn data companies. But before we turn 25, we thought it might be good to take a look back at 2015, our 24th year and reflect on how data has become rather newsworthy.

Headlines that worried us…

The most shocking of data scandals in 2015 was the breach of security at AshleyMadison.com, an online marketplace for married folk with a wandering eye.  The world watched in fascinated horror as the site’s 37 million members scrambled to save their marriages, careers, and reputations.  No one was exempt:  the compromised membership list even included members of the United Nations and the Vatican.  More worrisome to companies watching this event unfold was the suspicion that the leak came from a disgruntled employee.  Which is a good reminder to all of us that while protecting your database from outside threats is a priority, never overestimate the loyalties of those on the inside as well.

While the original cyber-attack on Sony Pictures Entertainment occurred in 2014, the aftershocks were still arriving in the new year.  Especially when Wikileaks made the decision to create a searchable data dump on more than 30,000 private documents from the breach.  Citing public interest, Wikileaks claimed that Sony’s influence on Washington made the inner workings of their company relevant to the general public.  Companies everywhere began beefing up their privacy settings.

Some good news…

As the data security attacks kept coming, businesses were forced to reanalyze the priority placed on data security as well as data’s importance to their organizations.  Quality database management not only keeps customers safe, but businesses now recognize that accurate contact data can be used to keep customers happy.  According to an Accenture poll, 89 percent of business leaders believe big data will revolutionize business operations in the same way the Internet did.

These same leaders are planning on pursuing big data projects in 2016 in order to seize a competitive edge.  To this effect, many businesses are recognizing that the first step is improving the quality of their contact databases in order to better serve their customers.  For consumers, this should bring a more personalized and effective e-commerce experience in the New Year.

What we were up to…

With database management taking center stage on the corporate agenda, we at helpIT systems have been hard at work to meet this rising demand.  While we have a wide range of unrivaled data quality software solutions, the year of 2015 began a shift away from individual products towards a more complete data quality solution providing customers with a one-stop shop for data quality, data matching, and data enrichment.  Our staff has been working diligently to ensure that we are ready to provide both our current and our new customers with stellar customer service and technical support that exceeds expectations.

To that end, helpIT systems rolled out a new website designed with the customer experience in mind.  Our goal was to provide more value through instructional tools and industry resources.  While data quality software and solutions are our primary focus, we strive to be a resource to those who visit www.helpit.com looking for answers about all aspects of data quality.

Expectations for 2016…

Perhaps the most pervasive change in the data industry during 2015 was the rapid growth in the “internet of things”.  While the technology has been around for years, we saw a large jump in the amount of devices created to monitor data in our personal lives.  Sales of everything from Fitbits, to monitor your health, to Nests, a home thermostat that adapts to your behaviors and the seasons, have boomed.  We expect to see more of this in 2016.

While this data will save electricity, ideally make us rethink our fitness goals, and improve home security, both companies and consumers are looking ahead to ensure this data is better protected than the troves that came before it.  Because while the concept of data has been around for a while, 90 percent of the world’s data was collected within the last two years.  Companies are learning how important it is to keep this information safe, and that is good news for everyone in 2016.
As all of this data is being collected, the feat of managing it looms before us like perfecting a true hover-board (which can actually just about hover over water).  A company can only hope to gain the competitive edge from data analysis if it possesses secure, accurate, and searchable contact data.  helpIT systems has been a partner to companies in this goal for the past 24 years and we will continue to grow and be there for 24 more.

Happy Holiday and a Prosperous New Year to all!

12 Days of Data Quality sign

12 Days of Data Quality

The holidays are finally here.  They always seem so far away and then, as the days grow short and temperatures fall, they tend to jump out at us in a surprise attack like a kid in a spooky costume on Halloween.  And once they are here, if you blink, they are over.  The anticipated smells of gingerbread baking in the oven, the joy of seeing a loved one open a carefully selected present, the glow of thousands of twinkling Christmas lights… all over before we were able to slow down and truly appreciate the holiday season.

So before December disappears under a pile of wrapping paper, we are inviting you to take the time to be merry, revel in the holidays, and perhaps still get a bit of work done.

Welcome to helpIT system’s 12 Days of Data Quality:

On the first Day of Data Quality, helpIT gave to me:

A Single Customer View (In a Pear Tree)

The first gift in this classic holiday carol is a Partridge in a Pear Tree.  The partridge sits alone high above the rest of the world.  Regally.  Eating pears (I imagine it eating pears) while looking down on all the lesser beings that have to see the world from ground level.

Your organization can be that partridge, sitting high above the rest.  Except in the world of database management, we are seeking a truly accurate Single Customer View, rather than a belly full of pears.  We all want the ability to look down on one contact record and obtain accurate, up-to-date information, each and every time.  Having one complete record for each customer ensures that they will receive the correct marketing materials at the correct address.  Salespeople will know a customer’s complete purchase history to analyze likely future purchases.  Customer service reps will be aware of address changes, name changes, as well as any other personal details in order to make the customer feel like they matter.  Which they do.  A lot.

Each customer in your contact database makes up a limb of your “pear tree”.  In the song, no matter how many gifts of drummers or pipers or ladies milking cows are given, it always comes back to the pear tree.  The tree is the center of everything, holding up even the partridge.  Just as your customers hold up your organization.  Make your customers feel this importance by respecting them as individuals, and as the base of your success, rather than lumping them in with the rest in your database.  The first step in doing this is by having a strong data quality solution and system in place.

On the second Day of Data Quality, helpIT gave to me:

2 Matched Records

On the second day of Christmas, my true love gave to me two turtle doves.  Which was great, in medieval times, when the doves symbolized true love’s endurance, mainly because they mated for life.  Everyone from the Bible to Shakespeare has made mention of them.

This December, give yourself another sort of true match.  Matching contact records in your database is the first step to cleaning up dirty data and obtaining a Single Customer View.  And there is no one-size-fits-all solution.

The important thing to consider when matching and deduping your database is the methodology used in the process.  Some software only matches exact records, so ‘John Smith’ and ‘John Smith’ would show up as a duplicate.  However, ‘John Smith’ and ‘Jon Smith’ would not.  So if you want a truly accurate database, you have to employ a more sophisticated method.

helpIT system’s matching software compares all the datasets in one contact record against the rest of your database.  John Smith’s address, birthday, phone number, email, or whatever other datapoints you use are all taken into consideration when pinpointing matches.  This process often picks up 20-80 percent more matches than other software.  When you multiply that by 200 million records, that’s a lot of matches.

The biggest mistake organizations make when matching records is to view it as a “one and done” solution.  Data matching, like any long-term relationship, is something that must be constantly tweaked, adapted, and carried out on a regular basis.  Although as the turtle doves can attest, this type of devotion does come with big rewards.

On the third Day of Data Quality, helpIT gave to me:

 

3 Frenchmen

 

Rather than the French Hens in the traditional song, let us meet a Frenchman whose name is Dr. Mathieu Arment. He loves to purchase designer scarfs from your company, Parisian Scarves. During his first online purchase, he entered his information as follows:

Matheiu Arment
27 Rue Pasteur
14390 Cabourg
FRANCE

The Parisian Scarves marketing department then sends him a catalogue for the Spring Collection. He flips through it while sitting at a local café sipping a latte and finds a handsome purple plaid scarf that he absolutely must have, but he has forgotten his laptop. So he calls in the order. The Parisian Scarves customer service rep does not see an account under the name she types in, Mattheiu Armond, so she creates a new account record and places his order.

Later, a second customer service rep is handling an issue with his order and decides to send a coupon as a gift for all of his trouble. The coupon is sent to Mathis Amiot. And the rep slightly misspelled his address on Rue Pasteur. Upon receiving the misguided coupon as well as two of the same catalogue addressed to slight variations of his name, Dr. Arment realizes that he is not just one Frenchman, but rather 3 separate Frenchmen in the eyes of Parisian Scarves. Feeling annoyed and undervalued that his favorite scarf company cannot even spell his name correctly, not to mention they also forgot his birthday, Dr. Arment takes his scarf shopping to another business who appreciates him as an individual.

Not all data matching software is created equal. While some compares only exact matches, helpIT system’s unique phonetics matching system will pull out similar sounding pieces of data as well as similar spellings. This will create a higher match rate, allowing for less duplicates to slip through into your database.

In this instance, the Parisian Scarves customer service rep would have typed in Mattheiu Armond, only to have the record Matheiu Arment show up as a possible match. She would have noted the similar addresses and concurred, correctly, that these were the same customer.

Accurate data matching creates a data quality firewall, preventing bad data from entering the system at point-of-entry, as well as filtering it out on a regularly scheduled check-up. So Dr. Arment can stay one Frenchman, and more importantly, he will stay a customer of Parisian Scarves.

On the fourth Day of Data Quality, helpIT gave to me:

 

4 Calling Salespeople

 

Sales is a unique industry in which every minute can translate into profits, if that minute is spent efficiently and effectively. Salespeople are constantly seeking better ways of doing things in order to increase your company’s profits, as well as their commissions. Which means every minute wasted clicking through the CRM, either in search of leads or trying to obtain accurate client data, is valuable time lost. Every phone call they make is either costing you money or making you money. What decides whether a sales team is a drain or an asset? The quality of the leads they are contacting.

A CRM that has effective data quality measures in place is filled with accurate contact records. These records can be analyzed to obtain valuable information by all arms of your organization, especially the sales department.

A good salesperson can use CRM data to offer the right products to the right potential buyers, as well as dedicate more time to leads that are statistically more likely to turn into sales. They will be able to quickly obtain the correct point-of-contact and contact information without fishing through multiple records for the same lead. A salesperson will also look more knowledgeable as they are able to talk easily with a client about their business needs.

Give your salespeople the resources they need to be a profitable addition to your company by having an accurate, up-to-date CRM.

On the fifth Day of Data Quality, helpIT gave to me:

 

5 Golden Reasons to Trial matchIT SQL

 

helpIT’s ‘12 Days of Data Quality’ continues with 5 Golden Reasons to Trial matchIT SQL. Perhaps not the golden rings the lady received in the song, but really, who needs five golden rings? Sounds like a pickpocket’s dream come true. So instead, we here at helpIT are presenting you with five reasons to try our matching system.

We hope by now that you are starting to understand how important a strong data quality management system is to the success of your organization. It can increase profits and productivity in all arms of your organization. Yet sometimes it is hard to get the ball rolling, especially if you have a lot of chiefs who are part of this decision. So consider these five reasons why a helpIT systems trial is a good place to start:

1. Quick Installation. Be processing data in less than an hour!
2. Run data cleansing processes on your own data in your own environment (even address validation).
3. Customize the matching process and fine-tune your results with dedicated Trial Customer Support.
4. Run large volumes of data to see real performance results.
5. Get the real-world examples you need to justify your business case for SQL data.

This holiday season, try matchIT SQL for 30 days for absolutely nothing! We know you will love it, but if you don’t, we will give you 5 golden rings. Or maybe just one. Or a thank-you email. Yes, if you don’t love it, we will send you an email thanking you for your time. Happy trialing!

On the sixth Day of Data Quality, helpIT gave to me:

 

6 Companies a-Laying

 

Our countdown to Christmas and better data quality measures continues! In the song, his sweetheart received 6 geese laying eggs. Which might get some odd looks around the office. Instead, consider the importance of laying a strong foundation when beginning your quest for clean data.

All geese lay eggs. But the goose that laid the golden egg got a lot more attention than the rest. Like that golden-egg laying goose, the company that lays the strongest foundation in regards to data quality will garner the most attention and achieve the best results.

Most organizations think of clearing out dirty data as something to be dealt with when absolutely necessary. When in reality, database maintenance is a process that should be consistently tweaked, monitored, and exercised. Contact data is constantly entering your system. Contacts are frequently relocating, changing names, or passing away. Which means a good database administrator is diligent in tracking these changes.

Laying the foundation for strong data quality measures is often labeled too time consuming to be dealt with. But the time invested originally will pay off in piles of golden eggs in the future.

On the seventh Day of Data Quality, helpIT gave to me:

 

7 Sales a-Swimming

 

Or rather, floundering. Whether you want to admit it or not, the odds are you are floundering in bad data, working hard just to stay afloat of the changes that occur in your contact database on a daily basis. Each sale relies on every member of your team being able to swim seamlessly through the CRM to obtain the information they need to make a client feel valued and understood.

Companies today report data analysis as one of the most effective tools for developing marketing campaigns and targeting sales leads. Many organizations use data analysis on a daily basis. However, if they are analyzing inaccurate or out-of-date data, the analysis is all but pointless. A database that does not have systems in place for catching bad data at point-of-entry, as well as a regular cleansing schedule, is a hindrance rather than a help in regards to data analysis.

This holiday season, give your data analysis the gift of a life raft. Make sure your team is swimming, rather than floundering, in the sea of contact data. Accurate data analysis will increase marketing effectiveness, reduce marketing spend, and increase productivity in all aspects of your business that work in the CRM.

On the eighth Day of Data Quality, helpIT gave to me:

 

8 Maids E-Mailing

 

While your business is probably not made up of maids, it does most likely contain many people that rely on email communications on a day-to-day basis. Email is an important means to reach prospects, current customers, and vendors. How these messages are delivered, as well as the content in them, is a strong reflection on the quality of your business model.

Do the emails look polished and professional? Or lazy and sloppy? Most organizations unintentionally accomplish the latter. A lack of data quality management systems has caused incorrect contact information to reside in their database. So Joe Smith gets an email addressed to Jo Smith. Or Jo Smith becomes a Mrs. instead of a Mr. And that’s all assuming that the email is even delivered.

Email deliverability is a key concern to many businesses, especially in regards to marketing. A great marketing campaign is irrelevant if the message is not received by the intended recipient. New email addresses are often mistyped. Another possibility is that a wrong address was given intentionally. Either way, the organization has lost a sales lead because the incorrect address is not reachable.

Email validation is a valuable and effective piece of the data quality puzzle. It will greatly increase the number of leads passed onwards to your sales team as well as ensure that marketing communications arrive to the person they were intended for. It is easy to implement, and the rewards far outweigh the costs.

On the ninth Day of Data Quality, helpIT gave to me:

 

9 Ladies Dating

 

One of the biggest challenges in your database can come from name changes. Sometimes it is from 9 ladies dating and then deciding to tie the knot. And while marriage is normally considered a wonderfully celebrated occasion, to the database administrator it means the possibility of error. Because it is almost a certainty that Ms. Smith is not calling her 17 magazine subscriptions from her honeymoon to let them know she married Mr. Clark and moved into his duplex in the Heights.

The new Mrs. Clark is a valued customer. So treat her as such by recognizing these changes as quickly as possible. Name changes and new addresses are easily dealt with when you have a proper data quality system in place. Stay tuned for tomorrow’s blog for some tips on keeping up with Mrs. Clark.

On the tenth Day of Data Quality, helpIT gave to me:

 

10 Lords a-Moving

 

The original Lords from the song might be a-leaping, but most of your customers getting around via UHaul trucks and airplanes. They are leaping across town, across the state, and sometimes, across the world. In an average year over 40 million people move. Keeping up with them can seem even harder than remembering the words to the 12 Days of Christmas.

Keep your contact database accurate and up-to-date with National Change of Address (NCOA). In one easy process your current contact address data is compared to USPS CASS and DPV certified data, correcting any typing errors and appending additional information.

On the eleventh Day of Data Quality, helpIT gave to me:

 

11 Pipers Piping

 

The Pied Piper was a character in German folklore who tried to sway a town to pay him to rid their village of rats. His pipe music would lure the rats out of hiding and they would follow him out of town. When the villagers refused to pay for this service, he piped away their children instead. Not the noblest use of his talents, but the ability to lead others is a powerful trait nonetheless.

Be the pied piper at your organization, only use your powers for good instead of evil. Make 2016 the year your organization makes data quality solutions a priority and others will be glad they followed you. Often the only thing holding a company back from reducing the costs of bad data is the knowledge and the leadership to move forwards. helpIT systems offers a full range of customer support solutions so that you and your company can feel confident about your next move.

On the twelfth Day of Data Quality, helpIT gave to me:

 

12 DBAs Drumming

 

More often than not, the squeaky wheel gets the grease. The loudest drummers in your office this season should be those making noise about the importance of data deduplication. Having an improperly deduped database can create upwards of 60 percent of dirty data in your contact database.

Those incorrect contacts are receiving marketing materials (which cost money), taking up manpower to organize and sift through in the CRM (which costs time), and getting calls from your sales people (which cost money and time).

This month alone I have received mail for 4 different past residents of my current apartment. You know what I do with it? I throw it away. So Horace will never get that credit card offer. Zachary will not be donating to the Salesian Missions. And Monique will not be showing up in court for her fifth and final notice to appear. (Feeling a little guilty about that last one.)

We hope you enjoyed our unique spin on the traditional “12 Days of Christmas”. While the holidays are nearing an end, helpIT systems is here to answer your data quality questions 365 days a year. We hope to make 2016 your best data quality year ever. Give us a call or visit our website at www.helpit.com to find out more information.

Weighing up the Cost of Bad Data image

Weighing up the Cost of Bad Data

In a recent survey conducted by helpIT systems, almost 25 percent of respondents cited finances as the biggest hindrance to maintaining superior contact databases.  We get it.  Data quality solutions can carry what may seem to be a hefty pricetag, and they won’t show up two days later in a nicely wrapped package like an Amazon Prime purchase.  As such, like any other expensive and complicated decision, data quality may well get pushed to the bottom of the pile.

Then again, just like going to the gym or eating salad instead of steak, the toughest behaviors to adapt are usually the most beneficial.  Because even though database management may be something we’d rather forget about, 40 percent of those same respondents stated that their companies were losing tens of thousands of dollars each year due to poor contact data quality.  So while the solution may not be cheap and easy, the cost of living without it does not appear to be either.  Data Warehousing Institute found that the cost of bad data to US businesses is more than $600 billion each year.  Is that a number your company can afford to ignore?

Many businesses do notice these dollars disappearing and choose to do something about it.  Unfortunately however, this is often simply a “quick fix”.  They look at their messy databases, pay someone to “clean them up”, and then everyone gets a pat on the back for a job well done.  And it is.  Until someone enters a new record in the CRM, a customer moves, or perhaps even dares to get a new phone number.  And I will shock everyone by reporting that this happens all the time.  Studies indicate up to a 2 percent degradation each month…even in a perfect database.

Right now you’re probably picking up on the fact that maintaining good data is going to cost money.  You’re right.  But the fact is, avoiding that cost is only going to cost more in the long run.  Just like having a well-trained sales team, a finely-targeted marketing plan, or a boss with years of experience…great results are an investment of time and resources rather than a happy accident.

Companies that choose to invest in good data quality, as well as to view it as an ongoing process rather than a simple one-time fix, are finding that the benefits by far outweigh the initial costs.  Advertising dollars are reaching their intended audiences and sales calls are reaching the right recipient, with customer satisfaction going through the roof.  Today’s consumer expects the personal touches that can only come from having an accurate and up-to-date Single Customer View, and it is good data quality solutions that will achieve them.

bank signs

Why Customers Must Be More Than Numbers

I read with some amazement a story in the London Daily Telegraph this week about a customer of NatWest Bank who sent £11,200 last month via online banking to an unknown company instead of his wife. Although Paul Sampson had correctly entered his wife’s name, sort code and account number when he first made an online payment to her HSBC account, he wasn’t aware that she had subsequently closed the account.

Mr Sampson thought he was transferring £11,200 to his wife: he clicked Margaret’s name among a list of payees saved in his NatWest banking profile and confirmed the transaction, but the payment went to a business in Leeds. Mr Sampson believes that HSBC had reissued his wife’s old account number to someone else, a company whose name they refused to tell him. NatWest told Mr Sampson it was powerless to claw the money back.

HSBC said it had contacted its customer, but it had no obligation regarding the money. HSBC insisted that the account number in question was not “recycled”, saying Mr Sampson must have made a typing error when he first saved the details, which he disputes. Although the money was in fact returned after the newspaper contacted HSBC, a very large issue has not been resolved.

Although news to most of us, it is apparently a common practice among banks in the UK to recycle account numbers, presumably because banking systems are so entrenched around 8 or 9 digit account numbers that they are concerned about running out of numbers. Apparently a recent code of practice suggests that banks should warn the customer making the payment if they haven’t sent money to this payee for 13 months, but according to the Daily Telegraph “No major high street bank could confirm that it followed this part of the code”.

The Daily Telegraph goes on to state that the recipients of electronic payments are identified by account numbers only. The names are not checked in the process, so even if they do not match, the transaction can proceed. “This is now a major issue when you can use something as basic as a mobile phone number to transfer money,” said Mike Pemberton, of solicitors Stephensons. “If you get one digit wrong there’s no other backup check, like a person’s name – once it’s gone it’s gone.” If you misdirect an online payment, your bank should contact the other bank within two working days of your having informed them of the error, but they have no legal obligation to help.

Mr Sampson obviously expected that the bank’s software would check that the account number belonged to the account name he had stored in his online payee list, but apparently UK banking software doesn’t do this. Why on earth not? Surely it’s not unreasonable for banks with all the money they spend on computer systems to perform this safety check? It’s not good enough to point to the problems that can arise when a name is entered in different ways such as Sheila Jones, Mrs S Jones, Sheila M Jones, SM Jones, Mrs S M Jones, Mrs Sheila Mary Jones etc.

These are all elementary examples for intelligent name matching software.  More challenging are typos, nicknames and other inconsistencies such as those caused by poor handwriting, which would all occur regularly should banks check the name belonging to the account number. But software such as matchIT Hub is easily available to cope with these challenges too, as well as the even more challenging job of matching joint names and business names.

There are also issues in the USA with banking software matching names – I remember when I first wanted to transfer money from my Chase account to my Citibank account, I could only do so if the two accounts had exactly the same name – these were joint accounts and the names had to match exactly letter for letter, so I had to either change the name on one of the accounts or open a new one! Having been an enthusiastic user of the system in the USA for sending money to someone electronically using just their email address, I’m now starting to worry about the wisdom of this…

We banking customers should perhaps question our banks more closely about the checks that they employ when we make online payments!

Where Big Data, Contact Data and Data Quality come together

We’ve been working in an area of untapped potential for Big Data for the last couple of years, which can best be summed up by the phrase “Contact Big Data Quality”. It doesn’t exactly roll off the tongue, so we’ll probably have to create yet another acronym, CBDQ… What do we mean by this? Well, our thought process started when we wondered exactly what people mean when they use the phrase “Big Data” and what, if anything, companies are doing in that arena. The more we looked into it, the more we concluded that although there are many different interpretations of “Big Data”, the one thing that underpins all of them is the need for new techniques to enable enhanced knowledge and decision making. I think the challenges are best summed up by the Forrester definition:

“Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers. To remember the pragmatic definition of Big Data, think SPA — the three questions of Big Data:

  • Store. Can you capture and store the data?
  • Process. Can you cleanse, enrich, and analyze the data?
  • Access. Can you retrieve, search, integrate, and visualize the data?”

http://blogs.forrester.com/mike_gualtieri/12-12-05-the_pragmatic_definition_of_big_data

As part of our research, we sponsored a study by The Information Difference (available here) which answered such questions as:

  • how many companies have actually implemented Big Data technologies, and in what areas
  • how much money  and effort are organisations investing in it
  • what areas of the business are driving investment
  • what benefits are they seeing
  • what data volumes are being handled

We concluded that plenty of technology is available to Store and Access Big Data, and many of the tools that provide Access also Analyze the data – but there is a dearth of solutions to  Cleanse and Enrich Big Data, at least in terms of contact data which is where we focus. There are two key hurdles to overcome:

  1. Understanding the contact attributes in the data i.e. being able to parse, match and link contact information. If you can do this, you can cleanse contact data (remove duplication, correct and standardize information) and enrich it by adding attributes from reference data files (e.g. voter rolls, profiling sources, business information).
  2. Being able to do this for very high volumes of data spread across multiple database platforms.

The first of these should be addressed by standard data cleansing tools, but most of these only work well on structured data, maybe even requiring data of a uniform standard – and Big Data, by definition, will contain plenty of unstructured data which is of widely varying standards and degrees of completeness. At helpIT systems, we’ve always developed software that doesn’t expect data to be well structured and doesn’t rely on data being complete before we can work with it, so we’re already in pretty good shape for clearing this hurdle – although semantic annotation of Big Data is more akin to a journey than a destination!

The second hurdle is the one that we have been focused on for the last couple of years and we believe that we’ve now got the answer – using in-memory processing for our proven parsing/matching engine, to achieve super-fast and scalable performance on data from any source. Our new product, matchIT Hub will be launching later this month, and we’re all very excited by the potential it has not just for Big Data exploitation, but also for:

  • increasing the number of matches that can safely be automated in enterprise Data Quality applications, and
  • providing matching results across the enterprise that are always available and up-to-date.

In the next post, I’ll write about the potential of in-memory matching coupled with readily available ETL tools.

The 12 Days of Shopping

According to IBM’s real-time reporting unit, Black Friday online sales were up close to 20% this year over the same period in 2012.  As for Cyber Monday, sales increased 30.3% in 2012 compared to the previous year and is expected to grow another 15% in 2013. Mobile transactions are at an all time high and combined with in store sales, The National Retail Federation expects retail sales to pass the $600 billion mark during the last two months of the year alone. While that might sound like music to a retailer’s ears, as the holiday shopping season goes into full swing on this Cyber Monday, the pressure to handle the astronomical influx of data collected at dozens of possible transaction points is mounting. From websites and storefronts to kiosks and catalogues, every scarf or video game purchased this season brings with it a variety of data points that must be appropriately stored, linked, referenced and hopefully leveraged. Add to that a blinding amount of big data now being collected (such as social media activity or mobile tracking), and it all amounts to a holiday nightmare for the IT and data analysis teams. So how much data are we talking and how does it actually manifest itself? In the spirit of keeping things light, we offer you, The 12 Days of Shopping…

On the first day of shopping my data gave to me,
1 million duplicate names.

On the second day of shopping my data gave to me,
2 million transactions, and
1 million duplicate names.

On the third day of shopping my data gave to me,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the fourth day of shopping my data gave to me,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the fifth day of shopping my data gave to me,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the sixth day of shopping my data gave to me,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the seventh day of shopping my data gave to me,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the eighth day of shopping my data gave to me,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the ninth day of shopping my data gave to me,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the tenth day of shopping my data gave to me,
10,000 tweets,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the eleventh day of shopping my data gave to me,
11 new campaigns,
10,000 tweets,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the twelfth day of shopping my data gave to me,
12 fraud alerts,
11 new campaigns,
10,000 tweets,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

While we joke about the enormity of it all, if you are a retailer stumbling under the weight of all this data, there is hope and over the next few weeks we’ll dive a bit deeper into these figures to showcase how you can get control of the incoming data and most importantly, leverage it in a meaningful way.

Sources:
http://techcrunch.com/2013/11/29/black-friday-online-sales-up-7-percent-mobile-is-37-percent-of-all-traffic-and-21-5-percent-of-all-purchases/

http://www.pfsweb.com/blog/cyber-monday-2012-the-results/

http://www.foxnews.com/us/2013/11/29/retailers-usher-in-holiday-shopping-season-as-black-friday-morphs-into/

6 Reasons Companies Ignore Data Quality Issues

When lean businesses encounter data quality issues, managers may be tempted to leverage existing CRM platforms or similar tools to try and meet the perceived data cleansing needs. They might also default to reinforcing some existing business processes and educating users in support of good data. While these approaches might be a piece of the data quality puzzle, it would be naive to think that they will resolve the problem. In fact, ignoring the problem for much longer while trying some half-hearted approaches, can actually amplify the problem you’ll eventually have to deal with later. So why do they do it? Here are some reasons we have heard about why businesses have stuck their heads in the proverbial data quality sand:

1. “We don’t need it. We just need to reinforce the business rules.”

Even in companies that run the tightest of ships, reinforcing business rules and standards won’t prevent all your problem. First, not all data quality errors are attributable to lazy or untrained employees. Consider nicknames, multiple legitimate addresses and variations on foreign spellings just to mention a few. Plus, while getting your process and team in line is always a good habit, it still leaves the challenge of cleaning up what you’ve got.

2. “We already have it. We just need to use it.”

Stakeholders often mistakenly think that data quality tools are inherent in existing applications or are a modular function that can be added on. Managers with sophisticated CRM or ERP tools in place may find it particularly hard to believe that their expensive investment doesn’t account for data quality. While customizing or extending existing ERP applications may take you part of the way, we are constantly talking to companies that have used up valuable time, funds and resources trying to squeeze a sufficient data quality solution out of one of their other software tools and it rarely goes well.

3. “We have no resources.”

When human, IT and financial resources are maxed out, the thought of adding a major initiative such as data quality can seem foolhardy. Even defining business  requirements is challenging unless a knowledgeable data steward is on board. With no clear approach, some businesses tread water in spite of mounting a formal assault. It’s important to keep in mind though that procrastinating a data quality issue can cost more resources in the long run because the time it takes staff to navigate data with inherent problems, can take a serious toll on efficiency.

4. “Nobody cares about data quality.”

Unfortunately, when it comes to advocating for data quality, there is often only one lone voice on the team, advocating for something that no one else really seems to care about. The key is to find the people that get it. They are there, the problem is they are rarely asked. They are usually in the trenches, trying to work with the data or struggling to keep up with the maintenance. They are not empowered to change any systems to resolve the data quality issues and may not even realize the extent of the issues, but they definitely care because it impacts their ability to do their job.

5. “It’s in the queue.”

Businesses may recognize the importance of data quality but just can’t think about it until after some other major implementation, such as a data migration, integration or warehousing project. It’s hard to know where data quality fits into the equation and when and how that tool should be implemented but it’s a safe bet to say that the time for data quality is before records move to a new environment. Put another way: garbage in = garbage out. Unfortunately for these companies, the unfamiliarity of a new system or process compounds the challenge of cleansing data errors that have migrated from the old system.

6. “I can’t justify the cost.”

One of the biggest challenges we hear about in our industry is the struggle to justify a data quality initiative with an ROI that is difficult to quantify. However, just because you can’t capture the cost of bad data in a single number doesn’t mean that it’s not affecting your bottom line. If you are faced with the dilemma of ‘justifying’ a major purchase but can’t find the figures to back it up, try to justify doing nothing. It may be easier to argue against sticking your head in the sand, then to fight ‘for’ the solution you know you need.

Is your company currently sticking their head in the sand when it comes to data quality? What other reasons have you heard?

Remember, bad data triumphs when good managers do nothing.

8 Ways to Save Your Data Quality Project

Let’s face it, if data quality were easy, everyone would have good data and it wouldn’t be such a hot topic. On the contrary, despite all the tools and advice out there, selecting and implementing a comprehensive data quality solution still presents some hefty challenges. So how does a newly appointed Data Steward NOT mess up the data quality project? Here are a few pointers on how to avoid failure.

1.DON’T FORGET THE LITTLE PEOPLE

As with other IT projects, the top challenge for data quality projects is securing business stakeholder engagement throughout the process. But this doesn’t just mean C-level executives. Stakeholders for a data quality initiative should also include department managers and even end-users within the company who must deal with the consequences of bad data as well as the impact of system changes. Marketing, for example, relies on data accuracy to reach the correct audience and maintain a positive image. Customer Service depends on completeness and accuracy of a record to meet their specific KPIs. Finance, logistics and even manufacturing may need to leverage the data for effective operations or even to feed future decisions. When it comes to obtaining business buy-in, it is critical for Data Stewards to think outside the box regarding how the organization uses (or could use) the data and then seek input from the relevant team members. While the instinct might be to avoid decision by committee, in the end, it’s not worth the risk of developing a solution that does not meet business expectations.

2. BEWARE OF THE “KITCHEN SINK” SOLUTION

The appeal of an ‘umbrella’ data management solution can lure both managers and IT experts, offering the ease and convenience of one-stop shopping. In fact, contact data quality can often be an add-on toolset offered by a major MDM or BI vendor – simply to check the box. However, when your main concern is contact data, be sure to measure all your options against a best-of-breed standard before deciding on a vendor. That means understanding the difference between match quality vs match quantity, determining the intrinsic value (for your organization) of integrated data quality processes and not overlooking features (or quality) that might seem like nice-to-haves now but which down the line, can make or break the success of your overall solution.  Once you know the standard you are looking for with regards to contact deduplication, address validation, and single customer view, you can effectively evaluate whether those larger-scale solutions will have the granularity needed to achieve the best possible contact data cleansing for your company. While building that broader data strategy is a worthy goal, now is the time to be conscious of not throwing the data quality out with the proverbial bathwater.

3. JUST BECAUSE YOU CAN, DOESN’T MEAN YOU SHOULD

When it comes to identifying the right contact data quality solution, most companies not only compare vendors to one another but they also consider the notion of developing a solution in-house. In fact, if you have a reasonably well-equipped IT Department (or consultant team) it is entirely possible that an in-house solution will appear cheaper to develop and there may be several factors that cause organizations to ‘lean’ in that direction including the desire to have ‘more control’ over the data or eliminate security and privacy concerns.

There is a flip side, however, to these perceived advantages, that begs to be considered before jumping in. First, ask yourself, does your team really have the knowledge AND bandwidth necessary to pull this off? Contact data cleansing is both art and science. Best-of-breed applications have been developed over years of trial and error and come with very deep knowledge bases and sophisticated match algorithms that can take a data quality project from 80% accuracy to 95% or greater accuracy. When you are dealing with millions or even billions of records, that extra percentage matters. Keep in mind that even the best-intentioned developers may be all too eager to prove they can build a data quality solution, without much thought as to whether or not they should. Even if the initial investment is less expensive than a purchased solution, how much revenue is lost (or not gained) by diverting resources to this initiative rather than to something more profitable?  In-house solutions can be viable solutions, as long as they are chosen for the right reasons and nothing is sacrificed in the long run.

4. NEVER USE SOMEONE ELSE’S YARDSTICK

Every vendor you evaluate will basically tell you to measure by the benchmarks they perform the best at. So the only way to truly make an unbiased decision is to know ALL the benchmarks and then decide for yourself which is most important to your company and don’t be fooled in the fine print. For example:

  • Number of duplicates, are often touted as a key measure of an application’s efficacy, but that figure is only valuable if they are all TRUE duplicates. Check this in an actual trial of your own data and go for the tool that delivers the greater number of TRUE duplicates while minimizing false matches.
  • Speed matters too but make sure you know the run speeds on your data and on your equipment.
  • More ‘versatile’ solutions are great, as long as your users will really be able to take advantage of all the bells and whistles.
  • Likewise, the volume of records processed should cover you for today and for what you expect to be processing in the next two to five years as this solution is not going to be something you want to implement and then change within a short time frame. Hence, scalability matters as well.

So, use your own data file, test several software options and compare the results in your own environment, with your own users. Plus remember those intangibles like how long it will take you to get it up and running, users trained, quality of reports, etc. These very targeted parameters should be the measure of success for your chosen solution – not what anyone else dictates.

5. MIND YOUR OWN BUSINESS (TEST CASES, THAT IS)

Not all matching software is created equal and the only way to effectively determine which software will address your specific needs, is to develop test cases that serve as relevant and appropriate examples of the kinds of data quality issues your organization is experiencing. These should be used as the litmus to determine which applications will best be able to resolve those examples. Be detailed in developing these test cases so you can get down to the granular features in the software which address them. Here are a few examples to consider:

  • Do you have contact records with phonetic variations in their names?
  • Are certain fields prone to missing or incorrect data?
  • Do your datasets consistently have data in the wrong fields (e.g. names in address lines, postal code in city fields, etc)?
  • Is business name matching a major priority?
  • Do customers often have multiple addresses?

Once you have identified a specific list of recurring challenges within your data, pull several real-world examples from your actual database and use them in any data sample you send to vendors for trial cleansing. When reviewing the results, make sure the solutions you are considering can find these matches on a trial. Each test case will require specific features and strengths that not all data quality software offers. Without this granular level of information about the names, addresses, emails, zip codes and phone numbers that are in your system, you will not be able to fully evaluate whether a software can resolve them or not.

6. REMEMBER IT’S NOT ALL BLACK AND WHITE

Contact data quality solutions are often presented as binary – they either find the match or they don’t. In fact, as we mentioned earlier, some vendors will tout the number of matches found as the key benchmark for efficiency. The problem with this perception is that matching is not black and white – there is always a gray area of matches that ‘might be the same, but you can’t really be sure without inspecting each match pair’ so it is important to anticipate how large your gray area will be and have a plan for addressing it. This is where the false match/true match discussion comes into play.

True matches are just what they sound like while false matches are contact records that look and sound alike to the matching engine, but are in fact, different. While it’s great when a software package can find lots of matches, the scary part is in deciding what to do with them. Do you merge and purge them all? What if they are false matches? Which one do you treat as a master record?  What info will you lose? What other consequence flowed from that incorrect decision?

The bottom line is: know how your chosen data quality vendor or solution will address the gray area. Ideally, you’ll want a solution that allows the user to set the threshold of match strictness. A mass marketing mailing may err on the side of removing records in the gray area to minimize the risk of mailing dupes whereas customer data integration may require manual review of gray records to ensure they are all correct. If a solution doesn’t mention the gray area or have a way of addressing it, that’s a red flag indicating they do not understand data quality.

7. DON’T FORGET ABOUT FORMAT

Most companies do not have the luxury of one nice, cleanly formatted database where everyone follows the rules of entry. In fact, most companies have data stored in a variety of places with incoming files muddying the waters on a daily basis. Users and customers are creative in entering information. Legacy systems often have inflexible data structures. Ultimately, every company has a variety of formatting anomalies that need to be considered when exploring data cleansing tools. To avoid finding out too late, make sure to pull together data samples from all your sources and run them during your trial. The data quality solution needs to handle data amalgamation from systems with different structures and standards. Otherwise, inconsistencies will migrate and continue to cause systemic quality problems.

8. DON’T BE SHORT-SIGHTED

Wouldn’t it be nice if once data is cleansed, the record set remains clean and static? Well, it would be nice but it wouldn’t be realistic. On the contrary, information constantly evolves, even in the most closed-loop system. Contact records represent real people with changing lives and as a result, decay by at least 4 percent per year through deaths, moves, name changes, postal address changes or even contact preference updates. Business-side changes such as acquisitions/mergers, system changes, upgrades and staff turnover also drive data decay. The post-acquisition company often faces the task of either hybridizing systems or migrating data into the chosen solution. Project teams must not only consider record integrity, but they must update business rules and filters that can affect data format and cleansing standards.

Valid data being entered into the system during the normal course of business (either by CSR reps or by customers themselves) also contributes to ongoing changes within the data. New forms and data elements may be added by marketing and will need to be accounted for in the database. Incoming lists or big data sources will muddy the water. Expansion of sales will result in new audiences and languages providing data in formats you haven’t anticipated. Remember, the only constant in data quality is change. If you begin with this assumption, you skyrocket your project’s likelihood of success. Identify the ways that your data changes over time so you can plan ahead and establish a solution or set of business processes that will scale with your business.

Data quality is hard. Unfortunately, there is no one-size fits all approach and there isn’t even a single vendor that can solve all your data quality problems. However, by being aware of some of the common pitfalls and doing a thorough and comprehensive evaluation of any vendors involved, you can get your initiative off to the right start and give yourself the best possible chances of success.

What I Learned About Data Quality From Vacation

Over the 12 hours it took us to get from NY to the beaches of North Carolina, I had plenty of time to contemplate how our vacation was going to go. I mentally planned our week out and tried to anticipate what would be the best ways for us to ‘relax’ as a family. What relaxes me – is not having to clean up.  So to facilitate this, I set about implementing a few ‘business rules’ so that we could manage our mess in real-time, which I knew deep down, would be better for everyone.  The irony of this, as it relates to my role as the Director of Marketing for a Data Quality company did not escape me but I didn’t realize there would be fodder for a blog post in here until I realized business rules actually can work. Really and truly. This is how.

1. We Never Got Too Comfortable.

We were staying in someone else’s house and it wasn’t our stuff. So it dawned on me that we take much more liberty with our own things than we apparently do with someone else’s and I believe this applies to data as well. Some departments feel like they are the ‘owners’ of specific data. I know from direct experience that marketing, in many cases, takes responsibility for customer contact data, and as a result, we often take liberties knowing ‘we’ll ‘remember what we changed’ or ‘we can always deal with it later’. The reality is, there are lots of other people who use and interact with that data and each business user would benefit from following a “Treat It Like It’s Someone Else’s” approach.

2. Remember the Buck Stops With You.

In our rental, there was no daily cleaning lady and we didn’t have the freedom of leaving it messy when we left (in just a mere 7 days). So essentially, the buck stopped with us. Imagine how much cleaner your organization’s data would be if each person who touched it took responsibility for leaving it in good condition. Business rules that communicate to each user that they will be held accountable for the integrity of each data element along with clarity on what level of maintenance is expected, can help develop this sense of responsibility.

3. Maintain a Healthy Sense of Urgency.

On vacation, we had limited time before we’d have to atone for any messy indiscretions. None of us wanted to face a huge mess at the end of the week so it made us more diligent about dealing with it on the fly. To ‘assist’ the kids with this, we literally did room checks and constantly reminded each other that we had only a few days left – if they didn’t do it now, they’d have to do it later. Likewise, if users are aware that regular data audits will be performed and that they will be the ones responsible for cleaning up the mess, the instinct to proactively manage data may be just a tad stronger.

So when it comes to vacation (and data quality), there is good reason not to put off important cleansing activities that can be made more manageable by simply doing them regularly in small batches.

When Data Quality Goes Wrong…

When Data Quality Goes Wrong…

Whether you are a data steward or not, we’ve all experienced the unfortunate consequences of data quality gone terribly awry. Multiple catalogues to the same name and address. Purchasing a product through an online retailer only to find you have three different accounts with three different user names. Long, frustrating phone calls with customer service who can’t help you because they don’t have access to all the relevant info.

As the Director of Marketing for a data quality company, it brings me exceptional pain to see bad data quality in action. Such inefficiency is what gives marketing a bad reputation. It can ruin brands, destroy customer loyalty, waste opportunities and…let’s face it, it also kill trees.

Indeed, throughout our entire company, the water cooler occasionally buzzes with stories of bad data quality. So what’s a data quality company to do with all these DQ “blunders”? Call them out!

So this summer we’re going to dig through our box of examples and showcase a few #dataqualityblunders. We’ll try to be nice about it of course but the important part is that we’ll also highlight the ways that a good data quality strategy could have addressed these indiscretions. Because where there is bad data, there is also a clean data solution.

Have a #dataqualityblunder you’re just dying to spill?

We know that you’ve seen your fair share of data quality blunders. Send them in and win a $10 Starbucks gift card! Just email [email protected]!