Posts

Where Big Data, Contact Data and Data Quality come together

We’ve been working in an area of untapped potential for Big Data for the last couple of years, which can best be summed up by the phrase “Contact Big Data Quality”. It doesn’t exactly roll off the tongue, so we’ll probably have to create yet another acronym, CBDQ… What do we mean by this? Well, our thought process started when we wondered exactly what people mean when they use the phrase “Big Data” and what, if anything, companies are doing in that arena. The more we looked into it, the more we concluded that although there are many different interpretations of “Big Data”, the one thing that underpins all of them is the need for new techniques to enable enhanced knowledge and decision making. I think the challenges are best summed up by the Forrester definition:

“Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers. To remember the pragmatic definition of Big Data, think SPA — the three questions of Big Data:

  • Store. Can you capture and store the data?
  • Process. Can you cleanse, enrich, and analyze the data?
  • Access. Can you retrieve, search, integrate, and visualize the data?”

http://blogs.forrester.com/mike_gualtieri/12-12-05-the_pragmatic_definition_of_big_data

As part of our research, we sponsored a study by The Information Difference (available here) which answered such questions as:

  • how many companies have actually implemented Big Data technologies, and in what areas
  • how much money  and effort are organisations investing in it
  • what areas of the business are driving investment
  • what benefits are they seeing
  • what data volumes are being handled

We concluded that plenty of technology is available to Store and Access Big Data, and many of the tools that provide Access also Analyze the data – but there is a dearth of solutions to  Cleanse and Enrich Big Data, at least in terms of contact data which is where we focus. There are two key hurdles to overcome:

  1. Understanding the contact attributes in the data i.e. being able to parse, match and link contact information. If you can do this, you can cleanse contact data (remove duplication, correct and standardize information) and enrich it by adding attributes from reference data files (e.g. voter rolls, profiling sources, business information).
  2. Being able to do this for very high volumes of data spread across multiple database platforms.

The first of these should be addressed by standard data cleansing tools, but most of these only work well on structured data, maybe even requiring data of a uniform standard – and Big Data, by definition, will contain plenty of unstructured data which is of widely varying standards and degrees of completeness. At helpIT systems, we’ve always developed software that doesn’t expect data to be well structured and doesn’t rely on data being complete before we can work with it, so we’re already in pretty good shape for clearing this hurdle – although semantic annotation of Big Data is more akin to a journey than a destination!

The second hurdle is the one that we have been focused on for the last couple of years and we believe that we’ve now got the answer – using in-memory processing for our proven parsing/matching engine, to achieve super-fast and scalable performance on data from any source. Our new product, matchIT Hub will be launching later this month, and we’re all very excited by the potential it has not just for Big Data exploitation, but also for:

  • increasing the number of matches that can safely be automated in enterprise Data Quality applications, and
  • providing matching results across the enterprise that are always available and up-to-date.

In the next post, I’ll write about the potential of in-memory matching coupled with readily available ETL tools.

The 12 Days of Shopping

According to IBM’s real-time reporting unit, Black Friday online sales were up close to 20% this year over the same period in 2012.  As for Cyber Monday, sales increased 30.3% in 2012 compared to the previous year and is expected to grow another 15% in 2013. Mobile transactions are at an all time high and combined with in store sales, The National Retail Federation expects retail sales to pass the $600 billion mark during the last two months of the year alone. While that might sound like music to a retailer’s ears, as the holiday shopping season goes into full swing on this Cyber Monday, the pressure to handle the astronomical influx of data collected at dozens of possible transaction points is mounting. From websites and storefronts to kiosks and catalogues, every scarf or video game purchased this season brings with it a variety of data points that must be appropriately stored, linked, referenced and hopefully leveraged. Add to that a blinding amount of big data now being collected (such as social media activity or mobile tracking), and it all amounts to a holiday nightmare for the IT and data analysis teams. So how much data are we talking and how does it actually manifest itself? In the spirit of keeping things light, we offer you, The 12 Days of Shopping…

On the first day of shopping my data gave to me,
1 million duplicate names.

On the second day of shopping my data gave to me,
2 million transactions, and
1 million duplicate names.

On the third day of shopping my data gave to me,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the fourth day of shopping my data gave to me,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the fifth day of shopping my data gave to me,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the sixth day of shopping my data gave to me,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the seventh day of shopping my data gave to me,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the eighth day of shopping my data gave to me,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the ninth day of shopping my data gave to me,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the tenth day of shopping my data gave to me,
10,000 tweets,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the eleventh day of shopping my data gave to me,
11 new campaigns,
10,000 tweets,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

On the twelfth day of shopping my data gave to me,
12 fraud alerts,
11 new campaigns,
10,000 tweets,
90,000 emails,
8,000 new logins,
7,000 refunds,
6,000 bad addresses,
5 new marketing lists,
40 returned shipments,
30,000 credit apps,
2 million transactions, and
1 million duplicate names.

While we joke about the enormity of it all, if you are a retailer stumbling under the weight of all this data, there is hope and over the next few weeks we’ll dive a bit deeper into these figures to showcase how you can get control of the incoming data and most importantly, leverage it in a meaningful way.

Sources:
http://techcrunch.com/2013/11/29/black-friday-online-sales-up-7-percent-mobile-is-37-percent-of-all-traffic-and-21-5-percent-of-all-purchases/

http://www.pfsweb.com/blog/cyber-monday-2012-the-results/

http://www.foxnews.com/us/2013/11/29/retailers-usher-in-holiday-shopping-season-as-black-friday-morphs-into/