mariott hotel data breach

Marriott’s 500m Record Breach – what they should have done

The massive 500m record breach of Marriott’s Starwood customer database is just the latest in a very long line of high profile, reputation-threatening data breaches.

“Marriott has not finished identifying duplicate information in the database, but believes it contains information on up to approximately 500 million guests who made a reservation at a Starwood property.”

in 2017 alone, over 40 organizations including Equifax, Verizon, eBay and Uber were in the news having suffered costly and/or embarrassing data breaches. That seemed bad enough, but according to personal information security specialist IdentityForce there have already been three times as many in 2018 including Facebook, British Airways and the US Postal Service!

The even worse news for Marriott is that (unlike the companies hacked last year) they now face potentially a billion dollar fine under GDPR (4% of worldwide revenue) if they can’t demonstrate prompt, effective action to notify the relevant data protection authorities and affected customers – that’s in addition to the probable loss of business and immediate 6% decline in its share price.

Let’s remind ourselves of the main requirements of GDPR compliance in respect of customer data:

• Keeping customer data accurate, up-to-date and secure
• Proving consent for all use of customer data
• Responding to Subject Access Requests quickly
• Processing the “Right To Be Forgotten”
• Maintaining a complete audit trail of access and updates to customer data
• Notifying affected customers promptly in case of a breach

Obviously, Marriott are in breach of the first duty. Their abilities on their other responsibilities are about to be put under intense scrutiny by authorities, courts and the media. Note that Marriott “has not finished identifying duplicate information in the database” – they are obviously finding it difficult to assess (and notify) the actual scale of the problem. It’s also likely to make it a huge challenge to respond quickly to what are likely to be large numbers of Subject Access Requests, prove consent for customers who wonder why Marriott hung on to their data, or reliably erase records for (probably) hordes who will who demand the “Right To Be Forgotten”. With the volumes of data involved, it will require highly accurate, automatable matching – for example, if Marriott remove one or two instances of a customer but other occurrences remain undetected, they will not be fulfilling the deletion request properly. The situation might then be aggravated by marketing to the undetected customer duplicates, leading to further scrutiny and potentially more fines.

Let’s think about what might have been – earlier this year, another very large international hotel group acquired a worldwide licence for our contact data matching engine. Their motivation was primarily twofold: they wanted to improve the quality of matching behind their Single Customer View using best-of-breed matching and to bring it under the control of their corporate database system. From the available cross-platform, on-premise or cloud deployments, our client chose to integrate the matching engine into their Amazon Web Services Linux platform. They recognised that using a discrete system for customer data matching which involves exporting data from one system to another, perhaps via a flat file, makes it difficult to ensure absolute security while the data is in flight – and any security system is only as good as its weakest link. Other significant benefits of integrating matching within the corporate database are the access control and the auditability that this provides.

But let’s imagine that the worst happens and despite their customer data residing only in the most secure place it can be, inside their main database, our client is also hacked. The first difference is that they will be alerted quickly by the monitoring tools within their database, so they can react fast. Next, they can use their accurate, up-to-date Single Customer View (enabled by our uniquely effective customer data matching) to check how many and which customers were affected – this means that they can notify the authorities immediately with concrete information about the hack, as well as the affected customers. Then, our client would be well placed to handle the expected surge in customer demands for information, erasure etc.

The bottom line is that any CTO, CEO and board that is not doing their utmost to keep access to high volumes of customer data secure, or not making sure that the organisation can react effectively in the event of a breach, is betting the farm on thinking that “it wouldn’t happen to us”!

NHS logo

The doctor won’t see you now – NHS data needs a health check!

On BBC Radio 4 the other day, I heard that people who have not been to see their local GP in the last 5 years could face being ‘struck-off’ from the register and denied access until they re-register – the story is also covered in most of the national press, including The Guardian. It’s an effort to save money on NHS England’s £9bn annual expenditure on GP practices, but is it the most cost-effective and patient-friendly approach for updating NHS records?

Under the contract, an NHS supplier (Capita) will write every year to all patients who have not been in to see their local doctor or practice nurse in the last five years. This is aimed at removing those who have moved away or died – every name on the register costs the NHS on average around £136 (as at 2013/14) in payments to the GP. After Capita receives the list of names from the GP practice, they’ll send out two letters, the first within ten working days and the next within six months. If they get no reply, the person will be removed from the list. Of course, as well as those who have moved away or died, this will end up removing healthy people who have not seen the GP and don’t respond to either letter. An investigation in 2013 by Pulse, the magazine for GP’s, revealed that “over half of patients removed from practice lists in trials in some areas have been forced to re-register with their practice, with GP’s often blamed for the administrative error. PCTs (Primary Care Trusts) are scrambling to hit the Government’s target of removing 2.5 million patients from practice lists, often targeting the most vulnerable patients, including those with learning disabilities, the very elderly and children.” According to Pulse, the average proportion that were forced to re-register was 9.8%.

This problem of so-called ‘ghost patients’ falsely inflating GP patient lists, and therefore practice incomes, has been an issue for NHS primary care management since at least the 1990’s, and probably long before that. What has almost certainly increased over the last twenty years is the number of temporary residents (e.g. from the rest of the EU) who are very difficult to track.

A spokesperson for the BMA on the radio was quite eloquent on why the NHS scheme was badly flawed, but had no effective answer when the interviewer asked what alternatives there were – that’s what I want to examine here, an analytical approach to a typical Data Quality challenge.

First, what do we know about the current systems? There is a single UK NHS number database, against which all GP practice database registers are automatically reconciled on a regular basis, so that transfers when people move and register with a new GP are well handled. Registered deaths, people imprisoned and those enlisting in the armed forces are also regularly reconciled. Extensive efforts are made to manage common issues such as naming conventions in different cultures, misspelling, etc. but it’s not clear how effective these are.

But if the GP databases are reconciled against the national NHS number database regularly, how is it that according to the Daily Mail “latest figures from the Health and Social Care Information Centre show there are 57.6 million patients registered with a GP in England compared to a population of 55.1 million”? There will be a small proportion of this excess due to inadequacies in matching algorithms or incorrect data being provided, but given that registering a death and registering at a new GP both require provision of the NHS number, any inadequacies here aren’t likely to cause many of the excess registrations. It seems likely that the two major causes are:

  • People who have moved out of the area and not yet registered with a new practice.
  • As mentioned above, temporary residents with NHS numbers that have left the country.

To Data Quality professionals, the obvious solution for the first cause is to use specialist list cleansing software and services to identify people who are known to have moved, using readily available data from Royal Mail, Equifax and other companies. This is how many commercial organisations keep their databases up to date and it is far more targeted than writing to every “ghost patient” at their registered address and relying on them to reply. New addresses can be provided for a large proportion of movers so their letters can be addressed accordingly – if they have moved within the local area, their address should be updated rather than the patient be removed. Using the same methods, Capita can also screen for deaths against third party deceased lists, which will probably pick up more deceased names than the NHS system – simple trials will establish what proportion of patients are tracked to a new address, have moved without the new address being known, or have died.

Next, Capita could target the other category, the potential temporary residents from abroad, by writing to adults whose NHS number was issued in the last (say) 10 years.

The remainder of the list can be further segmented, using the targeted approach that the NHS already uses for screening or immunisation requests: for example, elderly people may have gone to live with other family members or moved into a care home, and young people may be registered at university or be sharing accommodation with friends – letters and other communications can be tailored accordingly to solicit the best response.

What remains after sending targeted letters in each category above probably represents people in a demographic that should still be registered with the practice. Further trials would establish the best approach (in terms of cost and accuracy) for this group: maybe it is cost-effective to write to them and remove non-responders, but if this resulted in only removing a small number, some of these wrongly, maybe it is not worth mailing them.

The bottom line is that well-established Data Quality practices of automatic suppression and change of address, allied with smart targeting, can reduce the costs of the exercise and will make sure that the NHS doesn’t penalise healthy people simply for… being healthy!

Weighing up the Cost of Bad Data image

Weighing up the Cost of Bad Data

In a recent survey conducted by helpIT systems, almost 25 percent of respondents cited finances as the biggest hindrance to maintaining superior contact databases.  We get it.  Data quality solutions can carry what may seem to be a hefty pricetag, and they won’t show up two days later in a nicely wrapped package like an Amazon Prime purchase.  As such, like any other expensive and complicated decision, data quality may well get pushed to the bottom of the pile.

Then again, just like going to the gym or eating salad instead of steak, the toughest behaviors to adapt are usually the most beneficial.  Because even though database management may be something we’d rather forget about, 40 percent of those same respondents stated that their companies were losing tens of thousands of dollars each year due to poor contact data quality.  So while the solution may not be cheap and easy, the cost of living without it does not appear to be either.  Data Warehousing Institute found that the cost of bad data to US businesses is more than $600 billion each year.  Is that a number your company can afford to ignore?

Many businesses do notice these dollars disappearing and choose to do something about it.  Unfortunately however, this is often simply a “quick fix”.  They look at their messy databases, pay someone to “clean them up”, and then everyone gets a pat on the back for a job well done.  And it is.  Until someone enters a new record in the CRM, a customer moves, or perhaps even dares to get a new phone number.  And I will shock everyone by reporting that this happens all the time.  Studies indicate up to a 2 percent degradation each month…even in a perfect database.

Right now you’re probably picking up on the fact that maintaining good data is going to cost money.  You’re right.  But the fact is, avoiding that cost is only going to cost more in the long run.  Just like having a well-trained sales team, a finely-targeted marketing plan, or a boss with years of experience…great results are an investment of time and resources rather than a happy accident.

Companies that choose to invest in good data quality, as well as to view it as an ongoing process rather than a simple one-time fix, are finding that the benefits by far outweigh the initial costs.  Advertising dollars are reaching their intended audiences and sales calls are reaching the right recipient, with customer satisfaction going through the roof.  Today’s consumer expects the personal touches that can only come from having an accurate and up-to-date Single Customer View, and it is good data quality solutions that will achieve them.

bank signs

Why Customers Must Be More Than Numbers

I read with some amazement a story in the London Daily Telegraph this week about a customer of NatWest Bank who sent £11,200 last month via online banking to an unknown company instead of his wife. Although Paul Sampson had correctly entered his wife’s name, sort code and account number when he first made an online payment to her HSBC account, he wasn’t aware that she had subsequently closed the account.

Mr Sampson thought he was transferring £11,200 to his wife: he clicked Margaret’s name among a list of payees saved in his NatWest banking profile and confirmed the transaction, but the payment went to a business in Leeds. Mr Sampson believes that HSBC had reissued his wife’s old account number to someone else, a company whose name they refused to tell him. NatWest told Mr Sampson it was powerless to claw the money back.

HSBC said it had contacted its customer, but it had no obligation regarding the money. HSBC insisted that the account number in question was not “recycled”, saying Mr Sampson must have made a typing error when he first saved the details, which he disputes. Although the money was in fact returned after the newspaper contacted HSBC, a very large issue has not been resolved.

Although news to most of us, it is apparently a common practice among banks in the UK to recycle account numbers, presumably because banking systems are so entrenched around 8 or 9 digit account numbers that they are concerned about running out of numbers. Apparently a recent code of practice suggests that banks should warn the customer making the payment if they haven’t sent money to this payee for 13 months, but according to the Daily Telegraph “No major high street bank could confirm that it followed this part of the code”.

The Daily Telegraph goes on to state that the recipients of electronic payments are identified by account numbers only. The names are not checked in the process, so even if they do not match, the transaction can proceed. “This is now a major issue when you can use something as basic as a mobile phone number to transfer money,” said Mike Pemberton, of solicitors Stephensons. “If you get one digit wrong there’s no other backup check, like a person’s name – once it’s gone it’s gone.” If you misdirect an online payment, your bank should contact the other bank within two working days of your having informed them of the error, but they have no legal obligation to help.

Mr Sampson obviously expected that the bank’s software would check that the account number belonged to the account name he had stored in his online payee list, but apparently UK banking software doesn’t do this. Why on earth not? Surely it’s not unreasonable for banks with all the money they spend on computer systems to perform this safety check? It’s not good enough to point to the problems that can arise when a name is entered in different ways such as Sheila Jones, Mrs S Jones, Sheila M Jones, SM Jones, Mrs S M Jones, Mrs Sheila Mary Jones etc.

These are all elementary examples for intelligent name matching software.  More challenging are typos, nicknames and other inconsistencies such as those caused by poor handwriting, which would all occur regularly should banks check the name belonging to the account number. But software such as matchIT Hub is easily available to cope with these challenges too, as well as the even more challenging job of matching joint names and business names.

There are also issues in the USA with banking software matching names – I remember when I first wanted to transfer money from my Chase account to my Citibank account, I could only do so if the two accounts had exactly the same name – these were joint accounts and the names had to match exactly letter for letter, so I had to either change the name on one of the accounts or open a new one! Having been an enthusiastic user of the system in the USA for sending money to someone electronically using just their email address, I’m now starting to worry about the wisdom of this…

We banking customers should perhaps question our banks more closely about the checks that they employ when we make online payments!

Where Big Data, Contact Data and Data Quality come together

We’ve been working in an area of untapped potential for Big Data for the last couple of years, which can best be summed up by the phrase “Contact Big Data Quality”. It doesn’t exactly roll off the tongue, so we’ll probably have to create yet another acronym, CBDQ… What do we mean by this? Well, our thought process started when we wondered exactly what people mean when they use the phrase “Big Data” and what, if anything, companies are doing in that arena. The more we looked into it, the more we concluded that although there are many different interpretations of “Big Data”, the one thing that underpins all of them is the need for new techniques to enable enhanced knowledge and decision making. I think the challenges are best summed up by the Forrester definition:

“Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers. To remember the pragmatic definition of Big Data, think SPA — the three questions of Big Data:

  • Store. Can you capture and store the data?
  • Process. Can you cleanse, enrich, and analyze the data?
  • Access. Can you retrieve, search, integrate, and visualize the data?”

As part of our research, we sponsored a study by The Information Difference (available here) which answered such questions as:

  • how many companies have actually implemented Big Data technologies, and in what areas
  • how much money  and effort are organisations investing in it
  • what areas of the business are driving investment
  • what benefits are they seeing
  • what data volumes are being handled

We concluded that plenty of technology is available to Store and Access Big Data, and many of the tools that provide Access also Analyze the data – but there is a dearth of solutions to  Cleanse and Enrich Big Data, at least in terms of contact data which is where we focus. There are two key hurdles to overcome:

  1. Understanding the contact attributes in the data i.e. being able to parse, match and link contact information. If you can do this, you can cleanse contact data (remove duplication, correct and standardize information) and enrich it by adding attributes from reference data files (e.g. voter rolls, profiling sources, business information).
  2. Being able to do this for very high volumes of data spread across multiple database platforms.

The first of these should be addressed by standard data cleansing tools, but most of these only work well on structured data, maybe even requiring data of a uniform standard – and Big Data, by definition, will contain plenty of unstructured data which is of widely varying standards and degrees of completeness. At helpIT systems, we’ve always developed software that doesn’t expect data to be well structured and doesn’t rely on data being complete before we can work with it, so we’re already in pretty good shape for clearing this hurdle – although semantic annotation of Big Data is more akin to a journey than a destination!

The second hurdle is the one that we have been focused on for the last couple of years and we believe that we’ve now got the answer – using in-memory processing for our proven parsing/matching engine, to achieve super-fast and scalable performance on data from any source. Our new product, matchIT Hub will be launching later this month, and we’re all very excited by the potential it has not just for Big Data exploitation, but also for:

  • increasing the number of matches that can safely be automated in enterprise Data Quality applications, and
  • providing matching results across the enterprise that are always available and up-to-date.

In the next post, I’ll write about the potential of in-memory matching coupled with readily available ETL tools.