Process Centric Data Quality

When I read that even today, contact data quality issues are still costing US business billions, and industry average duplicate rates are around 1 of every 20 records for a typical database we need to accept that there has got to be a better way to resolve the issue.

In my role, I’ve had the opportunity to speak with a lot of CIO’s, DBA’s, and business stake holders describe the challenges in trying to deal with data quality issues. The one question which consistently comes up is; “how do I enforce throughout my company- policies and practices which ensure the users entering data into our database is clean and duplicate free?”

The answer starts with establishing a simple universal truth; that for any company- data quality starts at the point of capture. This is the moment that record is being entered into your database. It doesn’t matter if it is entered by someone in a call center, a sales person, account manager, billing, support or even a web generated lead or sale. This is the opportunity to get it right.

Between the CRM or ERP in any given company- nearly every employee either looks up, adds records, or modifies record details. Even the website connects to the database handling new web leads or new customer e-commerce purchases, billing and shipping details. Data providers such as JigSaw Data, InfoUSA and SalesGenie have made a lot of data readily available at little to no cost and it is being sucked into company databases. While all of this data has enormous benefits to business and profits, it creates a lot of work for IT departments trying to keep it all clean and linked to existing accounts or records. For their part, the data quality industry has been diligent in coming up with new processes and methodologies like; MDM, CDC, CDI, data stewardship, etc., which have certainly helped many companies understand and make improvements to the data quality dilemma.

If you look at the data quality industry as a whole, little has changed over the years. Backend “batch” data quality processing is still the predominant way IT departments deal with correcting poor inputs and linking duplicate records. Yes, processing has moved from the mainframe to the workstation, and costs have certainly come down to a point where it is reasonable for every company to seriously consider acquiring. But we are still using tools and building thought processes based on 1990’s technology to deal with 21st century data realities.

Admittedly, there will always be a place for backend maintenance correction and analysis. It’s fundamental. But in most cases, batch processing is performed offline, days, weeks, months, and for some companies- years after the record had been created. Speaking for helpIT systems, we have done a lot of design around simplifying the processes around complex data cleansing functions and extending it into robust and fully functional batch data cleansing tools. These tools are critical in effectively supporting the IT department in maintaining a single customer view across the enterprise.

But poor data quality is in the first instance- a process problem, not a technology problem. However, properly applied, technology can assist each user and the organization to eliminate or at least mitigate the human impact on data capture.

To quote Rob Karel, Principle Analyst with Forrester Research, “Investments in downstream batch data quality solutions help to mitigate data quality issues, but there is no replacement for capturing the right data in the first place”.

This is where our new data quality framework fits into a company’s data quality initiatives.

findIT S2 is a real-time data quality framework designed to be integrated into frontend application and reside between the data entry tasks and database. findIT S2 reports suspect duplicates to the UI and calls a postal reference database to ensure that addresses are complete, accurate and entered in fewer keystrokes. The rest of the data quality engine extrapolates further reference data, standardizes data and post the information back to the under laying database.

Essentially with findIT S2 we’ve empowered every user to be a deputized data steward and a functional part of the data quality process. Instead of the human element being the problem, the user is an integral part of the solution. Clean data is entered into the systems in real-time allowing business decisions, actions and reporting to be made with more accurate data.

Additionally, findIT S2 UI can be modified or customized, and can be linked to multiple internal or external data sources or web services. This functionality will allow findIT S2 to extend its core functionality to also provide customized data enhancement/append, immediate reporting of fraud/ red flag warning, or synchronous matching and updating between internal systems.

By applying a process centric data quality approach, you are directly impacting the amount of work necessary downstream. It’s the old adage at work; an ounce of prevention is worth a pound of cure.