A 30,000 View on Data- related problems that businesses face
1. Abundant data (Every big business has petabytes of information)
2. Multiple Sources (There are 7-10 different sources on an average)
3. Poor Data Quality (According to Data Warehousing Institute poor data quality costs $600 billion annually for US firms)
4. No Standard format (Every transactional system follows their own standards)
5. Missing Data (Most systems, especially the online systems don’t capture required information useful for Analysis)
6. No Single version of the truth (Every transaction system stores data in their own versions).
Let’s look at each of these issues in greater detail:
A decade earlier, data in GB (Gigabytes) seemed like a mountain, but today we get USB drives which possess the capacity to store this data. Data evolution has progressed from Gigabytes to Terabytes (1000 GB) and to Petabytes (1000 TB) today. The amount of data is doubling every 18 months (Moore’s Law). Hence the importance of having a Data Management Strategy in place is increasing by the day. Lots of rich information required for business decision making is hidden in this vast amount of data. The key requirement is to organize this in a particular place and then to appropriately mine it using Data Mining techniques to unearth hidden insights.
Data sources can be as varied as transaction data, online apps, Call Centre data, Survey data, Campaign data, Complaints, Software Downloads, Reply Cards, Online Store purchases, Registration data and Response data. Each of the sources captures key information which forms a particular piece of the puzzle. When the multiple pieces from each of the sources is integrated, it provides the answer to the bigger puzzle. The problem is that each of the sources have their own database and store data in their own formats
“Garbage in, Garbage out” goes the saying and this is true for the data driven world too. If poor data is fed into the system, the resulting analysis based on this data will also be poor, no matter what kind of advanced analytics tools and techniques we use. When the data is captured in the source system, we need to put in place the appropriate checks and procedures to reject or correct the wrong data.
Multiple transaction systems such as CRM, ERP etc., capture data in different formats and there is no set standard across this. The one that stands out particularly is contact/address data where there is a total mismatch in address elements. In the West, many address correction softwares certified by the USPS and eligible for postal discounts, are used to standardise addresses before mailing. The result is more hygienic data in the West.
How does data get missing? Let’s take an example. A new user enters the store/online store and signs up for the registration process. He starts filling the fields which are marked mandatory and leaves out rest of the fields. In this case if only fields like email id and mobile number are mandatory and not the address, date of birth, sex then these fields will be blank. During analysis, we require these fields to come out with customer insight but those are missing now. It is therefore important to plan data capture devices with an eye on future requirements.
There are ways to overcome this problem by using some imputation techniques and so on, but it won’t be as effective as the original data fed directly by the customer himself.
This is a reality – a top business problem. If a business wants to know details about a customer (say his name, address details etc.,) there will be different answers flowing from multiple systems each giving their own version of the truth.
This is because the information resides in Silos and is not integrated across an enterprise. How will you segment and target your customers if you don’t even know who they are? This problem impedes any analytical work and needs to be addressed before any target marketing (campaign) is carried out.
We know the issues. What next?
The solution lies in going for the Enterprise Data Warehouse (EDW) using Data Integration, Data Quality tools available in the market. The option is to write customized scripts and codes to implement the Data Integration solution if the required expertise is present with the company.
This “Make or buy” decision needs to be made by the management after discussing with the technical personnel as the licensing costs for some of these tools available in market is huge.
Going for freeware is not a good option as there will not be any support available in case of problems. What’s more, there is no guarantee for getting the right results.
Once the Integrated DW is built, the source systems also need to be updated accordingly with the standardized data.
The EDW will now act as the hub for any querying and reporting kind of activities
But who owns the entire process and data?
This is an industry wide problem - the client feels that the vendor should own the solution and vice versa. Here comes the need for Data Governance mechanisms and an evaluation of the Master Data Management (MDM) framework.
This framework will carefully outline the entire strategy, processes, tools and techniques for managing the Master data. Here Master data means the data which has the single view of the truth and represents the final, integrated and cleaned-up data.
While defining the Master data, several aspects like complexity, volatility, source of the data, how it will be used across the enterprise and its value to the business, is considered.
Hence the Management of this Master data which is critical for the success of the business objective cannot be taken for granted. The MDM has become more of a business driven outcome than a technical one.
Added to this, the Complexity in managing today’s business with situations like Mergers and Acquisitions, Regulatory Compliances, Technological advances like Service Oriented Architectures (SOA), different business model like Software As a Service (SAAS) etc., puts greater emphasis on managing the Master data as that forms the Bedrock for success in any of the above business scenarios.
Two of the most common techniques considered for this - Customer Data Integration (CDI) responsible for the creation of Customer data with a single, unified and consistent view and Product Information Management (PIM) for handling product information.
The Data owners will monitor the updates to this data on a frequent basis and also follow the checks to ensure that none of the data from the source system is missed and unutilized. Also the system needs to be updated based on specific Change Data Capture (CDC) mechanisms.
Figure 1: Enterprise Data Warehouse – Master Data Management Architecture
POS – Point of Sale
CDI – Customer Data Integration
PIM – Product Information Management
EDW – Enterprise Data Warehouse
MDM – Master Data Management
Note: The Data sources used in this diagram is for illustrative purposes and doesn’t cover the entire gamut of the available sources.
How does this impact business?
1. We can expect better campaign response rates as Segmentation and Targeting of the customers will be unbiased based on a perfect 360 degree view of the customer.
2. We can reduce mailing costs in case of Direct Mailers (DM) by ensuring the delivery to the right addresses. We avoid legal hurdles by not sending unwanted information to customers who have already unsubscribed or are in the DND category
3. Organisation business decisions will be based on reliable data and scientific analysis of the same.
4. Analysis will be more complete as we have all the required data and the confidence level on the outcome will be the highest be it classification or regression or multivariate analysis.
5. And Yes, Implementing Loyalty programs will be a smooth ride without worrying too much about the data.
The data problem in today’s world is prevalent in every business and industry. This needs to be addressed immediately, if firms want to compete in today’s market where cut-throat competition exists with multiple offers chasing very few prospects. The competitive advantage which a company has lies in its own data and they should be able to treat it like a Goldmine of information. This definitely requires the infrastructure like EDW and framework such as MDM as the complexity grows exponentially across different dimensions.