download

White Paper

Cross-System Data Analysis for MDM Implementations

“Cross-System Data Analysis. One speaker after another highlighted the time and costs involved in understanding and modeling source data that spans multiple, heterogeneous systems. Microsoft had 10 people working for 100 days to analyze source systems targeted for MDM integration, while the European Patent Office has 60 people analyzing and managing patent data originating around the world...”

Ten Lessons from MDM Early Adopters
A TDWI Special Report by Wayne Eckerson and Jill Dyche
April 2008

Cross System Data Analysis:
A Major Risk Factor for MDM Deployments

Cross-system source data discovery and data mapping are two of the most important, and the most overlooked, steps in a master data management (MDM) implementation. Before you can populate a new master with “trusted” data, you need to analyze the potential source data systems individually, and you must perform cross-system analysis to determine the survivorship rules that will determine which data, under which circumstances, will be used to populate the master. In addition, once the master is populated, you must map the new master to the downstream “consuming” applications – yet another level of cross-system data analysis. The bottom line is that cross-system data analysis and mapping is often the critical path for MDM projects, consuming up to 70% of the time and effort in an MDM deployment.

However, MDM software vendors expect this analysis to be performed by their customer or a system integrator since the end customer and their representatives have the greatest knowledge of the source systems. Unfortunately, most organizations tend
to underestimate the detailed level of understanding of source systems and downstream applications required to successfully implement MDM and this can lead to significant time and cost overruns for the subsequent deployment.

While software products exist today to profile, cleanse and move data, these products don’t address the discovery and ongoing audit of cross-system business rules, data maps, transformation logic and inconsistencies. All of these solutions assume the user already knows both where data is located as well as the cross-system business rules that relate the data located in various databases across the organization.

Fortunately, products like Exeros Discovery™ are opening up a new market for cross-system data analysis tools. Exeros Discovery is a cross-system data analysis workbench that automates the formerly manual process of cross-system data analysis and mapping and delivers a 10x time and risk savings for these projects. Exeros Discovery analyzes data values using heuristics and sophisticated algorithms to automate the discovery of business rules and transformations within and between any structured data sources. Exeros Discovery offers two types of analysis, source data discovery and data mapping:

Source Data Discovery: Source Data Discovery enables the analysis of up to twenty (20) data sources simultaneously. It is specifically designed to identify critical data elements, column overlaps and global identifiers across multiple sources for MDM and data consolidation projects.

Data Mapping: The data mapping analysis within Exeros Discovery enables very detailed mapping down to the column level between two structured data sources, accelerating mapping project time and reducing risk by 10x. When simple overlap analysis is not sufficient, and relationships between data sources are complex, the data mapping capability is used to automatically discover complex business rules and transformations between structure data sources containing any number of tables or files.

Cross-System Analysis: Source Data Discovery

Source data discovery is the foundation of any MDM implementation. If source data discovery is not thorough and accurate, mistakes and oversights will propagate into the MDM model and structure. It will impact the quality and accuracy of the master data, and will ultimately reduce the usability of the MDM application. The overriding reason to implement an MDM solution is to create a trusted, consistent source of master data. If the resulting MDM implementation does not meet these standards, downstream business users will not use it.

Exeros Discovery software provides a framework for source data discovery and guides the data or business analyst through a process involving several categories of data analysis and discovery. The process starts with each of the individual sources and moves, step-by-step, to discovering the relationships within and among the sources. Practitioners acknowledge that discovering implicit relationships within and across data sources is a very complex exercise. Without a structured approach, the implementation team is exposing the MDM project to the risk of failure.

Source data discovery is the process of understanding several data sets and how they relate to each other. It is extremely scalable, and can analyze up to 20 data sources with hundreds of tables and millions of rows in a very short amount of time (less than a day). It involves three categories of discovery:

  1. Baseline Analysis
  2. Composite Global Key Analysis
  3. Survivorship Hotspots

Each step is a building block for the next step. As a result, poor analysis in early steps ripples through to later stages. This ultimately produces inaccurate results that negatively impact the MDM design implementation. Because MDM deployment teams are anxious to get started with the actual deployment of the MDM software, they often begin defining the survivorship rules before source data discovery is complete. Following a tight analysis process up front keeps the team focused and on track; ensuring smoother deployment in later stages.

Step 1: Baseline Analysis

The first step is to understand the “baseline” of the cross-system overlaps for data assets that will be used as sources for the MDM implementation. It begins with basic profiling, understanding the domains, format, semantics and quality about the data
in each of the sources. This step involves simple value matching across all the columns and attributes of all the data sources under consideration. The goal of this step is to deliver statistical evidence of which data sources are central to the MDM system and which columns overlap.

It is tempting to skip this step, because organizations tend to be overconfident in their understanding of their systems and rely heavily on institutional memory through their subject matter experts (SMEs). Unfortunately, most system documentation is either
out of date or non-existent. SME knowledge of source systems is usually incomplete and limited to a single system. This situation highlights the need for a rigorous cross-system data analysis and discovery process for each source before it can be considered as a candidate to populate the master.

Deep analysis of each of the sources precedes the Data Source Baseline reports. The deep analysis goes beyond simple profiling. Exeros Discovery automates the discovery of the implicit primary-foreign key relationships, as well as Data Objects, which are logical clustering of tables that correspond to a business entity. This deep analysis ensures that the scope of the data moving into the master is complete. Exeros Discovery automates this process for the user and can analyze up to 20 data sources simultaneously.