download

White Paper

Cross-System Data Analysis for MDM Implementations

Step 2: Composite Global Key Analysis

In order to create a consolidated data master, the MDM team needs to identify global keys that exist across data sources. The global keys will be the basis for merging and conflict resolution (survivorship). Global keys are often a composite of several unique attributes and enable “contextual matching”. The following steps help establish a global key:

  1. Test a (composite) key in each data source to determine if it is a unique identifier. Rarely are these global keys 100% unique because of dirty data. Exeros Discovery performs deep key analysis that goes beyond simple selectivity by using data previews and expressions.

  2. Discovery then performs cross-source matching on composite keys to understand the quality and strength of a global key. This determines if a composite key is fit to be used for matching across sources.

For example, to create a customer master, the MDM team would first determine a set of attributes that uniquely identify a customer. Often each system generates its own identifiers which do not apply to other sources. However, a combination of attributes, such as first name, last name, and phone number could uniquely identify a customer across sources. Once Exeros Discovery confirms this global key within and across each system, it can then determine if the same customer has other attributes, such as address or account number, that also match across those systems.

Step 3: Survivorship Hotspots

Once the team establishes the composite global key, they are then able to match records from one source to records in another source. The next question that must be answered is “Do the values for the critical attributes match across sources?” When there is a mismatch in values of critical attributes across sources, the conflict must be resolved through a “survivorship” rule. Survivorship rules determine which will be the trusted source for which attribute in the master. The goal of survivorship hotspot analysis is to identify all situations where such conflicts and mismatches occur and therefore, require survivorship rules.

These hotspots are identified through a “contextual matching” or overlap analysis of the composite global key identified in the previous analysis. The outcome of this analysis identifies places where global keys match but critical attribute values do not. Exeros Discovery generates reports on the actual records that have conflicts.

Cross-System Analysis: Data Mapping

Relationships across data sources are sometimes more complex than can be found by the source data discovery analysis discussed above. When you need to find business rules at the column level with more complex logic, a more detailed analysis called data mapping is required. Because this kind of analysis looks for patterns in the data to effectively reverse engineer business rules from the data itself, it is compute intensive and can only be used between two data sources at a time. However, note that both data sources can contain multiple tables.

Exeros Discovery has typically mapped over 1000 columns within one data source to 1000 columns in a second data source, both with a million rows of data in the mapping sample, in less than 24 hours of calculation in customer environments. So while not as scalable as source data discovery, this analysis is still extremely efficient. Exeros Discovery includes data mapping capabilities that analyze data values between two structured data sources to automatically discover the following column level business rules and transformation between those data sources:

There are several use cases for data mapping within an MDM deployment. First, data mapping is used to find very detailed column level transformations across two upstream data sources. In this way it augments upstream source data discovery, which is a many-to-many data source analysis that does not provide the same level of detail as data mapping. Second, data mapping is used to validate relationships between an upstream data source and the master data hub.

Finally, data mapping is used to determine the transformation logic by which data in the master will be mapped to downstream applications. This is particularly useful because before the master is deployed, downstream applications consume data from upstream data sources directly. After this data is placed into a master hub, the form and structure of the data will change drastically compared to the original sources. This means that for downstream applications to use the higher quality data in the master, they must be remapped. Very often companies find that downstream application owners lack the skills to remap their application to the master. At the same time, the team running the master hub is not knowledgeable about the data structure of the downstream application. This can result in a stalemate of who will step up to remap the downstream application to the master.

The automated mapping capabilities of Exeros Discovery resolve this stalemate by making it possible for the MDM Hub team to take full accountability for the success of the MDM deployment by making it possible map their master to any downstream application without knowledge of that system.

Summary

Exeros removes significant project risk for your MDM deployment by replacing manual, error prone, cross-system data analysis with software that automates and accelerates cross-system data analysis by a factor of 10x, making the data analysis portion of your MDM project both fast and accurate.

Contact Exeros

For more information on how your company can accelerate time to deployment more than 10x for data management projects, please contact us at:

Telephone: US 1 866 939 3767 (866-9EXEROS)
                UK 44 (0) 203-002-0174

www.exeros.com | sales@exeros.com