Data Quality Tips & Tricks: Using delimiters to your advantage

January 25, 2010
By William Sharp
Data Quality Tips & Tricks: Using delimiters to your advantage

Introduction While I am doing research on my next matching algorithm post, the Jaro-Winkler algorithm, I have decided to throw together some of my favorite “lessons learned” which I have discovered during my practice with Informatica Data Quality (IDQ) Workbench.  This eclectic bunch of tricks has helped me carry out various tasks such as more comprehensive data profiling and more accurate matching.  Some of the tips are generic and apply to any data...

Read more »

Hamming Distance Matching Algorithm

January 4, 2010
By William Sharp
Hamming Distance Matching Algorithm

Among Richard Hamming's many accomplishments is the development of an algorithm to compare various types of strings of the same length to determine how different they are. Due to the requirement of equal length, the algorithm is primarily used to detect differences in numeric strings but can be used with textual data as well. Informatica has incorporated the Hamming algorithm into the data quality workbench tool in order to...

Read more »

Informatica Data Quality Workbench Matching Algorithms

December 10, 2009
By William Sharp

I’d like to begin a multi-part series of postings were I detail the various algorithms available in Informatica Data Quality (IDQ) Workbench.  In this post I’ll start by giving a quick overview of the algorithms available and some typical uses for each.  In subsequent postings I’ll get more detailed and outline the math behind the algorithm.  Finally I’d like to finish up with some baseline comparisons using a single set...

Read more »

Microsoft Dynamics CRM Duplicate Consolidation Management

December 1, 2009
By William Sharp

After receiving a comment on last month’s post I decided to do a follow-up and detail a little further how Microsoft Dynamics CRM manages the merging of duplicate records.  For the purposes of this post I’ll stick to using Contacts as the example.  However, the same is true for Accounts and many other tables.  For our sample records let’s say we have just two contacts that are duplicates.  Contact A has four...

Read more »

Removing duplicates in Microsoft Dynamics CRM

October 17, 2009
By William Sharp
Removing duplicates in Microsoft Dynamics CRM

In last month’s edition of the DQC I reviewed some data quality features built into Microsoft’s CRM package, namely detect a duplicate upon create or update, duplicate detection rules and duplicate detection jobs.  I left off with a promise to dive deeper into how you remove the duplicates once you’ve detected them.  Before I get into the details, I want to emphasize that without customization, removing duplicates is not a...

Read more »

Data Quality and Microsoft Dynamics CRM

October 2, 2009
By William Sharp
Data Quality and Microsoft Dynamics CRM

This month I'd like to talk about my recent experienecs with some of the data quality features of Microsoft's Dynamics CRM package and how to put them to use in the typical enterprise environment.

Read more »

August Edition of IAIDQ Festival del IDQ Bloggers

September 1, 2009
By William Sharp

This year the IAIDQ, an international not-for-profit dedicated to developing the profession of Information Quality Management, is 5 years old and is having a series of rolling celebrations, the Blog Carnival “Festival del IDQ Bloggers” being one of the strands of those celebrations.  I am glad to be hosting the Festival del IDQ Bloggers this month!  I’ve tried to capture the core of each message, but each of these is...

Read more »

The DQ Two Step!

August 5, 2009
By William Sharp
The DQ Two Step!

In order to positively identify a non-unique individual you need to pair their name with an additional piece of identifying information, usually an address. In other words, it is a two part matchon name and address that can, with a realtively high confidence level, identify a true duplicate. If we only used a match on name to identify duplicate, we'd consolidate all the John Smith's in the dataset...

Read more »

GUI or command line? Where to run an IDQ plan.

May 25, 2009
By William Sharp
GUI or command line? Where to run an IDQ plan.

Recently on a data quality project I stumbled across an anomoly that I thought I share with the data quality / Informatica community. It involves the use of Informatica Data Quality (IDQ) and the use of certain types of queries. With these basic switches you can deploy any IDQ plan regardless of the query required to source the data. I hope this post helps someone avoid hours...

Read more »

Begin at the end … ensuring data quality success!

May 20, 2009
By William Sharp

Due to the fact that data is there before a data quality project, and it is there after a data quality project, data quality is not as clear an impact on the business as a traditional application development project. This is particularly true of customer data management oriented data quality projects where the primary objective is to "de-dup" or consolidate the data. Afterall, in the end there...

Read more »

First Edition – An Introduction

April 18, 2009
By William Sharp

As the name implies, I want this blog to be a chronicle of data quality.  The process of data quality not just the concept.  The process from project kick-off to implmentation and each step in between.  I intend on recording my experiences on data quality initiatives in order to present a body of evidence regarding the opportunities and challenges that exist as part of data quality initiatives.   I’ll also be...

Read more »