Latest Story

ABC and DQ: Codependent Initiatives?

August 26, 2011
By William Sharp
ABC and DQ: Codependent Initiatives?

Activity Based Cost and Data Quality: Codependent Initiatives? Summary Activity Based Costing, or ABC, is an exercise where costs are assigned to business activities required to support critical business operations. While it is often used in support of a business process redesign (BPR) effort, it can also serve an important role in data quality (DQ) initiatives. In order to conduct a data quality initiative, a significant investment is required. ...

Read more »

Data Migration Best Practice: Orphan Analysis

July 3, 2011
By William Sharp
Data Migration Best Practice: Orphan Analysis

What’s an Orphan? An orphan transaction is a transaction in a “child” table without an associated transaction in a “parent” table.  For instance, an address record in the Address table without a link to a customer record in the Customer table.  Orphaned records lead to various issues in business operations like marketing and business analytics. Challenges orphans pose to the business An address without an associated customer record will pose...

Read more »

The role of data quality in ETL design: DQETL

June 13, 2011
By William Sharp

Introduction Data integration is nothing new.  Since the concept of data warehousing, data integration has been a major initiative for most large organizations.  On the most common obstacles of integrating data into a warehouse has been the fact that assumptions about the state of the source data have been either false or flawed at best.  One of the reasons for this is that very little investigation, or data profiling,...

Read more »

The Seven Habits of Highly Effective Data Quality

May 20, 2011
By William Sharp

7 Habits of Highly Effective Data Quality I’ve been reading Stephen Covey’s The 7 Habits of Highly Effective People and I couldn’t help but notice the parallels between effective people and effective data management.  In the book Covey discloses that there are principles, centered on self-discipline, that lead to success and fulfillment.  Sounds great, right? The seven habits include some ear-cringing buzz words, but let’s take a look at them...

Read more »

Data Quality Poll: Data Profiling and Data Migration

May 13, 2011
By William Sharp

I’m interested to hear the thoughts of my fellow data quality practitioners about the role of data quality, more specifically data profiling, in the data migration process. Vote, leave a comment, whatever … I’m looking for some consensus around the approach. Thanks for taking the time to visit the weblog!William Sharpsharp@thedataqualitychronicle.org

Read more »

Flexibility: The advantage to Talend’s Matching Techniques

May 1, 2011
By William Sharp

One of the most interesting things about Talend’s matching technology offering is that it provides both deterministic and probabilistic options.  In my opinion, this is a unique approach that allows for flexibility in creating a match solution.  I see advantages to using these techniques in combination which could increase the number of true positives and, perhaps more importantly, decrease the number of false positives. Deterministic matching is a rules...

Read more »

Data Profiling & Scorecarding with Informatica Data Quality

April 30, 2011
By William Sharp

In my opinion, profiling and scoring data is a fundamental part of a sound data quality assessment.  I routinely use these processes to build my “current state” report for clients.  I recently used Informatica’s Data Quality developer and analyst tools to put together such a package.  I am of the opinion that these tools represent the “best in breed” available to do so.  The learning curve is not steep,...

Read more »

Soundex for String Matching

April 30, 2011
By William Sharp
Soundex for String Matching

Soundex is a useful function for performing data matching While you can use a Soundex function in the process of identifying potential duplicate strings, I don’t recommend it.  Here’s why … The algorithm encodes consonants Vowels will not be encoded unless it is the first letter Consonants to the right of a vowel are not coded Similar sounding consonants share the same digit C,G,J,K,Q,S,X,Z are all encoded with the same...

Read more »

Data Quality: where does it belong?

April 27, 2011
By William Sharp

  Data Quality is not a technology issue, it’s a business issue Here is my opinion on why people think it is about technology. Business initiatives like MDM/BI/DQ and the like are being presented, sold on, and driven by technology experts. Information technology has carried business forward to the point where we are the chauffeurs for change and progress. Without the ability to integrate new technologies into a business,...

Read more »

Data Quality: to whom does it belong?

April 27, 2011
By William Sharp

 How should data ownership be addressed? In my opinion a governance committee is the best option.  There should be at least one, probably two representatives from the business, from technology and from budgeting.  I’d suggested budgeting be the head of the committee so that solid cost-based decisions can be made.  Business and technology can present their case for why money should/should not get spent on a data management issue. This...

Read more »

Data Cleansing every quarter?

April 24, 2011
By William Sharp

@jschwa1 Data cleansing every 3 months? http://ow.ly/1i0vd - Someones not addressing the right problem! This is a clip from a recent tweet from Julian Schwarzenbach of Data and Process Advantage Limited (DPA).  My response to his tweet was “ I can see validity esp. if the data is from external sources like customers”.  I can see where Julian and others might see quarterly cleansing as a lack of attention to the...

Read more »