Data Quality Poll: Data Profiling and Data Migration

May 13, 2011
By William Sharp

I’m interested to hear the thoughts of my fellow data quality practitioners about the role of data quality, more specifically data profiling, in the data migration process.

Vote, leave a comment, whatever … I’m looking for some consensus around the approach.

Thanks for taking the time to visit the weblog!

William Sharp

2 Responses to Data Quality Poll: Data Profiling and Data Migration

  1. Tom Moseley on September 3, 2011 at 12:35 pm

    In the past few years, I’ve been involved exclusively with DW impementations & profiling in some manner is part of all data migrations I’ve been involved with. I think that most folks these days understand that data quality is important and that Data Profiling is an excellent method for helping validate DQ.

    But, with all the focus (hype & marketing) around Data Profiling lately, I see almost no discipline around the actual practice of data profiling – beyond a few line items in a project plan. Shouldn’t it be possible on any project to say that these are the 20 or so standard ‘potential’ anomalies you always look for, here is how they affect a data migration and here is how you discover them (this will obviously differ based on the toolset)? Additionaly, here is the standard way to report the results… A real methodology maybe?…:)

    • William Sharp on September 4, 2011 at 9:25 am

      Agreed profiling for profiling’s sake is a waste of time and resources. I would think that the methodology would differ from project to project, depending on the industry. For instance, pharma data would have different metrics than finance data. However, there are standard metadata metrics that can be included in each project. Data types and lengths are prime examples.
      A methodology that can be applied to each project is to measure quality on the 6/7 dimensions: Conformity, Consistency, Completeness, Integrity, Accuracy, Duplication, and Timeliness.
      In fact, I’m headed into a major data quality implementation this coming week and this is the template for my workstream. I’ve designated around 20 line items for each dimension. While I can see some needing less, some will need more.
      Perhaps this could be good material for my next post!?!?

Leave a Reply

Your email address will not be published. Required fields are marked *