• Informatica Posts

    Informatica Data Quality Posts

    I have used Informatica’s data quality for years now and initially started this blog to document the things I learned on projects.  Now, after years of implementing data quality solutions, this page is where I will compile on my Informatica related posts.


    Data Migration Best Practice: Orphan Analysis

    What’s an Orphan? An orphan transaction is a transaction in a “child” table without an associated transaction in a “parent” table.  For instance, an address record in the Address table without a link to a customer record in the Customer table.  Orphaned records lead to various issues in business operations like marketing and business analytics. Challenges [...]

    The role of data quality in ETL design: DQETL

    Introduction Data integration is nothing new.  Since the concept of data warehousing, data integration has been a major initiative for most large organizations.  On the most common obstacles of integrating data into a warehouse has been the fact that assumptions about the state of the source data have been either false or flawed at best.  [...]

    It’s a date!

    I’ve started using the date related functions in the data quality developer tool. I’ve found some fun ways to implement them and wanted to share. Is_Date Before you use any date function you need to be sure you’re dealing with a date string. The Is_Date function, available in the Expression transform, is how you test [...]

    Master Data Management: Address Validation Series: Address Validation

    Why? There are plenty of aspects of address validation to write about.  Validating addresses can be done with many different tools, each with their own specific details on how to do it.  There are various ways to validate address within each tool to produce different outcomes.  And there are various ways to manage and integrate [...]

    Master Data Management: Address Validation Series

    Why? Address information, in particular customer address information, is a core asset of any business.  It plays a pivotal role in two fundamental business operations; revenue assurance and revenue generation. Without valid, deliverable customer address information collecting payment for services or products is often a process that, at best, requires repetitive efforts that cost the business [...]

    Informatica v9 shifts and how to manage them: Source to Target

    Introduction I’ve recently made the shift from Informatica Data Quality / Data Explorer version 8.x to Informatica v9.  In the process I have discovered quite a few shifts in how certain tasks are performed.  Notice I called them shifts.  I did so because after learning them, I didn’t feel as though there were fundamental changes. A prime [...]

    On Cloud 9!

    I’ve been in the clouds lately, in more ways than one. I’ve been on the road performing another data quality assessment on an island in the Pacific. This translates into the fact that I’m gaining status on multiple airlines and becoming increasingly appreciative of noise canceling technologies.

    I’m also gaining an appreciation for another technology, cloud based data quality solutions! I am leveraging Informatica’s latest data quality platform, IDQ v9. IDQ v9 brings to mind a favorite 80′s commercial of mine where peanut butter and chocolate are combined into one tasty treat! For sure there is a little PowerCenter in your Data Quality and a little Data Quality in your PowerCenter …

    Data Quality Tips & Tricks: Using delimiters to your advantage

    Introduction While I am doing research on my next matching algorithm post, the Jaro-Winkler algorithm, I have decided to throw together some of my favorite “lessons learned” which I have discovered during my practice with Informatica Data Quality (IDQ) Workbench.  This eclectic bunch of tricks has helped me carry out various tasks such as more comprehensive data profiling and [...]

    Hamming Distance Matching Algorithm

    Among Richard Hamming’s many accomplishments is the development of an algorithm to compare various types of strings of the same length to determine how different they are. Due to the requirement of equal length, the algorithm is primarily used to detect differences in numeric strings but can be used with textual data as well.

    Informatica has incorporated the Hamming algorithm into the data quality workbench tool in order to produce a match score. The Hamming component requires the selection of at least two inputs, it can be configured to handle data with nulls and will output a match score. In IDQ a Hamming match score of one (1) indicates a perfect match while a Hamming match score of zero (0) indicates that there was no correlation between the two values being analyzed.

    I’ve used the Hamming component in IDQ to analyze match possibilities in telephone numbers and postal codes. I’ve found it to be reliable in detecting true positive matches and sensitive enough to detect even slight differences (as indicated in the sample data above). I hope this review will help those of you interested in using the Hamming component in IDQ or those just interested in developing knowledge of the algorithm.

    Informatica Data Quality Workbench Matching Algorithms

    I’d like to begin a multi-part series of postings were I detail the various algorithms available in Informatica Data Quality (IDQ) Workbench.  In this post I’ll start by giving a quick overview of the algorithms available and some typical uses for each.  In subsequent postings I’ll get more detailed and outline the math behind the algorithm.  [...]

    GUI or command line? Where to run an IDQ plan.

    Recently on a data quality project I stumbled across an anomoly that I thought I share with the data quality / Informatica community. It involves the use of Informatica Data Quality (IDQ) and the use of certain types of queries.
    With these basic switches you can deploy any IDQ plan regardless of the query required to source the data. I hope this post helps someone avoid hours of debugging!

    4 Responses to Informatica Posts

    1. Pingback: Tweets that mention Informatica Data Quality « The Data Quality Chronicle -- Topsy.com

    2. February 7, 2011 at 10:25 am

      Good post William,
      I, for one, am very interested in seeing the Techie Tidbits for how Informatica provides data quality functionality. Please keep them coming. It will make a good repository of knowledge for when people Google topics like "informatica Data Quality confirm date". Hopefully you have other "bits in hand".
      Cheers, Gordon

    3. February 8, 2011 at 10:41 am

      Thanks, Gordon. More tidbits will be added as I get time to write them up. Subscribe and have them delivered to you! :) Thanks again for stopping by and commenting. Regarding "confirm date", are you referring to confirm that a string is indeed a date?

      • February 8, 2011 at 11:00 am

        Hi William, Sorry for my obfuscating verbiage, the "confirm date" reference was meant as an example query that might be made of Google. I thought your post described the function Is_Date very well, and that your blog post itself will become the subject of searches in the future. Good luck on building that subject repository!

    Leave a Reply

    Your email address will not be published.