15 Responses

  1. Per Olsson
    Per Olsson December 10, 2009 at 1:02 pm | | Reply

    This be fun to follow, thanks for an interesting post!

    1. wesharp
      wesharp December 10, 2009 at 1:45 pm | | Reply

      Per Olsson: Glad you are interested in following the series! I love the enthusiasm you express by using the word "fun"! I'll try to meet that expectation!

  2. Informatica Data Quality Workbench Matching Algorithms « Data … | Suporte de Informática

    [...] post:  Informatica Data Quality Workbench Matching Algorithms « Data … [...]

  3. Dalton Cervo
    Dalton Cervo December 10, 2009 at 6:22 pm | | Reply

    That sounds very interesting! I'll sure be following it too.

    Thanks!

    1. wesharp
      wesharp December 10, 2009 at 10:17 pm | | Reply

      Dalton,
      Glad you are interested. I'll try and keep the postings spicy! Thanks for the comment. It is energizing!

  4. Peter Jaumann
    Peter Jaumann December 16, 2009 at 10:49 pm | | Reply

    Interesting! We've implemented most of these plus more but not
    'Bigrams' yet. Will be curious to see further expos on this.
    How is validation done and waht are the results from that?
    We use decision tree (DT) validation

    1. wesharp
      wesharp December 17, 2009 at 9:37 am | | Reply

      Peter — Glad you enjoyed the post! I'll be sure to send you an alert when it is time to expand on Bigram algorithm matching and it's benefits. As for validation, are you interested in match validation or address validation? Thanks for the comment! It is always beneficial to hear from readers!

  5. Peter Jaumann
    Peter Jaumann December 17, 2009 at 10:39 pm | | Reply

    wesharp,
    I should have been more specific…..match validation/analysis on both, records that matched and records that didn't match.

    1. wesharp
      wesharp December 18, 2009 at 12:41 am | | Reply

      To the best of my knowledge there is no automated way to do this. I typically facilitate this exercise with predefined use cases of known duplicates. I load my match results into a table and use SQL to analyze their validity.

      1. vijji
        vijji February 18, 2011 at 11:42 am | | Reply

        Hi wesharp,

        Thanks for your replies.
        I am working on a IDQ plan to eliminate duplicate records coming coming from source.
        I am usingg the following components in my plan and not able to export the plan into informatica power center designer as a maplet.
        components : Group source and Group target.

        Can you please advice me on this.

        Thanks very much

        1. William Sharp
          William Sharp February 18, 2011 at 12:16 pm | | Reply

          Thanks for the cooment, vijji. I hate to answer a question with a question, but I am afraid I need some additional information before I can give you a firm answer.
          What version of IDQ and PowerCenter are you using?
          Have you tried to validate your IDQ plan?
          Are you using an IDQ mapping or a mapplet?

          Looking forward to your answers!
          Regards,
          William

  6. Sue Corwin
    Sue Corwin December 28, 2009 at 8:46 pm | | Reply

    Nice post. I don't have any experience with IDQ, but I've done quite a bit of matching work using the UTL_MATCH and Jaro-Winkler in Oracle. Very interested in learning more about the DQ tools and how they simplify this work.

  7. wesharp
    wesharp December 28, 2009 at 10:10 pm | | Reply

    Sue – Thanks for your comment. I am not familiar with the process in Oracle but I'd be happy to discuss the process using Informatica with you in depth. As we continue in this series, please feel free to ask specific questions.

  8. Information and Data Quality Blog Carnival, February 2010 « Liliendahl on Data Quality

    [...] Informatica Data Quality Workbench Matching Algorithms is part of a series of postings were William details the various algorithms available in Informatica Data Quality (IDQ) Workbench. In this post William start by giving a quick overview of the algorithms available and some typical uses for each. The subsequent postings gets more detailed and outline the math behind the algorithm and will finally be finished up with some baseline comparisons using a single set of data. [...]

  9. 2010 in review « The Data Quality Chronicle

    [...] Informatica Data Quality Workbench Matching Algorithms December 2009 12 comments 3 [...]

Leave a Reply