Talend Posts

April 30, 2011
By William Sharp

Talend is one of the largest pure play vendors of open source software, offering a breadth of middleware solutions that address both data management and application integration needs. [1]

Talend is quite a compelling data integration tool, particularly for smaller budget organizations looking for an enterprise solution.  Talend is rated on the Gartner Magic Quadrant for data integration tools which adds to its credibility.

Talend is an open source vendor who offers various solutions including data quality, master data management and data integration. 

I want to share my experiences with Talend data integration solutions here.

Flexibility: The advantage to Talend’s Matching Techniques

One of the most interesting things about Talend’s matching technology offering is that it provides both deterministic and probabilistic options.  In my opinion, this is a unique approach that allows for flexibility in creating a match solution.  I see advantages to using these techniques in combination which could increase the number of true positives and, perhaps more importantly, decrease the number of false positives. Deterministic matching is a rules based approach that uses “fuzzy” matching algorithms which are based on the number of changes required to make two or more strings equivalent.  This technique, one of my favorites, is based on the theory that data entry can be flawed and accounts for these flaws through the use of the transposition technique. For example, deterministic matching will identify Smith and Smyth as a possible match because only one transposition is required to make Smyth equal to Smith (the exchange of an i for the y). Talend offers the following deterministic algorithms: Soundex Metaphone Levenshtein Jaro-Winkler I have covered Soundex in a previous post and while it has its applications, I personally do not recommend it for matching.  Soundex assigns a code to strings and matching is done of the codes assigned. [...]

Thanks for taking the time to visit the weblog!

William Sharp


Leave a Reply

Your email address will not be published. Required fields are marked *