<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Data Quality Chronicle</title>
	<atom:link href="http://thedataqualitychronicle.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://thedataqualitychronicle.org</link>
	<description>Stuff I have learned, read, or think about ...</description>
	<lastBuildDate>Fri, 27 Apr 2012 19:36:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Model Citizen: Should Data Discovery Tools include modeling functionality?</title>
		<link>http://thedataqualitychronicle.org/model-citizen-should-data-discovery-tools-include-modeling-functionality/</link>
		<comments>http://thedataqualitychronicle.org/model-citizen-should-data-discovery-tools-include-modeling-functionality/#comments</comments>
		<pubDate>Fri, 13 Apr 2012 16:18:28 +0000</pubDate>
		<dc:creator>William Sharp</dc:creator>
				<category><![CDATA[data discovery]]></category>
		<category><![CDATA[data modeling]]></category>

		<guid isPermaLink="false">http://thedataqualitychronicle.org/?p=2194</guid>
		<description><![CDATA[Data models: Data Management 101 In my years as a consultant implementing data management solutions, my first question to a client would be … Can I see the data model? I have long felt that gaining a better understanding of an organization’s data landscape involves to primary artifacts; a data model and a data profile.  [...]]]></description>
			<content:encoded><![CDATA[<h2>Data models: Data Management 101</h2>
<p>In my years as a consultant implementing data management solutions, my first question to a client would be …</p>
<blockquote><p>Can I see the data model?</p></blockquote>
<p>I have long felt that gaining a better understanding of an organization’s data landscape involves to primary artifacts; a data model and a data profile.  This is because, in a lot of cases, these two artifacts represent the two most fundamental states of an organizational data landscape:</p>
<blockquote><p>what the data should look like and what it actually looks like</p></blockquote>
<p>With knowledge of these two states, I felt armed with the ability to quickly and easily identify areas of conformity and areas of anomaly.  These two perspectives tended to be the basis of most Information Technology questions I was there to solve.  In this way, I looked at a data model as the starter kit to a data management strategy, or a 101 crash course to an organization’s data management state.</p>
<p>If there were a lot of anomalies, I knew they would require a lot of data quality strategy and remediation, as well as a robust data governance initiative.  If there was a lot of conformity, I knew the organization was mature enough to handle new data management initiatives like Master Data Management or Big Data implementation.</p>
<p>The sad reality is that most organizations either did not have data models for critical applications or felt that the data model was so out-of-date that it was not going to be very helpful to me in my quest for understanding.</p>
<h2>Lack of Data Models: Data Management 100</h2>
<p>Without a viable data model I was unable to reach these valuable conclusions quickly and was forced to, in a sense, reverse engineer profiling results which was time-consuming and based on some brash assumptions.</p>
<p>Here are some activities which helped me to mitigate the risks of performing educated guessing:</p>
<ol>
<li><span style="font-family: Arial;">Perform Orphan Analysis</span></li>
<ol>
<li><span style="font-family: Arial;">Analyzing orphans can help you determine the validity of a data model, how users are adding or deleting data and whether referential integrity constraints are even in place in production (which happens more than most will admit during interviews)</span></li>
</ol>
<li><span style="font-family: Arial;">Analyze Documented versus Actual Data Types</span></li>
<ol>
<li><span style="font-family: Arial;">Again, this addresses the validity of the design and how users are entering data (very often data is entered in formatted form, entry fields are used for purposes other than the original intention and developers build architectures without really understanding the scenarios that the app is required to support)</span></li>
</ol>
<li><span style="font-family: Arial;">Analyze most and least commonly occurring values</span></li>
<ol>
<li><span style="font-family: Arial;">This can help create a profile of how often standards are conformed to, word-of-mouth work-arounds in place and areas that are conforming and do not need attention (as valuable as identifying areas that <em>do</em> need attention)</span></li>
</ol>
</ol>
<h2>Data Modeling Profiles: Data Management 102?</h2>
<p>Having been through this many times and knowing how much time and effort this requires (often not accounted for in project plans), I feel strongly about developing a tool that can turn data profiles into a data model.  Most of the functionality is there already, someone just needs to make a case for it (just call me somebody).</p>
<p>Such a solution could take profile results, which include actual data types and inferred relationships, and create a data model that supports data management best practices like data governance and data quality.</p>
<p>In addition, a profiling-to-model function could go a long way in reducing the amount of time and error involved in building an MDM hub.  After all, profiling all the contributing sources is one the best practices in defining an MDM hub, why not take it the next step and bake that in?</p>
<p>I completely agree that there are going to be cases where there were design considerations based on performance and that a profile is not <em>always</em> the most accurate source for a model’s design, but there are many cases where a profile-to-model function would increase accuracy and performance and decrease error and time required to model data landscapes.</p>
<p>What do you think?</p>
<p><a href="http://polldaddy.com/poll/6135039">Take Our Poll</a><br />
</p>

]]></content:encoded>
			<wfw:commentRss>http://thedataqualitychronicle.org/model-citizen-should-data-discovery-tools-include-modeling-functionality/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Discovery: a path to better ETL development</title>
		<link>http://thedataqualitychronicle.org/data-discovery-a-path-to-better-etl-development/</link>
		<comments>http://thedataqualitychronicle.org/data-discovery-a-path-to-better-etl-development/#comments</comments>
		<pubDate>Mon, 09 Apr 2012 16:25:12 +0000</pubDate>
		<dc:creator>William Sharp</dc:creator>
				<category><![CDATA[data discovery]]></category>
		<category><![CDATA[data profiling]]></category>
		<category><![CDATA[ETL]]></category>

		<guid isPermaLink="false">http://thedataqualitychronicle.org/?p=2166</guid>
		<description><![CDATA[In my last post I made the statement that one of the uses for data discovery was to produce better ETL design.  I wanted to backup that statement with a follow up post on why I feel this way, some supporting research and how to go about achieving this enhanced design. Why data discovery leads [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post I made the statement that one of the uses for data discovery was to produce better ETL design.  I wanted to backup that statement with a follow up post on why I feel this way, some supporting research and how to go about achieving this enhanced design.</p>
<h1>Why data discovery leads to better ETL design</h1>
<p>Let’s start with why I feel this way.  Before I’d even heard of data quality I was doing it on a daily basis.  You see I spent several years as an ETL developer on many data warehousing implementation projects.</p>
<p>Typically after a couple of briefing meetings, I’d start developing ETL mappings.  Like any development effort that was followed by some unit testing where I would discover that although my ETL was written to specifications, the load didn’t “look right”.</p>
<p>After some digging I usually found the culprit was the fact that the source data did not match the expectations I had going into the development effort and it was time to, at the very least, add some transforms to the mapping to accommodate for the discrepancies.  In effect, I was performing two critical functions left out of the original development plan, data profiling and enhancement.  I feel strongly that had these two processes not been left out, I would have had a more complete and accurate ETL development experience from the get-go.</p>
<p>Unfortunately this was not an isolated event and, in fact, happened on almost every ETL project.  First hand experience is why I feel so strongly that data discovery leads to a better development process and, ultimately, outcome.</p>
<p>Supporting Research for Data Discovery in ETL Design</p>
<p>In a fairly recent polling exercise the <a href="http://www.etltool.com/about-passionned-group/" target="_blank">Passioned Group</a>, an analyst and consultancy company, based in The Netherlands, specializing in Business Intelligence, Data Integration and ETL tools, conducted a polling of 2,000 participants where they ask what they thought were the most important requirements when choosing an ETL tool.  The results demonstrated just how important data discovery is to ETL developers.</p>
<p><a href="http://thedataqualitychronicle.org/wp-content/uploads/2012/04/etlpollproducesunexpectedresults.png"><img class="wp-image-2168" style="display: inline; margin-left: 0px; margin-right: 0px; border-width: 0px;" title="etl-poll-produces-unexpected-results" src="http://thedataqualitychronicle.org/wp-content/uploads/2012/04/etlpollproducesunexpectedresults_thumb.png" alt="etl-poll-produces-unexpected-results" width="345" height="187" border="0" /></a> As you can see aside from performance, data profiling was the most important feature.  My intuition tells me that the people who responded to the poll had similar experiences to mine when developing ETL solutions.</p>
<p>Way back in 2001 William Laurent of Information Management wrote a piece entitled, Best Practices for Data Warehouse Database Developers.  The number one best practice was <em>make sure you are provided with a usable data dictionary before starting heavy-duty development.</em><strong>  </strong>Data discovery can help build that data dictionary without relying on assumptions and assertions made by business analysts and database administrators.</p>
<p>In defining what ETL is the Passioned Group <a href="http://www.etltool.com/what-is-etl-extract-transform-and-load/" target="_blank">mentions</a> data profiling by explaining how it can help build a system that</p>
<blockquote><p>that is robust and has a clear structure.</p></blockquote>
<p>The Data Warehouse Information organization , a site “Powered by &#8220;<strong>DWH Professionals</strong>&#8220;, &#8220;<strong>DWH Enthusiasts</strong>&#8221; and <strong>People alike</strong>” graphically depicts data profiling  in their recommended ETL design process.</p>
<p><a href="http://thedataqualitychronicle.org/wp-content/uploads/2012/04/ETLDataProfilingMain.gif"><img style="display: inline; margin-left: 0px; margin-right: 0px; border-width: 0px;" title="ETLDataProfilingMain" src="http://thedataqualitychronicle.org/wp-content/uploads/2012/04/ETLDataProfilingMain_thumb.gif" alt="ETLDataProfilingMain" width="292" height="163" align="left" border="0" /></a></p>
<p>Here is an important statement they make about the benefits of data profiling during ETL design.</p>
<blockquote><p>Data Profiling is a process that <strong>familiarizes you with the data you will be loading into the warehouse/mart</strong></p></blockquote>
<h1>So how do I use data discovery to achieve a better ETL design?</h1>
<p>As I mentioned in my previous <a href="http://thedataqualitychronicle.org/the-many-uses-of-data-discovery/" target="_blank">post</a>, I recommend starting with the following question:</p>
<blockquote><p>What are the critical data domains we are looking to integrate into the target?</p></blockquote>
<p>The reason I start with this seemingly basic question is so that you can build true discovery processes into the ETL design.  True discovery finds data unbeknownst to the data consumer that also needs to be included in the target.  To me, this is one of the most value added services that the ETL team can provide to the data consumers.  Here is an example, taken from my previous experience, that demonstrates what I mean.</p>
<p>I had a marketing client that was looking to build a repository from which they could perform campaign management and analytics.  They had done a fair mount of quality due diligence and identified what <em>they felt </em>were the required sources.</p>
<p>When I asked my generic question there was a fair amount of dissent in the room and some even pointed to the source to target matrix (STTM) as my source of information.  However, I pressed on and discovered that some of the more executive users of the analytics were interested in performing analysis on customers were were marketed to but the address of record, for which the source systems was included in the STTM, was not deliverable (or was returned by the USPS).</p>
<p>As it turns out, this information was not stored in a source system but rather kept in a spreadsheet (of course) by one of the marketing administrators.  Of course knowing this allowed me to incorporate  the spreadsheet in the ETL sources but it also help us build in another process which discovered and profiled address data in critical business applications which were then included in an enrichment process so that undeliverable addresses could be updated with the proper addresses (where applicable).</p>
<p>Data discovery is a simple process once you know where to point the discovery tool.  This focus is obtained by asking the general but effective question I mentioned above.  Data domains, like address, help you ask more intelligent and specific questions like …</p>
<blockquote><p>what critical applications store, collect or consume address data?</p></blockquote>
<p>Once this is uncovered, data discovery works much the same way that data profiling works.  You define the source, build a connection, define and execute the profile jobs and decipher the results.</p>
<h1>Data Discovery for ETL Tips</h1>
<p>Here are a few tricks I use when performing data discovery for an ETL design proof of concept.</p>
<ol>
<li>Profile early and often</li>
<li>Translate data profiles into a metadata dictionary</li>
<li>Identify data anomalies</li>
<li>Never develop an ETL map from a specification, do it based on profile results</li>
<li>Communicate where metadata and data distributions do not match the businesses expectations and look for the root cause</li>
</ol>
<p>I know this list seems basic, but you’d be surprised how often it does not happen and how much rework and cost is incurred as a result.</p>
<p>Your thoughts?<br />
</p>

]]></content:encoded>
			<wfw:commentRss>http://thedataqualitychronicle.org/data-discovery-a-path-to-better-etl-development/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The many uses of data discovery</title>
		<link>http://thedataqualitychronicle.org/the-many-uses-of-data-discovery/</link>
		<comments>http://thedataqualitychronicle.org/the-many-uses-of-data-discovery/#comments</comments>
		<pubDate>Mon, 02 Apr 2012 21:18:34 +0000</pubDate>
		<dc:creator>William Sharp</dc:creator>
				<category><![CDATA[data discovery]]></category>
		<category><![CDATA[Application Lifecycle Management]]></category>
		<category><![CDATA[data management strategy]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[MDM]]></category>

		<guid isPermaLink="false">http://thedataqualitychronicle.org/?p=2036</guid>
		<description><![CDATA[Bloor research defines data discovery as … the discovery of relationships between data elements, regardless of where the data is stored. If you expand your mind beyond the conventional relational database meaning of relationships, I agree with this definition.  Relationships in this context, or rather the context I chose to apply, means much more than [...]]]></description>
			<content:encoded><![CDATA[<p>Bloor research defines data discovery as …</p>
<blockquote><p>the discovery of relationships between data elements, regardless of where the data is stored.</p></blockquote>
<p>If you expand your mind beyond the conventional relational database meaning of relationships, I agree with this definition.  Relationships in this context, or rather the context I chose to apply, means much more than a primary – foreign key relationship.</p>
<p>In this context relationships is defined as commonality.  This commonality can be of a data type, value pattern, or business use.  If you can profile data and understand the relationships you can set yourself up for more efficient data management practices in the areas of ETL, MDM, and application lifecycle (or application retirement).  Let’s take each of these and examine how a data discovery can increase the quality of the effort.</p>
<h2>ETL and Data Discovery</h2>
<p>Classical ETL takes data from a source and loads it to a target.  If you perform data discovery profiling on the sources before you build the ETL mapping you can achieve the following:</p>
<ul>
<li>a more accurate picture of the required data type of the attribute
<ul>
<li>By examining the profile you can determine if the assigned data type is most appropriate for the data element</li>
</ul>
</li>
<li>a more accurate specification for the type of transform required
<ul>
<li>If the data and metadata are not 100% coordinated you can build transforms to accommodate for this</li>
</ul>
</li>
<li>identification of data anomalies and outliers which require further investigation for possible remediation prior to the migration of data
<ul>
<li>this leads to a more robust error handling and exception handling process</li>
</ul>
</li>
<li>the identification of data, previously unknown, that meets the business requirements and needs to be migrated
<ul>
<li>discovery can lead to uncovering data that was previously undefined or unobtainable for data migrations</li>
</ul>
</li>
</ul>
<p>As discover tools mature, it may also be possible to generate ETL mappings directly from the tool.  If the target is more richly defined in the discovery tool and the sources are more accurately identified, it makes sense to me that a discovery tool can build a better ETL mapping.</p>
<p>This will require a tight coupling between the discovery and ETL tool, however, there are vendors in the market with this type of coupling available to them.</p>
<h2>MDM and Data Discovery</h2>
<p>In the same way that data discovery can aid ETL, so too can it aid the efforts of an MDM implementation.  Since MDM implementations are so dependent on ETL, the same leverage is available and can lead to a better MDM hub definition and ETL specification.</p>
<p>Here too can a feature to generate a data mapping be particularly useful.  With so much configuration required for match and merge rules, cutting some development form the scope of the effort would only add benefit.</p>
<p>Another particularly interesting feature would be the ability to generate candidate schemas for the MDM hub based on the data and metadata obtained in the profiles of the sources.</p>
<h2>Application Lifecycle and Data Discovery</h2>
<p>Finally, during a data discovery investigation it is possible to segment data by data ranges derived from last create / update dates.  This can be leveraged to perform application and/or data lifecycle management which would basically archive data past a certain date line or retire an application which has not be accessed in a predetermined, business driven date.</p>
<p>Here is another use for dynamically generated data mappings which would migrate the retired data to a target or archive destination.</p>
<h2>Discovery is only the first step</h2>
<p>As you can see from this quick summary, there are many uses for data discovery and as the tools mature there are many more things that can be done to leverage a discovery effort.</p>
<p>Your thoughts?<br />
</p>

]]></content:encoded>
			<wfw:commentRss>http://thedataqualitychronicle.org/the-many-uses-of-data-discovery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When one door closes &#8230;</title>
		<link>http://thedataqualitychronicle.org/when-one-door-closes/</link>
		<comments>http://thedataqualitychronicle.org/when-one-door-closes/#comments</comments>
		<pubDate>Mon, 02 Apr 2012 15:17:39 +0000</pubDate>
		<dc:creator>William Sharp</dc:creator>
				<category><![CDATA[data quality]]></category>

		<guid isPermaLink="false">http://thedataqualitychronicle.org/?p=2031</guid>
		<description><![CDATA[The famous saying is that when one door closes, a window opens.&#160; Such is the case with my career in data management.&#160; I have hung up my consulting cleats and have donned the headset of product management.&#160; I will be joining the world class team at Informatica to take the Data Explorer product to the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thedataqualitychronicle.org/wp-content/uploads/2012/04/window.jpg"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="window" border="0" alt="window" src="http://thedataqualitychronicle.org/wp-content/uploads/2012/04/window_thumb.jpg" width="244" height="209"></a> The famous saying is that when one door closes, a window opens.&nbsp; Such is the case with my career in data management.&nbsp; I have hung up my consulting cleats and have donned the headset of product management.&nbsp; </p>
<p>I will be joining the world class team at Informatica to take the Data Explorer product to the next level.&nbsp; Data Explorer (DE) is a profiling and discovery engine that focusing on the reporting of data distributions and metadata definition.</p>
<p>There could be no better place for me to land than Informatica.&nbsp; I have been working with their products for close to 6 years now and find them among the easiest to learn and use in the marketplace.</p>
<p>I intend on using my knowledge gained as a consultant performing countless implementations to enhance the product through increased functionality, usability and transfer of functions with numerous products such as PowerCenter, Lifecycle Management, Metadata Manager, and, of course,&nbsp; Data Quality.</p>
<p>I’d like to extend my gratitude to the colleagues and clients that have helped me get to this point in my career.&nbsp; </p>
<p>With this move comes a shift for the blog.&nbsp; I will primarily focus on those topic areas that influence or affect data discovery.&nbsp; I’ll post updates about the Data Explorer product and poll the masses for ideas and feedback.&nbsp; </p>
<p>I hope you enjoy the new and exciting path I am about to embark on as much as I will!</p>


]]></content:encoded>
			<wfw:commentRss>http://thedataqualitychronicle.org/when-one-door-closes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Justin Bieber abuses Twitter but proves how similar phone numbers are</title>
		<link>http://thedataqualitychronicle.org/justin-bieber-abuses-twitter-but-proves-how-similar-phone-numbers-are/</link>
		<comments>http://thedataqualitychronicle.org/justin-bieber-abuses-twitter-but-proves-how-similar-phone-numbers-are/#comments</comments>
		<pubDate>Fri, 30 Mar 2012 03:45:57 +0000</pubDate>
		<dc:creator>William Sharp</dc:creator>
				<category><![CDATA[data matching]]></category>
		<category><![CDATA[data matching methodology]]></category>
		<category><![CDATA[data quality]]></category>

		<guid isPermaLink="false">http://thedataqualitychronicle.org/?p=2027</guid>
		<description><![CDATA[Look, I can&#8217;t believe I have managed to work Justin Bieber into a data management blog, but I have. When Bieber tweeted 9 digits of his phone number and asked his twitter followers to guess the 10th digit and call him, he set two unsuspecting victims phones on fire. He also proved how similar phone [...]]]></description>
			<content:encoded><![CDATA[<p>Look, I can&#8217;t believe I have managed to work Justin Bieber into a data management blog, but I have.</p>
<p>When Bieber tweeted 9 digits of his phone number and asked his twitter followers to guess the 10th digit and call him, he set two unsuspecting victims phones on fire. He also proved how similar phone numbers are and why they are bad candidates for match strategies.</p>
<p>Using phone numbers in match strategies is, in my opinion, a waste of time.  You are only going to increase your chances of generating false positives (unless you do an exact match).  My issue with the exact matches is that it is very easy to make a &#8220;fat-finger&#8221; error and still identify a false positive.</p>
<p>If you care, here is the link to the Bieber event.  If you are a Bieber fan and found your way to my blog by some search engine failure, my apologies (please don&#8217;t terrorize me).</p>
<p><a href="http://www.technolog.msnbc.msn.com/technology/technolog/justin-bieber-abuses-twitter-phone-gag-may-get-sued-596123">Justin Bieber abuses Twitter with phone gag, may get sued &#8211; Technolog on msnbc.com</a>.<br />
</p>

]]></content:encoded>
			<wfw:commentRss>http://thedataqualitychronicle.org/justin-bieber-abuses-twitter-but-proves-how-similar-phone-numbers-are/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>MDM and address data cleansing tips &#124; LinkedIn</title>
		<link>http://thedataqualitychronicle.org/mdm-and-address-data-cleansing-tips-linkedin/</link>
		<comments>http://thedataqualitychronicle.org/mdm-and-address-data-cleansing-tips-linkedin/#comments</comments>
		<pubDate>Tue, 27 Mar 2012 00:32:44 +0000</pubDate>
		<dc:creator>William Sharp</dc:creator>
				<category><![CDATA[address validation]]></category>
		<category><![CDATA[data cleansing]]></category>
		<category><![CDATA[data quality]]></category>
		<category><![CDATA[data profiling]]></category>
		<category><![CDATA[MDM]]></category>

		<guid isPermaLink="false">http://thedataqualitychronicle.org/?p=2017</guid>
		<description><![CDATA[I came across this discussion on LinkedIn that I wanted to share.  A group member asks about cleansing address data with regard to an MDM solution. Having done this before, I recommend using an address service, Address Doctor, and profiling the data prior to building the cleansing rules. The latter is a topic I speak [...]]]></description>
			<content:encoded><![CDATA[<p>I came across this discussion on LinkedIn that I wanted to share.  A group member asks about cleansing address data with regard to an MDM solution.</p>
<p>Having done this before, I recommend using an address service, Address Doctor, and profiling the data prior to building the cleansing rules.</p>
<p>The latter is a topic I speak often about.  Profiling is an essential part of defining cleansing rules.  Well, meaningful cleansing rules.  Anyone can &#8220;stub&#8221; in those generic rules, but after profiling the data you can gain much more contextual insight and build customized data cleansing rules.</p>
<p>Check out the whole disucssion here:</p>
<p><a href="http://www.linkedin.com/groupItem?view=&amp;gid=2390170&amp;type=member&amp;item=98666018&amp;commentID=74234124&amp;report%2Esuccess=8ULbKyXO6NDvmoK7o030UNOYGZKrvdhBhypZ_w8EpQrrQI-BBjkmxwkEOwBjLE28YyDIxcyEO7_TA_giuRN#commentID_74234124">I&#8217;m a long-time IT professional, but a newcomer to MDM and data cleansing. Any tips on cleansing Address data? | LinkedIn</a>.<br />
</p>

]]></content:encoded>
			<wfw:commentRss>http://thedataqualitychronicle.org/mdm-and-address-data-cleansing-tips-linkedin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Clues to a Great Business Intelligence (Data) Story</title>
		<link>http://thedataqualitychronicle.org/clues-to-a-great-business-intelligence-data-story/</link>
		<comments>http://thedataqualitychronicle.org/clues-to-a-great-business-intelligence-data-story/#comments</comments>
		<pubDate>Mon, 26 Mar 2012 18:11:12 +0000</pubDate>
		<dc:creator>William Sharp</dc:creator>
				<category><![CDATA[data discovery]]></category>

		<guid isPermaLink="false">http://thedataqualitychronicle.org/?p=2015</guid>
		<description><![CDATA[I highly recommend reading Cindy Harder&#8217;s interesting piece on telling stories with data.  Even though she was specifically referencing BI stories,  she does elude to doing the same with data analysis and data discovery in the article. I was just thinking about this topic of making data discovery fun last night so this article really [...]]]></description>
			<content:encoded><![CDATA[<p>I highly recommend reading <a href="http://www.visualdatagroup.com/user/7" target="_blank">Cindy Harder&#8217;s</a> interesting piece on telling stories with data.  Even though she was specifically referencing BI stories,  she does elude to doing the same with data analysis and data discovery in the article.</p>
<p>I was just thinking about this topic of making data discovery fun last night so this article really spoke to me.  I think Cindy is dead-on when she reminds us to make sure we engage our audience with a storyboard around data analysis that makes the user want to know the ending.</p>
<p>This is often forgotten,  albeit challenging, with regard to presenting data analysis.  I struggle, at times, with how to make presentations interesting.  And let&#8217;s face it snazzy icons in a PowerPoint deck does not count as entertaining.</p>
<p>What Cindy emphasizes in the article is to engage users with data stories using these five basic principles when writing your data story:</p>
<ol>
<li>Refresh your data often</li>
<li>Build a complete dashboard, but don&#8217;t over complicate it (tricky and important!)</li>
<li>Encourage further investigation with data discoveries (my favorite!)</li>
<li>Analyzing data is fun, not just a job</li>
<li>Draw conclusions with your analysis that are accurate and meaningful</li>
</ol>
<p>To me, points 3 &amp; 5 are the real important concepts here.  I think you can facilitate further investigation with meaningful results.  And the key lies in the term meaningful.  To do this effectively, you need to bear in mind your audience.  Meaningful to a controller is not the same as it is to a DBA.  However, data discovery activities can support both these roles and you need to be sure to deliver something they care about in your story.  I think, if you do, that individual will be compelled to conduct further investigations which is where Cindy&#8217;s point of being accurate is important.  Make sure you are on point in your analysis!</p>
<p>Read the referenced article here:</p>
<p><a href="http://www.visualdatagroup.com/Clues_to_Great_Business_Intelligence_Story">Clues to a Great Business Intelligence Story | Visual Data Group</a>.</p>
<p>Check out Cindy on Twitter: <a href="https://twitter.com/#!/CindyBHarder" target="_blank">@CindyBHarder</a><br />
</p>

]]></content:encoded>
			<wfw:commentRss>http://thedataqualitychronicle.org/clues-to-a-great-business-intelligence-data-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Discovery Discussion on LinkedIn</title>
		<link>http://thedataqualitychronicle.org/data-discovery-discussion-on-linkedin/</link>
		<comments>http://thedataqualitychronicle.org/data-discovery-discussion-on-linkedin/#comments</comments>
		<pubDate>Fri, 23 Mar 2012 14:15:06 +0000</pubDate>
		<dc:creator>William Sharp</dc:creator>
				<category><![CDATA[data discovery]]></category>

		<guid isPermaLink="false">http://thedataqualitychronicle.org/?p=1996</guid>
		<description><![CDATA[Recently I posed a question in the LinkedIn group for The Data Quality Pro. Dylan Jones, editor of The Data Quality Pro, and I had a nice conversation about data discovery, when to use it and some tools to use to discover data. Join the discussion! I&#8217;m curious about how folks here are using data [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I posed a question in the LinkedIn group for <a href="http://www.dataqualitypro.com/" target="_blank">The Data Quality Pro</a>. Dylan Jones, editor of The Data Quality Pro, and I had a nice conversation about data discovery, when to use it and some tools to use to discover data.</p>
<p>Join the discussion!</p>
<p><a href="http://www.linkedin.com/groupItem?view=&amp;gid=1061007&amp;type=member&amp;item=100681247&amp;commentID=73868126&amp;goback=%2Egmr_1061007&amp;report%2Esuccess=8ULbKyXO6NDvmoK7o030UNOYGZKrvdhBhypZ_w8EpQrrQI-BBjkmxwkEOwBjLE28YyDIxcyEO7_TA_giuRN#commentID_73868126">I&#8217;m curious about how folks here are using data discovery and what tools they are using to do discovery #datadiscovery | LinkedIn</a>.<br />
</p>

]]></content:encoded>
			<wfw:commentRss>http://thedataqualitychronicle.org/data-discovery-discussion-on-linkedin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Data Explosion: Opportunities and Challenges Abound</title>
		<link>http://thedataqualitychronicle.org/the-data-explosion-opportunities-and-challenges-abound/</link>
		<comments>http://thedataqualitychronicle.org/the-data-explosion-opportunities-and-challenges-abound/#comments</comments>
		<pubDate>Fri, 23 Mar 2012 01:08:55 +0000</pubDate>
		<dc:creator>William Sharp</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[CRM]]></category>
		<category><![CDATA[value proposition]]></category>
		<category><![CDATA[data discovery]]></category>

		<guid isPermaLink="false">http://thedataqualitychronicle.org/?p=1968</guid>
		<description><![CDATA[It is an interesting time to be in data management.  There are more sources of data in so many varied formats than ever before.  There are new tools continuously evolving at light speed.  There is the promise of opportunity and with it enormous challenges. With regard to the opportunities, one of the most interesting things [...]]]></description>
			<content:encoded><![CDATA[<p>It is an interesting time to be in data management.  There are more sources of data in so many varied formats than ever before.  There are new tools continuously evolving at light speed.  There is the promise of opportunity and with it enormous challenges.</p>
<p>With regard to the opportunities, one of the most interesting things I see developing is increased access to customers.  From traditional to mobile platforms, there are more avenues to interact with customers, presenting an opportunity for product and service providers new ways to measure their effectiveness.  I have starting researching things like sentiment analysis which is an example of how access to customers and data explosion provides insight into product / service perception.</p>
<p>With regard to the challenges, performing analysis on this data requires tools, methodologies, and resources that are very unique and unconventional.  For most organizations, it will take some time to align the resources to perform meaningful analysis.  That is not even taking into the account the budget that needs to be set aside for this activity.</p>
<p>While the technology industry is thrilled with their new story filled with magical elephants and all the promise of a new reality, the boots-on-the-ground in data management must feel like a deer caught in the headlights of an on coming 18 wheeler at 90 mph.  To some the data explosion must feel like fireworks against the warm summer sky, to others the explosion must feel like the pounding of cannon fire against the office wall.</p>
<p>What I think it is very important to realize is that this data explosion is really both at the same time.  We need to be realistic and remember that while there is a lot of data out there and with it comes the promise of gaining new insights, this presents significant challenges to organizations in just how they are going to roll this into the mix of things that already need to do.</p>
<p>I intend on keeping my eye on what organizations come out winners and, maybe even more so, what organizations come out as losers in this new data frontier.  One of the things I intend on paying particular attention to is ROI.  What it costs to do this well and what it produces.</p>
<p>Until I see what that looks like, I am going to hold off getting giddy about big data / no sql … what about you?  Are you “all in” or waiting to see how this goes?</p>
<div id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:e91b3576-f294-45f0-841b-79b75c141c91" class="wlWriterEditableSmartContent" style="margin: 0px; display: inline; float: none; padding: 0px;">Technorati Tags: <a href="http://technorati.com/tags/big+data" rel="tag">big data</a>,<a href="http://technorati.com/tags/data+management" rel="tag">data management</a>,<a href="http://technorati.com/tags/data+mining" rel="tag">data mining</a>,<a href="http://technorati.com/tags/ROI" rel="tag">ROI</a>,<a href="http://technorati.com/tags/customer+analysis" rel="tag">customer analysis</a>,<a href="http://technorati.com/tags/CRM" rel="tag">CRM</a>,<a href="http://technorati.com/tags/sentiment+analysis" rel="tag">sentiment analysis</a></div>


]]></content:encoded>
			<wfw:commentRss>http://thedataqualitychronicle.org/the-data-explosion-opportunities-and-challenges-abound/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Next Hadoop confirms data as a platform &#124; Business Intelligence &#8211; InfoWorld</title>
		<link>http://thedataqualitychronicle.org/next-hadoop-confirms-data-as-a-platform-business-intelligence-infoworld/</link>
		<comments>http://thedataqualitychronicle.org/next-hadoop-confirms-data-as-a-platform-business-intelligence-infoworld/#comments</comments>
		<pubDate>Tue, 13 Mar 2012 00:46:17 +0000</pubDate>
		<dc:creator>William Sharp</dc:creator>
				<category><![CDATA[data quality]]></category>
		<category><![CDATA[BigData]]></category>
		<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://thedataqualitychronicle.org/?p=1958</guid>
		<description><![CDATA[I just finished reading this brief article and was quite interested in the implications of Hadoop maturing into a platform for data driven applications. In the article, Brian Proffitt of IT World, details Hadoop VP Aran Murthy&#8217;s Strata presentation which describes how Hadoop is expanding the types of applications that Hadoop&#8217;s MapReduce will be able [...]]]></description>
			<content:encoded><![CDATA[<p>I just finished reading this brief article and was quite interested in the implications of Hadoop maturing into a platform for data driven applications.</p>
<p>In the article, Brian Proffitt of IT World, details Hadoop VP Aran Murthy&#8217;s Strata presentation which describes how Hadoop is expanding the types of applications that Hadoop&#8217;s MapReduce will be able to support.</p>
<p>This expansion was compared to an operating system.  While I can&#8217;t quite see that analogy all the way through, I do see the strategic impact of being able to build big data applications against a Hadoop framework.</p>
<p>This type of expansion could free software developers to concentrate on the more front-end, user facing aspects of application features.  Something I have long thought would be a significant challenge.</p>
<p>I am eager to see this framework in action.  I&#8217;ll hold off on judgement until then &#8230;</p>
<p>&nbsp;</p>
<p><a href="http://www.infoworld.com/d/business-intelligence/next-hadoop-confirms-data-platform-188480">Next Hadoop confirms data as a platform | Business Intelligence &#8211; InfoWorld</a>.</p>
<p>&nbsp;<br />
</p>

]]></content:encoded>
			<wfw:commentRss>http://thedataqualitychronicle.org/next-hadoop-confirms-data-as-a-platform-business-intelligence-infoworld/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 1.017 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2012-05-19 08:06:31 -->

