With the advent of web applications like eBay and Twitter, big data solutions have become all the rage. Being a true data fanatic, I have decided to log what I am reading and my thoughts on it on the blog.
In simplest terms, the phrase refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities. [1]
While I am sure the data quality freak in me will come out, I intend on covering big data from a technology perspective as I find it very interesting.
Hope you enjoy!
Big Data … Little Data Quality
Is Big Data better Data Quality? Big Data is everywhere. Chances are you’ve used a big data solution today. However, are big data solutions delivering big data quality? High Availability versus High Data Quality Typically, Big Data solutions are designed to ensure high availability. High availability is based on the concept that it is more important to collect and store data transactions than it is to determine the uniqueness or accuracy of the transaction. Some common examples of big data / high availability solutions are Twitter and Facebook. It is possible to configure a big data solution to validate uniqueness and accuracy. I want to make sure I state that clearly. However, in order to do so you need to sacrifice some of the aspects of high availability to do so. So, in some regard, big data and data quality are at odds. This is because one of the fundamental aspects of high availability is to write transactions to whichever node is available. In this model, consistency of transactional data is sacrificed in the name of data capture. Most often, consistency is eventually configured on data inquiries, or on data reads as opposed to data writes. In other words, at some given point in time [...]
Thanks for taking the time to visit the weblog!
William Sharp
[...] Big Data Technology | The Data Quality Chronicle thedataqualitychronicle.org With the advent of web applications like eBay and Twitter, big data solutions have become all the rage. Being a true data fanatic, I have decided to log… [...]