Sunday, April 1, 2012

Traditional and New school of data management


Data management is ever evolving. The emphasis is more on availability, scalability, flexibility, multimedia and big/ large scale data.

At least a few areas draw attention:

What is the vision and concept behind this evolution?

What are the capabilities/ offerings of the Big data systems compared to the traditional databases?

What is the impact of these capabilities on the consumer industries?

How can the new capabilities enable new thinking to solve problems which were not possible before?

I found the following references providing answers to some of my questions on this:

First, in reference (1) Jim Gray emphasized the need for new data stores to accommodate large scale data that can happen with the “synthesis of database systems and file systems” when file systems grow to peta byte scale archives with billions of files”. I think this paper provides sufficient background for understanding the vision and concept. 

In the second article under the references, the authors suggest that the “vision not only applies for scientific data management, but also applies to any data intensive system”.


In [3], R. Cattell examines the  NOT only SQL (so called NoSQL) data stores “designed to scale simple OLTP style application loads over many servers”
The author’s survey is hopefully the most closest we can get to understand the basis,  strengths of the new data stores systems in contrast to the traditional, scalable relational db systems. 
The relational db systems are known to 
provide ACID transactional properties. 
In the case of the Not only SQL systems, "updates are
eventually propagated, but there are limited guarantees
on the consistency of reads. Some authors suggest a
“BASE” acronym in contrast to the “ACID” acronym:
• BASE = Basically Available, Soft state,
Eventually consistent
• ACID = Atomicity, Consistency, Isolation, and
Durability. The idea is that by giving up ACID constraints, one
can achieve much higher performance and scalability".



References:
1.      J. Gray et al., Scientific data management in the coming decade, SIGMOD Record, 34 (4): 34 – 41 [2005]
2.       
                 Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. 2007. Map-reduce-  merge: simplified relational data processing on large clusters. In Proceedings of the 2007        ACM SIGMOD international conference on Management of data (SIGMOD '07). ACM,         New York, NY, USA, 1029-1040. DOI=10.1145/1247480.1247602       http://doi.acm.org/10.1145/1247480.1247602

3.      Rick Cattell. 2011. Scalable SQL and NoSQL data stores. SIGMOD Rec. 39, 4
(May    2011), 12-27. DOI=10.1145/1978915.1978919