SQL v. NoSQL: We've been here before
Recently Craig Mullins was reflecting on the 'war" between object databases (ODBM) and relational database technology (RDMS) 15-20 years ago and how it parallels the current "conflict" between SQL and NoSQL big data database systems.
One interesting thing about NoSQL is that there are different categories of NoSQL databases based on processing needs. A list of these categories with descriptions can be found at nosql-database.org.
According to this article published in June, MongoDB is the clear leader in document-oriented databases. The leaders in big table style databases are Cassandra and Apache Hadoop/Hbase. Neo4J is the undisputed leader with graph databases, and Riak has emerged as the leader in key-value stores.
Getting back to Craig's post, he's right when he says we've seen this before. New technologies come along, and a lot of them have value, but IT departments cannot and will never wholly discard a proven solution for every buzz-generating thing that emerges. As Craig puts it, it's "a good idea to remain skeptical of a wholesale replacement of relational" databases.
Obviously big data is important and necessary, but that doesn't make operational data any less vital. Businesses clearly see the need to bring the two together. As a result, the lines are blurring as traditional relational database vendors integrate NoSQL into the RDMS. DB2 11, for instance, supports simple integration of Hadoop with operational data. And with IBM introducing DB2 for z/OS table functions to access Hadoop data with DB2 data, I believe Apache Hadoop will be important going forward.
As you can imagine though, there are many different ways to integrate operational data with big data. If your strategy is to move data between DB2, Netezza and Hadoop, you may want to look at the Cloudera distribution. This piece from IBM developerWorks tells you how to get started.
Naturally, it won't be just SQL or NoSQL, but a combination based on individual business needs that will ultimately result in big data and analytic data being accessed together with operational data.
Has your company started a big data project? What are some challenges you are facing with enriching your operational data with big data?