Before we compare our list of CPS requirements against the universe of NoSQL, it helps to have an understanding of what NoSQL actually is. As it turns out, NoSQL doesn’t refer to a single technology but rather a grab bag of otherwise unrelated technologies that are only loosely “related” by virtue of not being based on relational (specifically SQL) technologies.
And you thought the NoSQL moniker was simply a clever marketing ploy to incite rage amongst relational aficionados! :-)
While Wikipedia lists no less than 7 major types of NoSQL solution, the “Big 4″ that are discussed most frequently are:
- Key / Value
Let’s look at each of them in turn.
Key / Value
These solutions are typically little more than sophisticated distributed hash tables, often adding direct support for some combination of persistence (durability of data across restarts), replication (physical duplication of data, typically across multiple servers) and sharding (partitioning of data into discrete subsets, each of which is stored on separate sets of servers). They often restrict what data can be used for keys, while allowing values to be any kind of data that can be serialised. Typically querying can only be done by key.
These solutions are based on Google’s BigTable system (used internally by Google, as well as being the primary data storage facility available to Google App Engine developers). They can be thought of as being one step above a Key / Value store in that the values are not simply unstructured binary blobs that are opaque to the store, but are instead structured data elements that can be used for additional non-key based queries.
In some respects these solutions are quite similar to a relational database, minus explicit foreign keys, and with support for different “rows” in the same “table” having different “columns” (this is somewhat of an over-simplification, but from a data/content modeling perspective is reasonably accurate).
Document databases store “flat” collections of structured documents – in this case “document” does not (as the name might suggest) mean binary documents (Word, PDF etc.), but rather structured data objects with potentially rich internal structures.
The current generation of document databases have, for the most part, standardised on JSON as the underlying data format for documents, however I consider XML databases to fall under this umbrella as well (albeit many of those have additional facilities for slicing and dicing XML in various weird and unnatural ways that are illegal in some states).
Interestingly, query facilities vary widely between the extant document databases, with some offering query facilities on par with relational databases, while others don’t provide anything resembling a traditional query capability†.
Graph databases are unquestionably the oldest NoSQL solution, with at least one example predating the relational model by several years!
In these databases, data is stored as a series of discrete objects (“vertexes”) connected by zero or more relationships (“edges”) to one another. Typically the objects are simple hash table data structures and cannot have rich internal data structures (in contrast to a document in a Document database).
Borrowing a nice diagram from the Neo4J folks, one way of comparing the different classes of NoSQL solution is as follows:
Note: when looking at this graph I mentally replace the “Complexity” label with “Sophistication of data modeling” – the diagram is equally accurate with that substitution and to my mind that’s a more interesting picture (not to mention more relevant to the discussion of CMS and NoSQL).
There are any number of good NoSQL primers available on the interwebitubes, and I’d encourage you to read them if you’re new to the topic. I particularly like:
- Slides 11 through 17 of “A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)” by Neo4J’s Emil Eifrem.
- NoSQL – the Shift to a Non-Relational World
- NoSQL – Death to Relational Databases(?)
- Ricky Ho’s NOSQL Patterns (if you’re after a little more “meat”)
† Before I get flamed to a burnt crisp by the CouchDB fanbois, yes I’m quite familiar with map/reduce “materialised” views – I simply don’t consider that to be a “real” query mechanism. This feature also runs afoul of my “avoid crystal ball gazing at all costs” principle, but that’s a topic for another day.