Outside the Box: NoSQL Document Databases
For all the fun of talking about NoSQL's flexibility and promise, we should link that promise to some reality - particularly some product reality to ground the ideas of NoSQL with real-world applications. My last blog talked about how graph data structures and analysis are different than what we might expect from the "box" that is structured data in an RDBMS.
Note, when I talk about specific products, there is no implied recommendation or endorsement of a particular product or brand. However, for the most part, I don't use "bad" products as examples in my writings, so take that for what it is worth.
When I talk about the ability to manage the multi-structured data sets associated with NoSQL, document data stores are some of the best examples to use. JSON (a skinnied down version of XML) provides an excellent format for "new age" applications to store and organize data that isn't tied to the "box" of a structured data management system. For example, JSON provides for information to be stored on a particular "object" or "entity" without having to declare in advance all the attributes associated with that particular object.
For example, customer can have standard attributes such as name, address and revenue. However, if a customer has a Twitter handle, JSON allows you to add that particular attribute just to the customer, or customers, who have Twitter handles. Another example might be a product that has standard SKU and pricing. Yet if a new feature or product identifier is added for one region and not another, do you need to change all of the data structures or just the objects impacted by the change?
Here are some illustrations from Doug Finke on how different objects can have different attributes:
For a full set of examples including arrays and nested arrays of attributes (try that with a structured data set in a SQL RDBMS), I like the nice folks at jsonexample to give some guidance.
Standard Format versus Formatting Standards
Now, if you are looking at these examples, you might say:
"Wait! These aren't unstructured data sets -- there's a structure, almost like the relational box above. NoSQL is supposed to be about unstructured data."
Well, yes and no. JSON provides a formatting standard similar to XML, but it is not a standard format such as the "box" of a relational database table. And this is the maddening part about JSON documents for relational databases. RDBMS/SQL databases really want to parse the documents and place them into a structured table. However, if there are different types of information or different levels of nested arrays/attributes the parsing routine will either 1) not input data that doesn't match the "target table" (i.e., you will be missing data from your JSON) or 2) add lots of columns to the "target table" and you end up with a sparsely populated, wide table that constantly has new columns added.
What JSON allows is for an application developer to manage the information that they store about a particular object - customer, product, region, etc. - without having to go through the process of checking the database, asking for a change request, etc. The processes of the "modern" IT department have become too ordered (and some might say immovable) to make a nimble adjustment as business and technical requirements arise.
Then again, some might say that cowboy developers should at least let the data stewards know what exactly they are coding to see if it meets with corporate standards rather than just "winging" their data structures, but that is a blog for a different day,
JSON and the SQLonauts...
Probably the best known and widely used JSON document data store is the MongoDB platform. While MongoDB doesn't technically support JSON, they do store JSON documents in the BSON format. BSON is a binary encoded version of JSON and allows for certain additional extensions.
Accessing the document information stored in MongoDB is as the NoSQL concept describes: not about SQL. However, if you are willing to give it a chance and use your syntactical imagination (if such a beast truly exists), you can see that just as JSON offers a structure to the data, so does MongoDB with the access layer:
Yet as I described in the last blog posting, NoSQL in the form of JSON is starting to bleed into the wonderful world of SQL. Both Teradata and MemSQL are starting to support JSON documents in their stores and support access from SQL statements.
The above SQL syntax is from the folks at MemSQL; JSON highlighting is mine.
What say the readers?
- Do document in particular JSON data stores have a future in your organization?
- Are your application developers chomping at the bit to put any attributes on the objects they utilize?
- Did you get the Jason and Argonauts reference?
- Would you have preferred a Colorado-based "Where the BSON roam" pun?
Provide your comments below and/or ping me via Twitter at @JohnLMyers44 with the hashtag #noodlingNoSQL.