Often times we treat data as a natural resource like water from a stream or minerals from the earth. The information is to be extracted and used like a non-perishable item. Water goes into a tank until required. Ore goes on a railcar to sit beside a plant until processed.
However, what if we treated our data as we treat our employees? We would have a “data resources” department. We would think about ways to attract and retain only the best data. We would also attempt to maintain a “satisfied” data workforce.
In the world of relational/structured data, this would never happen. The data is like water or coal. Sources are owned by the company in the form of point of sale or transaction systems. Customer information has been vetted into domains. Products have SKUs and inventories. Each aspect has a purpose and is processed to that purpose.
Multi-structured (read: NoSQL) data is more like living concept than a static one such as ore to be extracted from a mine. NoSQL data can have multiple aspects. It can even perish. Web-based click stream analysis is a perfect example. Operational teams can load balance and optimize website performance using the aggregate information by Web server. Marketing can look at purchasing paths using page analysis by click. Finance can look at pricing decisions and shopping cart close rates by transactions. Click stream analysis even has a shelf life if you consider that customer behavior prior to purchase may be indicative of either a rising prospect or a churning customer.
Recently, I read one of the most fantastic articles that I have seen in a long time. It wasn’t about technology (however, I am about to turn it into a technology piece). It wasn’t about SQL schemas vs. NoSQL formats. It was a human resources (HR) piece about how to build your team. Mel Kleiman wrote on TLNT.com about the “Top 10 Ways to Guarantee Your Best People Will Quit.” It was more a cautionary tale about what the top mistakes to avoid than it was about how to get your employees to quit. However, it was sooooooo very well written and laid out that I decided to borrow its themes to talk about data sets in the era of NoSQL and machine-to-machine (M2M):
Treat All Data Equally: The easiest way to wreck a data management strategy is to say that the sensor data from your building’s heating and cooling system is just as important as the sales data that is reported to the SEC and Wall Street. It isn’t. However, that doesn’t mean that you can’t find business value in sensor data. Don’t treat all the data the same, but have an appreciation where value can come from in a data source.
Tolerate Mediocre Data: Data that sits there is mediocre data … In a M2M sensor data example, don’t “just” store it simply because it is available … Have a plan on what value the data can have. Cost reductions can come from operational sensor data … or even better, your operational web traffic logs can provide new revenue streams. That creates Superior Data as opposed to Mediocre Data!
Have Dumb Data Governance Rules: Readers know that I am not a huge fan of “legacy” (read old school) data governance rules for new NoSQL data sources. For data that comes from non-human sources, focus on pedigree and freshness, and not syntax. There are no data entry mistakes in a log file ... but there are source configuration and lineage issues to be managed.
Don’t Recognize Outstanding Data: Data that yields results should be championed as a data source that should be used and valued. If significant cost savings or revenue generation comes from a data source, no matter the format, then it should be communicated across the organization. Give examples of how to think out of the box and provide value to the organization.
Don’t Have Any Fun With Your Data: Speaking of thinking outside the box … Most of my examples above relate to cost reduction or revenue generation. However, not all data has to be sent off to “business school” and drive straight to the top or bottom line. Some data can tell you interesting things about customers or their habits that have nothing to do with costs or revenues … Though, you might just find that if you have fun with data, you can identify patterns that turn into revenues or cost savings …
Don’t Keep Your Data Informed: Silo-ing your NoSQL data is just as bad as attempting to treat it like structured/relational data. Find ways to link the customer information in your established intelligence platforms to your NoSQL data … and vice versa. You will find that creating those links between data sources will create additional value.
Micromanage Your Data: NoSQL data is NOT relational/structured data. Attempting to make sure that every value in every field meets the expectations of the Data Steward is not a worthwhile enterprise … Going blind on data quality “paperwork”/processing isn’t going to work for data sources that create information as prolifically as M2M …
Don’t Develop a Data Retention Strategy: As I said above, if you allow your data to “just” sit around as Mediocre Data, you will incur too many storage costs. Likewise, if you get rid of it too quickly, you run the risk of eliminating the very value that most NoSQL platforms hope to generate. Therefore, you need a plan and you need to execute on the plan. As most project managers like to say … Plan the work. Work the plan.
Don’t Do Employee Retention Interviews: The key to the above the Data Retention Strategy is to make adjustments as necessary. You need to determine which data sources are contributing and which are not. Regularly, if not constantly, exploring the information in NoSQL data stores will allow you to know where value can be found and when /dev/null (sorry old unix joke http://en.wikipedia.org/wiki//dev/null) should be used …
Make Your Data Acquisition Process an Exercise in Tedium: Back to treating your data the same … NoSQL data shouldn’t be “tortured” via ETL into a schema before it can be stored in Hadoop or Mongo or Neo4j. NoSQL data should be profiled and stored quickly. The “beauty” (read value) is in the eye of the “beholder” (analyst). Use ETL as the next step as you process information from NoSQL into relational/structured environments.
What say the readers?
Have I read too much into Mel Kleiman’s piece? Can you treat data as you treat employees? Can these rules be used just as easily for relational/structured data as they can for NoSQL?
Provide your comments below and/or ping me via twitter at @JohnLMyers44 with the hashtag #noodlingNoSQL.