for Information Management Blogs
JAN 14, 2010 2:54am ET

Blogroll

Mining the Cloud? Are You Allowed?

Print
Reprints
Email

I recently came across a trade-press article with the headline “Mining the Cloud.” The cynic in me immediately issued a silent scoff: How is that different from “crawling the Web”? Are we just mapping old wine to shinier new bottles? Or is there something different here?

But, seeing as how I too like to proliferate discussions of mining this or that information type, I was willing to cut the reporter some slack. The article was from Redmond Developer, and concerns “Project Dallas” under Microsoft’s Azure cloud initiative. Essentially, “Project Dallas” (still in beta) supports discovery, manipulation, visualization, and analysis of data retrieved from multiple public, commercial, and private data sources via the Azure cloud. “Dallas” allows enterprises to provide users (via REST, Excel PowerPivot, and/or Visual Basic applications) with online access to aggregated feeds via Azure, which essentially operates as an online information marketplace. Also, “Dallas” allows customers to have Azure host their data for them, or simply continue to host it on their own premises while the cloud service connects securely to it.

That’s all cool, and the screenshots are compelling, but I don’t see any actual data mining, in the strictest sense of that term. In other words, “Dallas” has one data mining feature—interactive information discovery—in spades, but appears to lack some other essential features, such as clustering, classification, regression, and predictive modeling. It’s not as if Microsoft lacks those technologies. After all, the vendor provides decent predictive modeling and data mining through add-ons to SQL Server and Excel, but those features don’t seem to be integrated into this so-called “mining the cloud” service. In a very real sense, this “Dallas” beta is traditional BI in the Azure cloud, with a strong visualization layer. As such, it bodes well for any future plans that Microsoft might develop to make Azure a full-blown BI cloud in its own right.

Rather than quibble anymore on this point, I’d like to call attention to another “Dallas” feature that I find very interesting. “Dallas” incorporates an information syndication and licensing model, which frees users from having to separately set up and manage diverse subscriptions. Though you might say, “so what, that’s a standard component of any online content aggregation service,” I’d argue that that’s at the heart of any future service that promises to let you “mine the cloud” (however you define “mine,” but with specific emphasis on public clouds and federated public/private clouds). Considering that a cloud is a highly distributed information environment, and that many public clouds will federate with and among private enterprise clouds, it’s absolutely essential to have federated content syndication and licensing. Essentially, federation provides a web of trust to ensure that you’re only given access to data sets for which you have permissions, and that you’re prevented from accessing any that you’re not allowed to (perhaps because you didn’t pay or because you don’t have valid credentials).

That, of course, also points to the need for federated identity, authorization, digital rights management, and permissioning among content clouds. In a Microsoft context, I would expect to see them leverage their federated identity technologies, especially CardSpace, WS-Federation, and Identity Metasystem, in any federated cloud content permissioning environment. In a broader industry context, I would expect a role for such federation standards as SAML. Inasmuch as more and more clouds will involve peer-to-peer information provisioning through social networks and RSS feeds, I’d expect to see a role for user-centric federation standards such as OpenID. And, just to complete this thought, I’d expect to see all of this converged with BI-oriented data federation approaches, perhaps leveraging RDF-based ontology/taxonomy specifications for semantic interoperability.

But I’m not seeing that, at least not in Microsoft’s “Project Dallas,” nor, to be honest, in any industry initiatives aimed at making clouds a truly standards-based federated information environment. In terms of “mining” clouds, there are, of course, the Hadoop and MapReduce communities, who have developed a powerful framework for doing predictive analytics against complex, distributed information sources. I don’t see any clear commitment by Microsoft yet toward incorporating these technologies into Azure generally, or “Dallas” specifically.

I expect that Microsoft and others will evolve toward this comprehensive vision over time, but it’s not obvious right now. At least not based on what I’ve seen and heard.

James Kobielus also blogs at http://blogs.forrester.com/business_process/.

Filed under:

Advertisement

Comments (4)
One of the biggest road-block for adoption of the Cloud - whether SaaS, PaaS, or IaaS - is data ownership, security, and privacy. As such any attempt to mine data - whether in public, private, or hybrid cloud - would impact these three concerns and further hinder the adoption of it. This is a non-starter for me!
Posted by Luan N | Friday, January 15 2010 at 1:57PM ET
I have perpendicular view :-).

As everyone except that Cloud computing will be the future and data center will be the history(Not sure about the banks). This means,There will be two type of data will be available on cloud, first(more than 90%) will be open for mining and second will be protected data. For first kind of data, better miner and analyst will be ahead in race while second will be busy in saving their skin from hackers.

Hence, in yet to evolve ecosystem,all industry will be more than happy to share or expose their data for mining for better reputation and customer acquisition.

Hence, no one is going to break their head on federated information environment but companies like hadoop will mushroom.

Posted by Phnai B | Friday, January 15 2010 at 1:57PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for James Kobielus

Social Media Analytics Revolutionizing Marketing Campaign Management
The Year Ahead in Next Best Action? Here’s the Next Best Thing to a Crystal Ball!
The Year Ahead in Advanced Analytics? Advances on All Fronts!
The Year Ahead In Big Data? Big, Cool, New Stuff Looms Large!
Data Scientist: Is This Really Science or Just Pretension?

More from James Kobielus »

Blog Index »

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.