Data Modeling in the Cloud
Cloud computing is the outsourcing of hardware, software and networks, allowing us to view and change information through Web browsers. From a data perspective, I find the cloud concept fascinating. It is extremely challenging for most organizations to know where their data is and to obtain an enterprise view. What if the data is in the cloud, though? Are we going to need to expand the enterprise view to a universe view if the data can be anywhere? In other words, is the cloud going to make our data management jobs easier, harder or have no impact, and why?
Up in the Air
There are three general types of cloud computing: infrastructure as a service, platform as a service and software as a service. IaaS is the offering of servers, software and storage space as a service. PaaS is the offering of tools for application design, development and testing as a service. SaaS is the offering of applications to end users (through a Web browser) as a service. The consensus from our Design Challenger responses is that our data management jobs are going to be easier in some ways and harder in others at each of these three levels of cloud computing.
For IaaS, Enterprise Architect Principal Vasant Gadgil echoes this theme that some things will be easier and others harder. "In theory, we don't have to spend hundreds of hours trying to 'smoothen' out the resource demand curve; try to estimate the hardware requirements as accurately as possible. In theory, all such tasks are somebody else's headache. One problem gone, another new arises. We have to think of network latency and response time in totally different dimensions."
Richard Kooijman, data warehouse architect, agrees. "Copying data might be slower than expected because cloud-specific details might get in the way. Since we do not know the technical details of the underlying architecture, we might stumble on unexpected difficulties which we would not have had in the traditional architecture where we have complete control and insight."
For PaaS, Vasant believes that although PaaS will make the issues around development tools and deployment easier, most organizations will have a steep learning curve because we don't have a lot of experience in this area. Mehmet Orun, senior manager, data architecture and analysis, describes an important learning curve many modelers will need to climb. "We will need to focus more on the semantic layer, similar to what BusinessObjects and other reporting universes did to simplify report creation. We still need to focus on identifying key entities and their relationships, but after creating these, focus on optimum design not just for storage, but security permissions and findability."
For SaaS, our modeling activities do not change. Professor Emeritus Gordon Everest says, "Modeling is getting an understanding of the user domain; we are modeling that portion (the scope) of the users' world of interest to a community of users/analysts/managers. This activity is unaffected by computing in the cloud." Data Analyst Mary Pat Suchy adds, "I think we have a hard enough time wrapping our heads around the data and its usage even when we know where it resides. No doubt, it creates an imperative for accurate and secure data definition, contextual meaning, and naming so as to ensure data integrity and integration across the universe."
Vasant emphasizes this integration effort. "The difficult part here is mapping what we understand of our data to the data/object model provided by SaaS, and the toughest part is the integration layer. In fact, integration is by far the most challenging part of the cloud." Chris Bielinski, principal architect, agrees. "Cloud computing will not solve data integration and management issues. It just introduces efficiency to the process along with a whole number of possible issues (security, connectivity, data integrity, etc.)."
Enterprise Architect Karen Lindokken neatly summarizes the impact of the cloud on data modeling. "The complexity of the impact of existing in the cloud depends on how the organization thinks about data currently. The current rigor of data governance and management within the organization prior to entering the cloud computing arena can make the transition easier or harder. An organization with documented metadata, known systems of record, identified data owners, well-understood data integration processes and mature governance practices likely can transition more easily; the organization well knows what data it uses and needs. An organization without governance in place and a lack of high quality metadata will simply add to their complexity of data management with cloud computing."
Join in the design challenge by becoming a Design Challenger. Sign up at stevehoberman.com.