Continue in 2 seconds

Information Management: Myths and Facts

  • December 29 2008, 2:22pm EST

As information management is gaining acceptance, good governance is becoming even more critical for the whole process of retrieval, acquisition, organization and maintenance of information. The crucial factor in information and decision process analysis is an improved design-thinking attitude. Only when decision-makers use a good process and methodology for making decisions under limited circumstances can people’s information management needs and desires can be made technologically feasible. Because all the information ultimately is managed by individuals, wherever there is human intervention conflicts between facts and myths exist. I have come across the following eight myths in information management.

Myth #1: Performance management is not a BI solution.

For sure, performance management is the buzzword of the year. Performance management is definitely closely tied with BI. The fact is, any new buzzword takes its own time and pattern to catch up with the reality of implementation and utilization. It’s like a new technology entering into the market creating hype, promises and anxieties. Some performance management vendors argue that they are different from BI. For me this appears to be a complete myth. To address it, let’s define business performance management. The terminology could be defined as a solution that enhances processes and procedures by proactively identifying the risks and problems by using specific methodology. It helps in predicting and answering what-if scenarios with a proactive approach rather than a reactive approach.

But before making it a strategic initiative, companies have to do some homework in assessing the true nature and purpose of the performance management initiative, and if their environment is conducive for the performance management tools. Any performance management tool does well only if the underlying system is stable and robust. And for performance management, the underlying system is a robust data warehousing and BI architecture and infrastructure. There are lots of companies where data warehouses still look like mirror images of their operational systems. In such cases, you will never receive any benefit from a performance management investment. At most, the performance management vendor will end up building a complete BI and DW for you, which is not what a performance management solution would typically offer.

Myth #2: Data warehousing and BI are technology solutions.


It’s a myth to call a DW/BI solution a technology solution because DW/BI is not any product, buta combination of tools and technology put together that enables a business to answer its decision-making questions in a much faster and efficient way. DW/BI is actually a business solution. BI/DW terminology has been around for couple of decades, but the myth is still very strong in the IT community that BI/DW is all about delivering with tools, whether it’s ETL tools or reporting tools. We always hear about the BI/DW in terms of how many reports it delivered, how many dashboards were created and the load time taken by the batch. We hardly hear BI/DW solutions in terms of “this initiative could address these business issues.”One can find a number of write-ups on the success criteria for a BI/DW (good design, best tools, right staffing) and why BI/DW initiatives fail (data quality, inadequate requirements gathering). But there are hardly any pointers which emphasize that it’s not important how much data you bring into the DW, but its associated business value. This essentially means that the BI/DW solution has its ultimate value in generating revenue and cost savings for the company. It’s possible that many of us know this fact, but it’s always good to get in touch periodically with basics, particularly basics of data warehousing to keep the investments, vision and directions right.


Myth #3: You cannot tangibly measure ROI on BI investments.


Measuring the return on any investment is dependent on various factors like your company, the person owning these metrics, tangible and intangible measurement types and direct and indirect returns. All these things are true in BI/DW projects and programs. You can’t manage what you can’t measure. So, for sure, you can measure ROI. It’s as simple as that. However, what’s not simple is the way you measure it. The good news is now BI/DW has reached a stable and matured state where guidelines can be used to calculate ROI along with the TCO (total cost of ownership) on the BI investments. Some of the key factors used to capture this are:

  • Infrastructure costs
  • Service costs (software vendors and service vendors) and
  • Staffing costs (both onshore and offshore). 

Each of these factors can be expanded per the requirement of the engagement, and you can derive the ROI. All the measures coming from these factors are numeric. In that case measurement should be pretty much tangible. One can drill down to the level where he/she can analyze various measurements, like what’s the cost per user or what’s the cost per terabyte volume, and derive information and further plan for the forecasted data growth. Having said that,ROI and TCO are still very subjective to the individual person and business. The guidelines can definitely trigger more detailed research and areas of concentration, which will lead to specific measurements on an ongoing perspective. Every detail associated with the ROI and TCO is tangible. If investment is a number and if cost is a number, there is no way that ROI and TCO are non-numeric measures.


Myth #4: Data Governance is a project.

A project has a definite start date and an end date. But data governance is a process that involves people, process, procedures and policies along with technology to handle an organization’s data in a secured and controlled way. Every BI/DW project will have an essence of data governance which is inline with the enterprise standards. There is no end data for these programs. Data governance can never ever be a project, but a through and through program. Its life is as long as business is running and technology is supported to run that business. Bottom line, the data governance programs should be systematic and a continuous investment in the enterprise. This is very important because change is the only constant in any business, and so the data governance initiatives should be able to dynamically adapt those changes coming from the business initiatives and technology practices. Whether these programs are top-down or bottom-up approaches is left to the individual enterprise and the way they want to implement it.


A data governance program might automatically start addressing other key issues like data quality and data security, which are other pillars where people are skeptical about investing or, at least, they don’t know how to measure any returns from those investments. As far as I can understand, as the data governance concept is fairly new from an investment perspective, I see this as a change in the culture and mind-set of people and organizations, which is threatening for them. All of us resist change, which is quite normal. But this barrier can be overcome by educating people on these aspects and the importance and returns that it brings to the organization. Though data privacy and protection is a very old concept, data governance is new to the industry and addresses the process and procedure to achieve this across the enterprise. Wherever there is human intervention in existing process or protocols, there should be some governance policies and control around them.


Myth #5: Disaster recovery plans are only for financial firms dealing with critical data.


The fact is disaster recovery is for every business running mission critical applications and expecting continuity of the business should a disaster occur. One of the clients from a mid-sized organization said when I proposed of having a DR solution, “We don’t need a DR solution or strategy, because we don’t run any critical information.” I thought that was very interesting, and my next question was, “Do you care if your systems are on fire today, or doesn’t it matter to the business?” Obviously it did for them, but their thinking was, why should one waste money on something which may or may not happen? This line of thinking might be true only if the information that they are losing does not matter to them in running their business.


I always keep a backup of my personal data. For me, that data is very critical, and I always take necessary preventive steps just in case. I am sure most of us do this. So if we can’t afford to lose our personal data, I don’t think any company or business ever can afford to lose their data. In short, everyone requires some kind of disaster recovery plan. It can be as simple as a backup in and external drive or implementing hot sites. The magnitude depends on how much each business can afford to wait or lose data should a disaster occur.


The disaster recovery started being recognized as a capability for a data warehouse only over the last couple of years - to be precise, when business started capturing transactions and making decisions in real time. Today, multiple implementation scenarios and architectures can be tailored to individual choice and solution requirements. There is always a trade-off between cost and benefit while designing a DR solution. It can range from having hot backups in a remote site to just shipping tapes overnight to the DR center. For matured progressive DW systems, careful capacity planning and design can be done so that the DR systems are utilized every day and can act as DR center when a disaster happens. By architecting it carefully, you can make the DR investment efficient to get performance in your day-to-day business and avoid making the DR systems simply wait for a disaster to occur. Once the criticality is defined, i.e., once a business is able to identify “must” information from “routine” information, a careful strategy can be planned by identifying the service levels around it.


Myth #6: A relational database model is the best data model for all decision support systems, and the dimensional data model works only with specific subject areas and domains.


A relational model, which is an ER model, is a design technique which addresses the relationship between data elements at the most granular level. This is a perfect technique for transaction processing systems, whereas the concept of dimensional modeling came into existence to address end-user queries and retrieval of data. The purpose of both these models,however, is quite different, and each of them should be used very carefully in their respective places and scenarios. The highest level connection between both of the design techniques which I can think of is that perhaps an ER model can be broken down into various dimensional models. Because the nature of the BI/DW systems are to address end-user analysis and queries using more recent and historical data, the dimensional model design technique is the only suitable technique to achieve this.


Lots of consultants claim that relational model is the best way to deal with data getting into the data warehouse. More emphasis is given to the data warehouses that are being built for the first time. The argument I have heard for this approach is that “because you don’t know what you want to do with the data warehouse, it’s a good idea to build it with the relational model because it eliminates redundancy rather than to keep adding layers, which demoralize the data when required.” My take on this kind of argument is, if you don’t know what you would want to do with your data warehouse, it’s high time you go back and figure it out with the business sponsors. Then decide about modeling. On the technical side, ETL loads for data warehouses require the biggest chunk in terms of effort and time. With a relational model, even to load a few GBs might take couple of hours. Apart from that, scalability and associated issues can be well understood automatically.Some amount of normalization is seen depending on the complexity of the data model, but claiming relational data model is best for DW is not appropriate. Dimensional modeling is the only powerful technique for overall enterprise data warehouse. There are classic examples in the industry with companies like P&G that implemented dimensional modeling in building their EDW back in the early ‘80s. 


Myth #7: Data warehouse performance is generally low if data is not summarized.


The general recommendation is to have optimal lowest level operational data in the dimensional model for a data warehouse. It’s a myth to think that a DW performance is low because there is no summarized data. Summarized or pre-aggregated data in a warehouse without a specific goal or requirement can be dangerous. One has to really understand the appropriate situation and necessity for having summarized data. There is no doubt that accessing pre-aggregated data might be much faster in certain defined scenarios, but one should not forget that having a summarized or a pre-aggregated view is only a performance tuning method and not a architecture replacement in the dimensional model. For standard reports where the report requirements are predefined and there is no scope for ad hoc reporting from users, it might be a good idea to have summarized tables. If the user is performing ad hoc reporting, it’s always better to have the lowest level of detail to avoid reaching dead ends while trying to address some new business requirements.


Myth #8: Data models for the data warehouses should be built as “generic.”


We have come across many instances where products and their vendors promise to deliver generic data model having generic business rules. I wonder how this could give any efficient value to the data warehouse delivery. Providing the most granular level of data in data models often helps to handle all the business questions that are not predefined. But still, having lowest grain does not qualify to have a generic data model, meaning, any application can derive its results from one generic data model. You can scale the data model to use the conformed data and define the BI application with specific business rules around it. Data models should always be built customized to applications. Having an application-oriented data model is important to derive metrics related to that application. These metrics, like profit-loss and cost-benefit, cannot be queried and/or calculated on the fly, and that’s the reason the ETL prepares the data in the backend and presents it to the reporting tools to utilize it. In thinking of creating generic data models or business rules, one has to clearly understand that the calculations and operating load come only on the reporting front. The reporting tool will never be able to handle that kind of load, because it is not designed for that. It’s always beneficial to assess the application and its delivery requirements to make the data model application specific, taking enough care to give space for scalability when required.


Information is power and information management is more like a behavioral science theory of management where the critical factor in making decisions lays with the individual’s limited ability to process information and to make decisions under limitations. New information management techniques and technologies will keep emerging and existing ones will keep undergoing a transformation, which is normal. But the only crucial factor that differentiates between the “intelligent” decisions and “just” decisions is the capability of the information management leaders. The demand for information management professionals will continue to increase in the foreseeable future. The leaders who can successfully control the power of information will be able to harness right value from the unlimited information already existing as well as from the generated data. These leaders should be able to clearly differentiate between the objective realities (facts) and fabulous statements (myths) in every single data point in the information management domain.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access