The Importance of Enterprise Categorization Systems
InfoManagement Direct, November 1998
The profitability and market position of most companies are under attack. Forces such as an increasingly global economy, aggressive new competitors, new technology advances, deregulation and rapid changes in customer behavior are changing the way the world does business. Companies lacking the knowledge or understanding of these forces will have extreme difficulties expanding. The tool that will improve a company's ability to operate more intelligently and gain competitive insight on how best to operate in this ever-competitive world is information - perhaps the simplest tool of all, yet potentially the most powerful.
To thrive, businesses must understand and manage information about their business processes and associated marketplaces. The key business measures contained within this information will become one of the most important assets of your enterprise. After all, measurements lie at the very heart of a company's vision and strategy - they shape the attitudes and behavior of the organization. Key business measures communicate values, channel employee thinking and set the priorities of a company. The key business measures and associated information will be contained within decision support systems, including data warehouses and data marts.
We often argue points related to the technology issues of these decision support systems rather than business needs and issues. These arguments are focused on the wrong topic and are typically fueled by vendor bias. The fact is that businesses are distributed. We should focus on delivering decision support systems, which include data marts and data warehouses, based on the needs and requirements of the users. What is truly required to deliver such a system is a clear understanding of the business requirements that will translate into a proper enterprise categorization system. Only from this foundation can technology requirements be defined and physical products be selected to deliver the decision support system.
Advertisement
The Enterprise Categorization System
It used to be that when administrative employees were hired, they had to be able to build filing systems and file documents appropriately. What has happened to our filing systems? They are now computerized and handled by people who do not always possess good filing skills. If you want to deliver a decision support system that best supports your users, it would help to first determine how the business categorizes information. From these categorizations, you will be able to easily build information packages which are distributed throughout your enterprise.
The information packaging methodology that was first described in the book, Data Warehousing: Building the Corporate Knowledge Base, focuses on defining your enterprise categorization system throughout your data warehousing architecture. Within the information packaging methodology, the enterprise categories are transformed from user requirements gathered in information packaging diagrams.
For example, consider an information package which delivers student enrollment information for an academic institution. The student services information package encapsulates several categorizations utilized within academic institutions; namely, the calendar and fiscal year information, the enrollees, the instructional staff, the academic institution's locations, the majors offered, and the course content offered by the institution.
Each of these categorizations offer at least one hierarchical path into the data, and many offer multiple, alternate hierarchical paths to access the detailed object. For example, the location categorization refers to the physical property of the academic institution which includes campuses and schools on the campuses. For instance, Indiana University has several campuses located in cities such as Bloomington and Indianapolis. Each campus has schools with classrooms for students and staff. These schools include Arts & Sciences, Engineering, Biology, Chemistry, Medicine and Law, to name a few. An example of a categorization which has multiple hierarchical paths would include student which includes demographic information allowing users to categorize data regarding a student in various manners.
Multiple Uses of Categories
While you are building your decision support system enterprise categorizations you will notice a convergence around 20 or so definitions. Many people would define these categorizations as dimensions. Within the information packaging methodology, these dimensions are merely defined as hierarchies, not fully attributed entities. The dimensions will point to business metric information often discussed as fact tables or key performance measures. The dimensions will also point to related entities, known as category details, that provide attributes to robustly define the nodes, or levels, within the dimensional hierarchy. So the primary purpose of the enterprise categorizations within a decision support system is to standardize how all forms of data are retrieved from the electronic storage environment. Secondary uses for category hierarchies include data optimization and security.
Data Optimization
Many techniques are utilized today for optimizing data access. These techniques are often forced upon you by the inefficiencies of the underlying decision support systems technologies. These optimizations reduce the volume of data by either aggregating or partitioning the underlying detail data.
Aggregations reduce the overall number of rows that are available to the user by summarizing data using a categorization hierarchy. The resulting entity allows for faster data retrieval at the expense of detail data. The categorization hierarchy can allow aggregations to be built easily while maintaining linkage to the detail entities, guaranteeing overall accountability within the decision support data structures. For example, you could aggregate information on an annual basis to create entities that manage the data summarized by the years 1995, 1996 and 1997; while maintaining a larger entity within the details for all time periods. Partitioning also reduces the amount of data available for the user by dividing rows according to one or more categorization hierarchies. Partitioning provides the users with a view of the information for particular values of a dimension column. Continuing with our academic example, you could partition based on a student's level, providing your users with entities containing freshman, sophomore, junior, senior or graduate data only. Where aggregating the data reduces the rows through summarization, or horizontally, partitioning reduces the rows by eliminating the detail provided by columns, thereby summarizing vertically not horizontally.
Security
Aggregation and partitioning can also be used to apply security. All too often we build complicated security webs to keep people from visualizing data only to realize that the security required is more detailed than the underlying database management system's security - it is application or data-oriented security.
Aggregations provide a level of security that only allows the user to see all the levels within a categorization hierarchy and does not allow them to see the detailed data. Operationally the user will be unable to gain access to data outside of their purview. Partitioning also provides a level of security: if it is not present, the user cannot view the data by the partitioned value. These techniques are enforced by simply not providing the information about a given, or several, categorization hierarchies. The user is unable to access detail data if it is not physically present; nor is the user able to report or filter the data on a categorization if it does not physically present itself.
Getting Physical
Now that we have discussed some fundamental architectural concepts surrounding the delivery of decision support data, how is this leveraged into a physical implementation? Those companies that develop the ability to analyze their processes quickly and change their strategy proactively will be better able to exploit key opportunities. In addition, those companies who want to thrive must go beyond managing their own processes; they must manage the entire process that impacts their businesses, including better linkages with suppliers, customers, shareholders and employees. The enterprise categorization system will interlock all of the information assets across these knowledge centers, providing consistency in information and communication.
Providing information packages to each of these knowledge centers will allow users to evaluate how well they are achieving their objectives as set by the enterprise. The enterprise categorization system will also permit these information packages to be networked together, allowing people to analyze cross-correlated information. In our academic example, the financial and student enrollment analysis will more than likely occur across the locations and time period dimensions. This common linkage between student enrollment and financial information will allow users to analyze financial trends along with student services information to answer such questions as: Is technology affecting our incomes? Or, are cyber-universities decreasing our enrollment and fee revenues within specific graduate programs? In addition to this flexibility, the enterprise categorization of your data will allow you to better fulfill your user's information requests and store the information which best suits their work patterns and behaviors.
A decision support system is all about data gathering the proper data, managing the data throughout the overall process and providing access to the data. If you do not get the data right, you might as well stop. A decision support system architecture will contain several physical data stores which map to the requirements of the users as discussed above. Therefore, we can relate the decision support system to the architecture and to a set of standards that are developed within an enterprise to deliver a successful business intelligence that will support all employees of a company, not just the senior management.
The users of your decision support systems will have varying requirements in the area of data storage. Some of the users, for example sales personnel, are very mobile and require their information to be as mobile as they are - they need personal data marts. Other users, for example suppliers, are external to the enterprise and should only be provided with the data which they control or are responsible for - they need partitioned or aggregated data marts. Still other users, such as telemarketers, may require the ability to update information within their decision support system - they need mixed workload decision support systems. And others, such as brand managers, will require ad hoc access to highly dynamic data, allowing them to define target marketing promotions by analyzing large volumes of behavior based data on customers or prospects - they need a highly scalable data warehouse or data mart.
Data Warehouses or Data Marts
You should not set out to create an enterprise data warehouse; this scope is far too large for any development team and will more than likely never be completed. Data warehouses are complex - probably the largest integration effort ever undertaken by those who do so. These projects require the coordination of multiple vendors' components as well as internal organizations in a company. Based on these complexities, you should start with a small and manageable aspect of the business, define the information packages required for the users, begin delivering the enterprise categorization system and evolve your decision support system. The enterprise categorizations will become the linkage between the underlying decision support data stores - the personal, workgroup and enterprise data marts and data warehouses.
In effect, you will build an enterprise data warehousing system which is implemented as a networked set of databases. Your development team must build an architecture that supports the concept of an enterprise data warehouse, a plug-and-play or building-block approach to decision support system development. The enterprise categorization system will lay the groundwork for integrating these data stores into an enterprise data warehouse in a manageable fashion.
In Summary
The work facing most information system departments lies in the area of improving data architecture so that all information assets are held in a highly optimized enterprise categorization system. Users want to see their data and information categorically, whether text and numbers or multi-media objects such as audio, video and images. Great work still must be done to get our information systems in line with this categorization and to support an enterprise's overall mission. Most of the innovations of the future will be focused on the nuts and bolts, but the most important innovation is the concept and advancement of a data architecture and information accessibility.
The overall data architecture of our future will provide an all-encompassing definition of the real-world data that drives a business. This architecture will bring together operational systems that are focused on daily transactions that support the ability of a company to stay operational, and on the historical and external data used to analyze important measurements in a business' purview. Such an architecture will permit users to recreate past business events, analyze this event, learn from it to predict the future and make proper decisions based on the analysis. This shared data concept is not far from reality for those who have begun the process of warehousing information and building their own enterprise categorization system. Fulfilling the user and business requirements should be our focus and the process begins through aptly defining their view of the business intelligence data through enterprise categorizations.
Tom Hammergren is a specialist in the area of data warehousing. during his career, Hammergren has grown through the evolution of decision support systems constantly focusing on better ways to improve information system to end-user communication. He is currently the product manager of Data Warehousing Solutions at Sybase where he has been working to craft the complete Sybase Data Warehousing Solution - Warehouse Studio. Hammergren has assisted many large consumers of data with their decision support and data warehouse systems including: The Procter & Gamble Company, AT&T, Dun & Bradstreet, Equifax, Schlegal, Aetna, Clorox and State Farm Insurance to name a few. He is under contract to write three books on data warehousing. The first of these books, DATA WAREHOUSING: BUILDING THE CORPORATE KNOWLEDGE BASE was released in early 1997. The second of these books, DATA WAREHOUSING ON THE INTERNET: ACCESSING THE CORPORATE KNOWLEDGE BASE was released in 1998. He is currently working on the third book in this series, DATA WAREHOUSING ABSTRACT DATA: EXPANDING THE CORPORATE KNOWLEDGE BASE.
For more information on related topics, visit the following channels:







