Continue in 2 seconds

Technical Considerations When Business Intelligence Applications Access ERP Databases

Published
  • May 30 2003, 1:00am EDT

Since its introduction in the early 1980s, enterprise resource planning (ERP) software has replaced custom-built applications and has become the primary transactional system of most established organizations. ERP software provides the ability to efficiently integrate departments and functions into a single application by automating business processes that conduct and record the activities of the organization.

ERP Data Access Complexities

Most ERP applications utilize relational database management systems (RDBMSs) as the primary software to manage and store the data that is collected and processed. While ERP applications are able to integrate the functions of the organization, accessing the data from the database is challenging. Having extensive experience in accessing ERP databases for reporting, ad hoc query and decision support, there are several technical aspects that should be considered.

Database Schema

Most ERP databases have been designed based upon the principles of data normalizations. These principles are a set of techniques for organizing data into tables in a manner that eliminates redundancy and incompleteness. This in turn reduces the amount of storage needed by databases and provides a technique to logically organize data. However, most database designers or data modelers loosely adhere to these principles when developing the database schema to support ERP applications. There is a tendency to stray from these principles when designing a database schema because it requires theoretical discipline, which is often difficult to achieve when trying to satisfy business requirements. ERP database schemas that utilize the principles of data normalization usually do not go beyond the third normal form (3NF), which means that the data is organized in the following manner:

  • First Normal Form (1NF) requires that a table contain columns in which data of the same kind is located, repeating groups are separated into new tables and that a column or columns are designated at the primary key to uniquely identify a row of data within the table.
  • Second Normal Form (2NF) requires further refinement of a table by identifying and separating subsets of data that appear in more that one row into new tables and the creation of relationships between tables by the use of primary and foreign keys.
  • Third Normal Form (3NF) requires that that every column within a table be dependent on the primary key for its value. If a column is not dependent, then a separate table should be created for it.

Each form of normalization builds upon the previous form. Therefore, in order to be in 3NF, the data must have been organized in a manner that achieves 1NF and 2NF. While these are the principles of data normalization, understanding the ERP database schema can be very challenging because the software vendors rarely provide a complete logical model, which means that you will have to piece together the limited information that they provide. Also, the database schema may not be intuitive if the creator significantly deviated from the principles of data normalization.
Naming Conventions

One of the many distinguishing characteristics between the various ERP software vendors is their naming convention for the database schema tables and columns. The naming conventions that are used by ERP software vendors are unique to their products and require a technical reference manual in order to decipher and understand the purpose of a table or the characteristics of a column. For example, if you were trying to locate the purchase order number within the database schema of one ERP vendor, you would have to locate the column named "segment1" within the table, "PO_HEADERS." This naming convention is not intuitive and requires an intimate knowledge of the ERP database schema in order to extract data.

Complex Relationships

The relationships that often exist within an ERP database are complex. Complex relationships, such as self- referencing and many to many can be a problem when trying to generate a structured query language (SQL) statement to extract the data. These relationships may require multiple passes of the database or nested subqueries within an SQL statement in order to extract the data that has been requested. Understanding these relationships and how to resolve them from a data extraction perspective requires an intimate knowledge of the ERP database schema and a set of technical reference manuals.

Indexing Strategies

As information is entered into the ERP application, data is inserted, updated or deleted in the database. The indexes on columns within a table are designed for data entry and processing by the ERP application. Changing or adding indexes to an ERP database can cause the application to have performance problems and with certain vendors, void the product warranty.

Approaches to Accessing ERP Data

There are several approaches that one can take to configure a business intelligence (BI) application to extract data from an ERP database. Each approach has its benefits and drawbacks. The most commonly used approaches are:

Direct Access. This approach is often performed first because it requires no additional work other than the configuration of the BI application to access the ERP database. There are several items to consider with this approach. First, can the BI application generate complex SQL statements? Second, does the query performance meet the expectations of the users of the BI application? If you answered "yes" to both of these questions, then this is the approach for you. If not, evaluate the next approach.

Views. If problems are encountered with direct access because the BI application cannot handle complex SQL statements or query performance is slow, then another approach to consider is the use of a database view. The BI application can access a view, which is a predefined SQL statement that can resolve the issues with complex relationships and can be tuned to provide faster query performance. It is a quick solution to resolving the problem of extracting a predefined set of data from a complex database schema. However, a view is predefined and can have limited value if users are developing ad hoc queries or requesting data that is not within the view. It can also become a maintenance nightmare if users’ requirements are constantly changing. The problem can be exacerbated if the views are not properly documented or cataloged. If you do not have the ability to efficiently maintain the views or if user requirements are constantly changing, you should probably consider another approach to accessing ERP data.

Materialized Views. Unlike a view, a materialized view does not need to be compiled every time it is referenced. The data is compiled when it is created and therefore, provides faster query performance. However, view maintenance must occur whenever there are changes to the underlying data structure. In addition, the maintenance and functionality issues are the same as with a view.

Multidimensional Cube. A multidimensional cube is a predefined set of data that users of a BI application can access. It provides optimal performance since the result set has already been generated. However, similar to the previous two approaches, the generation of multidimensional cubes can become a maintenance nightmare if users’ requirements are constantly changing or the underlying data needs to be frequently refreshed.

Denormalized Table. Reassembling data from several existing tables creates a denormalized table, which can be incorporated into an ERP database schema to improve performance while providing the functionality for ad hoc queries. This approach is a step toward creating a data mart solution since it requires the extraction, transformation and loading of data from the existing tables into the denormalized table. The drawback to this approach is the refresh and maintenance of the denormalized table.

Dimension Model. A dimension model is the optimal approach to creating an information solution for users who need fast query performance and functionality to address their ad hoc business questions. This solution provides the greatest flexibility from a user perspective because they are not constrained by the structure of the view or a predefined result set. However, it requires the extraction, transformation and loading of data from the ERP database schema into a dimensional or star schema.

Summary

While ERP databases are rich in data content, that ability to easily and quickly extract data for reporting, analysis and decision-making purposes can be difficult. There are several approaches that one should consider when embarking on this adventure: each has its benefits and drawbacks. The challenge is for you to select the best approach with the BI application that you have available to meet the needs of the user community.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access