Thanks to recent technological advancement in storage technologies, businesses today are gathering and storing more data than ever before, and this growth has been exponential in last few years. Trends indicate that data size in only going to increase in the future with unabated increases in growth rates. The single version of the truth phrase has been used by analysts, industry experts, vendors and consultants numerous times, as if it is a panacea for all data warehousing and business intelligence (BI)-related issues. We all tend to forget that there is no silver bullet that can deliver a single version of truth with 100 percent success.
We have reached the stage where enterprises understand the urgent need to establish a single version of truth and they have just begun to add new architectural concepts such as master data management (MDM), customer data integration (CDI), enterprise information integration (EII), etc., but they are still drowning in an ever-growing population of data silos at physical levels and information silos at the logical level. Information silos get created when different business units in an organization develop their independent understanding of the same data assets.
The newest tools and technologies have a much higher probability of failure if organizations do not develop a long-term vision for enterprise data management (EDM), which should include people and process along with the data assets. The most critical process of EDM is data governance, which is the key to continued consistency and accuracy of data and information in an enterprise. EDM is too daunting of a task, especially in large organizations, to cover the whole enterprise at the same time. An evolutionary approach is recommended; start small and build it incrementally. Enterprise standards for naming logical and physical data assets are one of the foundation components of EDM, which facilitates and supports data and information governance efforts in any organization. As with most processes of EDM, it is recommended to start small and follow a build as we go approach for enterprise-naming conventions for data and information assets.
Why Naming Standards?
The core objectives of building and implementing naming standard in an enterprise are:
- Business as well as technical users should be able to describe any data entity or data element just by looking at its name. Users can be internal as well as external (vendors) to the organization.
- The name decided by more than one professional for an entity or a data element should be same if they are exposed to same business and technical descriptions of the data asset.
Describing and naming data correctly is critical. If it is done right, it can help an enterprise:
- Minimize misunderstandings among business functions, which can reduce the amount of total effort needed in a BI/DW project.
- Facilitate operational efficiency and strategic use of the data.
- Reduce time to introduce new products in market.
- Set, describe and achieve common business goals.
- Improve customer satisfaction.
Key Components of Data Naming Standards
Class words: Class words help classifying entities and attributes in broad categories. There are two types of Class words:
Entity class: Entity in a logical model corresponds to a table in physical model. Each entity is assigned to one entity class based on its primary business intent, e.g., asset (AT), document (DO), event (EV), location (Location), party (PA), rule (RU), structure (ST), transaction (TR), etc. Generally full name of an entity class should be used in a logical model, and it should be abbreviated to a two to three character code for naming a corresponding table in a physical model.
Attribute class: Attributes in a logical model correspond to columns in physical model. Each attribute is assigned to one attribute class based on the business function supported by the attribute. Attribute classes are closely and directly linked to column domains such as name, address, quantity, code, etc. in a logical model, which in turn defines the data type in physical model, format and kind of values that may be stored in the associated column. Attribute classes can be built up to any level in any organization depending upon the requirement. Organizations should have a library of attribute class words under at least the following major categories:
- Chronology represents a point in or span of time.
- Measurement represent capacity, quantity or count.
- Identification identifies a person, place or thing.
- Text identify free form or narrative data.
For example, an attribute class could be quantity (QY) or could go up to the level of units, volume, weight, etc.
Prime word or base noun: It identifies the application and subject area, major data category or model name, depending on the data object being defined. It may consist of a single word or phrase. E.g., account (ACCT), budget (BDGT), organization (ORG) vendor (VNDR), Transaction (TRANS), etc.
Prime words assignment, if done correctly, can also help in establishing the first level of data stewardship.
Modifier or qualifier: It defines and distinguishes prime and class words. It further describes the data object (entity and table) and attributes (column) beyond their classes and prime words. E.g., Employee-name versus Employee-First-name, where first is a modifying word.
Constructing Names in Data Models
You should build a library of entity classes, attribute classes, prime words and modifiers and their abbreviations before you actually start building names in a data model. It is generally not possible to build a comprehensive and completely mature library in the beginning of enterprise modeling efforts; a starting library should be mature enough not to require very frequent changes in the future. The library of class words and prime words should be built first and should be the most mature among all other libraries.
In a typical organization, logical modeling is done first and then physical data model is derived from logical modeling, depending upon the physical database infrastructure targeted in an organization.
Names in logical physical data models should:
- Be meaningful.
- Be self-explanatory (should not require very lengthy and detailed additional descriptions).
- Reflect business use or purpose.
- Be such that different people in the organization develop the same understanding of its content.
- Resemble each other as much as possible.
- Should contain full words as long as the total length threshold is not exceeded.
- If an abbreviation is required, it should be taken only from the approved list of abbreviations in the naming standard of the organization.
- Names should be singular, not plural.
Steps for constructing names in logical data models include:
- Step 1: Develop good business descriptions for each logical data object and data element (e.g., entity, attribute, etc.).
- Step 2: Develop a business name for each object from the previously described list and ensure that it contains a prime word and one or more modifiers. It is a good practice to ensure that the name of each entity additionally includes an entity class word. Abbreviations should only be used if necessary for meeting length restrictions.
- Step 3: Develop a business name for each attribute of the above business data objects and ensure that it contains at least one attribute class word and one modifier.
- Step 4: In a logical data model, class words, prime words and modifiers in a name may be separated by a single space.
Data Object Name: Entity Class word: Prime Word: One or more modifiers e.g., CC Published Course Catalog (CC is an Entity class word denoting the course catalog application)
Data Attribute Name: Attribute Class Word: One or more modifiers e.g., Course Catalog ID (ID is an attribute class word denoting Identification).
Steps for constructing names in a physical data model include:
- Step 1: Table and column names in a physical data model should be derived from the corresponding business names in a logical data model using the approved list of abbreviations in the organization.
- Step 2: While converting logical names into physical names, each column should default to specific data type and data length based on its attribute class word. If necessary, default values may be modified after conversion. E.g., an attribute containing TITLE (TTLE) may default to VARCHAR2 (Oracle) data type of 40 character length and an attribute containing CODE (CD) attribute class words may default to VARCHAR2 (Oracle) data type of 4 characters length.
Thus, implementation of data naming standard facilitates standardization of physical data types and data lengths, which helps tremendously in reducing overall development effort.
Table Name: Entity Class word: Prime Word: One or more modifiers e.g., CC_PBLSH_CRS_CAT
Column Name: Class Word: One or more modifiers e.g., CRS_CAT_ID
Useful rules for creating abbreviations:
- Use the singular form of each word.
- Use the root form of each word unless otherwise necessary. (i.e., Estimation = Estimate)
- Remove unnecessary vowels (i.e., course = crs or crse) except in acronyms.
- If word begins with vowel or combination of vowels, do not remove those vowels (i.e., automate = autmt).
- Drop the second consonant of a word with double consonant (i.e., payroll = pyrl).
- Drop unimportant modifiers.
- Avoid creating acronyms or abbreviations that result in an English word (i.e., number = no).
- Use industry standards like ARTS for commonly acceptable abbreviations and use it as a base to build the organizations approved list of abbreviations.
Data naming standards are the key foundation component of a sound data management practice. Early adopters of data naming standards can avoid costly rework in later stages of development work in an organization. I recommend starting small and building as you grow for establishing data naming standards in an organization. Establishing a naming standard in an organization where a lot of development work has already taken place will pose a significant challenge to data architects, but gradual adoption of data naming standards is better late then never.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access