Before diving into some guidelines for constructing and testing classifications, let’s start with terminology.
Taxonomy: The science or technique of naming, identifying and classifying. This article explains aspects of a general taxonomy.
Domain: A group of items or objects characterized by a specific feature. The two domains used as examples in this article are financial securities and songs.
Classification (noun): A collection of ordered categories (delineated along one or more features) for a specific domain.
Category: A subset of similar domain objects. A category can be of group of more detailed categories.
Level: All the categories with the same parent category (as in multilevel classifications).
As a simple example, consider a classification of music – you might identify the categories (or genres) as classical, rock, country, pop, jazz, or hip hop. Consider food: Vegetables, legumes, fruit, meat, fish and dairy are all categories of the classification. If we wish to have more specificity of vegetables, we might have root vegetables (e.g., carrots, beets), plant vegetables (e.g., collards, lettuce, peas) and so on. When we include subcategories of vegetables, we are now talking about a multilevel classification.
Classification Benefits
The benefits of classifications are numerous. We refer to classifications when we communicate with colleagues. We name classifications so that systems can label them and attach qualities to them.
Well-documented classifications become part of our language. Classifications improve our communication; they reduce probability of interpretation errors or reporting errors, and provide the ability to differentiate, compare and analyze. From a purely utilitarian standpoint, they enable us to more quickly locate a single data item we really want. A classification enables a tree-based search – for each level we traverse, the closer we get to our desired target domain object.
The larger the universe of data items being classified, the more valuable the classification. A large universe of domain objects lends itself well to a multilevel classification. For a large universe, a single level classification likely has too many domain objects associated with each category, reducing its ability to target a small, targeted group. In financial services, would an asset management firm be happy with classifying all its securities into just one level such as Equities, Fixed Income, Commodities, Currencies, and Derivatives? It is unlikely the firm would b e content with this, because a good-sized asset management firm may keep track of more than 20,000 securities.
Constructing the Classification
When building classifications, ensure all categories apply to the same domain. Getting back to music, categories such as ‘70s or ‘80s songs would not be consistent with genres. However, you could have a separate time-based classification. Both genre and decade would be fair - but separate - classifications.
Here are a few points to consider when building a new classification:
- Give the classification a name so it clearly represents its domain.
- When considering a category, ask yourself whether it can apply to all objects. If so, it is probably an attribute of the domain object (like the recording technique of a song or whether a security’s income is taxable).
- If your new category is Boolean (usually yes or no), it may or may not be a good choice for a classification category. For example, in financial services, IS_TAXABLE is not a good category since it is really an attribute (every domain object has this attribute). However, the Boolean category IS_CONVERTIBLE is acceptable because it only applies to a subset of categories (e.g., fixed income).
- Each category should not be defined too narrowly (very few domain objects fall into this category); if so, it may rarely (if ever) get assigned to the domain objects and even more importantly, used rarely in reporting.
- Similar to the previous point, ensure some dispersion. If the expected time frame before a classification is first assigned is relatively long (for instance more than one year) it may be better to keep “unused/future” categories on the drawing board. With several categories having very or no few domain objects, you can expect business users to complain that a search using the category is not working. Seeing the category implies there are members.
- Stay brutally focused on the domain. Consider another example from financial services: Many firms tend to consider “municipals” as a type of security, putting it alongside other fixed income categories like convertibles, mortgage backed, floating rate notes, etc. But the term “municipal” actually describes the issuer, not the security. When you test the classification (see below), you’ll see it fails the test. However, “municipal” is a very reasonable issuer classification
- Socialize it, because feedback will foster consensus. Encourage multiple, frequent reviews to achieve incremental agreement and have several sample domain objects for each category in the classification. This aids understanding by the users of the classification.
- As a rule of thumb, define no more than 10 to 12 categories under the level above. More than that suggests a higher-level grouping or level in between may be beneficial. For example, you may introduce a level in between, with three categories, where each spans out to four (which equals the original 12). Consider how often manual assignment of the classification will be made.
- Try to select one or two words to name a category if possible, because they will likely be displayed in an application GUI tree or displayed on a report. If you make the category names too long, they’ll be harder to refer to, and users will abbreviate them with acronyms.
- In a multilevel classification, create as few “Other” categories as possible and place them at the leaf (lowest) level.
- In a multilevel classification, every category should be a formal subset of its parent category. If the new lower category can capture domain objects in any other level above, it is likely not a good candidate (or it may be an attribute of the domain object). An example of this is:
- Level 1: Debt
- Level 2: Issuer Backed
- Level 2: Money Market
- Level 3: Commercial Paper











Be the first to comment on this post using the section below.