Continue in 2 seconds

Data Modeling and the Internet

  • Bob Schmidt, David Downs
  • October 01 1998, 1:00am EDT
More in


Astounding, mundane, pornographic, democratic, exhibitionist, voyeuristic--the Internet is all these things. No one site can be so described, but the lines between sites become meaningless to the searcher who scans from Australia to Austria without even being conscious of it. Marshall McCluhan's global village implodes so that now the entire planet seems to fit in my shirt pocket; computer screens transfix us as if we were children with new magnifying glasses.

Attempts to retrofit structure to human language have proven impossible. Search engines return the UPI story, the way the same story was printed in the Times and again in the local paper. Search for "modeling" and you will miss stories on "modelling." You will be overwhelmed with stories on model trains and just models.

In his article, David Downs plays with the idea that the Internet is another great idea like the invention of fire. The changing technology would have us think we are moving at Web-speed when in fact we are all still operating on some basic problems of turning repetitive data into knowledge. Entertainment has evolved from stories told by elders at campfires to MTV, but the mission has remained the same. The entertainment/ brochure table part of the Web follows the rules of human language. Human language sells, seduces and entertains. Data processing has evolved from parchment to, but still-- somehow--millions of observations have to be classified, measured, sorted and studied to create knowledge. The part of the Web that tracks packages, flights and books must be based on sound data modeling principles. The Internet is astounding, but it does not change the fundamental ways in which data is organized.

Once upon a time, a leading thinker of his era called Thag figured out how to harness the energy of fire for his fellow cave people. The innovation was recognized at once, and fire became a hot item. With such a lucrative market, any one at all who thought he knew anything at all about fire was trying to market his hot ideas. The fire-hungry public didn't know how to judge their fire vendors and, unfortunately, many of them got burned.

Eventually, leading authorities identified certain fire regulations which became the guidelines for the industry. Then a new innovation came along. Someone, we'll call her Gruff, had figured out how to develop a fire with the use of rocks and sticks. This ignited a whole new era for the fire industry. Instead of having to carry a fire with them whenever they thought they might need one, the cave people could just take their fire producing tools and start one when there was a need. The cave folk rekindled their love affair with fire. Folks were lighting up everywhere, forgetting some of the principles of fire management that had been established before the advent of the portable fire--and unfortunately, again, many of them got burned.

There are two points to be made in relating the trials of fire. First, the emergence of a new idea or field of endeavor is characterized by a flurry of ad hoc and chaotic activity. It is hard to differentiate those who really know something about the subject from those who only know that it sells. Second, when changes and adjustments occur to what we have established as the accepted body of knowledge, we fail to recognize how much of what we already knew is still relevant.

It is easy enough to replace the development of fire, as related in the story, with that of the Internet and Web-based applications. The impact of the Web is just being determined, but its potential is great. The scramble to support the demand has produced a plethora of Web applications, but most are not living up to the potential. Yet the concepts needed to create successful Web-based applications are not all that different from those needed to succeed with pre-Web applications. Many of the essential concepts are found in data modeling, which is analogous to the fire regulations developed after Thag. Understanding and controlling the application's information needs through data models made sense in the pre-Web world. Applying good data modeling concepts in an Internet and Web world is a good practice for all the same reasons.

We're all familiar with the criticisms regarding the usefulness, friendliness and ease of use of the pages and sites encountered on the Web:"Too much junk;""It takes too long to find what I want;" "Badly designed." Is good design just a matter of opinion or are there more objective criteria for judging the usefulness of a Web-based application? Good Web pages don't just happen by chance. Data modeling concepts serve as a good approach to Internet design and supply the means to address the characteristics that a well-designed Web page must meet.

Ready, Aim, Then Fire

Someone browsing the Web site for the Hierarchical Database Fire Company gets frustrated with the seemingly endless traversal of links to get to what is really wanted. Another person checking out the Flat File Fire Foundation is presented with a page with everything on it which takes forever to download--and then still needs to be sorted to obtain the desired information. Both of these individuals return to the same sites some time later looking for the latest update in a particular area and find that on the first site the information is inconsistent and on the second it is out of date.

Is there a set of criteria by which to plan, build and then judge the design and implementation of Web applications? It all starts with an idea, need or opportunity to provide information to someone for some purpose. Any criteria must indicate how well the implementation meets the intended purpose. Too much junk, bad organization and/or poor response time can detract from that purpose. As time goes by, other considerations impact the ability of the system to meet its objectives. These include out-of-date information, out- of-date links and not using the latest presentation add-ins. A good Internet system is designed to be efficient and robust. It doesn't happen by accident.

The reasons we migrated to relational databases and modeled our data in relational terms to begin with was that data modeling allowed us to get to information more intuitively and reduce its redundancy. It allowed us to provide designs that were adaptable to change by localizing the changes and avoiding delete and update anomalies. The Internet has the same requirements. Thus, good Internet development requires a well-designed "data" model to be judged by data modeling concepts.

The Heat of the Matter

Yes, it's all about data: understanding it, managing it. Client/server hasn't eliminated the need to think about data clearly. Neither has data warehousing nor object- oriented design--and neither has the Internet.

But what exactly is data--entities, attributes, relationships and identifiers? An entity is a thing of relevance about which information can be kept. An attribute is a quantitative or descriptive characteristic of the entity. A relationshipis a reason of relevance why one or more entities may be associated to another. An identifier (or key) is a way of distinguishing a unique occurrence of an entity.

Data has traditionally been understood as text or numeric fields of rather limited length. In an information technology context, data can be represented easily as a string of bits (1s and 0s). Information is provided when the values of the data are represented in a certain context or in relationship to something. Fire is fire but means something different when associated with a steak instead of a can of gasoline. Data becomes information based on its relationship and context. The adjustment some of us have to make is to appreciate the other incarnations of data. Pictures, audio and video files are other representations of data. Such data can still be represented as strings of 1s and 0s. Such data can still derive meaning in relationship to its environment. It can be the identifying key of a unique piece of information.

For the most part, the Internet is really an enabled document management system with pictures, video and sound files, and lots and lots of words. The value of the Internet is in disseminating to a diverse and dispersed population the library of information in all its interesting forms. The logical data model for these elements looks the same regardless of the final representation. This is a fundamental concept.

Containing the Fire

Another contextual change for some of us is in seeing the rules of normalization more broadly. Normalization applies to any data or information structure. Although originally oriented to data representation for storage purposes in a relational database context, the concepts are applicable to any information storage and retrieval application. The overriding principles of affinity, coupling and cohesion in order to group and store information apply.

Normalization is a design method which provides a straightforward transition from model to representation. The techniques reduce redundancies, provide update and delete integrity, provide stability for adding and deleting entities, attributes and relationships so that they don't require restructuring, decrease I/O and result in smaller transfer units. If the normalization rules are translated, they might provide a more useful and slightly different view of normalized data.

First Normal Form: An entity is in first normal form when all occurrences of a row type must contain the same number of fields. In other words, there are no repeating groups. The entity contains atomic values only. If Gruff were developing a Web page to advertise the types of flames she could offer, the data model could not contain an entity of fire with a repeating group of attributes for the methods used to start the fire.

Second Normal Form: An entity is in Second Normal Form if it has a key to uniquely identify the row (occurrence) and every non-key attribute is dependent on the key. If a change happens to part of the key that causes attributes throughout to also change, the entity is not in second normal form. If Gruff's Web page model indicated suppliers for the various flames in a supplier/flame entity, it would not be in second normal form since certain attributes would be dependent on the supplier part of the key and certain attributes would be dependent on the flame product part of the key. If the last supplier of a particular flame were deleted from the flame supplier table, no information about the flame would remain either.

Third Normal Form: An entity is in Third Normal Form if the entire primary key or candidate key is needed to identify every other data item in the entity occurrence, and no data item is identified by a data item that is not the key. Third Normal Form resolves conditions where a non-key field is a fact about another non-key field

These three normal forms are always remembered best by the saying that each attribute must be a fact about "the key, the whole key and nothing but the key, so help me Codd." If we allow keys to be represented by images or other non-traditional representations, we are well on the way to designing good Web pages. Traditionally, the key we use for a person has been an employee number or social security number or some other artificial numeric assignment. How much more useful to use a picture or a fingerprint depending on our needs.

One challenge in many data modeling sessions is the name given to something. Often the same description is defined by more than one tag or, conversely, the same name will mean two (or more) different things to two different sets of people. The resolution always comes from understanding the description of the entity rather than the name it is given. If a description is really a better identifier, then why not use it as such? And if a picture is worth a thousand words, why not use a picture to describe the entity under consideration? We're just not used to thinking that way.

Turning Up the Heat

Developing a normalized data model of the application can go a long way toward making it user friendly. There are also other data modeling concepts that can be applied to the design. Consider the cardinality of the relationships: one-to-one, one-to-many, many- to-many. The direction of the cardinality in the one-to-many relationship will help to determine which entities get listed together and which ones need to be separated. For example, if each Flame Service Center has many qualified repair people, they should be listed by repair center. The Web page design should reflect the one-to-many nature of the relationship.

Rarely does an entity participate in only one relationship. When many ends of a one-to-many relationship terminate at an entity, it is a good candidate to be a listing of the key or representation of a key to which the details can be linked. A useful concept is understanding few-to-many relationships. When a many-to-many relationship exists, often one part is not as numerous as the other. For example, a flame may come in several colors--red, orange, yellow and maybe blue--but generally not too many, while the number of things it can burn is great. This would be a few-to-many relationship. This few-to-many situation suggests that the things that can burn be listed by the flame color rather than the other way around.

Lessons from History

So, we've explored ways in which data modeling concepts can improve the design of Web pages, but how can data modeling concepts help the design be resilient to change? Perhaps if we were able to predict the future, our design would be in better shape for anticipated changes. The trick in accomplishing a design for the future is to logically apply the normalization principles and not be seduced into designing for performance considerations. An attribute in one context may have the meaning of an entity in another. The more time spent on understanding the data from all different perspectives, the more adaptable the design will be in the future.

History repeats itself. The history of new development paradigms is strewn with undisciplined approaches, and the results are strong evidence of the need for a solid design approach. Data modeling allowed us to develop good relational database systems by normalizing data to get to it more intuitively, reduce redundancy, avoid delete and update anomalies and provide design that was adaptable to change. Unfortunately we associate data modeling only with relational databases and, by so doing, neglect the valuable principles of modeling that still apply in the Internet world.

Although we may have the ability to start a fire anytime we like, we may still want to consider how we structure and fuel it.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access