Companies of all sizes everywhere are engaging in business intelligence. When done properly, nothing adds value to business in today's economy more than BI. It's the face of the modern competitive landscape. However, in laying down the technical foundation to support the underlying data, these same companies are faced with an exponentially higher number of variations and distinct departures from the traditional online transactional processing database management systems than ever before. These options have been created in recent years largely due to the high cost of ownership of the traditional approaches, as well as the hit-or-miss nature of companies in achieving acceptable performance in their analytics. Analytic queries lie at the heart of BI. However, analytic query and daily reporting have fundamental differences. With analytic query, the results are desired immediately. Each query can also be a spur-of-the-moment, fleeting thought - the first in a long series of thoughts that lead to business change. These thoughts can only be capitalized on if query performance is not just good, but great. Large volumes exacerbate the performance difference between acceptable and unacceptable analytic query performance. Analytic query performance continues to be difficult to achieve in the traditional architectures. Any organization that has lived with a data warehouse or other large data store in production for a while knows that manageability for change is just as essential. This includes the IT department who executes the changes, as well as the users in the business who deal with service levels provided by IT. Ironically, what usually leads to difficult-to-manage databases is trying to achieve acceptable query performance. Manageability and performance go hand in hand. Under time pressures to deliver reports, well-crafted data models that facilitate a solid understanding of the database to the user and deliver performance to the queries are increasingly harder to find. Although it lacks standards, the quick-turn data model, often stays around for years. Once the query environment adapts to a model, regardless of the difficulty in doing so, the model sticks and a patchwork of performance techniques is embedded into the culture.  Today, typical data management environments have several restrictions they’ve "learned to live with” in seeking acceptable analytical performance. Indexes, summary tables, cubes, utility executions like reorganizations and various denormalizations are often introduced into the environment. There's a point at which overcoming these challenges ceases to be easy and affordable and therefore ceases to get done. But perhaps more important than these very real restrictions are the limitations in thinking about possibilities and information exploitation. Well-crafted data models as well as the performance tuning can be acceptably accomplished with expert data modelers and database administrators. However, these are highly mobile individuals, and tying environmental performance to their availability and cost is less appealing.

While results are what matters most when it comes to a database management system, having an understanding of what goes on at the molecular (i.e., bits and bytes) level is helpful in understanding the fundamental differences between traditional and newer approaches.  Data is ultimately stored as bytes. These bytes are grouped by the several thousand, from 4,000 to 64,000, into “pages” or “blocks” is the unit of input/output – an exception being the “prefetch” capability of row-oriented database management systems sensing a pattern in the reads. In a row-wise oriented DBMS, blocks contain headers and footers comprising metadata that organize the information the block contains. The footer contains pointers to where the records begin in the block. Records can be fixed or variable length. Each column in the record, stored consecutively in a traditional DBMS, is comprised of enough bytes to represent the value. In the case of a variable length column, an extra 2 bytes precede the column which contain its length. In the case of a nullable column, an extra byte precedes the column containing a yes/no condition for the nullability of the column.  Data type determines the column length. For example, an integer is 4 bytes (and 4 bytes can contain numbers up to +/- 32,000). A character (10) is 10 characters since each character is a byte. And so on.  Records, which always contain a few, fixed bytes in the header, can be a few bytes up to thousands of bytes, but they cannot span the block. Some data types, like unstructured data, stored in file structures, can be pointed to from columns which are just pointers. Block headers, on the other hand, contain valuable statistics about the block, such as the number of records contained, the timestamp of the last statistics run and so on. Blocks can contain records from multiple tables, and they may be interwoven, depending on the planned or unplanned co-location strategy presented to the DBMS in the data definition language.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access