This is the final installment of a series covering the changing role of database administration. The first two parts of this series discussed how the Internet changes the role of the DBA and the need for additional DBA duties to manage procedural database objects such as triggers, stored procedures and user- defined functions.
In this installment we will take a look at additional features, functions and trends impacting database administration. First of all, organizations are storing more and more data for longer and longer periods of time. Sometimes, the same data is stored two, three or more times for different processing purposes. As the amount of data increases, so does the size of databases. The bigger the databases, the more difficult the job of the DBA becomes.
From VLDB to VHDB
Databases are growing in size. I have talked to hundreds of DBAs and visited many sites. None of them report that their databases are getting smaller. This has been true for many years, but the pace at which database sizes are growing is at an all time high.
For several years, database experts have used the acronym VLDB (very large database) to refer to the largest databases in production environments. However, some production databases are now approaching a petabyte in size. Figure 1 lists petabyte size and where we will inevitably go from there. It demeans a database of this size to call it large; this is a huge database, hence the updated terminology VHDB, or very huge database.
There are many factors causing this unprecedented growth. The need to store complex, unstructured data types such as audio and video data tends to cause database size to increase. Most relational database systems provide extensions to support multimedia data such as audio, video and images. Examples of such extensions include DataBlades for Informix and Type Extenders for DB2 UDB. Unstructured data is typically much larger in size than simpler data types. Consider the following: To store an hour of video requires about 1GB of storage, audio requires a little less than 1MB per minute, and images can range from as little as 20KB to as much as 60MB depending on the type and quality of the image. Therefore, storing multimedia data in the database unquestionably increases its size.
Figure 1: Storage Abbreviations
Another trend causing databases to increase in size is data warehousing. Organizations are storing more data for longer periods of time for analytical processing that creates business intelligence out of enterprise data. They are also creating a lot of indexes on this data to enable rapid data access. Indexes require even more storage space, further increasing the overall size of the database. As more data is maintained for longer periods, and sometimes duplicated multiple times throughout an organization, data storage and maintenance needs increase.
Data mining is fast becoming a requirement for modern businesses. Data mining applies heuristic algorithms to historical data to discover new patterns that can be exploited for competitive advantage. The more data there is, the better the ability to mine the data. Of course, other factors influence the quality of data mining including overall data quality and the robustness of the pattern discovery algorithms. Regardless, data mining causes organizations to store more data for longer periods of time.
Hardware improvements also spur this growth along. The ability to cheaply store multiple gigabytes of information enables the creation, storage and access of these VHDBs. However, the speed of access has not kept up with the volume of storage available. The amount of storage space on a disk drive has grown nearly three orders of magnitude in the past 25 years, but the data exchange rate has increased by about only one order of magnitude during that same time. This causes hardware and DBMS vendors to keep pace by requiring additional main storage, caching data in memory, enabling parallel data access and other techniques all of which complicates database administration.
The net result of increasing database size is that the largest production databases are unmanageable even with the best tools that money can buy. Day-to-day administration becomes increasingly difficult. Some administration tasks may not make sense at all. If you can rebuild the entire database quicker than it would take to back up and recover it, then why bother with a backup and recovery strategy? Instead, deal with a recreation strategy. The manageability of a database is dependent upon many factors including:
- Your individual environment (hardware, software, DBMS, applications, etc.).
- Your staffing levels.
- Availability requirements (e.g., 24x7 vs. batch window).
- Concurrent workload requirements.
- Availability of DBA tools.
- Your overall software budget.
The bottom line is that DBAs need to monitor the pace of database growth and plan for the resources that are required to manage the ever-increasing database portfolio. Failure to do so can result in system downtime, lost data and system degradation.
The PDA DBA
Another information technology (IT) trend is the proliferation of small form-factor devices known as personal digital assistants (PDAs). Examples of such devices include the Palm Pilot, the Handspring Visor and the PocketPC. The major DBMS vendors are racing to provide small footprint versions of their flagship DBMSs to run on PDAs. For example, IBM offers DB2 Everyplace, Oracle markets Oracle8i Lite and Sybase has the Adaptive Server Anywhere.
One characteristic of all PDA databases is their size; they are small. These databases are designed for low-cost, low- power, small form-factor devices such as PDAs and hand-held personal computers (HPCs). The general idea is to store a small amount of critical data on the PDA that is later synchronized to other, more complete and long-term data stores.
The DBA's job will indeed be impacted by this development. A PDA database should not require the in-depth tuning and administration required of enterprise databases. The DBA will need to be involved in the initial design of PDA databases to ensure they are optimal for the hand-held environment. However, design is a minimal concern.
The big impact will be in planning for and managing the data synchronization from hundreds or thousands of PDAs. When should synchronization be scheduled? How will it impact applications that use large production databases that are involved in the synchronization? How can you ensure that a mobile user will synchronize his data reliably and on schedule?
These are not minor issues. Before enabling a large contingent of PDA database users, make sure that your DBA staff is prepared for the impact to the enterprise databases. Be ready to support the inevitable request for this technology by understanding data synchronization technology and the potential need for remote database users at your organization. Next, apply some reason to which application systems in your organization might be impacted first. Those with remote workers, such as sales systems or delivery tracking, are likely to be the first impacted. Take some time to review the data requirements of those applications and how a large influx of remote connections might impact the current systems.
Mobile workers are here to stay, and DBAs need to be ready to support them with a valid, shared data infrastructure.
Yet another factor impacting the modern DBA is heterogeneity. Most organizations have multiple DBMS products. Studies have shown that most mid- to large-sized organizations have from three to ten different DBMS products. This heterogeneity makes database administration more difficult. Although most of today's development uses relational technology, there are many existing legacy applications using hierarchical (IMS) and network (IDMS) DBMSs. A thorough understanding of Oracle, for example, will not help a DBA who must also administer IMS databases. When the database model differs, the DBA must be prepared to perform almost completely different jobs.
Each of the relational products operates differently, too. You cannot take a DB2 DBA, sit him in front of an Oracle database and expect him to efficiently administer it. There are many differences in the implementation, data definition language (DDL), structured query language (SQL) and underlying mechanics of the DBMSs. For example, locking is implemented very differently in DB2 than in Oracle. Oracle does not take READ locks, whereas DB2 does, and this is just one small example. Consider how difficult it becomes to understand not just DB2 and Oracle, but perhaps Microsoft SQL Server and Informix as well.
There are DBA tools available that mask some of these differences, but not every shop owns these tools. Therefore, DBAs need to understand every nuance of every DBMS in their shops. This is a difficult, if not impossible task.
The Speed of Change
A final impediment to DBA efficiency is the speed of change. Most organizations cannot implement new DBMS versions as fast as the vendors deliver them. In just the year 2000, we saw Microsoft move from SQL Server 7 to SQL Server 2000, IBM upgrade from DB2 UDB V6 to V7, and Oracle announce Oracle9i as the follow-on to Oracle8i.
Many users are still running old versions of their DBMSs because it requires a lot of migration and testing effort to move from one version to the next. This is so even when the vendors make an effort to simplify the process. Programs need to be run to ensure that performance is similar for the new release. Sometimes data needs to be migrated to a new format and older features are increasingly being dropped from new releases, which can require reprogramming of functional programs and databases.
Database management is becoming more complex which, in turn, is driving up the level of skills required to successfully implement database administration. Today's DBA requires skills not just in database design and tuning, but in other key area such as:
- Integrating database applications with the Web.
- Incorporating logic into the database using triggers, UDFs and stored procedures.
- Storing and manipulating large and complex types of data such as multimedia data.
- Managing data transfer from intermittently connected devices.
- Dealing with rapidly changing technology and products.
The modern DBA must possess more skills than most technicians can master. One of the ways to reduce this problem is through intelligent automation. As IT professionals, we have helped to deliver systems that automate multiple jobs throughout our organizations, but we have yet to intelligently automate our DBA tasks. By automating some of the tedious day-to- day tasks of database administration, we can gain some time to learn about new RDBMS features and implement them appropriately.
However, simple automation is not sufficient. The software should be able to intelligently monitor, analyze and optimize applications using past, present and future analysis of collected data. Simply stated, the software should work the way a consultant works fulfilling the role of a trusted advisor.
By deploying intelligent automation, the underlying complexity of the database environment can be somewhat simplified, enabling the DBA to successfully manage the modern database environment. In this day and age, anything less is a recipe for failure.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access