Data warehouse development utilizes a surprising variety of technology types. An understanding of them will help you manage the warehousing process, make tool choices, determine skill requirements and plan your budget for successful data warehousing.
Data warehousing practice is aimed at understanding, collecting, organizing, storing, refreshing and delivering useful analytical information to a variety of end users. Let's look at the technology typically used in each of these six categories:
- Understanding data modeling, data dictionary, library
- Collecting data modeling, utilities, conversion and extraction, various hardware
- Organizing data transformation, data modeling, DBMS, DBMS platform
- Storing data transformation, DBMS, DBMS platform
- Refreshing data transformation, DBMS, DBMS platform
- Delivering DBMS, DBMS platform, query and reporting, data mining, data visualization, LAN/WAN, NOS, Internet and intranet, browser, workstation, personal productivity, operating systems, programming, usage monitoring
- Conversion. Conversion tools are required to convert formats, media and a variety of electronic forms into a common format that can be manipulated by transformation and loading tools.
- Data dictionary. A "place" where definitions of data elements are stored and retrieved. Such definitions are needed throughout development and operation of a data warehouse.
- Data mining. A form of analysis where special tools look for patterns and trends in data which may not be apparent or suspected by humans. (Optional)
- Data modeling. Tools for logical and physical definition of data elements and their relationships with each other.
- Data visualization. Specialized technology that can present data in visual forms, thereby possibly enabling insights into data not conceivable in standard forms. (Optional)
- DBMS (database management systems). Typically they are relational but may include other forms.
- Extraction. Tools that can extract subsets of data from data files based on sophisticated selection criteria.
- Internet and intranet browsers. Increasingly a common distribution mode for warehoused information. New warehouses and marts will invariably need Internet technology.
- LAN/WAN (local and wide area network), NOS (network operating system) and browser. Used to enable distribution of information to users of the data warehouse. A browser is a specialized tool for viewing data formatted for Internet/intranet documents.
- Library. 1) A storage and retrieval process for source data. 2) A collection of predefined analytical queries and/or reports for use by end users.
- Operating systems. Managers of platforms throughout the warehousing process. Windows and UNIX are examples.
- Personal productivity. Tools used in conjunction with delivered warehouse information spreadsheets, word processors, e-mail, presentation or graphics tools.
- Programming. Some specialized analytical requirements will necessitate program development often in such technology as Visual Basic, but may include other programming languages.
- Query and reporting. These tools enable sophisticated display and retrieval of data from the warehouse in response to users' analytical requests.
- Usage monitoring. After warehouses go into production, usage may vary dramatically and rapidly. You'll need tools to track and analyze information on usage patterns by user, time and a host of other factors.
- Utilities. A variety of tools for capture, replication, printing, copying and other file manipulations.
There are other kinds of software technology used in all these categories.
- Project management tools. Building a data warehouse is a complex process. Project management tools are usually required to define, sequence and track all the activities that must happen in the correct order.
- Meta data repository. Meta data is "data about data," and it is critical to successful data warehouse construction and operations. Specialized tools such as commercial repositories are available, or you can develop your own meta data management system using your warehouse DBMS.
- DASD (Direct Access Storage Device), DBMS platform. Hard disk or equivalent for storage of warehoused data. Rarely is a warehouse housed on the same platform as a production transaction system, because of the contention for resources and subsequent degradation of production work.
- Data input. For initial population and for periodic refresh of data, some form of input device and related software must be available.
- LAN/WAN. Used to enable distribution of information to users of the data warehouse. Includes related servers.
- Warehouse platform. Typically, a data warehouse or data mart is housed in its own processor/operating environment. These are often mini- computer or NT-class configurations for data marts, but they may also be very large mainframes in multi-terabyte warehouse configurations.
- Workstations. PC or equivalent desktop or laptop platforms, with operating systems, LAN/WAN or dial-up connections.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access