Information Management's Glossary
A dashboard is a reporting tool that consolidates, aggregates and arranges measurements, metrics (measurements compared to a goal) and sometimes scorecards on a single screen so information can be monitored at a glance. Dashboards differ from scorecards in being tailored to monitor a specific role or generate metrics reflecting a particular point of view; typically they do not conform to a specific management methodology.
Items representing facts, text, graphics, bit-mapped images, sound, analog or digital live-video segments. Data is the raw material of a system supplied by data producers and is used by information consumers to create information.
Data access tools
An end-user oriented tool that allows users to build SQL queries by pointing and clicking on a list of tables and fields in the data warehouse.
Identification, selection and mapping of source data to target data. Detection of source data changes, data extraction techniques, timing of data extracts, data transformation techniques, frequency of database loads and levels of data summary are among the difficult data acquisition challenges.
Data analysis and presentation tools
Software that provides a logical view of data in a warehouse. Some create simple aliases for table and column names; others create data that identify the contents and location of data in the warehouse.
A combination of hardware, software, DBMSs and storage, all under one umbrella—a black box that yields high performance in both speed and storage, making the BI environment simpler and more useful to the users.
An individual, group, or application that receives data in the form of a collection. The data is used for query, analysis, and reporting.
The individual assigned the responsibility of operating systems, data centers, data warehouses, operational databases, and business operations in conformance with the policies and practices prescribed by the data owner.
A database about data and database structures. A catalog of all data elements, containing their names, structures, and information about their usage. A central location for metadata. Normally, data dictionaries are designed to store a limited set of available metadata, concentrating on the information relating to the data elements, databases, files and programs of implemented systems.
A collection of definitions, rules and advisories of data, designed to be used as a guide or reference with the data warehouse. The directory includes definitions, examples, relations, functions and equivalents in other environments.
The most elementary unit of data that can be identified and described in a dictionary or repository which cannot be subdivided.
Data enrichment is an activity that supplements and/or improves the existing data. Some techniques used to enrich data include: use of fuzzy logic to assist a search activity; accessing related data from other sources and bringing the data into a single virtual (for example, providing a link) or physical location; and, correcting misspellings (for example, if a city name of "New Yrok" is given in combination with a zip code of 10001, the city name may be corrected to "New York".
Data extraction software
Software that reads one or more sources of data and creates a new image of the data.
This term refers to a method of linking data from two or more physically different locations and making the access/linkage appear transparent, as if the data was co-located. Contrast this with a data warehouse method of housing data in one place and accessing data from that single location.
Data flow diagram
A diagram that shows the normal flow of data between services as well as the flow of data between data stores and services.
Data governance is the practice of organizing and implementing policies, procedures and standards for the effective use of an organization's structured/unstructured information assets.
Pulling together and reconciling dispersed data for analytic purposes that organizations have maintained in multiple, heterogeneous systems. Data needs to be accessed and extracted, moved and loaded, validated and cleaned, and standardized and transformed.
The process of populating the data warehouse. Data loading is provided by DBMS-specific load processes, DBMS insert processes, and independent fastload processes.
Controlling, protecting, and facilitating access to data in order to provide information consumers with timely access to the data they need. The functions provided by a database management system.
Data management software
Software that converts data into a unified format by taking derived data to create new fields, merging files, summarizing and filtering data; the process of reading data from operational systems. Data Management Software is also known as data extraction software.
The process of assigning a source data element to a target data element.
A subset of the data resource, usually oriented to a specific purpose or major data subject, that may be distributed to support business needs.
Data migration is the process of transferring data from repository to another.
A technique using software tools geared for the user who typically does not know exactly what he's searching for, but is looking for particular patterns or trends. Data mining is the process of sifting through large amounts of data to produce data content relationships. It can predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This is also known as data surfing.
A logical map that represents the inherent properties of the data independent of software, hardware or machine performance considerations. The model shows data elements grouped into records, as well as the association around those records.
A method used to define and analyze data requirements needed to support the business functions of an enterprise. These data requirements are recorded as a conceptual data model with associated data definitions. Data modeling defines the relationships between data elements and structures
The individual responsible for the policy and practice decisions of data. For business data, the individual may be called a business owner of the data.
The process of logically and/or physically partitioning data into segments that are more easily maintained or accessed. Current RDBMS systems provide this kind of distribution functionality. Partitioning of data aids in performance and utility processing.
A process of rotating the view of data.
A software service, organization, or person that provides data for update to a system-of-record.
Data profiling, a critical first step in data migration, automates the identification of problematic data and metadata and enables companies to correct inconsistencies, redundancies and inaccuracies in corporate databases.
The distribution of data from one or more source data warehouses to one or more local access databases, according to propagation rules. More generically, this term refers to a method of moving data from one location (a source) to another location (a target). This technique is most often used to populate a database but may also be used to move data so that it is more easily available to end users.
The narrow definition of data quality is that it's about data that is missing or incorrect. A broader definition is that data quality is achieved when a business uses data that is comprehensive, consistent, relevant and timely.
The process of copying a portion of a database from one environment to another and keeping the subsequent copies of the data in sync with the original source. Changes made to the original source are propagated to the copies of the data in other environments.
A logical partitioning of data where multiple databases that apply to specific applications or sets of applications reside. For example, several databases that support financial applications could reside in a single financial data repository.
The process of filtering, merging, decoding, and translating source data to create validated data for the data warehouse.
Data staging area
a data staging area is a system that stands between the legacy systems and the analytics system, usually a data warehouse and sometimes an ODS. The data staging area is considered the "back room" portion of the data warehouse environment. The data staging area is where the extract, transform and load(ETL) takes place and is out of bounds for end users.
The data steward acts as the conduit between information technology (IT) and the business portion of a company with both decision support and operational help. The data steward has the challenge of guaranteeing that the corporation's data is used to its fullest capacity.
A place where data is stored; data at rest. A generic term that includes databases and flat files.
Data strategy reflects all the ways you capture, store, manage and use information.
The continuous harmonization of data attribute values between two or more different systems, with the end result being the data attribute values are the same in all of the systems.
The process of moving data from one environment to another environment. An environment may be an application system or operating environment. See Data Transport.
Creating "information" from data. This includes decoding production data and merging of records from multiple DBMS formats. It is also known as data scrubbing or data cleansing.
The mechanism that moves data from a source to target environment. See Data Transfer.
Techniques for turning data into information by using the high capacity of the human brain to visually recognize patterns and trends. There are many specialized techniques designed to make particular kinds of visualization easy.
An implementation of an informational database used to store sharable data sourced from an operational database-of-record. It is typically a subject database that allows users to tap into a company's vast store of operational data to track and respond to business trends and facilitate forecasting and planning efforts.
Data warehouse architecture
An integrated set of products that enable the extraction and transformation of operational data to be loaded into a database for end-user analysis and reporting.
Data warehouse engines
Relational databases (RDBMS) and Multi-dimensional databases (MDBMS). Data warehouse engines require strong query capabilities, fast load mechanisms, and large storage requirements.
Data warehouse incremental celivery
A program that delivers one data warehouse increment from design review through implementation.
Data warehouse infrastructure
A combination of technologies and the interaction of technologies that support a data warehousing environment.
Data warehouse management tools
Software that extracts and transforms data from operational systems and loads it into the data warehouse.
Data warehouse network
An integrated network of data warehouses that contain sharable data propagated from a source data warehouse on the basis of information consumer demand. The warehouses are managed to control data redundancy and to promote effective use of the sharable data.
A large collection of data organized for rapid search and retrieval by a computer.
Database auditing is the ability to continuously monitor, record, analyze and report on all user-level database activity
Methods for creating successful marketing strategies and testing them.
The logical and physical definition of a database structure
DDL (Data Definition Language)
A language enabling the structure and instances of a database to be defined in a human and machine readable form. SQL contains DDL commands that can be used either interactively or within programming language source code to define databases and their components.
A centralized database that has been partitioned according to a business or end-user defined subject area. Typically ownership is also moved to the owners of the subject area.
A remote data source that users can query/access via a central gateway that provides a logical view of corporate data in terms that users can understand. The gateway parses and distributes queries in real time to remote data sources and returns result sets back to users.
Decision support system (DSS)
A decision support system or tool is one specifically designed to allow business end users to perform computer generated analyses of data on their own. This system supports exception reporting, stop light reporting, standard repository, data analysis and rule-based analysis.
A decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure. Decision tree has two other names: Regression trees approximate real-valued functions instead of being used for classification tasks. (e.g., estimate the price of a house or a patient's length of stay in a hospital) and Classification tree, if the Y is a categorical variable such as: sex (male or female), the result of a game (lose or win).
Deduplication, also known as record linkage, is the task of finding the same (duplicate) entry in multiple files. Deduplication is used when merging two or more data sets. Deduplication is a useful tool when performing data mining tasks, where the data originated from different sources or different organizations.
A degenerate dimension acts as a dimension key in the fact table but does not join a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions.
Delivery chain management
A strategy for interactively managing your customers and prospects at every touchpoint to create value, one customer at a time.
Only the data that was updated between the last extraction or snapshot process and the current execution of the extraction or snapshot.
Denormalized data store
A data store that does not comply to one or more of several normal forms. See Normalization.
Dependent data mart
Also called an architected data mart. Shares common business rules, semantics, and definitions. A Dependent Data Mart reads meta data from a central meta data repository to define local meta data. This ensures that all components of the architecture are linked via common meta data.
Data that is the result of a computational step applied to reference of event data. Derived data is the result either of relating two or more elements of a single transaction (such as an aggregation), or of relating one or more elements of a transaction to an external algorithm or rule.
The mechanism by which customers specify exactly what they need.
Query and analysis tools that access the source database or data warehouse across a network using an appropriate database interface. An application that manages the human interface for data producers and information consumers.
A dimension is a structural attribute of a cube that is a list of members, all of which are of a similar type in the user's perception of the data. For example, all months, quarters, years, etc., make up a time dimension; likewise all cities, regions, countries, etc., make up a geography dimension. A dimension acts as an index for identifying values within a multidimensional array. If one member of the dimension is selected, then the remaining dimensions in which a range of members (or all members) are selected defines a sub-cube. If all but two dimensions have a single member selected, the remaining two dimensions define a spreadsheet (or a "slice" or a "page"). If all dimensions have a single member selected, then a single cell is defined. Dimensions offer a very concise, intuitive way of organizing and selecting data for retrieval, exploration and analysis.
A dimension outrigger is a second-level dimension table that further defines and gives meaning to the model.
Inconsistent, missing, incomplete, or erroneous data. Source data often contains a high percentage of "dirty" data.
A protocol and associated execution to recover lost computing-system usage (applications), data and data transactions committed up to the moment of system loss.
Document object model (DOM)
Document object model is a platform and language neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents.
Document type definition (DTD)
Document type definition is a text file that specifies the meaning of each tag.
Internet-based companies that rely on digital technology and the use of the Web as the primary communication and interaction media.
A general condition wherein users cannot use or access computing systems, applications, data or information for a broad variety of reasons.
Distributed Relational Database Architecture. A database access standard defined by IBM.
The ability to "drill down" to any dimension without having to follow the predefined drill paths established by an organization's IT department.*
A method of exploring detailed data that was used in creating a summary level of data. Drill down levels depend on the granularity of the data in the data warehouse.
See decision support system.
Data Warehouse Administrator.
Dynamic data exchange
An industry standard accepted by most horizontal application software for exchanging data among different software programs.
A data dictionary that an application program accesses at run time.
Dynamically constructed SQL that is usually constructed by desktop-resident query tools. Queries that are not preprocessed and are prepared and executed at run time.