With data growing exponentially, business professionals use search technologies to find the information they need. However, search is inadequate for finding the structured data in enterprises transactional systems and data warehouses. Traditional business intelligence (BI) and reporting are the primary methods for locating this data, but their complexity normally requires IT assistance. The next generation of search will enable business users to easily and immediately search and discover the mission-critical structured data they need, without burdening IT.
Structured data discovery is the next wave in the search revolution. Data discovery revolutionizes how business professionals can leverage enterprises ever-expanding data assets to quickly and effectively access the data necessary to answer questions, make decisions and solve problems - thus changing the competitive dynamic with its speed and simplicity.
How Structured Data Discovery Differs
It is helpful to briefly review how unstructured data search works. A search engine uses one or more keywords to match items and displays the sorted results in a search results page. From there, users navigate via a link to another page displaying details about the returned data. To perform faster matching, the search engine collects metadata beforehand using a process called indexing. Search engines store the indexed information, but not the full content of each item. The full content remains at the data source, typically a document repository or Web site.
Structured and unstructured data search each present distinct challenges in terms of what is being searched, why search is being used and how search needs to work to solve the business problem at hand. Figure 1 summarizes the major differences between the two approaches.
Structured data search, also known as data discovery, is optimized for discovering data that is typically stored in databases and used as a complement to BI applications such as reporting and ad hoc query. After the user enters keywords, the search results page displays both physical and virtual database entities where those keywords reside.
Metadata about the keyword plus the relationships between keywords are as important as the keyword itself. For example, what does the number 2317 mean? Metadata in the form of a column name helps distinguish among 2317 B Street as a ship-to address, 2,317 as the number of units shipped, 23:17 as the shipment transaction time and 2317 as a shipped items ID number. Like BI, which shows the relationships between disparate data elements (for example, customers, invoices, shipments and returns), data discovery also shows data relationships. However, data discovery avoids manual modeling, instead automating the discovery of both the explicit relationships, as defined by existing schema, as well as the implicit relationships from the data values themselves. The latter capability is enabled using advanced, probabilistic correlation techniques.
Accessing data is typically done using ODBC and JDBC standards, while leveraging high-performance federated query algorithms to optimize the SELECT statements required. When viewing the data, users want to see data rows along with appropriate column headers. As such, structured search requires a spreadsheet-style workspace where users can easily add, move and delete rows and columns as they work with their data.
Security for structured data search is implemented by leveraging existing source-, column- and row-level security rules as implemented via corporate directories such as LDAP and Active Directory or via role-based rules specified in packaged applications.
Complementing Traditional BI and Reporting
In its original form, search was applied to BI as a way to complement hierarchical folder navigation or parameter-driven reporting methods. This assumed the data was available in an existing report.
However, BI, while providing significant top- and bottom-line benefits, has proven insufficient to meet the full range of structured data requirements with its existing reports and analytics. This gap is filled by structured data discovery.
Unanticipated requirements arise from new business opportunities. Unique combinations of data never before modeled or only partially covered in single reports are required as savvy business analysts and a range of analytical professionals such as engineers and researchers work with enterprise data. One-off requirements come and go in the course of daily business. Or, there is the immediate need to combine data from a newly acquired company, before the independent systems have been formally integrated into an enterprise-wide transaction system and data warehouse.
Prior to the availability of structured data discovery, only a few select users - those technically skilled to effectively use ad hoc query tools - could build the necessary reports. Industry analysts estimate that only 15 to 20 percent of users have these skills.1 Therefore, the remaining 80 to 85 percent require IT to produce the necessary reports - which is a costly and less agile method.
Structured data discovery fills this information gap in a simple, self-service way. When applied by business users in a process as previously described, structured data discovery provides an end result that is similar to ad hoc query analysis and reporting. However, the answers are derived far more quickly and easily due to structured data discoverys usage of new search and relationship discovery technologies that eliminate the need to know SQL or data modeling. Structured data discovery provides a complete solution that includes modern search paradigms, proper data security, intelligent leveraging of metadata and relationships, high-performance federated query, appropriate tabular display and refinement tools, and integration with Microsoft Excel. Together, these capabilities provide an easy and fast method for end users to retrieve and explore the data they need, with minimal to no IT assistance.
Adopting Structured Data Discovery
Enterprises typically adopt data discovery solutions in one of three ways: user-driven, IT-driven or embedded-solution adoption.
In the case of user-driven adoption, business professionals use data discovery as a self-service complement to IT-supported BI. For repeatable requirements, these professionals can create reusable reports, often called recipes, to discover up-to-the-minute information anytime. Recipes that stand the test of use and time serve as fully functioning prototypes for IT to jumpstart its traditional report development project lifecycles. This approach provides immediate benefit by allowing business users to handle their own fast-turn, ad hoc reporting requirements and by eliminating the need for ITs typical rigor and latency until that rigor and latency is absolutely required. By removing this noise from the queue, IT resources are freed to work on other priorities.
IT-driven adoption uses data discovery as a critical tool in the portfolio of BI and reporting solutions. Complementing existing BI analytics and reporting as well as the underlying physical data consolidation and virtual data federation infrastructures, data discovery solutions provide a fast and simple self-service tool for IT to provide to business users who lack deep SQL knowledge, data modeling skills or the ability to leverage advanced BI solutions. In this scenario, IT provides data discovery solutions to end users, just as it provides Microsoft Office products and the like.
A third adoption approach is for IT to embed data discovery functionality in high-level solutions, bolstering the business value of these solutions. For example, a large pharmaceutical company wanted to increase revenues, so it armed its sales representatives with up-to-date, aggregated views of relevant information about prospective customers (doctors) prior to sales calls. This data came from many sources, both internal (such as customer relationship management [CRM] activity history and prescription history) and external (such as speaking engagements and recent publications). To provide this data to the field, IT built a new portal solution that leveraged structured and unstructured search. In the field, sales representatives enter the doctors name as keywords in a unified search box found in their Web browsers, mobile devices or via email. Structured search retrieves relevant internal data from the internal systems, while unstructured search identifies external data. The union of these result sets is then displayed in the users entry devices.
With structured data discovery, justifying business value can be quite simple. Most enterprises spend millions of dollars every year on business analysts, managers and other analytical business professionals. Data is their livelihood. Yet, according to an Accenture study, managers typically spend a couple of hours per day looking for data. And discouragingly, these efforts fail to produce results nearly 50 percent of the time, even though the data exists.2
- What would it mean to the business if data search produced results 90 percent of the time instead of just 50 percent?
- What would it mean to the business to eliminate a big time waster and replace it with the effective means to discover and relate the data business users need to do their jobs?
- How much more value could IT deliver to the business by freeing 50 to 75 percent of the resources previously consumed to support businesses unique, one-off or temporal structured data needs?
Once justified, the route to implementation success can be quick and easy or long and hard, depending on the data discovery tool selected. The quick and easy approach is enabled by data discovery tools delivered within an appliance or via a software as a service (SaaS) model, both dramatically reducing installation time. In addition, data discovery tools that leverage existing schema and metadata while filling the gaps with automated relationship discovery are much quicker and easier to implement than those that require large, up-front investments in data modeling, natural language mapping and other manual activities. Remember, avoiding long and hard IT cycles is the goal. So, when considering data discovery, apply this litmus test to both implementation and operation.
Next-Generation Search Has Arrived
Daily, enterprises amass vast amounts of data to help their employees make informed decisions that increase customer satisfaction, contribute to profitability and improve day-to-day operational processes. Much of this information is stored in data warehouses and marts or intermediate virtual data stores, and is therefore not readily accessible using traditional search technology. Structured data discovery benefits both business users and IT teams by enabling business users to successfully find the information theyre looking for with little to no IT involvement. Looking for the next generation of search? It is available today as structured data discovery.
- Gartner Says Emerging Technologies will Marginalize ITs Role in BI. Gartner Press Release. March 18, 2008.
- Managers Say the Majority of Information Obtained for Their Work Is Useless, Accenture Survey Finds. Accenture Press Release. January 4, 2007.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access