What is the first thing that anyone surfing the Internet does, when he or she is asked to furnish some information or advice or data? Well, it’s a no-brainer. They pull the information from one of the popular search engines. They key in words and then sift through the searched Web pages to select those that are of interest or relevance to them as they try to pull out information. My research has revealed that most of any user’s search keywords are limited to his current personal and professional life. In other words, most users have a finite (and repetitive) set of keywords that they use to pull content from the vast Internet.
With this finding comes the thought that instead of pulling content from the Internet using search engines, we should have a mechanism to pull this finite (and repetitive) content from the Internet, sort it by recency, frequency, time windows or relevancy (RFTR) and have it pushed to us – either to our email inboxes or to our mobile phones as SMS. This would not only make our handheld gadgets more intelligent but also bring a paradigm shift in the search engine world: a combination of search engine functionality, text mining and text analytics facilitates the concept behind offline personal content push engines.
Current Search Engine Technology
Today, existing search engines help us pull content from the Web using a set of keywords which the user provides. It does this mostly through the information retrieval route, using pattern matching and querying, and to some extent, using basic semantics to provide us with some additional relevant information. This works fine for millions of users, and there is no questioning this fact.
Overview of Information Retrieval, Text Analytics and Knowledge Discovery
Information retrieval is the process of taking a set of keywords and finding the relevant pattern matches in the vast body of content that is available in various forms (such as blogs, forums, Twitter, Facebook, news sites, etc.) on the Internet. These pattern matches are also aided with additional information like the time period of relevance.
Text analytics is the process of analyzing relevant content that has been fetched from the information retrieval process. This analysis involves finding parts of the text with the most significance to the user-provided search keywords. It also involves other aspects, such as determining the opinions and/or sentiments within the relevant content and finding the affinity or context sensitivity of the content to user-provided search keywords.
The most distilled form of text analytics involves analyzing the relevant content to discover the intelligence from these information sets. This is also called knowledge discovery.
Offline Personal Content Push Components
Instead of sitting in front of the browser and typing in relevant keywords to pull out content, it is possible to create a profile and use this to automatically extract content and then push it to our mobile phones (in the form of SMS) or to our email boxes. This goal is aided by text mining and text analytics technologies to ensure that only the content that is recent, most frequent, most relevant or within a particular time window is pushed to our mobile device or email.
Figure 1 is a basic architecture for this type of offline content push technology. The primary components are:
- Personal profiler. This is the configurable arm of the offline content push technology. Here the user defines his current search preferences based on RFTR. A single user can create multiple profiles and can define any combination of search keywords, in any business area or domain and ideally in any language.
- Information retrieval engine. This is a simple Web crawler or search engine API that will use the information in the profiler as keywords and then search the Web for the relevant content based on the search preferences configured in the profiler.
- Landing database. This is the database in which the content fetched by the IR engine will be initially stored.
- Text analytics engine. This mines the data in the landing database to identify only those parts of the content which are of relevance to the user, based on the search configuration details provided in the personal profiler. This stage involves the standard text mining activities like text clustering, text classification and categorization, affinity analysis, sentiment analysis and opinion analysis.
- Distilled database. This will contain the personal profile-specific content that was distilled by the text analytics engine. The data can be stored in multiple forms, such as recent content, most frequent content, content falling in a particular time-window or most relevant content.
- Simple message builder. This is a process where, based on the data in the distilled database, highly concentrated and relevant content is chosen and built into simple messages. Semantic technologies are used to extract knowledge out of the chosen content. This knowledge and a chosen set of distilled content are used to build these simple messages, which could be a one-line eye-catching title or a five-line abstract, with or without reference to the underlying detailed content.
- Message alerter. This is a simple process that takes the messages built by the message builder and sends them to the user whose personal profile was used to generate these messages. These messages can be sent to the user either in the form of SMS or emails. The user communication information and preferences are part of the personal profiler.
Flexibility of the Personalized Content Push Engine Mechanism
The above architecture for an offline personal content push engine provides enough scope for automation for a personal content push mechanism through Web services. The underlying technologies can also be flexible – using open source technologies, industry standard products or through closed-architecture stacks.
Once this mechanism of generating personalized alerts using search engine and text analytics technologies is implemented, there will be a more intelligent and efficient use of the information that is available on the Web by even the regular users of the Internet and search engines. People will be better empowered with the situations in their personal world, which could include their investments, friends, relatives, assets, health advices, shopping advices/alerts, office situations, competitor situations, politics, sports and more.
This brings smarter surfers to the world of Internet and browsing – freeing the user from having to sit in front of the browser to pull his or her relevant content.
Inverse Advertising Using the Push Content Mechanism
Advertising is all about eyeballs. When thousands of users view a particular website or a webpage on that website, advertising dollars are flowing into that particular website.
But with the personalized content push mechanism, it is possible to divide these advertising dollars among the users of Web content. This is the concept of inverse advertising, where a mechanism can be created to enable each user to view the content that is of current relevance to him or her, to be paid for with some form of currency.
While browsing the Internet is not only a pastime but also a serious activity at times, it is still subject to an individual’s patience, perseverance and knowledge of the concept that he or she is searching. All this means endless hours spent sitting in front of the browser to retrieve a small but relevant piece of information.
The push will ensure that the user is always connected to his or her relevant content, whether he or she is sitting in front of the browser or not – thus taking the world of surfing into an offline and personal mode.
Wherever you go – your relevant content follows. Welcome to the world of content push engines.