We live in a digital era. When we’re not buying soaps or tablets from Amazon, listening to playlists on SoundCloud, or requesting a plumber to fix the kitchen sink from a smartphone app, we’re staying in touch with family, friends, and colleagues on Gmail, Facebook, Nimbuzz, or WhatsApp.

This state of connectedness offers a bottomless pit of data. In one minute, 48,000 apps are downloaded on iPhones, Pandora users listen to 61,141 hours of music, and over four million searches are completed on Google.

However, this type of data skims only the surface. According to experts, 80 to 90% of the available data is in amorphic or unstructured form. This data can either be textual or non-textual.

Textual data includes social chatter, electronic documents, word processed content such as blogs, presentations, emails and other record keeping content. Non-textual data usually exists in binary format such as images files, video, audio, analog communications, etc.

Harmonizing amorphic data could spell competitive advantage for organizations and also up the customer satisfaction quotient.

So what does harmonizing amorphic data mean? It refers to the process of combing disparate sources of unstructured data for a business outcome. This process involves data munging or blending, aggregation and/or analytical processing and interactive visualization for insight generation.

In unstructured data word, data blending usually involves moving the data to Hadoop for storage and using languages like Pig to blend them. The blended data is moved into stores like Hive or Impala to create data mart or ware house, where aggregation and joins are performed.

In the event that analytical processing is required, the data is passed through SAS, R or spark. In case of predictive analytics, the trained model is pushed to real-time system for operational purpose. Once the data store is ready for consumption, different groups generate insights using BI software such as Tableau or Qlik or through customized and in-house visualization software.

However, processing of amorphic data requires special techniques such as natural language processing, part-of-speech tagging, image processing and data mining. With open source big data processing software that can run on commodity hardware and democratization of analytics, enterprises are increasingly looking to generate insights from unstructured data for revenue generation or optimization.

Here are some scenarios showcasing how amorphic data is being leveraged by different industry players:

Social networking providers use online communication channels and networks for targeted marketing campaigns. By constantly mining information from conversations using an ensemble of natural language processing models and creating a network graphs of our social connects, providers can segment customers based on a range of factors.

Television network providers use similar techniques for charting campaign plans. In addition to having good demographics data and customer segments, TV network providers also use social communication channels to gauge the general sentiment and overall impact of the campaign. This helps them make necessary changes during campaign runs, if need be. Voice and data operators use call records (IVR and call center calls) to measure first call resolution (FCR). This can be leveraged to analyze customer satisfaction, agent effectiveness and workforce optimization.

Interactive Voice Response (IVR) data is usually present in large XML files which need to be processed and mined sufficiently in order to determine the IVR option patterns towards improving customer experience. Voice records from ongoing conversation can be parsed using natural language processing (NLP) and used to determine the sentiments of the caller (customer). Unifying all of these channels can help manage customer expectations better and improve their satisfaction score.

E-Commerce sites use predictive models on historical content for recommendations. As customer browses or buys products from their sites, enterprises use content-based and collaborative-based models to better predict similar items and thus improve sales and enhance customer experience. The underlying data includes a trove of customer information, click stream data and any feedback / social content referring to the site.

Airlines are taking customer engagement beyond mere travel to provide a more complete customer experience. Using customer information from transactional systems along with social data, personalized information regarding events, accommodations, food and dining packages are being pushed to users’ smart devices. This not only keeps the customer engaged with their preferred airlines, it ensures loyalty in the long run, bringing in additional revenue by way of channel partners.

Banks and financial institutions can apply deep learning concepts to bolster security with face recognition software and leverage video processing for event correlation. Biometric data (retina, facial image and fingerprints) form the non-textual data for national identity and large transaction guarantees. They can be mined and used to create new business models for third party validations for banks. These techniques can be employed at variety of locations for a number of purposes ranging from personal security and workforce monitoring to national security.

Enterprise information management

Learning organizations keep generating a plethora of content with respect to their work. They include presentations, word documents, excel reports and a range of sourced data. Information management becomes critical for effective utilization of all this knowledge content, which are primarily textual in nature. By using text mining, NLP and semantic as well as taxonomical based search systems, a metadata / keyword knowledge index can be created for context based consumption.

Businesses can benefit tremendously by harmonizing the myriad structured and unstructured data under their purview, using the right ecosystem of people, processes and platforms. Armed with the right tools, skills and cross-industry learnings, it is possible to tame the data deluge and extract the signals from the noise.

(About the author: Ganesh Moorthy is an apprentice leader at Mu Sigma, where he serves as program manager and senior solution architect for R&D engagements. Moorthy has more than 16 years of experience in leading enterprise solution development for Fortune 500 clients. He is currently involved in building industrial Internet, augment reality and analytics and visualization platforms for both descriptive and predictive analytics.)

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access