June 26, 2012 - Chicago - Live updating throughout day two of Predictive Analytics World conference.
8:58 a.m. CST - Seats filling up and, thanks to coffee (and is that Lady Gaga on the PA?), eveyone's eyes opening up ahead of opening presentation by Bruno Aziza, former BI director at Microsoft. Just spotted him chatting w/ Eric Siegel, Ph.D. and president of Prediction Impact, the consultancy that leads the PAW conferences here and in D.C., London, Germany, Boston, and ones already held in Toronto and San Francisco. Check out what Eric had to say ahead of their Toronto event in this "10 Minutes ..." Q&A w/ Information-Management.com.
9:15 a.m. - Aziza, now at SiSense, starts with partial clip from classic SNL fake TV ad for "Bad Idea Jeans" ... Delving into why BI is broken, why an estimated $7 billion is spent on business intelligence every year, but the adoption rate is 30 percent, even less at some enterprises, amounting to about one out of 10 end-users harnessing business analytics. It's "pretty depressing." Maybe, like Mel Brooks' Moses in "History of the World, Part 1," the some old rules need to be cut out of the conversation.
Most companies not equipt to handle "small data," much less big data. When we think about our space, it's a very hot space to be in right now." That's creating a lot of fear over big data: that it's only available to large enterprises, that you need a data scientist ... but it's a "fairly simple problem." Ex. of U.K. railroad monitor, asking about how to check the quality of the tracks. Currently, they physically walk along the tracks doing spot checks. Non-ending, expensive and not entirely safe project. So CIO proposes putting cameras taking pictures under trains every three seconds. But Aziza said you'd need a massive database to index, analyze and store.
Aziza: "The data piece is one small sliver of the problem. You have to think of what service this data is creating."
On expressing "big data" to most enterprises/executives: Nobody cares about data volumes and speed; they only care about the final number. You must think about your data as a service (not to be entirely confused with DaaS). You need applications that are mature enough for users to drill down to that one number they need, that they're looking for. The ability to map that semantic layer and create a direct service is where there will be "wins," Aziza says. If you can't explain the value of your data to ... your end users, you should probably reassess your approach, he says.
With frequently low returns on streams of data, Aziza stresses to just "get the data." Storage can be expensive, but in-memory capabilities growing (enabling far more access-to-network). Audience responding with a wild range of their belived storage costs -- from $20,000 to $80 ... Aziza pegs it at $30 for 1 terabyte of disk storage today, down from $14 million for 1 terabyte from 1980. That's a drastic reduction, but also over, in computing terms, a long time frame. "Incredible opportunity for us to say to people, 'Don't discriminate' ... store everything." (Take Bruno's LinkedIn poll on what makes a data scientist here.) RAM on commodity hardware alone definitely faces storage limitations, but meld w/ a data warehouse system for crunching and deeper storing where need-be. Two other pillars Aziza says are crucial: 1) handle scarcity of human resources to handle large data volumes via as-a-service or a competition approach, like at Kaggle ... seek and combine resources because "you don't know where the best ideas are going to come from"; and, 2) the importance of marketing by folks dealing with/developing data solutions ... think about how people make decisions, and outside of your bias toward approaches on data.
10:45 a.m. - James Taylor, self-described "decision management guy," from Decision Management Solutions starts his talk on profitable decisions through analytics: "what decision you're trying to improve." ("You can't just deliver a bunch of math.") Taylor dives right in --
Analytics team, IT and business: all three have to collaborate. Quick framework:
1) Be clear what decisions you're improving. Begin with the decisions in mind. In analytics, that means: risk (credit, insurance, supply chain); fraud detection; and customer-centered (maximizing interactions and potential ... fast-moving and real-time). Use your business questions to define decisions.
2) How do the decisions impact the business (good or bad)? Target decision-making on KPIs. Link KPIs to your definition of what is to be improved or isn't working. From here you can put analytics in context: what processes will be improved; what events trigger events; and who, internally, cares about analytics. "See where this decision making is going to fit."
3) Decompose decisions to understand them. Taylor says: "It's not typically enough to know that it exists to be able to improve it." Dig into what is required to make decisions: guidelines, expertise, regulations, existing system logic, external reference data, etc. Example of insurance company: used decision model to find 5 areas for risk with life insurance policies, pointed out that no matter how predictive the model was, did not change underpinning underwriter governance. Wouldn't have known w/out model in place, so able to address end-around instead of just layer on top.
4) Still have to deploy analytics back to the processes that need them. Taylor says, remember that operational systems and analytic systems are often pointed in separate directions. So it has to go back based on agility, analytics embedding and adaptive capabilities.
11:10 a.m. - Meta Brown, GM of Analytics, for LinguaSys ... on "cross-language text analytics and overcoming language barriers"
Opportunity in U.S. as most businesses only gear data toward English. "You have competition from all over the globe, and you may not be comfortable doing businesses in other language, but your competition is."
11:19 a.m. - Across the conference, Dean Abbot, President of Abbot Analytics, on best practices for models and hiring processes. Worked w/ U.S. Special Forces (in true Special Forces fashion, Abbot, can't give too many details on the government side of things) on who was likely to be a successful operator, what that means, and how to measure numerous mental, physical and experience levels.
U.S. Special Forces, obviously, wants the best of everything. But Abbot says that truly amounts to "about two guys." "There are some trade-offs ... and levels of how quickly they can be trained and learn." Similar to NFL tests for quarterback smarts, analytics doesn't always equate greatness in your enterprise picks. "But don't throw that data out," Abbot says. Peer assessment is "critical" for Special Forces because gives an obvious window into real-life functions that tests don't always show.
Where predictive analytics fits in: you could build models that will show you some results, some things you may already know. But the question is how much does each factor matter? You can find acceptable trade-offs (read: trainable) with a model that gives pertinent weight to questions more vital to intelligence and learning (or whatever is most vital for the position you're looking to fill). And a good target variable is the people who have stayed and succeeded (though that can take a while and not always give a huge pool of results), and a small, quick pool to avoid "too much noise."
11:45 a.m. - John Hassman, director of Marketing Analytics for wholesaler United Stationers, on forecasting the economic future: "No one knows, but it's better to look at the data you have and make informed decisions."
Forecasting wholesale sales, mixes data from white collar unemployment, overall unemployment, industrial production, and long-range overlays for GDP and information provided by banks, to do forecasting with its own IBM SPSS. (Also recommends quick and dirty modeling with in-house data and publicly available data using Ycharts.com) Predictive models for their forward-looking business has improved accuracy of internal forecasts by 30 percent (and now consistently within 5 percent of forecasts). United Stationers has fewer budget revisions from finance department w/ predictive info in place, and now have a cross-functoinal meeting w/ analytics team and finance team each month.
Question from crowd -- Where are you running this data? Hassman: desktop version of SPSS and only ingest the high-level data.
Question from crowd -- What about onboarding times and staffing? Hassman: Went on a trial run of what this predictive model looked like, piqued their interest, and analytics team starting to positoin analytics as a shared service within the organization.
Question from crowd -- How long do you keep the models and expect them to stay stable, viable? Hassman: Four-to-five years for longer-term economic reviews, which is admittedly less certain as they go along. Monthly run of other individual models, and introduce data points (i.e. different aspects of unemployment like those white collar workers who are no longer looking for work).
1:20 p.m. - At the podium, Roger Craig, CEO and co-founder of predictive analytics consultancy Cotinga, but more known as the person who dominated TV trivia show "Jeopardy!" with analytics-based preparation (no, not the other "person" who dominated "Jeopardy!").
Craig starts with level of broad education on machine learning shown by IBM's Watson in "Jeopardy!" performance. Describes luck in being chosen for the show, and then human doubt for his own record-setting big-money wager on the show (including training self to raise appetite for risk). "It was bar-none the most surreal day of my life." (Later, on big win: "We all know about sampling errors, and that was just one day.")
But it was partly by design through, namely, predictive analytics. In the process of learning about bioinformatics, steered toward statistical natural language processing "to predict what I should read next, what was next in my field." And that led him to quiz questions and his childhood appreciation of "Jeopardy!" From previous winner, Ken Jennings, a fan-fueled archive of questions and answers surrounding the show gave Craig a data source. He started with text mining and clustering from "answers" to "bring order to this unstructured textual data that was comprising the 'Jeopardy!' archive." On a friend's advice, re-coded new answer archive for right or wrong answers (via random samples among the clusters) along with metadata for those answers, and gave more weight to those categories/answers with higher dollar values. Then, he wrote a predictor for what he'd get right or not to "create a highly efficient learning scheme."
Craig: "If we have data on what someone is learning, then we can really refine what they need to learn. Then, when they learn that, we can sample them again. And the cycle goes on and on. Your kids and your grandkids, this will be one part of the piece of how they learn into the future."
Uses Anki online "flash card" teaching tool for spaced repitition (on how you'll learn and forget, and how reminders will keep you from forgetting). So, with analytics and learning, the key is when you get reminded and how important that information is that you're reminded of.
Craig: "We can augment our human capital with predictive analytics. Let's automate the mindless tasks." (That's now the focus in his venture Cotinga.)
Question from crowd -- How could this be applied to public school education? Craig -- In college, especially, because of their academic support and how everything is already digitized, but increasingly down to other aspects of learning/teaching.
Craig: "Once you have that [digitized element], you're going to be able to predict the student's predictive performance ... and you're going to have personalized education." Craig also indicates it could be an early indicator of learning disabilities or mental disease like Alzheimer's.
2:39 p.m. - Next up, Max Kuhn, director of nonclinical statistics at Pfizer and "R guru," on figuring out the right medicine for patients and minimize risk.
To build a predictive model for patients, we want info on: positive response to medication (within a certain time period), change from the baseline, likelihood of risk/benefit, errors at clinical trial sites, adherence to prescribed medication, impact onsales from key opinion leaders. All of that layered on top of patient history, symptoms, genetic markers, demographic results.
Pfizer relies on biomarkers, which are "surrogates for the disease," because it's "very hard to wait five or 10 years to decide if this is really good." Kuhn: "If we get measures of these things, hopefully without being invasive, it gives us a good idea on how someone will react to a medication."
With medical predictive analytics, event rates can be low, or very high, dealing in structured and/or unstructured data and concerning other drastic differences such as geography.
Main challenges in predictive models for medicine: Non-technical user acceptance of models -- "It is very difficult to get quantitative results to people and have them accept that" -- as well as consistently disruptive technology, visualization of results, and accuracy vs. interpretability.
A focus on visualization, especially while keeping up requirements for anonymity, brings treatments to life: "That cold quantitate result doesn't necessarily make them feel comfortable." Describes back-and-forth on 2,000 point data set with FDA, neither had visuals of direct results, though they were in the data.
Even with disruptive technology intended to hedge against risk, Kuhn says big pharma firms carry much of the same stigma against new method/tools.
3:01 p.m. - MeiMei Lim, principal at Accenture, a specialist on next generation mobile analytics
Telco a lead industry with predictive analytics, especially with customer churn and new revenue. Lim says that it's because, in part, customers are directly engaged with telecommunications, particularly on social media. Lim's primary examples are from Asia, where this adoption of cell phones and other consumer tech is at its greatest. For telco customer data, much of it now is from unstructured sources (i.e. Facebook threads, blogs, Twitter chatter), but by the time the analysts used to predict a customer is going to churn "it's way too late." So immediacy is key to this customer-direct industry, leading to models that are geared to paring from the thousands of variables, and funneling customer campaigns for certain offers and outreach from that honed question set. At some telcos, this unstructured social content has grown to be so important to marketing/sales that IT has directed the data directly into the enterprise data warehouse, Lim says. There, social ID information and questions are tied with responses from in-house retention offers.