for Information Management Blogs
SEP 21, 2010 10:12am ET

Blogroll

Bias in BI

Print
Reprints
Email

A few weeks ago, I wrote on the U.S News Best Colleges for 2011. One of my observations was that the rankings seemed to favor small, highly-selective and well-endowed private schools over larger public state universities. Indeed for 2011, no state school cracked the top 20 national universities, and only two, Berkeley and UCLA, made the top 25.

Not to be outdone, the Wall Street Journal just published its top schools as rated by corporate recruiters, but with very different findings. Of the top 25 schools in their rankings, all but six are public, state universities, and it seems those with the very largest enrollments, such as Penn State, Texas A&M, Illinois, Purdue, Arizona State and Ohio State, are disproportionately represented. What gives? Are these rankings in any way “biased?"

My read of the methodology behind the WSJ survey suggests the design itself is in large part responsible for the WSJ's surprising findings. For this study, 842 recruiting managers from many of the largest public, private and not-for-profit employers were surveyed. Of the 842, 479 or 57 percent, responded, indicating a total of 43,000 college hires for 2009.

Recruiters were asked to name, in rank order, their top schools overall and their top schools by study major from a final list of eight. Respondents could only rank schools and majors from which they actively recruit. The ranked majors included Accounting, Marketing, Engineering, Business/Economics, Finance, Computer Science, MIS and Liberal Arts.

“To calculate the final ranking we did the following: First we assigned 10 points to each No. 1 ranking, 9 points to each No. 2 ranking, 8 points to each No. 3 ranking — and so on — for each school. For the overall ranking, those ratings were weighted by the number of total graduates that a company reported hiring in the prior year ... For the ranking of schools by major, those ratings were weighted by the number of graduates each company hired in that specific major. To be considered for the majors ranking, a school had to have at least seven companies rank it; most had more.”

Just as the U.S. News ratings seem predisposed towards private schools, the WSJ rankings, weighted by total hires of each school mentioned by recruiters, clearly favor larger institutions with hefty graduating classes. And with a business, technology and engineering focus, it's not surprising that state and engineering schools fare so well in contrast to elite arts and sciences institutions, many of whom don't even offer undergraduate business degrees. Perhaps combining the U.S News and WSJ ratings would cancel out their respective biases!

There are a number of meanings of bias pertinent for BI. The most prominent and the one generally referenced in BI research is prejudice, wherein findings are inclined to specific outcomes by design from the get-go. Indeed, the whole purpose of some “research” is to show a pre-ordained result. An example of this is a report based primarily on responses from a vendor's customers that show's their product in a market leadership position. Go figure. When BI research claims its “unbiasedness,” it generally means the results are not prejudiced by design – i.e. is not simply marketing hubris. That's a start but not enough.

Statisticians have their own definitions of bias. An unbiased estimate is one that, on average, hits the mark. A biased estimate, in contrast, systematically over or under shoots the true population value over the long haul. I well remember best, linear, unbiased estimates (BLUE) from grad school days many years ago. In the past, statisticians were obsessed with unbiased estimation, but now are willing to tolerate small amounts of bias for less estimate variability – the so-called bias/variance tradeoff. Better to be approximately right with certainty.

Research surveys are biased to the extent that the profile of sample respondents differs systematically from the underlying population of interest. Variations of random sampling to select survey respondents can go a long way to minimizing this bias, as illustrated by modern political polling techniques. This is in part because with randomization, samples don't choose themselves. Alas, BI surveys are generally completed by voluntary, self-selecting web respondents – and therefore often radically different from the population they're supposed to represent. In short, a biased sample.

The overall design or methodology of BI performance measurement might also be biased, especially if random assignment to the strategic comparison groupings isn't feasible. Natural groups often differ systematically out of the gate, making intervention comparisons problematic. For example, a market for the pilot test of a new product might differ on socio-economic measures from an existing comparison market. Are differences in pre-post measurement between the groups then due to the pilot product? Or rather differences in the markets? Random assignment to the comparison groups minimizes the threat of the latter.

Most BI surveys/research go to great pains to establish their “unbiasedness.” And while I don't doubt the researchers' sincerity, I think the term is used to connote the absence of prejudice rather than the more precise statistical definitions. Indeed, the methods used today to elicit survey responses almost ensure methodological bias to BI survey research. And without random assignment to the competing groups of performance measurement, it's pretty hard for BI analysts to refute alternative explanations for their findings.

Fortunately, that BI research is almost certainly biased isn't necessarily devastating. Often, large sample sizes provide protection from its complications. But to mitigate the potential damage of bias, researchers must recognize and acknowledge the problems built into their designs and also investigate the consequences. What randomization and other fastidious methods buy is the comfort of knowing that factors outside the control of the research that could influence results are at least somewhat neutralized.

With self-selecting, voluntary response surveys, however, certain geographies, products, sectors, company sizes, BI maturity levels, etc. might be disproportionately represented, potentially skewing the findings. At a minimum, researchers should tabulate those extraneous variables, comparing their distributions to what is known of the population. The more the sample distributions of these factors look like those of the population, the more comfort the researcher can have that her results aren't spurious. On the other hand, if respondents differ from the population on outside factors in meaningful ways, the researcher must test if those differences are responsible for the results. In some instances, adjustments can be made to compensate for the impact of bias.

Going forward, it's my hope that consumers of BI research and performance measurement make the search and handling of bias a driving consideration in their evaluation of quality.

Advertisement

Comments (6)
Well done article, Steve. Indeed, when TDWI first started doing Web surveys in 2002 (we were the first in the BI space), our sample sizes of 600 to 1200 mitigated the inherent bias of self-selecting responders. We were told we were "safe" by one of our trusted academic advisors who teaches stats in a business school.

However, as research houses and media firms discovered Web surveys, our response rates have fallen as the market has become saturated. We are now seriously contemplating moving to an unbiased panel format to ensure the integrity of our research findings as well as to minimize wear and tear on our lists.

Posted by Wayne E | Wednesday, September 22 2010 at 12:54PM ET
Why is this article tag lined with BI? BI has no role or relevance to this topic. That said, these surveys are by their nature "biased". They are based on opinions just like the opinions section in these newspapers. So why take issue with them by claiming they are biased. Of course they are biased!

Anyone who takes credence in these subjective surveys deserves their blissful ignorance. There is no peer review or validation of the survey or its biases. What is the process for ensuring surveys or the interpretation of results is not biased? The process is anecdotal. There are no published guidelines and few in industry subject themselves to peer reviews, and less so news organizations. News has become entertainment. Consider these surveys as a source of comic relief.

The whole arena of surveys is fictional and for that matter so are those who profess they are doing "business intelligence. There are few that apply any rigorous review or validation process to the development of the surveys or analytical models.

I suggest that what is needed are a set of guidelines and review process to validate the results of any analysis. If the survey or analytic has not followed this rigorous process then it is nothing more than an opinion. There are established processes to do this but few know of them and even fewer apply them. That's a pragmatic solution. Until then, accept these surveys as entertainment!

Posted by Richard O | Wednesday, September 22 2010 at 6:16PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Steve Miller

Politics of Data Models and Mining
SAS, WPL Code Competition May Heat Up
SAS vs. R: Statistical Modeling Rivalry Renewed
Machine Learning Hits the Books
Modeling an IT Earnings Disparity

More from Steve Miller »

Blog Index »

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.