I came across an interesting read the other day surfing my digital The Economist. The article summarizes an academic paper by three foreign professors entitled "Directors as Connectors: The Impact of the External Networks of Directors on Firms".

The paper's point of departure is that in the U.S., the external networks of a company's board of directors significantly impact firm value and decisions. The authors hypothesize that such relationships drive business in proportion to the strength of linkages between the BOD and state-level politicians, especially newly-elected governors.

“Surrounding close gubernatorial elections, local firms with directors connected to winners increase value by 4.1% over firms connected to losers. Director network’s value increases with network strength and activities, and is not due to network homophily. Connected firms are more likely to receive state subsidies, loans, and tax credits. They obtain better access to bank loans, borrow more, pay lower interest, invest and employ more, and enjoy better long-term performance. Network benefits are concentrated on connected firms, possibly through quid pro quo deals, and unlikely spread to industry competitors.”

The researchers purport to quantify the associations between BOD ties to lawmakers and measures of subsequent good fortune. They aren't content, however, to simply measure the strength of relationships through regression models. In addition, they set out to test whether the BOD relationships to successful governors caused the good fortune that befell their companies, looking to eliminate alternative explanations for the associations they found. The investigation thus obsesses on the design behind the generated study data. If, for example, the results of this observational analysis had derived instead from a controlled experiment, the authors would have quite a bit confidence in their explanatory thinking.

The study operationalized close ties between BOD members and political figures by assessing those “who graduated from the same campus and degree programme within five years of each other.....It then looked at companies connected to winners and losers in 34 elections decided by less than 5% of the vote. The pool of relevant information included observations of 516 firms and 483 directors.”

The details of the methodology are many, but the one that caught my eye was the use of a Regression Discontinuity Design (RDD) to model generation of the study data. RDD is now a popular quasi- experimental approach to gauging causality for interventions in which randomization isn't feasible, “by applying a treatment assignment mechanism based on a continuous eligibility index.”

In essence, RDD is applicable in situations where a cutoff score is used to “assign” subjects to treatment or control. ‘If “treatment” is assigned on the basis of this variable, and if there is discontinuity in the probability of being treated at some cut-off value, and if the cut-off value of the index for treatment is arbitrary, and thus units on either side of the cut-off point are identical, on average, except for the presence of the treatment', then a statistical case can be made that differences in the means of treatment and control can be used to determine the impact of treatment.

The authors cleverly use voting shares in gubernatorial elections as the RDD treatment assignment variable, looking at close elections where the “event of winning is practically randomized between the winner and the loser.” In these cases the RDD “allows an estimation of the average treatment effect of connections to elected contenders versus defeated ones”

What were the conclusions from the investigation? “We find that the external networks of directors positively and significantly impact firm value and decisions. Local firms with directors connected to a narrowly elected governor increase their value by 4.1% surrounding elections, equivalent to $211.7 million and $27.4 million for our sample’s average and median firms, respectively, over local firms connected to a closely defeated candidate. “

Explanatory predictive modeling in search of cause and effect is pervasive in academics, but probably less so now in the business world where prediction or forecasting accuracy dominates. Indeed, Viktor Mayer-Schonberger and Kenneth Cukier, authors of the seminal book: Big Data: A Revolution That Will Transform How We Live, Work, and Think, wrote “As humans we have been conditioned to look for causes, even though searching for causality is often difficult and may lead us down the wrong paths. In a big-data world, by contrast, we won't have to be fixated on causality; instead we discover patterns and correlations in the data that offer us noveland invaluable insights. The correlations may not tell us precisely why something is happening, but they alert us that it is happening."

Acknowledging that prediction rules in my world, I can't seem to give up on explantion, ultimately feeling unfulfilled without reasonable cause and effect connections to predictive models. And that's where data generation design comes into play. If our designs are strong enough to mimic the random assignment of "subjects" to levels of the "features" in our models, we can have more confidence that findings supporting our hypothesis are true, and not the consequence of disparate group makeup or bias.

So what's the lesson here for analytics and data science? For investigations seeking to establish a cause and effect relationship between features and target – to explain the established relationships -- data generation design is a critical step the DS pipeline. Without such design, it's difficult for relationships to withstand alternative explanations and challenges of bias.

Even for prediction models, where a focus on forecasting accuracy is the priority, consideration of design is important for testing hypotheses and bringing explanatory lift to the modeling exercise. A prediction-oriented model with a strong data generation design is more credible than one without. For my money, if design's not given equal status with data and algorithms, it should at least be a prominent component of story telling, helping data scientists "sell" plausible narratives of modeling results to stakeholders.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access