Donald Trump and statistical analysis don’t mix
(Bloomberg View) -- The Trump administration has taken on three ambitious statistical projects: tracking down cases of voter registration fraud, identifying racism in college admissions and developing an algorithm for “extreme vetting” of visa applications.
These would all be very tricky even for a trained professional. I doubt the president’s people are up to the task.
Let’s start with voter fraud. It’s actually thought to be vanishingly rare: Multiple independent studies, performed by reputable researchers in different locations and at different times, have found little or no evidence. To demonstrate to a statistically minded audience that fraud is nonetheless a problem, one would have to find either very convincing yet hidden evidence, a flaw in the methodology of all previous studies, or both.
With that in mind, consider how Trump’s Voter Fraud Commission is likely to proceed. It won’t have a lot of non-public data from states, so it will probably look for cases where two people with the same name and same birthday -- or even birth year -- have voted.
According to a Washington Post analysis, the false positive rate for this method of matching people is roughly 99.5 percent. In other words, 99.5 percent of the time, if you have two people with the same name and birthday, they’re not actually the same person. Even in that last 0.5 percent of cases, you can’t infer voter fraud without more information. So the results are highly unlikely to be convincing.
How about race in college admissions? To a large extent, colleges’ right to consider race is protected by the Supreme Court. To make the statistical case that certain applicants have been grossly discriminated against, one would have to establish what a “qualified applicant” looks like. This is not straightforward: Some schools care more or less about grades, some insist on SAT scores while others don’t, and some look for well-rounded class presidents while others are in search of brilliant loners. For that matter, colleges are often just as interested in a given student’s likelihood of accepting an offer as they are in how polished their letters are.
Given colleges’ argument that diversity is a worthy goal in itself that improves the educational environment for everyone, it will be difficult to establish a clear line of racist policy except in egregious cases, such as the quotas on Jews that the Ivy League schools once employed. In particular, it would be statistically difficult if not impossible to prove that a given individual was denied admission because of race. At the most, one might demonstrate a systemic undervaluing of “qualified” applicants of a certain demographic. If we take the current evidence of legacy admissions into consideration, this demographic is unlikely to be white, and much more likely to be Asian American.
Finally, let’s consider the algorithm that the U.S. Immigration and Customs Enforcement’s Homeland Security Investigations division wants to deploy for vetting visa applications. According to the requisition document, it should be able to predict the chances of a visa applicant’s becoming a terrorist or a contributing member of society, and should be based on all publicly available data, including social media and anything collected online.
The technical challenges are vast. For one, the algorithm will face the same problem as the voter fraud investigation: lots of people with the same name, which leads to lots of false alarms. Also, the algorithm needs to be trained on historical data to identify desirable and undesirable people. This will be difficult for three reasons.
First, the number of immigrants who come to the country and actually commit acts of terror is tiny, so there will be very few examples to analyze.
Second, you would need years of data on visa applicants to find any useful patterns -- and I doubt Homeland Security has been scraping their social media profiles for that long.
Third, similar to a “qualified applicant” for colleges, it’s hard to define a “contributing member of society” -- and the results will depend heavily on the attributes (employment? income? church membership?) one chooses to emphasize.
Finally and most importantly, it’s difficult if not impossible to imagine how we’d track false negatives -- people who would be denied a visa due to their high “terrorist score” or low “contributing member of society” score, but who would have ended up a law-abiding member of their adopted community. Much more likely, we’d end up with a deeply unfair and arbitrary process.
As a trained data scientist, I’d be extremely wary of taking on any of these projects without years of preparation, including a deep dive into the related ethical considerations. And I’m not sure how many of Trump’s people have the time, or even the inclination.