Is the devil in the digits?
Wednesday, June 24th, 2009In the Washington Post, two Columbia political science students claim to “use statistics more systematically to show [that the Iranian elections results were altered behind closed doors].” They are confident that this leaves “very little room for reasonable doubt” that the results were not at least partially fabricated. They identify two unexpected occurrences in the last two digits of the number of votes received by the four candidates in Iran’s provinces (116 total numbers):
1. In the last digit, the number 7 occurs 17% of the time (N=20) and the number 5 occurs 4% of the time (N=5). The probability of this phenomena for 116 random numbers is 3.5%.
2. The last digit and the penultimate digit are non-adjacent only 62% of the time (0 is adjacent to both 9 and 1, so there is a 70% probability that a random number will be adjacent to any other number). The probability of this is 4.2%
3. The probability of both occurrences happening simultaneously is 0.5% [sic - it is actually 0.15%, the product of the probabilities of each event]
They correctly state that the odds of this happening in a fair election are extremely low; they incorrectly infer that this leaves little doubt of fraud. Focusing on the first point, let’s see what the authors have to say here:
Why would fraudulent numbers look any different? The reason is that humans are bad at making up numbers. Cognitive psychologists have found that study participants in lab experiments asked to write sequences of random digits will tend to select some digits more frequently than others.
…
The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran’s provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5.
Indeed, cognitive psychologists say this. What’s more, the authors previously looked into possible Nigerian election fraud and discussed this further:
We showed that we can expect the last digits of electoral results to occur with equal frequency given a wide range of distributional assumptions, and we then emphasized the fact that humans tend to be biased in the production of random numbers: They tend to select small digits, avoid repetition, and favor adjacent numerals.
None of the literature they cite says anything about the numbers 5 and 7, and the phenomenon observed here actually runs counter to experimental evidence of human attempts at producing random numbers.
They equate the probability of seeing one number too frequently and one number too infrequently with the probability that the last digits are random. These probabilities are not equivalent. It’s easy to see that there are any number of equivalent, similarly improbable events:
1. X appears too frequently and Y appears too infrequently
2. Both X and Y appear too frequently
3. Both X and Y appear too infrequently
4. X, Y, and Z appear too frequently
5. X, Y, and Z appear too infrequently
etc.
It’s trivial to continue and think of dozens of equivalent events all with a 3.5% probability. In fact, there is a 100% chance that a string of 116 random digits will feature such a pattern (update: I suspect this, but I’m not remotely capable of proving it).
The correct way to investigate whether a set of numbers might be random is using Pearson’s chi-square test. We first calculate the chi-square test statistic for an expected digit frequency of 11.6 per 116 numbers. The digits 0 through 9 are observed 9, 11, 8, 9, 10, 5, 14, 20, 17, and 13 times. The test statistic is 15.6. Since our data has 10 possible values there are 9 degrees of freedom, and the critical value required to reject the null hypothesis at a 95% confidence level is 16.9 – you simply can’t conclude with a high degree of confidence that the numbers aren’t entirely random.
What’s more, here is the authors’ example of the results for a fair election:
As a point of comparison, we can analyze the state-by-state vote counts for John McCain and Barack Obama in last year’s U.S. presidential election. The frequencies of last digits in these election returns never rise above 14 percent or fall below 6 percent, a pattern we would expect to see in seventy out of a hundred fair elections.
Why look at the last digit when the second-to-last digit should also be random? If you look at the second-to-last digit in this same data set, you’ll find 20% 7s and 5% 8s. The odds of this happening are 1.5% – well below the odds of the 7s and 5s phenomenon in Iran.





