Is the devil in the digits?

June 24th, 2009

In the Washington Post, two Columbia political science students claim to “use statistics more systematically to show [that the Iranian elections results were altered behind closed doors].”  They are confident that this leaves “very little room for reasonable doubt” that the results were not at least partially fabricated.  They identify two unexpected occurrences in the last two digits of the number of votes received by the four candidates in Iran’s provinces (116 total numbers):

1. In the last digit, the number 7 occurs 17% of the time (N=20) and the number 5 occurs 4% of the time (N=5).  The probability of this phenomena for 116 random numbers is 3.5%.

2. The last digit and the penultimate digit are non-adjacent only 62% of the time (0 is adjacent to both 9 and 1, so there is a 70% probability that a random number will be adjacent to any other number).  The probability of this is 4.2%

3. The probability of both occurrences happening simultaneously is 0.5% [sic - it is actually 0.15%, the product of the probabilities of each event]

They correctly state that the odds of this happening in a fair election are extremely low; they incorrectly infer that this leaves little doubt of fraud.  Focusing on the first point, let’s see what the authors have to say here:

Why would fraudulent numbers look any different? The reason is that humans are bad at making up numbers. Cognitive psychologists have found that study participants in lab experiments asked to write sequences of random digits will tend to select some digits more frequently than others.

The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran’s provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5.

Indeed, cognitive psychologists say this.  What’s more, the authors previously looked into possible Nigerian election fraud and discussed this further:

We showed that we can expect the last digits of electoral results to occur with equal frequency given a wide range of distributional assumptions, and we then emphasized the fact that humans tend to be biased in the production of random numbers: They tend to select small digits, avoid repetition, and favor adjacent numerals.

None of the literature they cite says anything about the numbers 5 and 7, and the phenomenon observed here actually runs counter to experimental evidence of human attempts at producing random numbers.

They equate the probability of seeing one number too frequently and one number too infrequently with the probability that the last digits are random.  These probabilities are not equivalent.  It’s easy to see that there are any number of equivalent, similarly improbable events:

1. X appears too frequently and Y appears too infrequently

2. Both X and Y appear too frequently

3. Both X and Y appear too infrequently

4. X, Y, and Z appear too frequently

5. X, Y, and Z appear too infrequently

etc.

It’s trivial to continue and think of dozens of equivalent events all with a 3.5% probability.  In fact, there is a 100% chance that a string of 116 random digits will feature such a pattern (update: I suspect this, but I’m not remotely capable of proving it).

The correct way to investigate whether a set of numbers might be random is using Pearson’s chi-square test.  We first calculate the chi-square test statistic for an expected digit frequency of 11.6 per 116 numbers.  The digits 0 through 9 are observed 9, 11, 8, 9, 10, 5, 14, 20, 17, and 13 times.  The test statistic is 15.6.  Since our data has 10 possible values there are 9 degrees of freedom, and the critical value required to reject the null hypothesis at a 95% confidence level is 16.9 – you simply can’t conclude with a high degree of confidence that the numbers aren’t entirely random.

What’s more, here is the authors’ example of the results for a fair election:

As a point of comparison, we can analyze the state-by-state vote counts for John McCain and Barack Obama in last year’s U.S. presidential election. The frequencies of last digits in these election returns never rise above 14 percent or fall below 6 percent, a pattern we would expect to see in seventy out of a hundred fair elections.

Why look at the last digit when the second-to-last digit should also be random?  If you look at the second-to-last digit in this same data set, you’ll find 20% 7s and 5% 8s.  The odds of this happening are 1.5% – well below the odds of the 7s and 5s phenomenon in Iran.

  • jacqeb
    As a point of comparison, we can analyze the state-by-state vote counts for John McCain and Barack Obama in last year's U.S. presidential election. The frequencies of last digits in these election returns never rise above 14 percent or fall below 6 percent, a pattern we would expect to see in seventy out of a hundred fair elections. nail tek and
    normal skin care and financial adviser
  • CNA Classes Arkansas A year or so ago, my gym was offering free apple flavored soyjoy bars, so I decided to take one. Not that interested in an apple flavored bar at the time, I threw it into my bag, and then didn't think about it for a few weeks. Then I found myself really hungry one day without a good snack available, at which point I reluctantly decided to try the soyjoy bar.
  • Nursing Resume accomplishments will tell you. Besides, I’m 23. I should be smoking pot. That said, for all you little kids out there who look up to me, don’t just go and smoke pot because I do. For one, wait until you’re atleast 18 since pot can be damaging to people whose brains are still developing. Not only that, but ease into it, and if you don’t like it, don’t feel the need to do it just because your friends do. After all, nobody likes being around a kid who gets high and is either annoying or really paranoid and scared.”
  • Artificial sweeteners are beneficial. They don't promote tooth decay or affect blood sugar levels in diabetics like sugar does. Also, judicious use can help in weight loss. For example, if you drank, say, three 12-ounce cans of regular sugar-based Classic Coke every day and then switched to artificial sweetened Diet Coke, you would decrease your calorie intake by 480 calories a day, or 3,360 fewer calories a week. Online Appointment Scheduling
  • website design services was the chapter itself that I had to start, one in which schizophrenia plays a major part. I felt, despite much research, humbled by this incredibly horrible disease. How could I bring it to life? I went back & reread one of the bibles on the disease & yet another memoir. There were hints of it in the first chapter & I may have tinkered with them early after I got back from Arizona, seizing them as opportunities to show the early symptoms. I reread the chapters from the Jane Austen novel I'm stealing from. I divided my chapter notes into two separate documents, one about Jane Austen and one about schizophrenia. I tried to reduce each to one page or to at least reduce the redundancies and unnecessary stuff.
  • I have been reading your blog last couple of weeks and enjoy every bit. Thanks.Pest control Austin
  • jacqeb
    That's really a fantastic post ! I added to my favorite blogs list..
    I have been reading your blog last couple of weeks and enjoy every bit. Thanks
    Wood blinds
  • skintreatment
    If you are considering what Home Health and
    Beauty Care
    to use health and beauty care product Weight Loss Tips !
    It’s trivial to continue and think of dozens of equivalent events all with a 3.5% probability. In fact, there is a 100% chance that a string of 116 random digits will feature such a pattern (update: I suspect this, but I’m not remotely capable of proving it).
  • skintreatment
    Acne Treatment As a point of comparison, we can analyze the state-by-state vote counts for John McCain and Barack Obama in last year’s U.S. presidential election. The frequencies of last digits in these election returns never rise above 14 percent or fall below 6 percent, a pattern we would expect to see in seventy out of a hundred fair elections.
  • I love this blog! Will come again next time for sure,
  • wow, what a math,,,,,, it seems nice
  • Michael Wales
    Wow. You can't do math. Whatever you think of the election or these students, attacking their argument with such blithe ignorance does more to hurt your apparent 'cause' than convince your readers. The summary you critique is a remarkably simplified report of accurate statistical analysis of the election. I haven't read the math, nor even know if it's been published, but I can reproduce their results with the same numbers. Your ignorance of their calculations is a pitiful attempt to discredit logical argument while remaining blissfully unaware of your own shortcomings. And no, the first two probabilities are not multiplied. They are not independent.
  • What 'cause' do you think I'm taking up? I can reproduce their numbers as well (except for one error which they've recognized).

    The two probabilities are entirely independent in the case where the last two digits are random. In the definition used here, any last digit has 7 out of 10 unique, non-adjacent digits (everything that's not a repeated digit or an adjacent digit, with 0 being adjacent to both 9 and 1). No matter what the last digit, a random second-to-last digit will have a 70% chance of being non-adjacent. Beber and Scacco agree with this in the annotated version of their article.
blog comments powered by Disqus