More on that devil
June 25th, 2009Following up on the last post, here’s an exercise in applying Beber and Scacco’s analysis to random numbers. I’m going to generate 10 sets of 116 random numbers and see how many contain similarly suspicious patterns. Here is the code and output from MATLAB:
freqs = [];
for i = 1:10
a = ceil(10.*rand(116,1)) - 1;
aFreq = [];
for i = 0:9
aFreq = [aFreq length(find(a==i))];
end
freqs = [freqs; aFreq];
end
freqs =
9 14 13 14 8 10 7 16 12 13
11 12 11 10 9 16 10 13 12 12
20 11 6 9 15 13 11 15 9 7
12 13 6 13 14 12 6 16 15 9
16 10 15 12 7 11 14 14 9 8
10 12 11 11 10 12 11 13 16 10
16 11 14 10 14 7 11 10 9 14
10 11 12 10 18 12 10 9 9 15
8 10 12 11 24 10 9 9 10 13
13 7 13 4 11 16 12 16 17 7
Each row is the frequency of the digits 0 through 9 in a set of 116 random numbers. Beber and Scacco identify fraud based on the premise that “humans are bad at making up numbers. Cognitive psychologists have found that study participants in lab experiments asked to write sequences of random digits will tend to select some digits more frequently than others.” In the Iranian example, they see 20 sevens and 5 fives in last digit of 116 vote counts from Iranian elections. How often can we identify an equivalent phenomenon in random numbers?
I’ll simulate the number of times each event happens in 10,000 simulations using this code:
ct = 0;
for sim = 1:10000
a = ceil(10.*rand(116,1)) - 1;
aFreq = [];
for i = 0:9
aFreq = [aFreq; length(find(a==i))];
end
if length(find(aFreq(2:4)>=13))==3
ct = ct+1;
end
end
Here, the condition I’m looking for is an overabundance of the numbers 1, 2, and 3, which is what Beber and Scacco identify as indicative of human manipulation in their work on Nigerian elections. Seeing the numbers 1-3 each 13 times or more occurs in only 365 of 10,000 simulations – it is as rare as the phenomenon observed in Iran, and fits better with experimental observations of fraudulent random numbers.
Let’s look at all of these numbers and see which ones show unexpected rare phenomena:
Row Times Condition 1 365 1,2,3>=13 (too many low #s) 2 284 N>=9 9<=X<=13 (too little variation) 3 227 N>=3 X>=15 & N>=1 X>=20 (3 high, 1 very high) 4 573 N>=2 X>=15 & N>=2 X<=6 (2 high, 2 low) 5 6 50 N>=8 10<=X<=12 (too little variation) 7 8 228 N>=8 9<=X<=12 (too little variation) 9 53 N>=1 X>=24 (too many 4s) 10 97 N>=3 X>=16 & N>=1 X<=4 (too many 5s,7s,8s and too few 3s)
So for 10 random sets of numbers it’s pretty easy to find phenomena in 8 of them as or more rare than what happened in Iran. Samples 5 and 7 are now suspicious because they don’t display any obvious rare pattern… was the person faking this data onto my game?