Reader small image

You're reading from  Mastering Clojure Data Analysis

Product typeBook
Published inMay 2014
Reading LevelBeginner
Publisher
ISBN-139781783284139
Edition1st Edition
Languages
Right arrow
Author (1)
Eric Richard Rochester
Eric Richard Rochester
author image
Eric Richard Rochester

Eric Richard Rochester Studied medieval English literature and linguistics at UGA. Dissertated on lexicography. Now he programs in Haskell and writes. He's also a husband and parent.
Read more about Eric Richard Rochester

Right arrow

Chapter 5. Benford's Law – Detecting Natural Progressions of Numbers

In this chapter, we'll look at Benford's Law; an interesting set of properties that are inherent in many naturally occurring sequences of numbers. For these sets of numbers, this observation predicts the distribution of initial digits.

The odd rule captures an interesting observation about the way numbers are distributed, and it's useful too. Benford's Law has been used as an evidence of fraud. If a sequence of numbers should be naturally occurring but Benford's Law indicates that they are not, then the sequence is likely to be fraudulent. For example, the daily balances in your bank account should follow Benford's Law, but if they don't, that may be evidence that someone is cooking the books.

Learning about Benford's Law


Originally, Benford's Law was observed by the astronomer Simon Newcomb in 1881. He was referencing the logarithm tables, which were tomes listing the values for logarithms of different numbers. He noticed that the pages of the books were more worn out and discolored at the beginning than they were at the end. In fact, the pages that deal with numbers that begin with 1 were significantly more worn out than pages that begin with 9. As the initial digits climbed, the pages were less and less worn.

This phenomenon was noticed again in 1938 by the physicist Frank Benford. He tested this against data in a number of domains, and the principle now bears his name.

In practical terms, this means that about one-third of the numbers in the sequence begin with the digit 1, a little more than 15 percent begin with 2, about 12 percent begin with 3, and the rest until the digit 9 are all below 10 percent. Five percent of the numbers begin with 9. The following is a graphical representation...

Failing Benford's Law


So far, we've seen several datasets, all of which conform to Benford's Law, most of them quite strongly. We haven't yet seen a dataset that does not conform to this distribution of initial digits. What would a failing dataset look like?

There are many ways in which we could get data that doesn't conform. Any linear data, for example, would have a more uniform distribution of the initial digits. However, we can also simulate fraudulent data easily, and in the process, we can learn just how much noise a dataset can handle before Benford's Law begins to have trouble with it.

We'll start this experiment with the population data that we looked at earlier. We'll progressively introduce more and more junk into the dataset. We'll randomly replace items in the dataset with a random value and re-run incanter.stats/benford-test on it. When it finally fails, we can note how many items we've replaced and how far off the new distribution is.

The primary function is shown as follows...

Case studies


This has all been very interesting but not exactly useful. So, can Benford's Law be useful? The answer is yes. In fact, analyses using Benford's Law is admissible in the United States courts. To get an idea for some uses of this analysis, let's take a look at a moderately well-publicized case where Benford's law was used.

The 2009 Iranian presidential election committee gathered analyses into whether the elections were fraudulent or not. Some of these used Benford's Law. One major article on this was A first-digit anomaly in the 2009 Iranian presidential election by Boudewijn F. Roukema (http://arxiv.org/abs/0906.2789). In this study, the author analyzes the first digit of vote counts in the election results publicized by the Iranian Ministry of the Interior on June 14, 2009. First, he analyzed first-round results for elections in immediately preceding years in other countries. This established a baseline or control to compare with. He also took into account the pre-election...

Summary


In many ways, Benford's Law seems like the perfect test for fraud and other misdeeds. It's intriguing, simple, and computationally cheap. However, as we've seen, it's not always reliable; Χ2 tests can be finicky, and as evidence, it doesn't stand on its own. It really needs to be buttressed by other data and helps to support cases of fraud.

However, it is a piece of evidence. It provides a distribution that is difficult to mimic, and it describes a wide class of number sequences accurately. In combination with other information and evidences, it can provide support in the cases of misdeed.

We've also learned about Χ2 tests, a very useful statistical procedure. Although they are sensitive to the sample size, these tests still have a lot to offer and are highly recommended. They're cheap to perform., and they work well with the categorical data or data that counts a limited, fixed number possibilities, such as sex or color. When used with appropriate sample sizes, they're straightforward...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Clojure Data Analysis
Published in: May 2014Publisher: ISBN-13: 9781783284139
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Eric Richard Rochester

Eric Richard Rochester Studied medieval English literature and linguistics at UGA. Dissertated on lexicography. Now he programs in Haskell and writes. He's also a husband and parent.
Read more about Eric Richard Rochester