Reader small image

You're reading from  Clojure for Data Science

Product typeBook
Published inSep 2015
Reading LevelIntermediate
Publisher
ISBN-139781784397180
Edition1st Edition
Languages
Right arrow
Author (1)
Henry Garner
Henry Garner
author image
Henry Garner

Henry Garner is a graduate from the University of Oxford and an experienced developer, CTO, and coach. He started his technical career at Britain's largest telecoms provider, BT, working with a traditional data warehouse infrastructure. As a part of a small team for 3 years, he built sophisticated data models to derive insight from raw data and use web applications to present the results. These applications were used internally by senior executives and operatives to track both business and systems performance. He then went on to co-found Likely, a social media analytics start-up. As the CTO, he set the technical direction, leading to the introduction of an event-based append-only data pipeline modeled after the Lambda architecture. He adopted Clojure in 2011 and led a hybrid team of programmers and data scientists, building content recommendation engines based on collaborative filtering and clustering techniques. He developed a syllabus and copresented a series of evening classes from Likely's offices for professional developers who wanted to learn Clojure. Henry now works with growing businesses, consulting in both a development and technical leadership capacity. He presents regularly at seminars and Clojure meetups in and around London.
Read more about Henry Garner

Right arrow

Quantiles


The median is one way to calculate the middle value from a list, and the variance provides a way to measure the spread of the data about this midpoint. If the entire spread of data were represented on a scale of zero to one, the median would be the value at 0.5.

For example, consider the following sequence of numbers:

[10 11 15 21 22.5 28 30]

There are seven numbers in the sequence, so the median is the fourth, or 21. This is also referred to as the 0.5 quantile. We can get a richer picture of a sequence of numbers by looking at the 0, 0.25, 0.5, 0.7, and 1.0 quantiles. Taken together, these numbers will not only show the median, but will also summarize the range of the data and how the numbers are distributed within it. They're sometimes referred to as the five-number summary.

One way to calculate the five-number summary for the UK electorate data is shown as follows:

(defn quantile [q xs]
  (let [n (dec (count xs))
        i (-> (* n q)
              (+ 1/2)
              (int))]
    (nth (sort xs) i)))

(defn ex-1-10 []
  (let [xs (->> (load-data :uk-scrubbed)
                (i/$ "Electorate"))
        f (fn [q]
            (quantile q xs))]
    (map f [0 1/4 1/2 3/4 1])))

;; (21780.0 66219.0 70991.0 75115.0 109922.0)

Quantiles can also be calculated in Incanter directly with the s/quantile function. A sequence of desired quantiles is passed as the keyword argument :probs.

Note

Incanter's quantile function uses a variant of the algorithm shown earlier called the phi-quantile, which performs linear interpolation between consecutive numbers in certain cases. There are many alternative ways of calculating quantiles—consult https://en.wikipedia.org/wiki/Quantile for a discussion of the differences.

Where quantiles split the range into four equal ranges as earlier, they are called quartiles. The difference between the lower and upper quartile is referred to as the interquartile range, also often abbreviated to just IQR. Like the variance about the mean, the IQR gives a measure of the spread of the data about the median.

Previous PageNext Page
You have been reading a chapter from
Clojure for Data Science
Published in: Sep 2015Publisher: ISBN-13: 9781784397180
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Henry Garner

Henry Garner is a graduate from the University of Oxford and an experienced developer, CTO, and coach. He started his technical career at Britain's largest telecoms provider, BT, working with a traditional data warehouse infrastructure. As a part of a small team for 3 years, he built sophisticated data models to derive insight from raw data and use web applications to present the results. These applications were used internally by senior executives and operatives to track both business and systems performance. He then went on to co-found Likely, a social media analytics start-up. As the CTO, he set the technical direction, leading to the introduction of an event-based append-only data pipeline modeled after the Lambda architecture. He adopted Clojure in 2011 and led a hybrid team of programmers and data scientists, building content recommendation engines based on collaborative filtering and clustering techniques. He developed a syllabus and copresented a series of evening classes from Likely's offices for professional developers who wanted to learn Clojure. Henry now works with growing businesses, consulting in both a development and technical leadership capacity. He presents regularly at seminars and Clojure meetups in and around London.
Read more about Henry Garner