Reader small image

You're reading from  Data Visualization with D3.js Cookbook

Product typeBook
Published inOct 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781782162162
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Nick Zhu
Nick Zhu
author image
Nick Zhu

Nick Zhu is a professional programmer and data engineer with more than a decade experience in software development, big data, and machine learning. Currently, he is one of the founders and CTO of Yroo.com - meta search engine for online shopping. He is also the creator of dc.js—a popular multidimensional charting library built on D3.
Read more about Nick Zhu

Right arrow

The crossfilter.js library


Crossfilter is also a library created by D3's author Mike Bostock, initially used to power analytics for Square Register.

Crossfilter is a JavaScript library for exploring large multivariate datasets in browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records.

-Crossfilter Wiki (August 2013)

In other words, Crossfilter is a library that you can use to generate data dimensions on large and typically flat multivariate datasets. So what is a data dimension? A data dimension can be considered as a type of data grouping or categorization while each dimensional data element is a categorical variable. Since this is still a pretty abstract concept, let's take a look at the following JSON dataset and see how it can be transformed into dimensional dataset using Crossfilter. Assume that we have the following flat dataset in JSON describing payment transactions in a bar:

[
  {"date": "2011-11-14T01:17:54Z", "quantity": 2, "total": 190, "tip": 100, "type": "tab"},
  {"date": "2011-11-14T02:20:19Z", "quantity": 2, "total": 190, "tip": 100, "type": "tab"},
  {"date": "2011-11-14T02:28:54Z", "quantity": 1, "total": 300, "tip": 200, "type": "visa"},
..
]

Note

Sample dataset borrowed from Crossfilter Wiki: https://github.com/square/crossfilter/wiki/API-Reference.

How many dimensions do we see here in this sample dataset? The answer is: it has as many dimensions as the number of different ways that you can categorize the data. For example, since this data is about customer payment, which is observation on time series, obviously the "date" is a dimension. Secondly, the payment type is naturally a way to categorize data; therefore, "type" is also a dimension. The next dimension is bit tricky since technically we can model any of the field in the dataset as dimension or its derivatives; however, we don't want to make anything as a dimension which does not help us slice the data more efficiently or provide more insight into what the data is trying to say. The total and tip fields have very high cardinality, which usually is an indicator for poor dimension (though tip/total, that is, tip in percentage could be an interesting dimension); however, the "quantity" field is likely to have a relatively small cardinality assuming people don't buy thousands of drinks in this bar, therefore, we choose to use quantity as our third dimension. Now, here is what the dimensional logical model looks like:

Dimensional Dataset

These dimensions allow us to look at the data from a different angle, and if combined will allow us to ask some pretty interesting questions, for example:

  • Are customers who pay by tab more likely to buy in larger quantity?

  • Are customers more likely to buy larger quantity on Friday night?

  • Are customers more likely to tip when using tab versus cash?

Now, you can see why dimensional dataset is such a powerful idea. Essentially, each dimension gives you a different lens to view your data, and when combined, they can quickly turn raw data into knowledge. A good analyst can quickly use this kind of tool to formulate a hypothesis, hence gaining knowledge from data.

How to do it...

Now, we understand why we would want to establish dimensions with our dataset; let's see how this can be done using Crossfilter:

var timeFormat = d3.time.format.iso;
var data = crossfilter(json); // <-A

var hours = data.dimension(function(d){
  return d3.time.hour(timeFormat.parse(d.date)); // <-B
});
var totalByHour = hours.group().reduceSum(function(d){
  return d.total;
});

var types = data.dimension(function(d){return d.type;});
var transactionByType = types.group().reduceCount();
        
var quantities = data.dimension(function(d){return d.quantity;});
var salesByQuantity = quantities.group().reduceCount();

How it works...

As shown in the preceding section, creating dimensions and groups are quite straight-forward in Crossfilter. First step before we can create anything is to feed our JSON dataset, loaded using D3, through Crossfilter by calling the crossfilter function (line A). Once that's done, you can create your dimension by calling the dimension function and pass in an accessor function that will retrieve the data element that can be used to define the dimension. In the case for type we will simply pass in function(d){return d.type;}. You can also perform data formatting or other task in dimension function (for example, date formatting on line B). After creating the dimensions, we can perform the categorization or grouping by using the dimension, so totalByHour is a grouping that sums up total amount of the sale for each hour, while salesByQuantity is a grouping of counting the number of transactions by quantity. To better understand how group works, we will take a look at what the group object looks like. If you invoke the all function on the transactionsByType group you will get the following objects back:

Crossfilter Group Objects

We can clearly see that transactionByType group is essentially a grouping of the data element by its type while counting the total number of data elements within each group since we had called reduceCount function when creating the group.

The following are the description for functions we used in this example:

  • crossfilter: Creates a new crossfilter with given records if specified. Records can be any array of objects or primitives.

  • dimension: Creates a new dimension using the given value accessor function. The function must return naturally-ordered values, that is, values that behave correctly with respect to JavaScript's <, <=, >=, and > operators. This typically means primitives: Booleans, numbers, or strings.

  • dimension.group: Creates a new grouping for the given dimension, based on the given groupValue function, which takes a dimension value as input and returns the corresponding rounded value.

  • group.all: Returns all groups, in ascending natural order by key.

  • group.reduceCount: A shortcut function to count the records; returns this group.

  • group.reduceSum: A shortcut function to sum records using the specified value accessor function.

At this point we have everything we want to analyze. Now, let's see how this can be done in minutes instead of hours or days.

There's more...

We have only touched a very limited number of Crossfilter functions. Crossfilter provides a lot more capability when it comes to how dimension and group can be created; for more information please check out its API reference: https://github.com/square/crossfilter/wiki/API-Reference.

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Visualization with D3.js Cookbook
Published in: Oct 2013Publisher: PacktISBN-13: 9781782162162

Author (1)

author image
Nick Zhu

Nick Zhu is a professional programmer and data engineer with more than a decade experience in software development, big data, and machine learning. Currently, he is one of the founders and CTO of Yroo.com - meta search engine for online shopping. He is also the creator of dc.js—a popular multidimensional charting library built on D3.
Read more about Nick Zhu