Reader small image

You're reading from  Data Visualization with D3.js Cookbook

Product typeBook
Published inOct 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781782162162
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Nick Zhu
Nick Zhu
author image
Nick Zhu

Nick Zhu is a professional programmer and data engineer with more than a decade experience in software development, big data, and machine learning. Currently, he is one of the founders and CTO of Yroo.com - meta search engine for online shopping. He is also the creator of dc.js—a popular multidimensional charting library built on D3.
Read more about Nick Zhu

Right arrow

Appendix A. Building Interactive Analytics in Minutes

In this appendix we will cover:

  • The crossfilter.js library

  • Dimensional charting – dc.js

Introduction


Congratulations! You have finished an entire book on data visualization with D3. Together we have explored various topics and techniques. At this point you will probably agree that building interactive, accurate, and aesthetically appealing data visualization is not a trivial matter even with the help of a powerful library like D3. It typically takes days or even weeks to finish a professional data visualization project even without counting the effort usually required on the backend. What if you need to build an interactive analytics quickly, or a proof-of-concept before a full-fledged visualization project can be commenced, and you need to do that not in weeks or days, but minutes. In this appendix we will introduce you to two JavaScript libraries that allow you to do that: building quick in-browser interactive multidimensional data analytics in minutes.

The crossfilter.js library


Crossfilter is also a library created by D3's author Mike Bostock, initially used to power analytics for Square Register.

Crossfilter is a JavaScript library for exploring large multivariate datasets in browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records.

-Crossfilter Wiki (August 2013)

In other words, Crossfilter is a library that you can use to generate data dimensions on large and typically flat multivariate datasets. So what is a data dimension? A data dimension can be considered as a type of data grouping or categorization while each dimensional data element is a categorical variable. Since this is still a pretty abstract concept, let's take a look at the following JSON dataset and see how it can be transformed into dimensional dataset using Crossfilter. Assume that we have the following flat dataset in JSON describing payment transactions in a bar:

[
  {"date": "2011-11-14T01:17:54Z", "quantity": 2, "total": 190, "tip": 100, "type": "tab"},
  {"date": "2011-11-14T02:20:19Z", "quantity": 2, "total": 190, "tip": 100, "type": "tab"},
  {"date": "2011-11-14T02:28:54Z", "quantity": 1, "total": 300, "tip": 200, "type": "visa"},
..
]

Note

Sample dataset borrowed from Crossfilter Wiki: https://github.com/square/crossfilter/wiki/API-Reference.

How many dimensions do we see here in this sample dataset? The answer is: it has as many dimensions as the number of different ways that you can categorize the data. For example, since this data is about customer payment, which is observation on time series, obviously the "date" is a dimension. Secondly, the payment type is naturally a way to categorize data; therefore, "type" is also a dimension. The next dimension is bit tricky since technically we can model any of the field in the dataset as dimension or its derivatives; however, we don't want to make anything as a dimension which does not help us slice the data more efficiently or provide more insight into what the data is trying to say. The total and tip fields have very high cardinality, which usually is an indicator for poor dimension (though tip/total, that is, tip in percentage could be an interesting dimension); however, the "quantity" field is likely to have a relatively small cardinality assuming people don't buy thousands of drinks in this bar, therefore, we choose to use quantity as our third dimension. Now, here is what the dimensional logical model looks like:

Dimensional Dataset

These dimensions allow us to look at the data from a different angle, and if combined will allow us to ask some pretty interesting questions, for example:

  • Are customers who pay by tab more likely to buy in larger quantity?

  • Are customers more likely to buy larger quantity on Friday night?

  • Are customers more likely to tip when using tab versus cash?

Now, you can see why dimensional dataset is such a powerful idea. Essentially, each dimension gives you a different lens to view your data, and when combined, they can quickly turn raw data into knowledge. A good analyst can quickly use this kind of tool to formulate a hypothesis, hence gaining knowledge from data.

How to do it...

Now, we understand why we would want to establish dimensions with our dataset; let's see how this can be done using Crossfilter:

var timeFormat = d3.time.format.iso;
var data = crossfilter(json); // <-A

var hours = data.dimension(function(d){
  return d3.time.hour(timeFormat.parse(d.date)); // <-B
});
var totalByHour = hours.group().reduceSum(function(d){
  return d.total;
});

var types = data.dimension(function(d){return d.type;});
var transactionByType = types.group().reduceCount();
        
var quantities = data.dimension(function(d){return d.quantity;});
var salesByQuantity = quantities.group().reduceCount();

How it works...

As shown in the preceding section, creating dimensions and groups are quite straight-forward in Crossfilter. First step before we can create anything is to feed our JSON dataset, loaded using D3, through Crossfilter by calling the crossfilter function (line A). Once that's done, you can create your dimension by calling the dimension function and pass in an accessor function that will retrieve the data element that can be used to define the dimension. In the case for type we will simply pass in function(d){return d.type;}. You can also perform data formatting or other task in dimension function (for example, date formatting on line B). After creating the dimensions, we can perform the categorization or grouping by using the dimension, so totalByHour is a grouping that sums up total amount of the sale for each hour, while salesByQuantity is a grouping of counting the number of transactions by quantity. To better understand how group works, we will take a look at what the group object looks like. If you invoke the all function on the transactionsByType group you will get the following objects back:

Crossfilter Group Objects

We can clearly see that transactionByType group is essentially a grouping of the data element by its type while counting the total number of data elements within each group since we had called reduceCount function when creating the group.

The following are the description for functions we used in this example:

  • crossfilter: Creates a new crossfilter with given records if specified. Records can be any array of objects or primitives.

  • dimension: Creates a new dimension using the given value accessor function. The function must return naturally-ordered values, that is, values that behave correctly with respect to JavaScript's <, <=, >=, and > operators. This typically means primitives: Booleans, numbers, or strings.

  • dimension.group: Creates a new grouping for the given dimension, based on the given groupValue function, which takes a dimension value as input and returns the corresponding rounded value.

  • group.all: Returns all groups, in ascending natural order by key.

  • group.reduceCount: A shortcut function to count the records; returns this group.

  • group.reduceSum: A shortcut function to sum records using the specified value accessor function.

At this point we have everything we want to analyze. Now, let's see how this can be done in minutes instead of hours or days.

There's more...

We have only touched a very limited number of Crossfilter functions. Crossfilter provides a lot more capability when it comes to how dimension and group can be created; for more information please check out its API reference: https://github.com/square/crossfilter/wiki/API-Reference.

Dimensional charting – dc.js


Visualizing Crossfilter dimensions and groups is precisely the reason why dc.js was created. This handy JavaScript library was created by your humble author and is designed to allow you to visualize Crossfilter dimensional dataset easily and quickly.

Getting ready

Open your local copy of the following file as reference:

https://github.com/NickQiZhu/d3-cookbook/blob/master/src/appendix-a/dc.html

How to do it...

In this example we will create three charts:

  • A line chart for visualizing total amount of transaction on time series

  • A pie chart to visualize number of transactions by payment type

  • A bar chart showing number of sales by purchase quantity

Here is what the code looks like:

<div id="area-chart"></div>
<div id="donut-chart"></div>
<div id="bar-chart"></div>
…
dc.lineChart("#area-chart")
                .width(500)
                .height(250)
                .dimension(hours)
                .group(totalByHour)
                .x(d3.time.scale().domain([
                 timeFormat.parse("2011-11-14T01:17:54Z"), 
                  timeFormat.parse("2011-11-14T18:09:52Z")
]))
                .elasticY(true)
                .xUnits(d3.time.hours)
                .renderArea(true)
                .xAxis().ticks(5);

        dc.pieChart("#donut-chart")
                .width(250)
                .height(250)
                .radius(125)
                .innerRadius(50)
                .dimension(types)
                .group(transactionByType);
                
        dc.barChart("#bar-chart")
                .width(500)
                .height(250)
                .dimension(quantities)
                .group(salesByQuantity)
                .x(d3.scale.linear().domain([0, 7]))
                .y(d3.scale.linear().domain([0, 12]))
                .centerBar(true);

        dc.renderAll();

This generates a group of coordinated interactive charts:

Interactive dc.js charts

When you click or drag your mouse across these charts you will see the underlying Crossfilter dimensions being filtered accordingly on all charts:

Filtered dc.js charts

How it works...

As we have seen through this example, dc.js is designed to generate standard chart-based visualization on top of Crossfilter. Each dc.js chart is designed to be interactive so user can apply dimensional filter by simply interacting with the chart. dc.js is built entirely on D3, therefore, its API is very D3-like and I am sure with the knowledge you have gained from this book you will feel quite at home when using dc.js. Charts are usually created in the following steps.

  1. First step creates a chart object by calling one of the chart creation functions while passing in a D3 selection for its anchor element, which in our example is the div element to host the chart:

    <div id="area-chart"></div>
    ...
    dc.lineChart("#area-chart")
  2. Then we set the width, height, dimension, and group for each chart:

    chart.width(500)
         .height(250)
         .dimension(hours)
         .group(totalByHour)

    For coordinate charts rendered on a Cartesian plane you also need to set the x and y scale:

    chart.x(d3.time.scale().domain([
      timeFormat.parse("2011-11-14T01:17:54Z"), 
      timeFormat.parse("2011-11-14T18:09:52Z")
    ])).elasticY(true)

    In this first case, we explicitly set the x axis scale while letting the chart automatically calculate the y-scale for us. While in the next case we set both x and y scale explicitly.

    chart.x(d3.scale.linear().domain([0, 7]))
            .y(d3.scale.linear().domain([0, 12]))

There's more...

Different charts have different functions for customizing their look-and-feel and you can see the complete API reference at https://github.com/NickQiZhu/dc.js/wiki/API.

Leveraging crossfilter.js and dc.js allows you to build sophisticated data analytics dashboard fairly quickly. The following is a demo dashboard for analyzing the NASDAQ 100 Index for the last 20 years http://nickqizhu.github.io/dc.js/:

dc.js NASDAQ demo

At the time of writing this book, dc.js supports the following chart types:

  • Bar chart (stackable)

  • Line chart (stackable)

  • Area chart (stackable)

  • Pie chart

  • Bubble chart

  • Composite chart

  • Choropleth map

  • Bubble overlay chart

For more information on the dc.js 'library please check out our Wiki page at https://github.com/NickQiZhu/dc.js/wiki.

See also

The following are some other useful D3 based reusable charting libraries. Although, unlike dc.js they are not designed to work with Crossfilter natively nevertheless they tend to be richer and more flexible when tackling general visualization challenges:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Visualization with D3.js Cookbook
Published in: Oct 2013Publisher: PacktISBN-13: 9781782162162
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Nick Zhu

Nick Zhu is a professional programmer and data engineer with more than a decade experience in software development, big data, and machine learning. Currently, he is one of the founders and CTO of Yroo.com - meta search engine for online shopping. He is also the creator of dc.js—a popular multidimensional charting library built on D3.
Read more about Nick Zhu