Tooling for the Lazy Programmer: DRYing up D3

D3.js is a powerful, extensible library for data visualization. It makes some fairly advanced data visualization ideas available to anybody who can bind data to DOM elements. The set of supported features is vast, including everything from layouts like the stream graph, to an arsenal of map projections for the geodesist. However, D3 acheives this amazing breadth of utility through a an unconventional programming pattern. For example, to add new SVG circle elements to an existing <svg> element, you'd do something like:

// select all the circles in `svg` and return them
var existing_circles = svg.selectAll("circle")

// bind `your_data_array` to the circles
var data = circles.data(your_data_array)

// make a place holder for each data element without a circle
var placeholders = data.enter()

// and append a circle for each of the placeholders above.
var all_circles = placeholders.append("circle")

This style of declension is a double edged sword. It is terse and powerful: You can bind data to DOM elements, execute transitions, enter new elements, and remove obsolete elements all in a single block of code. On the other hand, it defies naive attempts to avoid repetition. This can lead to some pretty awful spaghetti code. Hence, it's important to take a proactive stance on code repetition: noting where it happens and abstracting it away. The first thing to do is to have a decent working definition for the task. This semi-formal definition is my working concept of what a data visualization is:

Chart: Let data be the set of possible data points. Let the set of graphical elements be called graphics. Finally, let mappings be a collection of functions transforming dimensions of data into graphics elements on the screen. Then a Chart is a tripple of (mappings, data, graphics),

data is an exogenous variable, and choosing graphical elements is largely a design decision. Of course, this can get complicated, but much of D3's rich feature set is targeted at resolving the intricacies of this problem. However, the treatment of the mapping element of the tupple is bare bones.

D3 implements the heavy lifting with the d3.scales object. d3.scales implements a variety of common mappings from data space to screen space. For example, in this area chart, courtesy of Mike Bostock, d3.scale.linear and d3.time.scale map data to the vertical and horizontal dimensions of the screen.

(Sidenote: d3.time.scale lives apart from d3.scale.linear and its other relatives is because of how strange time is, not because it is conceptually different)

However, d3.scales also leave a few steps to the user, and for the most part, people implement these steps over and over and over again, and they do so for each mapping function in their data visualization.

This isn't good—- it's tedious, error prone, and certainly not D.R.Y. There are essentially two operations that are responsible for most of the repetition: Accessing data elements and computing the output of the mapping function.

Initially these were the only functionalities my solution addressed, but there are a few additional operations we almost always do once we have the accessor function, the scale, and the data. The first is computing the extent of the data, so we can figure out the ratio between units of data and units on the screen (whether they be in cartesian coordinates or RGB). Secondly we should always include an axis for the graph, although interaction can alleviate some of that pressure. In any event I can't imagine a case in which you would use d3.svg.axis without eventually using a scale, so it makes sense to keep these ideas together.

I think the amount of repeated thought involved in creating D3 scales and axes is a bit of a wart, so I wrote a class, called Mapping, to relieve the pain. This class lets you stop thinking about what a scale actually is, and provides a few convenience methods, for the common scale-related tasks described above.

(N.B. I use fellow Airshipper Chris Dickinson's style guide. It's a little idiosyncratic, but before you flame his inbox for the absence of semicolons, you should understand his reasoning, laid out in the unabridged version)

You initialize the class with the base d3.scale object of your choice, and the accessor function responsible for computing inputs to the scale. Commonly the accessor is as simple as function(d){return d.x}, but even in those cases, once you've defined it you never need to think about it again. An example Mapping might be:

var mapping = new Mapping(d3.scale.linear(), price_per_pound)

function price_per_pound(d) {
  return (+d.price)/(+d.weight)
}

If you need the value of a data element p, just call mapping.accessor(p). More commonly you'd just call mapping.place to map p into the screen space.

Finally there's two convenience methods, mapping.compute_domain which takes the data array and calculates the extent of the data set and updates the domain of the d3.scale object; and create_axis, which returns a newly created d3.svg.axis() with the scale attribute already set.

Methods involving the scale or the axis return the appropriate object, so you can continue method chaining like you're used to.