Building a Bubble Cloud

For the 2012 Republican and Democratic national conventions,Mike BostockShan Carter, andMatthew Ericsonhave created a series of visualizations highlighting the words being used in the speeches of both gatherings. These word-cloud-like word bubble clouds (what Ill call, unless you can think of a better name) serve as a great interface for looking at the differences in the two conventions and for browsing through quotes from the talks.

While there is a lot that could be discussed about all the little things that contribute to the quality and polish of these visualizations, in this tutorial we will look at some of the implementation details that make themtick.

Ive created a basic bubble cloud visualization that tries to replicate some of the functionality of the NYT version. Click on theimage below to see the demo.

Try clicking on the bubbles, then try dragging them around. Change the source text with the drop down in the upper-left corner.

Thesource code is available for downloadand use in your own projects.

This visualization uses D3s force-directed layout, so if you arent familiar with that, you might check out mypost on creating animated bubble charts, or thedesigning interactive network visualizationstutorial on Flowing Data.

While I wont be going over the basics of the force layout, hopefully there is still enough in this implementation to keep things interesting. The main topics I want to cover here are:

The use of SVG and plain html components in the same visualization

Saving the state of the visualization using links

Creating a custom collision detection mechanism

Here is all the CoffeeScript that makes this visualizationin case you want to follow along in the actual code. Sorry in advanced to the CoffeeScript haters.

Typically when usingD3, the more advanced visualizations are made with SVG. In a previous tutorial, wedabbled in a SVG-less D3 world, but the trade-offs were steep.

These bubble cloud visualizations actually combine SVG and Html elements elegantly! Specifically, the bubble elements themselves are circles inside of a SVG element, but the text on top of them is actually maintained in regulardivs. As a bonus, both sets of visual elements are backed by the data, so there is very little code duplication or overhead using this structure.

Why would you want this duality? My initial thought when I looked at the implementation was that this was for increased browser compatibility. I figured that in IE8 users might just see the html elements, but miss out on the bubble backgrounds.

This turned outnot to be the case. Navigating to the site with an old browser just gets you a static picture. The interactive versions still require browsers with SVG support.

My current guess for why this implementation route was chosen is because of howSVG deals with text wrapping.

Surprisingly, the 1.1 version of the SVG specificationdoes not deal with word wrapping. There are, and the 1.2 spec includes thetextAreaelement, which is supposed to solve this deficiency. But none of the solutions look particularly clean, dynamic, or without speed costs.

So the words and phrases the visualizations that will be showing might be on multiple lines. SVG doesnt support this natively, and it looks like there are some trade-offs with the SVG only workarounds. What should we do?

Why not just implement this part inHTML? This looks to be the path Mike et al. at the New York Times took.

And this combo plater actually works out pretty well. To get started, heres how we setup the elements that will hold the nodes and labels:

node will be used to group the bubbles

label is the container div for all the labels that sit on top of

– remember that we are keeping the labels in plain html and

Herethisrefers to thevisdiv. For the labels, we add a containerdivcalledbubble-labels. Adding the circles that make up the nodes for the visualization is stuff we have seen before, so lets focus on the labels. First we bind the same data we use to build up the nodes to.bubble-labelelements in ourlabelselection:

updateLabels is more involved as we need to deal with getting the sizing

as in updateNodes, we use idValue to define what the unique id for each data

So we can still use data bindings, just like in any other D3 visualization!

Now we build up our.bubble-labels by entering our (currently empty) selection and appending an anchor and some divs:

labels are anchors with divs inside them

labelEnter holds our enter selection so it

is easier to append multiple elements to this selection

Here we can see that each label is anawith twodivelements inside it. One to hold the word/phrase name, the other to hold the count. We will look at how these anchors work in the next section.

The last thing we need to do when creating these labels is determine the size of the elements holding this text. We want the text to be able to spill out a bit on either size of the bubble, but also to wrap around if it gets too long. Lets look at one way this can be done. Heres the code used to style and position the text:

first set the font-size and cap the maximum width to enable word wrapping

now we need to adjust if the width set is too big for the text

reset the width of the label to the actual width

compute dy – the value to shift the text from the top

Lets analyze this in more detail. As the comments say, we set our font size based on the same scale we are using to scale the bubbles. Good. Bigger bubbles will have bigger text. We also getword wrappingfor free by setting thewidthto an appropriate value.

But if the text is actually too small for the bubble, it will skew to the left. The width will pad it out and make it not line up correctly with the bubble. So we need to find out if the bubble or the word-wrapped text is bigger.

We can do this by adding the text to a temporary span element and grabbing its actual width usinggetBoundingClientRect.eachis used here so we can also set thedxproperty of our labels, which will be used when positioning them.

With this corrected width known, we can reset our labels width to account for smaller words.

Finally, we usegetBoundingClientRectagain to get the amount to shift the label down.

Its a bit confusing, but I think its a great solution to this issue – and now that its been worked out, we can all use it.

To close off this label issue, here is how thedxanddyproperties are used in thetickfunction to position the labels:

And there you have it. A nice way to use SVG and Html together, keeping with the D3 paradigm of binding data and using the same data backend for both.

Bysaving state, I mean being able to return to or share a particular view of a visualization. The way to do this in web-based interactive visualizations is by modifying the url so as to encode the current state of the visualization in the url itself.

Its an important detail, and one that is commonly missing from even professionally created visualizations. I liked Bryan Conners remark on this subject when discussing theMoney Ball WSJ political donor visualization:

Having the ability to share a specific view as a link on Twitter or Facebook is definitely important and, as far as Im concerned, should be standard for almost any contemporary visualization.

With this bubble cloud implementation, we really only have to keep track of two variables: the text being viewed and the current word selected. Ill focus on the word selected tracking, which we try to manage in a simple and elegant manner.

Clicking a bubble causes an immediate change to the url. And it is this url change that serves as a signal to update the visualization. This keeps the codeclean and neat. Lets see how it works.

The functionality depends on modifying the urlshash sectionwhich is the text that follows the hash symbol (). Here is theclickcallback function, which is executed each time a bubble is clicked:

We uselocation.replaceto kill the previous selection (if anything was selected before) and switch to the newly highlighted bubble. The use of thereplacefunction also means the history wont be polluted with a bunch of versions of the same visualization.

The hash component of the url is accessible vialocation.hash. Whats more, we can register to be notified when this component of the url changes by using thehashchangeevent. Combining these insights, the bubble cloud first hooks intohashchangein the visualizations initialization function:

And then uses this event to trigger a change in the active bubble.

called when url after the changes

if no node is selected, id will be empty

/span is now active/h3

h3No word is active/h3

This decouples the event handling implementation from the visualization and would allow us to add more callbacks forhashchangeto modify other parts of the page if we wanted to. Here, we are just modifying adiv, but think of the possibilities!

Of course you can use your own custom event to perform the same decoupling, but I think it is nice having state saving via url modification and user interaction changes wrapped up in this nice little package.

In D3, thegravitycomponent of a force actually draws nodes towards the center of the force layout. It is useful to ensure your nodes dont fly off the screen. But the default gravity implementation is symmetrical, meaning it pulls on a nodes vertical and horizontal positions equally.

But what if you want anon-symmetrical gravity? Say your visualization is wider then it is tall (like in this example) and you want nodes to spread out along the x axis a bit more? Well, then you can implement your own gravity function to push your nodes around how you like!

To get your own forces working on the nodes, we will call them each iteration of the force simulation by binding the forcestickevent to a callback function:

The force variable is the force layout controlling the bubbles

here we disable gravity and charge as we implement custom versions

of gravity and collisions for this visualization

Every time thetickevent is triggered, it will call our function, which happens to be calledtick. We also remove the built ingravityandchargeforces by setting them to0. We will implement both of these forces ourselves.

tick callback function will be executed for every

iteration of the force simulation

– moves force nodes towards their destinations

– deals with collisions of force nodes

– updates visual bubbles to reflect new force node locations

Most of the work is done by the gravity and collide

As the labels are created in raw html and not svg, we need

to ensure we specify the px for moving based on pixels

You can see for each node we are callinggravityandcollidewhich perform all the work on our nodes. After which we update our node and label positions to their newxandycoordinates.collidewill be in the next section, so lets focus ongravitynow:

custom gravity to skew the bubble placement

start with the center of the display

use alpha to affect how much to push

towards the horizontal or vertical

return a function that will modify the

You can see how the only thing we really tweak is how much alpha affects thexandycomponents of the gravitational pull. Thealphaparameter comes from D3s force layout and is thecooling temperature for the layout simulation.alphastarts at0.1and decreases as the force simulation continues, getting to around0.005before stopping.

Here we are reducing the alpha value to be applied to thexmovement by 8. When we multiple the movement towards the center byaxanday, this makes theymovement stronger. Thus, while the nodes should remain centered vertically, they will be allowed to drift along the x axis.

Redistributing nodes like this allows more content below it to be above the fold without running the risk of cutting off the bottom of the bubbles.

As we saw in thebubble chart tutorial, a form of collision avoidance can be implemented in D3 by making the charge associated with each node a function of the size of the node. This provides a visually interesting,organic, experience, where bubbles push on one another in an particle-like way.

The bubbles in these word visualizations act in a subtly but significantly different manner. Here the nodes bounce off one another, maintaining a rigid parameter space around themselves. So how is this affect achieved in D3? By implementing a custom collision detection and avoidance algorithm!

Its actually not as complicated as it might sound. Here is the code:

custom collision function to prevent

check that we arent comparing a node

use distance formula to find distance

find current minimum space between two nodes

using the forceR that was set to match the

if the current distance is less then the minimum

allowed then we need to push both nodes away from one another

scale the distance based on the jitter variable

We compute the distance between each node pair and see if it is less then the minimum distance allowed by our visualization. If so, then we move the nodes away from one another. The fact that D3 stores the currentxandyposition of each node as part of the data associated with that node makes it easy to get the values required to perform these calculations.

Notice the input variablejitter. Whats it do? We use it to help scale the distance that will be used to move the colliding nodes. Higher values will make this distance larger, smaller values will reduce it.

Ive connected the value ofjitterto a range control underneath the visualization. This lets you explore what the collisions look like when the nodes really start pushing each other hard. The default value,0.5visually seems to make the interactions look natural.

The implementation as shown above is abrute-force strategy. All nodes are compared with all other nodes, regardless of how close they are in the visualization. While this works decently for a small number of nodes, it will start bogging down as the node count increases.

If we had some rough-grained insight as to which nodes were nearest one another, we might be able to speed things up significantly by only performing these calculations for these nearby nodes.

This is actually how the NYT version of the collision algorithm is implemented. How do they know which nodes are nearby? By using aQuadtree.

A Quadtree is a tree-like data structure in which each internal node has four children (hence the name). You can use a Quadtree to partition up a two-dimensional space such that areas with more nodes will have more partitions. You can then visit the nodes in an efficient manner and stopped traversing once you have made it out of the quadrants that could affect the node in question.

D3 has a sparsely documentedQuadtreeimplementation which is used in the force layout in this manner to make it fast. While this Quadtree implementation is a great option to have, and should be considered for collisions between many nodes, I think the brute-force version provides the same basic idea, without more technical overhead.

That does it for this tutorial. Hopefully this provides a bit more insight into these great pieces from the New York Times (and hopefully they dont mind me continuing to exploit their great pieces).

Again, thecode is on github, so grab it and let your bubble clouds accumulate!

Taking this to the next level would involve splitting the bubbles based on some variable, like thedual convention version.

For that, check out Mikes great demonstration ofhow they split the bubblesto get a sense of how to add this kind of visualization.

Jim [email protected]