Visual Critique #1: On Continuous Scales of Text


As more and more textual information is brought into the digital realm, inheriting structure and little descriptive tags, visualizations are put forward to combat the “Overnewsed but Underinformed” syndrome (on that note, Neil Postman should be seen as nothing but an entry point for more serious reading on media effects).

In the past years, visualizations of textual elements have come and gone. Maybe speaking of a boom is not entirely misguided ever since processing makes creating them less tedious. Among many of them, one crucial problem has caught my eye. People don’t seem to care about the scale of their variables. Text, at least to my knowledge, is still a discrete (i.e. only integer counts) nominal (i.e. no inherent order) variable.

To put this clearly: text items in their original form cannot be ordered along an axis.

There are several solutions around for this problem. All of them are less than convincing.

  • One way is to just throw data points – visual representations of one textual item – at the recipient in an unsorted way, maybe with a physical simulation engine for organic movement. If someone can explain in what way the visual metaphor of fluids in a bacterium benefits visual data display, I’ll be glad to call it useful.
  • Some people think it’s possible to sort text items by another nominal variable, like author or topic. That, however, does not really sort but rather group them.
  • By far the most common approach is to use the continuous metadata of time and arrange by publishing date. Although this approach conveys some meaning (recency can be seen as related to importance), it pays no attention to other characteristics of text items whatsoever. A wonderful example is the new visualization (fig.1) for digg, the largest social news site, that is about to launch these days.



Figure 1 displays the main view of “Incoming”, an upcoming visualization of activity on digg. It features a bar graph with the horizontal axis representing an adjustable time window and bar height representing story weight (number of “diggs” – for an explanation of the concept see the FAQ). There is no absolute y-axis, which makes it remarkably similar to an oversized sparkline (small data graphic propagated by Tufte – will be covered in an upcoming visual explanation of the week).

The project has been covered by Max Kiesler, his partner Emily Chang, and photos of a presentation are available on Flickr. One closeup shot reveals that the text labels below bars are actually category names (technology, gaming etc.). Translating the visual message of this display into one sentence, it says:

“Here are relative popularity and the category of 17 items that were active from 4:32:33 to 4:34:57 compared to each other.”

This in itself is a very limited statement. If we look at the digg homepage for comparison, it offers two basic modes for story display: The most popular stories accumulated from a fixed point in time up to now (Top Today, Top This Month), and the most popular stories from the last n hours (while surpassing a certain threshold, I assume, either for the absolute number of stories or popularity). This graph seems to convey the same information as the standard digg homepage, minus the title, text and other information related to stories. How exactly should users profit from this form of data display? Emily Chang states the benefits as following:

Given the sheer volume of stories and activity on Digg, it’s becoming impossible to find new, noteworthy stories or to see what was popular at any given time unless it was on the home page. This certainly gives me a method to explore and discovery the stories I’m interested in based on live activity.

To state that it is impossible to find new or noteworthy stories is of course wrong. The following two sentences do, however, get something right: this chart is about “live activity” and “any given time”. It reduces stories to their popularity rating, providing an overview of the distribution of popularity among the site’s categories at a given moment in time.

“Incoming” thus does not concern itself with the actual data (or text), it shows behavioral patterns among people casting votes for nondescript items. I assume the visualization was never designed for the public anyways, but rather as a sort of installation for the offices of digg inc.

Verdict: Aesthetic but irrelevant and source-data-agnostic.



“Amusing Ourselves to Death: Public Discourse in the Age of Show Business” (Neil Postman)


About this entry