Visual Explanation of the Week #1: Stem-and-Leaf Graph

badge large hybrid2

Among the most common scenarios of data display, the distribution of one variable along one dimension ranks among the most frequent. In his 1977 book “Exploratory Data Analysis“, John W. Tukey devised an ingenious way to do this, the stem-and-leaf graph.

250 stem and leaf 1

On the left side of a vertical line, the measured units are plotted. These may be discrete numbers, as in my example on the left (fig.1), clustered groups or any other measurable variable. On the right side, measurements are recorded in their corresponding vertical height, with equal horizontal distance between them. Thus, the horizontal axis becomes a linear scale, conveying the absolute number of values.The fascinating aspect of this “graph” is its variability. By swapping the meaning of the two involved functions (the scale on the left and the measurements on the right), Tukey makes it anything between a bar-chart of sorts and a one-digit precision table for sorted and grouped data points. Figure 2 shows a typical five-point scale as employed by social sciences, with age plotted as the value of data points.

250 stem and leaf 2

How does this offer any advantage over other styles of visual representations? Typically, such data would be portrayed with bar or line graphs. Both of those would certainly omit the individual data items. Additionally, a line graph hides discrete data groups, suggesting a continuous scale of measurement where in reality we only have a nine-point scale.

The verdict: Use stem-and-leaf displays for one-dimensional distributions with less than 10 measurements per scale point or group. Tukey suggests quite some tricks to adept to a variety of situations. For a short description in a less hard-to-get book, cf. Wainer.

Tukey, John W. Exploratory Data Analysis. Reading 1977. Addison-Wesley pp. 7ff

Wainer, Howard. Graphic Discovery. Princeton 2005. Princeton UP. pp 119f

Advertisements

About this entry