Eyes on data — Civic Infographics

 
 
 
You are here: Home Eyes on data

Eyes on data

 

Levels of measurement and visual variables in graphing data

Ironically, the proliferation of infographics and the popularization of its practices sometimes leads to some of its fundamental principles – useful to avoid mistakes in data representation – being overlooked or ignored altogether.

Choosing the most suitable way to display a certain dataset is often the first obstacle for an inexperienced author. To overcome it, in addition to acquiring a clear idea of the basic cognitive aspects that regulate the design and use of infographics, it is useful to reflect on two fundamental concepts: the level of data measurement, and visual variables.

These form the link between the realm of values to be displayed and the visible form that they acquire by means of infographic artifice.

The level of measurement can be defined as the set of rules that determine how a name or number are associated to an aspect of reality.

An effective classification, formalized by psychologist Stanley Smith Stevens in 1946, isolates and describes four main levels: nominal, ordinal, interval and ratio.

 

Nominal level

At the nominal level there are no hypotheses about the correlation between the values assigned to data. Each value is an independent category, used only to label the phenomenon by giving it a name.

The only rule at this level of measurement is that the nominal categories to which objects refer must be inclusive and mutually exclusive. The first requirement entails that every object must find a place within a nominal category; the second that the relation between nominal categories and objects must have every object corresponding to a nominal category, and to one only.

There are no assumptions about the hierarchy or the distance between categories, and only a relationship of equality within each class is allowed: all the objects in a class are nominally equal.

At the nominal level, even numerical values such as those in the (1, 2, 3, n) series have the mere function of symbols, and cannot be processed mathematically.

Dichotomies are some of the simplest examples of nominal data (for example, yes/no or male/female).

 

Ordinal level

At the ordinal level data can be organized by assigning them to ordered, qualitative classes, and they can be correlated by means of the three logical symbols (> greater than, < less than, = equal to).

On these bases, data can be ordered but not spaced along a predetermined scale. Therefore, as with nominal data, mathematical operations are not applicable.

Qualitative attributes cannot be added, subtracted, or divided: for example, it does not make sense to think that the sum of two “small" objects can result in one “medium" or “large" object, or to say that a “small" object is n times larger than any other.

The relationship between ordinal data maintains the equality between elements of a same class, and also features two other important properties: symmetry and transitivity. The first means that the relations between objects are symmetrical (given two objects A and B, if A is grater than B then B must also be less than A). The second implies the transitive nature of the relationships between objects (if A> B and B> C, then it must also be that A> C).

 

Interval level

At the interval level, in addition to the properties of the nominal and ordinal level holding true, it is possible to define the distances between categories along a predetermined scale.

The interval scale does not include the null value (the value 0 °C, for example, does not imply the absence of temperature) and therefore can only be used to measure differences and not absolute values. To clarify this aspect, the example of temperature scales can be still useful. If in a location A there is a temperature of 10 °C, and in a different location B there is one of 20 °C, we cannot say that B is twice as hot as A. The relationship between the temperatures in the two places depends on the units of measurement used. Indeed, under the same conditions but using the Fahrenheit scale, we would have 50 °F in A and 68 °F in B: two values that are no longer one the double of the other. In this case we can only say that there is a 10-degree difference between A and B, when measuring temperature on the Celsius scale, or an 18-degree difference on the Fahrenheit scale. Temperatures are a typical example of measure based on intervals.

Data based on intervals can be subject to the mathematical operations of addition and subtraction.

 

Ratio level

Finally, at the ratio level measures maintain all the proprieties of the previous levels, and in addition have a non-arbitrary zero value; this allows for the ratio of difference between values.

Linear distance is a typical case of measurement based on a ratio scale. The null value of the horizontal distance between two points indicates the absence of distance between them. With regard to the relations of ratio equality or inequality, for example we can consider a location A that is 10 km away from B and a location C that is 5 km away from B, and regardless of the unit adopted we will always be able to say that A is twice as far from B than B is to C. This is precisely due to the fact that distance is measured based on a ratio scale.

Basic knowledge about levels of measurement is useful to fully understand the practical effects that are caused on data-based graphics by the appropriate use of visual variables to represent quantities.

Visual variables

The expression visual variable is used to describe the differences perceived between different symbols.

The concept was first formalized by Jacques Bertin [1967, 1977], and later furthered by authors such as McCleary [1983], Morrison [1984], DiBiase [1991] and MacEachren [1994], mostly in the field of cartography.

Bertin identified two groups of visual variables: six retinal ones (shape, size, orientation, texture or grain, tone or color, and value) and two relating to position.

Positional variables correspond to Cartesian coordinates on a chart or to geographical coordinates on a m.

Bertin’s analysis on positional variables is often overlooked or considered relevant more to mapping than to the study of symbols for the representation of values.

By setting apart the six retinal variables from the two positional ones, Bertin separates the relationships between symbols in Cartesian space (charts) and Euclidean space (maps) from their other visible properties.

The retinal variables represent unique ways to organize the traits that make up a cartographic symbol. According to Bertin’s model, first positional variables set a symbol on the plane, and then the retinal ones elevate it, offering the eye a pattern of light that is different from the background’s.

Thus, to express similarity, order and proportionality relationships, graphics employs eight classes of differences that the eye can perceive between “spots", that is between changes in light patterns.

Each retinal variable requires a brief explanation.

Shape and size

 



Shape and size

While shape is rather self-explanatory, size is the area of the plane on which a symbol – be it a point, line or surface – extends.

A “point symbol" of course is not restricted to circular points, but includes any symbol, of any shape, as long as a pair of positional variables can define it, locating it on the plane.

In abstract terms, point symbols are geometric objects with a position but no dimensions; linear symbols have one dimension (length); areal symbols have two dimensions to delimit their surface.

Once represented, all geometric primitives (points, lines and surfaces) occupy a part of the plane. The measure of their extension corresponds to the visual variable that Bertin called size.

Size is quantitative in itself. Thus the difference between two sizes can express a ratio: if we say A is twice as big as B, we are using B as the unit to measure A.

When quantities are expressed in sizes, the perception system is based on the geometric progression of the symbols’ surface.

In cartographic design, for example, we can use size in two ways: scaling the size of the entire symbol, or changing the size of the individual traits that form it.

 

Value



Value

Bertin defined value as the percentage of white in a fine halftone screen. If the value refers to a grayscale image, the complete opposite of pure white will be a zero value corresponding to black.

Value itself is not quantitative because white cannot be used to measure black. Value thus expresses quantities that can only be sorted: intuitively, one value is associated with a quantity that is bigger or smaller than the one associated with another, but without the aid of a legend we are unable to even approximately assess how bigger or smaller.

When data are transcribed by means of this variable, the perception system bases itself on the visual equidistance of value levels.

Value is one of the variables that were most frequently affected by developments of Bertin’s theory. With the advent of video-graphics, the use of halftone screens to create areas of color on paper was gradually abandoned, and value was revised in order to better define its meaning.

Bertin had characterized this visual variable in a way that later became ambiguous. Within graphic design software, value can be observed by maintaining the same color tone while varying brightness, moving the cursor within the color mixer from the beginning shade to pure white.

Size and value (or brightness) are both ordered: we do not need a legend to sort them; they create a spontaneous order which must clearly correspond to that of the represented objects in a progression of the two visual variables, from largest to smallest or from lightest to darkest.

Not matching the visual order with data’s is a graphic and logical mistake. In an ordered transcription, the legend should only serve to indicate the numerical values that define the data classes associated with a certain type of size or value.

 

Color, grain, orientation

The last retinal variables we have to analyze are color, grain or texture, and orientation.

Color

We have already talked about color in terms of shade. A shade can be defined as the dominant wavelength in the visible spectrum that originates color. Outside of Bertin’s classification, one of the most often cited visual variables that is linked to color is saturation or chroma, defined as a mixture between a shade of gray and any pure hue.

 

Texture

 

The grain, or texture, refers to the spacing and size of the point or linear elements that are repeated and combined into a pattern to form a symbol.

Orientation

For linear and areal symbols, orientation refers to the direction of the individual parts within the symbol, while for point symbols it refers to the direction of the entire symbol itself. By logic, this rule is necessary to distinguish orientation as a retinal variable of a linear or areal symbol from its orientation on the map, for example, which instead depends on its positional variables.

Only in the case of points this distinction is unnecessary because changing the orientation of the entire symbol, pivoting on its center, does not change its position on the plane: its orientation is independent of positional variables. This is a rather subtle distinction, and can be misleading especially if, as often happens, one forgets positional variables when analyzing Bertin’s theory.

Thus even a careful and experienced author like Mark Monmonier [1993], in writing about Bertin’s variables, fell into the trap when he says that “Orientation is an important visual variable for symbols that represents features with an identifiable direction, such as lines portraying roads or rivers or arrows showing winds or ocean currents" [Monmonier 1993, p. 60].

The author’s confusion between orientation as a retinal variable and spatial orientation of symbols on paper is obvious. In the case of wind vane arrows – which are point symbols according to the definition we gave above – you can use the retinal variable of orientation by rotating the whole symbol in order to return the different wind directions, without changing the positional variables of the symbol itself.

In the case of roads, rivers and ocean currents, which are necessarily represented as linear symbols, the retinal variable of orientation can be applied only to internal sections of each symbol, and therefore does not distinguish these objects at all in relation to their spatial orientation. The latter remains a function of the positional variables, and indeed changes from point to point.

The different hatching inside linear symbols allows only to distinguish, for example, one road from the other, carrying out the same function of shape or color.

From all this we can draw some final conclusions, which are essential for the production of charts and maps.

The retinal variables of shape, color and orientation – the latter being used only for linear and areal symbols – are suitable to represent qualitative data that are not ordered along a scale of values but are separate one from the other, that is characterized by a nominal level of measurement.

The orientation variable applied to point symbols is suitable for representing data characterized by nominal, ordinal or interval levels of measurement.

When direction is related to an angular scale of values with a zero value, as is the case of wind direction measured according to the North, the retinal variable of orientation can return quantitative data, but not at the ratio level of measurement, because the zero value remains arbitrary.

Size and value, finally, are the most powerful and versatile retinal variables because they can be used to represent qualitative and quantitative data at any level of measurement.

(By Giulio Frigieri)

Bibliography

 Stevens Stanley Smith (1946), “On the theory of scales of measurement'", in Science, 03 (2684), pp. 677-680. For a valid and more recent alternative, featuring 9 levels of measurement, Chrisman Nicholas (1998), “Rethinking Levels of Measurement for Cartography", in Cartography and Geographic Information Science, vol. 25 (4), pp. 231-242.

 Bertin Jacques (1967), Sémiologie grafique: les diagrammes, les réseaux, les cartes, Gauthiers-Villars & Mouton, Paris (English translation: Semiology of graphics: diagrams, networks, maps; in particular, see the chapter titled “The properties of the graphic system", pp. 41-49).

 Bertin Jacques (1977), La graphique et le traitement graphique de l’information, Flammarion, Paris (English translation: Graphics and graphic information processing, Walter de Gruyter & Co).

 McCleary G.F.J. (1983), “An effective graphic vocabulary", in IEEE Computer Graphics & Applications 3, n. 2, pp. 46-53.

 Morrison J.L. (1984), “Applied cartographic communication: map symbolization for atlases", in Cartographica 21, n. 1, pp. 44-84.

 DiBiase D. (1990) “Visualization in the earth sciences", in Earth and Mineral Sciences, Bulletin of the College of Earth and Mineral Sciences, Vol. LIX, pp. 13-18.

 MacEachren A.M. (1994), Visualization in modern cartography, Pergamon Press, Oxford.

 MacEachren A.M. (1994b), Some truth with maps: a primer on symbolization and design, Association of American Geographers, Washington.

 Monmonier Mark (1993), Mapping it out: expository cartography for the humanities and social sciences, University of Chicago Press, p. 60.

&nbsp;
 
 
  What is <ahref  

The <ahref Foundation focuses on the quality of information emerging from today's social networks and digital media. Its research activities aim at innovation geared toward good journalism and citizen participation, while also developing open platforms & projects to increase online collaboration.

   
  timu  

   
  Podio  

We're powered by Podio, fully customizable for crm sales and much more through the unique ability to create your own apps to work the way you want.

   
 
 
© 2012 Fondazione <ahref | Sede legale: Vicolo Dallapiccola 12 - 38122 Trento - Italy | P. IVA 02178080228 Creative Commons License
 
&nbsp;
&nbsp; &nbsp;