3 Visualize
Weeks 3 & 4: February 7 - February 27, 2022
Over the course of the past few days, we had a Tableau tutorial and we looked at data visualization from the perspective of the creator — how to encode data into visual signals that can be decoded effectively by the viewer — and the viewer — how do humans decode visual signals? For the former, we explored the works of Jacques Bertin and Jock MacKinlay’s effectiveness ranking and for the latter, we briefly examined a variety of human cognition theories on how the human brain sees and interprets what it sees.
3.1 From Data to Visualization
Data visualization begins by understanding our task and modeling our data. This includes:
- Task Analysis. What are the main questions we hope to answer with our data? What are our data exploration goals? Do we have any underlying assumptions?
- Domain Analysis. What are the conventions in this domain? Are there meta-data that augment or provide meaning to our dataset?
- Data Model. Data models are formal descriptions of our data. They include the schema of the data: what the different features are called, their names, and how they are represented, their types, (e.g. integers, strings, decimals, etc are physical representations of the data describing how they are stored in a file or a database). A higher-level data type can simply categorize these valuses into Nominal (N), Ordinal (O) or Quantitative (Q). A conceptual model helps us think of the semantics of the data: e.g. given two decimal values (41.25, -120.97), a conceptual model that describes these values as latitude and longitude ascribes a spatial location to these values as well as constraints on their ranges.
Visualization researchers have examined multiple data-type to visualization mappings or taxonomies. We will focus on the simple but powerful nominal (N), ordinal (O), quantitative (Q) data type system and how it can allows to construct effective visualizations.
3.1.1 Data types
Nominal (N) data types represent labels or categories of things that can’t be meaningfully compared. e.g. fruits: apples, oranges, bananas, etc. We don’t have a meaningful way to rank or compare apples to oranges.
Ordinal (O) data types represent ordered values: e.g. the quality of meat A, AA, AAA where AAA is a better quality meat than A, or the 5-star hotel rating where 5 stars is better than 1 star. The ordered values are not amenable to difference or other algebraic computations: e.g. AAA - AA has no meaning, and there are no 4.5 star hotels.
Quantitative (Q-interval) data types represent quantitative values where comparing differences between values is allowed but we have no “zero” value. E.g. dates or locations as represented by latitude and longitutde.
Quantitative (Q-ratio) data types represent quantitative values where we can compare ratios or proportions. They denote physical measurements. e.g. time period rather than date or distance from a fixed point rather lat-long coordinates. There is a fixed “zero” value.
3.1.2 Bertin’s Levels of Organization
Jacquer Bertin in 1967 published the “Semiology of Graphics”. In it, he laid out the following foundational ideas on visualization:
- An image is a set of signs.
- A sender encodes information in these signs.
- A receiver decodes information from these signs.
Consider a simple graphic with 3 marks, A, B, and C, like the one below. What can we decode from it?
We can decode that: 1. A, B, and C are distinct 2. B is between A and C 3. The distance between A and B is half that of B and C.
In this image, we used position as an information encoding-channel to communicate properties (e.g. distances between them, their relative magnitude, etc) of three quantitative data points represented with marks (x’s).
Bertin defined multiple visual encoding variables such as position (along the x and y dimensions), size, color (hue and value), texture, orientation, shape, etc. and mapped each variable to how well it encoded a data variable given’s it data type (nominal, ordinal or quantitative).
Following in this tradition, Jock MacKinlay founder of Tableau created an effectiveness ranking of several visual encoding variables for the different data types. and use this ranking to automatically search for visualizations that best encode a data set given an ordering on how important is each variable for the visualization task.
3.2 In the Eye of the Beholder
The effectiveness rankings of visual encoding variables by Bertin and Mackinlay, while largely conjectured, have theoretical roots in psychology and theories of human cognition.
The following readings provide further details on the following four main perception ideas:
Signal Detection: When can humans detect a difference? Of note here is Weber’s Law, which states that the “Just Noticeable Difference” is a constant fraction \(k\) of the initial stimulus. Thus, most continuous variation in stimuli is perceived as discrete steps. Hence, a gray scale better encodes ordinal variables than quantitative variables as humans may not detect the slight variations in the shades of gray.
Magnitude Estimation: How big is the difference? Of note here is Steven’s Power Law, which empirically describes the relationship between increases in the intensity of a stimulus and our perceived sensation of the increase. We tend to underestimate increases in the area of a circle but accurately estimate increases in length. Heer and Bostock construct crowd studies in their 2010 paper to determine human accuracy at a variety of magnitude estimation tasks with visualizations.
Visual Salience: How quickly can you find information? Which visual features lead to pre-attentive processing or completing a task such as finding a target in a sea of distractors within 200-250 ms? Which display dimensions are perceived holistically vs. which ones are analytically processed, i.e. judged separately?
“Visual Salience and Finding Information” Chapter 5 by C. Ware in his book, “Information visualization: Perception for design”, neatly ties concepts such as pre-attentive processing, the theory of integral and separable dimensions with visualization design.
- Gestalt Grouping: How do humans construct patterns and groups in what they see? This series of posts by Elijah Meeks on how understanding gestalt principles are useful not only for improving traditional visualizations but crucial for creating complex ones like network and hierarchy diagrams where the principles of connectedness, common fate, and parallelism come into play.
“How Maps Are Seen” Chapter 3 by Alan MacEachren in his book, “How Maps Work”, explores how the eyes and the brain work to see and interpret cartographic visualizations. He also explores psychology and the Gestalt principles of perceptual organization to explain why certain map visualizations are preferred to others.
3.3 Design & Redesign
What is the best way to improve our visualization skills? Understanding the cognitive underpinnings behind how human see can immensely help improve our designs. Here are a few more things you can do:
“Ask a friend” to look at your visualization and critique it: Did they get the message you intended to communicate? How easy was it for them to grasp that message? What was most confusing to them?
“Learn by example” A few books provide illustrative examples of master visualizers at work, which explain why certain design choices worked and which ones didn’t. Here are is a list of books that you can access online through the NYU Library:
- The functional art: an introduction to graphics and information visualization, by Alberto Cairo
- Fundamentals of data visualization: a primer on making informative and compelling figures, by Claus Wilke
- “Critique by Redesign” A practice common to visualization experts is to take an existing visualization and critique it by redesigning an improved version. In this process, one starts to understand some of the complexities of visualization design: what compromises did the designer make? What were their constraints and goals? What are your compromises, constraints and goals?