2 See

Week 2: January 31 - February 7, 2022

It seems strange that we need to argue for the importance of data visualization. By now we must have heard the adage, a picture is worth a thousand words, ad infinitum. But pictures are no longer competing with just words or raw data tables, they are competing with statistical summaries imbued with magical Scientific-sounding powers.

Nothing illustrates the value of seeing your data more than Anscombe’s quartet: four synthetic data sets of \(x, y\) values that have near identical statistical summaries and linear regression features.

The researchers at Autodesk have taken this idea to a new,entertaining, level with their dino dozen dataset and tool that can generate any data set that looks a certain way when graphed but shares many statistical properties with another data set.

You can access the Anscombe’s quartet dataset, along with the Spreadsheet formulas we used to compute the summary statistics here.

2.1 What are visualizations good for?

Visualization can serve many purposes, but we can categorize these uses into three main ones: * Support Reasoning * Convey Information * Record Information

We briefly described two case studies on how visualization can

  • support our ability to formulate plausible reasons for the causes of the cholera outbreak by following along John Snow’s visual reasoning journey.
  • hinder the ability of Challenger’s decision-makers in seeing the relationship between low temperatures and O-ring failure.

Edward Tufte beautifully explains these case studies in his visual explanations book along with a summary of tips for designing visualizations that aid reasoning. The required reading chapter can be accessed here

“The power of represenation” Chapter by Don Norman in his book, “Things that make us smart”, explores how humans interpret different symbolic representations. This optional reading is an excellent candidate for you to reflect on and earn participation credit by critiquing this chapter.

2.2 How to design good visualizations?

In his foundational work on designing tools that can automatically generate visualizations for a given data set, Jock Mackinlay, co-founder of Tableau, codifies two principles for designing a good visualization:

  • Expressiveness. A set of facts is expressible in a visual language if the sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data. We can understand this as a visualization should “tell the truth, the whole truth and nothing but the truth!”

  • Effectiveness. A visualization is more effective than another visualization if the information conveyed by one visualization is more readily perceived than the information in the other visualization. We can interpret this as “use visual encodings that people can decode better (faster and more accurate).”

Over the next few weeks we will dive deeper into these two principles, seeing examples of visualizations that adhere to or violate these principles.

2.3 Slides