Visualising data

Peter Geelan-Small – Stats Central, UNSW

13/06/2019

What this talk is about

  • making good graphics for publication
  • basic principles of graphing
  • building blocks of graphs
  • examples of common types of graph

What this talk is not about

  • animations
  • 3-D graphics
  • web-based graphics
  • interactive graphics
  • how to make graphs in particular software packages (examples here are produced in R because you can make great graphs with it!1)
1: https://www.r-project.org

Statistical graphs

“… statistical graphics are instruments to help people reason about quantitative information”2

2: Tufte, E. R. 2001. 'The Visual Display of Quantitative Information', p. 91

Basic elements of a plot

  • graph panel
  • scales, labels, tick marks
  • plotting symbols, line types, colour
  • reference lines
  • keys
  • captions


Maximise the “data-ink ratio”3

  • use ink for data, not unnecessary decoration
3: Tufte, E. R. 2001. 'The Visual Display of Quantitative Information', p. 96

Basic guidelines of graphing data

Make the data stand out

  • let the data fill the graph panel
  • don’t force axes to start at zero if data values are not close to zero (could use “break” marks to show axis does not start at zero)
  • label the axes clearly
  • use symbols, lines, colours etc. that highlight the data
  • use reference lines or grids with care
  • put legends outside the graph panel
  • put notes in the caption or text
  • make sure the graph remains clear if made smaller (e.g. for publication)

Basic guidelines of graphing data

Make the data stand out continued

Colour - symbols and lines

  • some people are colourblind
  • hard to see differences especially among reds, oranges, yellows and greens
  • there are different types of colourblindness
  • don’t use red and green together
  • check out ColorBrewer for colour palettes (in R, RColorBrewer package)


Avoid “chartjunk”4 - more about this later

4: Tufte, E. R. 2001. 'The Visual Display of Quantitative Information', p. 107

Basic guidelines of graphing data

Make sure the graph is properly understood

  • Put all necessary details about the graph in the caption
    • describe everything that is graphed
    • clearly explain any error bars
    • note important features in the data
    • describe the conclusions that are made from the data shown

Variation

Two areas for showing variation:

  • empirical variation of data - exploring data
  • variation of statistics - exploring data and model output

Empirical variation - discrete data

Data: A sample of three types of gastropod shell was scored as occupied by a hermit crab or empty5.

Question: Do hermit crabs prefer a certain shell type?


Species Occupied Empty
Austrocochlea 47 42
Bembicium 10 41
Cerithiidae 125 49
5: Glover, T. & Mitchell, K. 2015. 'An Introduction to Biostatistics.' Waveland Press, p. 369.

Comparing variation - discrete data

Mosaic plot

Data set - Sleep in mammals

Data: Measurements of brain and body weight, life span, gestation time, time sleeping, predation and danger indices for 62 species of mammals6.

Question: What is the best prediction model for total sleep time?

6: http://www.statsci.org/data/general/sleep.html

Empirical variation - Box plots

Empirical variation - Scatter plot matrix

Empirical variation - Scatter plot matrix

Log-transform positively skewed data to clarify relationships