Visualising data

Peter Geelan-Small – Stats Central, UNSW

13/06/2019

What this talk is about

  • making good graphics for publication
  • basic principles of graphing
  • building blocks of graphs
  • examples of common types of graph

What this talk is not about

  • animations
  • 3-D graphics
  • web-based graphics
  • interactive graphics
  • how to make graphs in particular software packages (examples here are produced in R because you can make great graphs with it!1)
1: https://www.r-project.org

Statistical graphs

“… statistical graphics are instruments to help people reason about quantitative information”2

2: Tufte, E. R. 2001. 'The Visual Display of Quantitative Information', p. 91

Basic elements of a plot

  • graph panel
  • scales, labels, tick marks
  • plotting symbols, line types, colour
  • reference lines
  • keys
  • captions


Maximise the “data-ink ratio”3

  • use ink for data, not unnecessary decoration
3: Tufte, E. R. 2001. 'The Visual Display of Quantitative Information', p. 96

Basic guidelines of graphing data

Make the data stand out

  • let the data fill the graph panel
  • don’t force axes to start at zero if data values are not close to zero (could use “break” marks to show axis does not start at zero)
  • label the axes clearly
  • use symbols, lines, colours etc. that highlight the data
  • use reference lines or grids with care
  • put legends outside the graph panel
  • put notes in the caption or text
  • make sure the graph remains clear if made smaller (e.g. for publication)

Basic guidelines of graphing data

Make the data stand out continued

Colour - symbols and lines

  • some people are colourblind
  • hard to see differences especially among reds, oranges, yellows and greens
  • there are different types of colourblindness
  • don’t use red and green together
  • check out ColorBrewer for colour palettes (in R, RColorBrewer package)


Avoid “chartjunk”4 - more about this later

4: Tufte, E. R. 2001. 'The Visual Display of Quantitative Information', p. 107

Basic guidelines of graphing data

Make sure the graph is properly understood

  • Put all necessary details about the graph in the caption
    • describe everything that is graphed
    • clearly explain any error bars
    • note important features in the data
    • describe the conclusions that are made from the data shown

Variation

Two areas for showing variation:

  • empirical variation of data - exploring data
  • variation of statistics - exploring data and model output

Empirical variation - discrete data

Data: A sample of three types of gastropod shell was scored as occupied by a hermit crab or empty5.

Question: Do hermit crabs prefer a certain shell type?


Species Occupied Empty
Austrocochlea 47 42
Bembicium 10 41
Cerithiidae 125 49
5: Glover, T. & Mitchell, K. 2015. 'An Introduction to Biostatistics.' Waveland Press, p. 369.

Comparing variation - discrete data

Mosaic plot

Data set - Sleep in mammals

Data: Measurements of brain and body weight, life span, gestation time, time sleeping, predation and danger indices for 62 species of mammals6.

Question: What is the best prediction model for total sleep time?

6: http://www.statsci.org/data/general/sleep.html

Empirical variation - Box plots

Empirical variation - Scatter plot matrix

Empirical variation - Scatter plot matrix

Log-transform positively skewed data to clarify relationships

Empirical variation - Groups of data and trends

Empirical variation - Groups of data and trends

Empirical variation - Groups of data and trends

Empirical variation - Groups of data and trends

Data set - sport, gender and iron

Data: Measurements on 102 male and 100 female athletes collected at the Australian Institute of Sport7.

Question: What variables are important in explaining ferritin concentration?

Sport Hematocrit
Gender - female, male Hemoglobin
Height (cm) Plasma ferritin concentration
Weight (kg) Body mass index
Lean body mass Sum of skin folds
Red cell count Percent body fat
White cell count
7: http://www.statsci.org/data/oz/ais.html

Empirical variation - comparing distributions

Empirical variation in data

Empirical variation - comparing distributions

Axis labels could be more informative?

Empirical variation - comparing distributions

(A dot plot may even be clearer)

Empirical variation - comparing distributions

Pie charts

  • No scale
  • Difficult to infer percentages based on sizes of angles

Empirical variation - comparing distributions

3-D pie charts

  • Aargh! Worse.
  • “Chartjunk”

Empirical variation - comparing distributions

Clustered histograms

Empirical variation - comparing distributions

Clustered density plots

Empirical variation - comparing distributions

Clustered box plots

Empirical variation - comparing distributions

Violin plots

Variation of statistics

A useful picture of variation in means?

Don’t start scale at zero if data values are not close to zero.

Variation of statistics

What do the error bars show?

Variation of statistics

Describe what the error bars show!

Variation of statistics

Much clearer!

“Chartjunk” again

Display data with fitted model - modelling scale

Variation of fitted values

Back to the sleeping mammals data

Display data with fitted model - natural scale

Summary

Think about what the purpose of the graph is

Put the components of the graph together to make its message clear

Look at the outcome - show it to a colleague

Do it all again … and again …

Creating effective graphs is an iterative process


Happy graphing!

Resources

Cleveland, William S. 1994. The Elements of Graphing Data. AT&T Laboratories: Murray Hill, USA. 2nd edition.

Chen, Chun-houh et al. (eds) 2008. Handbook of Data Visualization. Springer: Berlin, Germany.

Edward Tufte’s principles for visualising data - one of many websites

Colour palettes and specifications: ColorBrewer (Also R package: RColorBrewer)

R Graph Gallery

Also for R users: Hadley Wickham’s ggplot2 package

Another R package: pals