Visualising Data

Leighton Pritchard

6 September 2016

Use and License

Recording of this talk, taking photos, discussing the content using email, Twitter, blogs, etc. is permitted (and encouraged), providing distraction to others during the presentation is minimised.

These slides and source code are available on GitHub: https://widdowquinn.github.io/Presentation-DataVis-Barplots

These slides are made available under CC-BY v4.0

IMAGINE…

  • YOUR FIGURES ARE AMAZING!
  • BUT MISLEADING

A bar chart

Two effectors

  • Knocked out independently
  • Host chlorosis measured

Communication

  • Stories told through figures

Scales matter

  • Indication of quantities

Context matters

  • The same or not the same?

Figures can mislead

Storytelling

  • Figures are what you remember of a ‘story’

  • What about uncertainty?

Another Bar Chart

Four effectors

  • Bacterial effectors
  • Inoculate wild-type plants
  • Measure growth (CFU)

Four bar plots

  • Do the effectors have the same effect?

Add error bars

  • Do the effectors have the same effect?

Error bars

Error bars

  • Estimates of uncertainty
  • But uncertainty of what?
  • standard deviation (sd):
    • describes the data: how much members of the group differ from the mean
  • standard error (of the mean) (sem):
    • describes the estimate of the mean: standard deviation of the estimate of the mean

SD or SEM?

  • Which was used (& which do you need to know)?

Raw data

Raw data

  • Are they the same biological responses?

What does mean mean?

  • Same mean implies the same response?

What does mean mean?

  • Unequal sample sizes (cf. barplot)

What does mean mean?

  • Outliers (cf. barplot)

What does mean mean?

  • Bimodal distribution (cf. barplot)

But stats, right?

We only use figures as guides…

  • “Figures tell a story, but we actually only believe the stats”
  • Typical paper:
    • P<0.05, t-test (NHST), a description if you’re lucky
  • Do the distributions support use of a t-test, e.g. assumptions for 2-sample t-test:
    • both populations Normal
    • equal standard deviations

…we trust the P-values

  • Bar plots can hide inappropriate assumptions

Source: Weissgerber et al. (2015)

Figures can mislead

  • reinforce poor practice
    • binary thinking
    • overlooking data distributions and wrong statistical assumptions for tests
    • overlooking uncertainty
  • suggest neat stories (P<0.05)
    • data, like life, can be messy

Ways forward

Your analysis?

  • What you did…
    • Open package foo. Click. Click, drag. Click, Click. Undo. Click. Right-click. Save results.csv
    • Load into Excel. Click, drag. Generate graph. Right-click. Save pretty-graph.png

Your analysis?

  • What you said you did in the paper…
    • I analysed my data in foo using the bar analysis. Results are shown in Figure 1.

How reproducible is a mouse click?

Reproducible research

  • Automate (i.e. learn to program)
  • Write code in a (very) high-level language
  • Get some training
  • Use version control
  • Get a code buddy
  • Share code and data openly
  • Write tests

Now what?

Other visualisations

Anscombe’s Quartet

  • Four datasets: same means and standard deviations

geom_bar(), Source: Anscombe (1973)

Boxplots

  • Median, interquartiles, outliers

geom_boxplot()

Raw data

  • 1D scatterplots

geom_jitter()

Box and raw data

  • Boxplots and jittered 1D scatterplots

geom_boxplot() + geom_jitter()

Violin plot

  • Data density estimate

geom_violin()

Violin and raw data

  • Stacked, not jittered, data

geom_violin() + geom_dotplot()

Acknowledgements

Where do ideas come from?

References

Anscombe, F. J. (1973). “Graphs in Statistical Analysis.” American Statistician 27(1): 17–21. Paper

Weissgerber, T. L. et al. (2015). “Beyond bar and line graphs: Time for a new data presentation paradigm.” PLoS Biology, 13(4), e1002128. doi:10.1371/journal.pbio.1002128 Paper