Visualising Data

Leighton Pritchard

6 September 2016

Use and License

Recording of this talk, taking photos, discussing the content using email, Twitter, blogs, etc. is permitted (and encouraged), providing distraction to others during the presentation is minimised.

These slides and source code are available on GitHub: https://widdowquinn.github.io/Presentation-DataVis-Barplots

These slides are made available under CC-BY v4.0

IMAGINE…

YOUR FIGURES ARE AMAZING!
BUT MISLEADING

A bar chart

Two effectors

Knocked out independently
Host chlorosis measured

Communication

Stories told through figures

Scales matter

Indication of quantities

Context matters

The same or not the same?

Figures can mislead

Storytelling

Figures are what you remember of a ‘story’

What about uncertainty?

Another Bar Chart

Four effectors

Bacterial effectors
Inoculate wild-type plants
Measure growth (CFU)

Four bar plots

Do the effectors have the same effect?

Add error bars

Do the effectors have the same effect?

Error bars

Estimates of uncertainty
But uncertainty of what?
standard deviation (sd):
- describes the data: how much members of the group differ from the mean
standard error (of the mean) (sem):
- describes the estimate of the mean: standard deviation of the estimate of the mean

SD or SEM?

Which was used (& which do you need to know)?

Raw data

Are they the same biological responses?

What does mean mean?

Same mean implies the same response?

What does mean mean?

Unequal sample sizes (cf. barplot)

What does mean mean?

Outliers (cf. barplot)

What does mean mean?

Bimodal distribution (cf. barplot)

But stats, right?

We only use figures as guides…

“Figures tell a story, but we actually only believe the stats”
Typical paper:
- P<0.05, t-test (NHST), a description if you’re lucky
Do the distributions support use of a t-test, e.g. assumptions for 2-sample t-test:
- both populations Normal
- equal standard deviations

…we trust the P-values

Bar plots can hide inappropriate assumptions

Source: Weissgerber et al. (2015)

Figures can mislead

reinforce poor practice
- binary thinking
- overlooking data distributions and wrong statistical assumptions for tests
- overlooking uncertainty
suggest neat stories (P<0.05)
- data, like life, can be messy

Ways forward

Your analysis?

What you did…
- Open package foo. Click. Click, drag. Click, Click. Undo. Click. Right-click. Save results.csv
- Load into Excel. Click, drag. Generate graph. Right-click. Save pretty-graph.png

Your analysis?

What you said you did in the paper…
- I analysed my data in foo using the bar analysis. Results are shown in Figure 1.

How reproducible is a mouse click?

Reproducible research

Automate (i.e. learn to program)
Write code in a (very) high-level language
Get some training
Use version control
Get a code buddy
Share code and data openly
Write tests

Now what?

“Thanks for undermining me. Now what do I do about it?”
Other data representations are available (UseR!)
Data visualisation/statistics/programming training courses

Other visualisations

Anscombe’s Quartet

Four datasets: same means and standard deviations

geom_bar(), Source: Anscombe (1973)

Boxplots

Median, interquartiles, outliers

geom_boxplot()

Raw data

1D scatterplots

geom_jitter()

Box and raw data

Boxplots and jittered 1D scatterplots

geom_boxplot() + geom_jitter()

Violin plot

Data density estimate

geom_violin()

Violin and raw data

Stacked, not jittered, data

geom_violin() + geom_dotplot()

Acknowledgements

Where do ideas come from?

Christina Bergmann: “Visualization in Biology or Why #barbarplots”
- most of this presentation: RDVW presentation
- #barbarplots
Mike Croucher: “Is Your Research Software Correct?”
- most of the rest of this presentation: Presentation
Software Carpentry
Data Carpentry
Software Sustainability Institute

References

Anscombe, F. J. (1973). “Graphs in Statistical Analysis.” American Statistician 27(1): 17–21. Paper

Weissgerber, T. L. et al. (2015). “Beyond bar and line graphs: Time for a new data presentation paradigm.” PLoS Biology, 13(4), e1002128. doi:10.1371/journal.pbio.1002128 Paper