Building Programs With Python (2)

Leighton Pritchard

Introduction

Goal 1

  • To learn the basic concepts of programming

We use Python for this:

  • we need something ;)
  • free, well-documented, cross-platform
  • widely-used

Use what your colleagues (tend to) use

Goal 2

  • To analyse and visualise experimental data

  • Effectiveness of a new treatment for arthritis
  • Several patients, recording inflammation on each day
  • Tabular (comma-separated) data

We can do this with a little programming

  • Why programming? AUTOMATION

Setup

Setting up - 1

Before we begin…

  • make a neat working environment
  • obtain data
cd ~/Desktop
cd python-novice-inflammation

LIVE DEMO

Getting Started

Starting Jupyter

At the command-line, start Jupyter notebook:

jupyter notebook

Jupyter landing page

Create a new notebook

Motivation

  • We have code to plot values of interest from multiple datasets
  • BUT the code is long and complicated
  • It’s not flexible enough to deal with thousands of files
  • We can’t modify it easily
  • SO we need to package our code for reuse: FUNCTIONS

FUNCTIONS

What is a function?

  • Functions in code work like mathematical functions

\[y = f(x)\]

  • \(f()\) is the function
  • \(x\) is an input (or inputs)
  • \(y\) is the returned value, or output(s)

  • The output \(y\) depends in some way on the value of \(x\) - defined by \(f()\).
  • Not all functions in code take an input, or produce a usable output, but the principle is generally the same.

My first function

  • fahr_to_kelvin() to convert Fahrenheit to Kelvin

\[f(x) = ((x - 32) \times \frac{5}{9}) + 273.15\]

LIVE DEMO

Calling the function

  • Calling fahr_to_kelvin() is the same as calling any other function
print('freezing point of water:', fahr_to_kelvin(32))
print('boiling point of water:', fahr_to_kelvin(212))

LIVE DEMO

Composing functions

  • Composing Python functions works like mathematical functions: \(y = f(g(x))\)
  • Suppose there’s a function converting K to C: kelvin_to_celsius()
  • We could convert F (temp_f) to C (temp_c) by executing the code:
temp_c = kelvin_to_celsius(fahr_to_kelvin(temp_f))

LIVE DEMO

New functions from old

  • We can wrap this composed function inside a new function: fahr_to_celsius:
def fahr_to_celsius(temp_f):
    return kelvin_to_celsius(fahr_to_kelvin(temp_f))
print('freezing point of water in Celsius:', fahr_to_celsius(32.0))
  • This is how programs are built: combining small bits into larger bits until the function we want is obtained

LIVE DEMO

Exercise 01 (10min)

Can you write a function called outer() that:

  • takes a single string argument
  • returns a string comprising only the first and last characters of the input, e.g.
print(outer("helium"))
hm

Scope

  • Variables defined within a function, including parameters, are not ‘visible’ outside the function
  • This is called function scope
a = "Hello"

def my_fn(a):
  a = "Goodbye"

my_fn(a)  
print(a)

LIVE DEMO

Exercise 02 (10min)

What would be printed if you ran the code below?

a = 3
b = 7

def swap(a, b):
    temp = a
    a = b
    b = temp

swap(a, b)
print(b, a)
  1. 7 3
  2. 3 7
  3. 3 3
  4. 7 7

ANALYSIS

Tidying Up

  • Now we can write functions
  • Let’s make the inflammation analysis easier to reuse: one function per operation
%pylab inline

import matplotlib.pyplot
import numpy as np
import os
import seaborn

analyse()

  • We’ll write a function called analyse() that plots the data
def analyze(data):
    fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))

    axes1 = fig.add_subplot(1, 3, 1)
    axes2 = fig.add_subplot(1, 3, 2)
    axes3 = fig.add_subplot(1, 3, 3)

    axes1.set_ylabel('average')
    axes1.plot(numpy.mean(data, axis=0))

    axes2.set_ylabel('max')
    axes2.plot(numpy.max(data, axis=0))

    axes3.set_ylabel('min')
    axes3.plot(numpy.min(data, axis=0))

    fig.tight_layout()
    matplotlib.pyplot.show()

LIVE DEMO

detect_problems()

  • We noticed that some data was questionable
  • This function spots problems with the data
def detect_problems(data):
    if numpy.max(data, axis=0)[0] == 0 and numpy.max(data, axis=0)[20] == 20:
        print('Suspicious looking maxima!')
    elif numpy.sum(numpy.min(data, axis=0)) == 0:
        print('Minima add up to zero!')
    else:
        print('Seems OK!')

LIVE DEMO

Code reuse

  • Loop over the files and analyse() and detect_problems()
filenames = [os.path.join('data', f) for f in os.listdir('data')
             if f.startswith('inflammation')]
             
for fname in filenames:
    data = np.loadtxt(fname, delimiter=",")
    print(fname)
    analyse(data)
    detect_problems(data)       

LIVE DEMO

TESTING AND DOCUMENTATION

Motivation

  • Once written, functions are reused
  • Functions are reused without further checks
  • When functions are written:
    • test for correctness
    • document their function
  • Example: centring a numerical array
def centre(data, desired):
    return (data - np.mean(data)) + desired

LIVE DEMO

Test datasets

  • We could try centre() on real data
    • but we don’t know the answer!
  • Use numpy to create an artificial dataset
z = np.zeros((2, 2))
print(centre(z, 3.0))

LIVE DEMO

Real data

  • On real data…
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')
print(centre(data, 0))
  • But how do we know it worked?

LIVE DEMO

Check properties

  • We can check properties of the original and centred data
  • mean, min, max, std
centred = centre(data, 0)
print('original min, mean, and max are:', numpy.min(data), numpy.mean(data), numpy.max(data))
print('min, mean, and max of centered data are:', numpy.min(centred),
      numpy.mean(centred), numpy.max(centred))
print('std dev before and after:', numpy.std(data), numpy.std(centred))      

LIVE DEMO

Documenting functions

  • Writing comments in the code (using the hash #) is a good thing
  • Python provides for docstrings
    • These go after the function definition
    • Hook into Python’s help system
def centre(data, desired):
    """Returns the array in data, recentered around the desired value."""
    return (data - numpy.mean(data)) + desired

help(centre)

LIVE DEMO

Default arguments

  • The centre() function requires two arguments
  • We can specify a default argument in how we define the function
def centre(data, desired=0.0):
    """Returns the array in data, recentered around the desired value.
    
    Example: centre([1, 2, 3], 0) => [-1, 0, 1]
    """
    return (data - np.mean(data)) + desired
centre(data, 0.0)
centre(data, desired=0.0)
centre(data)

LIVE DEMO

Exercise 03 (15min)

Can you write a function called rescale() that

  • takes an array as input
  • returns an array with values scaled in the range [0.0, 1.0]
  • has an informative docstring
  • HINT: If L and H are the lowest and highest values in the original array, then the replacement for a value v should be (v-L) / (H-L).

ERRORS AND EXCEPTIONS

Create a new notebook

Errors

  • Programming: the process of making errors and correcting them until the code works
  • All programmers make errors
  • Identifying, fixing, and coping with errors is a valuable skill

Traceback

  • Python tries to tell you what has gone wrong by providing a traceback
def favourite_ice_cream():
    ice_creams = [
        "chocolate",
        "vanilla",
        "strawberry"
    ]
    print(ice_creams[3])

favourite_ice_cream()

LIVE DEMO

Parts of a traceback

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-1-b0e1f9b712d6> in <module>()
      8     print(ice_creams[3])
      9 
---> 10 favourite_ice_cream()

<ipython-input-1-b0e1f9b712d6> in favourite_ice_cream()
      6         "strawberry"
      7     ]
----> 8     print(ice_creams[3])
      9 
     10 favourite_ice_cream()

IndexError: list index out of range
  • (mostly, you can just look at the last couple of levels)

LIVE DEMO

Syntax errors

  • Logic errors occur when the code is ‘correct’ but does something ‘illegal’
  • Syntax errors occur when the code is not understandable as Python
def some_function()
    msg = "hello, world!"
    print(msg)
     return msg

LIVE DEMO

Syntax traceback

  File "<ipython-input-3-dbf32ad5d3e8>", line 1
    def some_function()
                       ^
SyntaxError: invalid syntax

LIVE DEMO

Fixed?

def some_function():
    msg = "hello, world!"
    print(msg)
     return msg

LIVE DEMO

Not quite

  File "<ipython-input-4-e169556d667b>", line 4
    return msg
    ^
IndentationError: unexpected indent

LIVE DEMO

Name errors

  • NameErrors occur when a variable is not defined in scope
  • (often due to a typo!)
print(a)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-c5a4f3535135> in <module>()
----> 1 print(a)

NameError: name 'a' is not defined

LIVE DEMO

Index Errors

  • If you try to access an element of a collection that does not exist, you’ll get an IndexError
letters = ['a', 'b', 'c']
print("Letter #1 is", letters[0])
print("Letter #2 is", letters[1])
print("Letter #3 is", letters[2])
print("Letter #4 is", letters[3])

Letter #1 is a
Letter #2 is b
Letter #3 is c
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-656a22fa6ec5> in <module>()
      3 print("Letter #2 is", letters[1])
      4 print("Letter #3 is", letters[2])
----> 5 print("Letter #4 is", letters[3])

IndexError: list index out of range

LIVE DEMO

Exercise 04 (15min)

  • Can you read the code below, and (without running it) identify what the errors are?
  • Can you fix all the errors so the code prints abbabbabba?
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (Number % 3) == 0:
        message = message + a
    else:
        message = message + "b"
print(message)

DEFENSIVE PROGRAMMING

Create a new notebook

Defensive programming

  • We’ve focused on the basics of building code: variables, loops, functions, etc.
  • We’ve not focused on whether the code is ‘correct’
  • Defensive programming is expecting your code to have mistakes, and guarding against them
  • Write code that checks its own operation

Assertions

  • Assertions are a Pythonic way to see if code runs correctly
    • 80-90% of the Firefox source code is assertions!
  • We assert that a condition is True
    • If it’s True, the code may be correct
    • If it’s False, the code is not correct
assert <condition>, "Some text describing the problem"

Example assertion

numbers = [1.5, 2.3, 0.7, -0.001, 4.4]
total = 0.0
for n in numbers:
    assert n > 0.0, 'Data should only contain positive values'
    total += n
print('total is:', total)

QUESTION: What does this assertion do?