In this workshop we're using a software platform call the Jupyter Notebook, which lets you run Python code inside your web-browser, e.g. click on the next cell and press Ctrl+Enter to run this snippet of Python:
print("Hello world")
Python can also be run interactively at the command line in your terminal window, where >>>
represents the interactive Python prompt and quit()
is the simplest way to exit.
$ python
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 12:04:33)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print("Hello world")
Hello world
>>> quit()
More commonly people write Python scripts. These are plain text files usually ending with the .py
extension, which can be run like this:
$ python example.py
...
Any self contained snippet of Python (including many of the examples here) could be run this way - but doing it in the notebook is in many ways easier, especially for interactive work and keeping notes with the code. The Jupyter Notebooks really shine when producing graphics with Python, as we will see this afternoon.
Python strings can be defined using double quotes, or single quotes. It doesn't matter which you use, but they have to match. Strings can be added together (concatenated) with the +
operation, or duplicated by multiplying by an integer number:
name = "Hello"
message = name + " world"
print(message)
print(message * 3)
It is very common to want to combine strings togther, often including numbers or other values. A widely used approach to string formating works with percent sign place holders:
%s
to insert a string%i
to insert an integer number%f
to insert a floating point number(This convention was introduced in the C programming language, which was enormously influential in later programming language design.)
name = "Peter"
message = "Hello %s, your name has %i letters" % (name, len(name))
print(message)
The Python list
serves as a general purpose data structure for holding an ordered collection of values. This is similar to an 'array' in other languages.
You can have lists of strings, lists of integers, etc. The length of a list is defined as the number of elements in the list.
names = ["Peter", "Sue", "Leighton"]
print(len(names))
Most programming languages, including Python, have several ways to repeat a block of code multiple times. Python's for
loop works with a loop variable (letter
in the example below) which takes in turn each of the values to be looped over (here the letters in string variable message
):
message = "Hello world"
for letter in message:
print(letter)
Another common situation is to loop over a list of values:
for value in ["alpha", "beta", "gamma", "delta"]:
print(value)
Later in the workshop you'll see this syntax used with other constructs, including parsing a sequence file, where we loop over each sequence record in the file.
Often as your Python code gets longer you will find you repeat snippets of code. In this situation it is usually best to turn the repeated code into a function which can be defined once and then used multiple times (reproducibility).
# Python keyword def is short for define
# Here defining a function taking one argument
def make_message(name):
length = len(name)
# Python keyword return exits the function with this value:
return "Hello %s, your name is %i characters long" % (name, length)
print(make_message("Peter"))
print(make_message("Sue"))
print(make_message("Leighton"))
For loops are also very important for reducing duplicated code, so in this little example rather than calling our function three times we could do this:
# Assumes you've already executed the cells above which defined
# the list *names* and the function *make_message*
for name in names:
print(make_message(name))
The examples we have shown so far are functions taking a single argument, but functions can take multiple arguments. This example is a function which requires two arguments:
def letter_frequency(text, letter):
return text.count(letter) / len(text)
sequence = "AGTGACACAGGT"
for base in "ACGT":
print("Frequency of letter %s is %f" % (base, letter_frequency(sequence, base)))
This example also introduced something new for counting the letters in a string. Python strings have lots of methods, a special kind of Python function acting on the the object itself via this .method(...)
syntax.
print(message.upper())
print(message.lower())
print(message.count("l"))
We've tried to introduce a minimum of concepts and syntax here. There will be more Python examples later on, which we won't have time to explain in detail as we want to focus on the Bioinformatics instead.