MCB112: Biological Data Analysis (Fall 2017)

Section 01: Python basics

Notes by William Mallard [9/8/2017]



# Anything to the right of a hash symbol (#) is a comment.
# Use comments to document what blocks of code are doing.
# When you revisit the code you wrote last month, you'll
# be glad you left yourself some clues!

x = 1  # you can also comment lines of code like this ...

# ... but generally it's more helpful to write high-level
# descriptions of blocks of code, rather than commenting
# each line individually.


Variables are containers for data. A variable name can be any combination of letters, numbers, and underscores.

You assign values to variables with an equal sign.

My_1st_Variable = 3

You can overwrite the value stored in a variable the same way.

x = 3
x = 4


A function takes some input, and generates an output. You must first “define” your function – ie, describe exactly what your function does. Once you’ve defined a function, you can “call” it. Functions can take as many or as few “arguments” as you want, and they can “return” a value when they are done.

# Define a function "add_one"
# that takes one argument,
# increments it by 1,
# and returns the result.
def add_one(x):
    return x + 1

# Call add_one with an argument of 4,
# assign the result to the variable y,
# and then print the value of y.
y = add_one(4)

# Call add_one with an argument of 4,
# assign the result to the variable y,
# overwriting the old value stored there,
# and then print the value of y.
y = add_one(0)

# Call add_one with an argument of 4,
# and then directly print the result,
# without first storing the value in y.

Data types

Computing is all about representation and manipulation of data. As far as your laptop’s processor is concerned, the world is 1s and 0s, and it spends its existence happily flipping bits and shuffling them around. Fortunately for us humans, programming languages provide friendlier ways of representing data.

There are three main types of data that we will work with: numbers, text, and booleans.


Python’s handling of numbers is fairly intuitive way. It supports standard arithmetic operations, follows the usual precedence rules, and supports parentheses.

Operator Operation
+ add
- subtract
* multiply
\ divide
% modulus
** exponentiate

When there are two operations in a row from the same precedence class, they are evaluated left to right. You can override this with parentheses.

eg, multiplication is evaluated first, and then addition second.

print(1 + 2 * 3 + 4)
print((1 + 2) * (3 + 4))

When you assign a number to a variable, Python keeps track of the fact that the variable contains a number, and it will allow you to perform numeric operations on that variable.

x = 5  # x contains a number
y = 3  # y contains a number
z = x + y  # so python lets you add them

There are two types of numbers: integers (ints) and floating point numbers (floats).

So far we’ve been using integers – whole numbers, without decimals. This is fine for things like counting, but what happens when we start dividing things?

print(5 / 2)

Note that 5 and 2 are ints, but (in Python3) division implicitly converts the numbers to floats first, so the resulting answer is a float.


A string is a sequence of text characters (letters, numbers, symbols, spaces, etc) surrounded by double or single quotes.

x = "This is a string."
	This is a string.

Just as numbers have intuitive operations defined on them (ie, arithmetic), strings also have various operations defined on them.

We can concatenate two strings:

"abc" + "def" + "ghi"

We can ask how long a string is:


We can convert string representations of numbers into actual numbers:

int('1') + float('2.5')

or vice versa:

str(1 + 2)
str(5 / 2)


A boolean is a data type with only two possible values: True or False. Whenever you perform a comparison, the result is a boolean.

Operator Operation
== equal
!= not equal
< less than
<= less than or equal to
> greater than
>= greater than or equal to

Note that = is used for assignment, while == is used to test for equality.

For example, you can compare numbers:

1 < 2
3 >= 3
4 == 5
7 != 8

You can also compare text:

'apple' == 'apple'
'apple' != 'orange'

Comparisons will become useful when we get to control flow.

Data Structures

We can organize collections of data into structures with various useful properties. Here we cover the three most useful examples.

Structure Description
List an ordered set of items
Set an unordered set of unique items
Dict an unordered set of pairs of items


A list is an ordered set of items.

The items of a list can be anything – numbers, strings, booleans … or even more lists, sets, or dicts.

# Here is a list of numbers:
x = [1, 2, 3, 4, 5]

# and a list of characters:
y = ['w', 'o', 'r', 'd']

# and a mixed list of numbers and strings:
z = [1, 'apples', 2, 'bananas']

# Here is a list of lists:
L = [[1, 2, 3], ['x', 'y', 'z']]

Lists support comparisons.

L1 = [1, 2, 3]
L2 = [1, 2, 3]
L3 = [1, 3, 2]

print(L1 == L2)
print(L1 == L3)

Lists support a number of operations.

Ask how many items are in a list:

L = ['apple', 'banana', 'cherry', 'durian', 'elderberry']


Add elements to a list:

L = ['apple', 'banana', 'cherry']
	['apple', 'banana', 'cherry', 'durian']

Because lists keep track of the order of their elements, each element can be accessed by its “index” – the number representing that element’s distance from the first position in the list. Item numbering starts from zero, just like in C and Java and every other language except for Matlab and R.

Retrieve an element from a list:

L = ['apple', 'banana', 'cherry', 'durian', 'elderberry']

Slicing – retrieve a subset of items from a list.

x = ['apple', 'banana', 'cherry', 'durian', 'elderberry']
	['banana', 'cherry', 'durian']
	['cherry', 'durian', 'elderberry']
	['durian', 'elderberry'] Slicing is done with a colon; the number before the colon is the element to start with; the number after the colon is the element _after_ the last one you want. In this example, we wanted elements 1-3, so we slice [1:4]. If we want to take everything from item #2 onward, we would use [2:] -- ie, just leave off the end index. We can also use negative numbers to index from the end of the list; so if we want to take the last two items, we use [-2:].

Sort a list.

L = [1, 4, 7, 2, 5, 8, 3, 6]

L = ['jim', 'bob', 'jane', 'alice', 'john', 'charlie']
	[1, 2, 3, 4, 5, 6, 7, 8]
	['alice', 'bob', 'charlie', 'jane', 'jim', 'john']

Check if a value is in our list.

L = [1, 4, 7, 2, 5, 8, 3, 6]
print(3 in L)
print('apple' in L)


A set is an unordered set of unique items. Sets are useful for any sort of task where you would use a Venn diagram.

S = {1, 2, 3, 4, 5, 6, 7, 8}

Sets support comparisons.

S1 = {1, 2, 3}
S2 = {1, 3, 2}
S3 = {1, 2, 3, 2, 2, 2}
S4 = {1, 2}

print(S1 == S2)
print(S1 == S3)
print(S1 == S4)
	False Note that since sets contain unique items, you can add the same value to a set multiple times, and the set will only keep track of the fact that the set contains that value.

As with lists, you can check if an item is in a set:

S = {1, 2, 3}
print(3 in S)
print(9 in S)
print('apple' in S)

Sets support union, intersection, difference, and symmetric difference.

S1 = {1, 2, 3, 4, 5}
S2 = {3, 4, 5, 6, 7}

# union:
S1 | S2

# intersection:
S1 & S2

# difference:
S1 - S2

# symmetric difference:
S1 ^ S2
{1, 2, 3, 4, 5, 6, 7}
{3, 4, 5}
{1, 2}
{1, 2, 6, 7}


Dictionaries map one item to another item. We call these keys and values, and a dict is just an unordered set of key-_value_ pairs.

You can build a dict all at once:

office_hours = {
    'Alice': 'Mon 3-4pm',
    'Bob': 'Tue 4-5pm',
    'Charlie': 'Wed 10-11am',
	{'Alice': 'Mon 3-4pm', 'Bob': 'Tue 4-5pm', 'Charlie': 'Wed 10-11am'}

You can build a dict one entry at a time:

office_hours = {}
office_hours['Alice'] = 'Mon 3-4pm'
office_hours['Bob'] = 'Tue 4-5pm'
office_hours['Charlie'] = 'Wed 10-11am'
	{'Alice': 'Mon 3-4pm', 'Bob': 'Tue 4-5pm', 'Charlie': 'Wed 10-11am'}

You can access dict values by key:

office_hours = {
    'Alice': 'Mon 3-4pm',
    'Bob': 'Tue 4-5pm',
    'Charlie': 'Wed 10-11am',

print('Alice:', office_hours['Alice'])

You can update/overwrite values in a dict:

office_hours = {
    'Alice': 'Mon 3-4pm',
    'Bob': 'Tue 4-5pm',
    'Charlie': 'Wed 10-11am',

office_hours['Charlie'] = 'Fri 4-5pm'
	{'Alice': 'Mon 3-4pm', 'Bob': 'Tue 4-5pm', 'Charlie': 'Wed 10-11am'}
	{'Alice': 'Mon 3-4pm', 'Bob': 'Tue 4-5pm', 'Charlie': 'Fri 4-5pm'}

Similarly to lists and sets, you can check if a key is in a dict:

office_hours = {
    'Alice': 'Mon 3-4pm',
    'Bob': 'Tue 4-5pm',
    'Charlie': 'Wed 10-11am',
'Alice' in office_hours
'Dave' in office_hours

Control Flow

So far we’ve treated scripts as a list of Python commands that run sequentially – every line runs, one after another, from top to bottom. But sometimes we need to skip chunks of code under certain conditions, or run different chunks of code depending on some condition. And sometimes we want to run the same chunk of code multiple times.


We use if statements to control the path our data follows through our code.

x = 5

if x > 0:
    print('x is greater than zero')
	x is greater than zero

We can also specify multiple conditons with elif (short for else if). An if statement can have as many elif statements as you want.

Each if/elif statement is evaluated in sequence, and only the code for the first condition that evaluates to True is run – and then Python jumps to the end of the if statement, and continues through your script line by line as usual.

We can also use else to specify what to do if none of the if or elif statements evaluate to True. An if statement can only have one else statement, and it must come last.

You can try this in a Python shell running in Terminal:

x = int(input("Please enter an integer: "))

if x < 0:
    x = 0
    print('changed negative x to zero')
elif x == 0:
    print('x is zero')
elif x == 1:
    print('x is one')
    print('x is more than one')


We use loops to run chunks of code multiple times. Each trip through the loop is called an iteration. We will mostly use for loops. In Python, for loops iterate over a list of items like so:

people = ['Alice', 'Bob', 'Charlie']

for person in people:

We can combine iteration and conditionals to do things like filter lists.

people = ['Alice', 'Bob', 'Charlie']

for person in people:
    if person[-1] == 'e':



We’ve been using the print() command without any discussion because it’s basic usage is fairly intuitive. Call print() with an argument, and it displays the value of that argument as text on your screen.

Sometimes you want to print to a file. It turns out print has an optional argument, “file”, which we’ll cover in the next two sections. But first, we’ll cover string formatting.

If you have multiple variables, and you want to combine them all into a single string with a specific formatting, say, for writing to a file, you’ll want to use the format() command. This command is built into all strings, so you call it like this:

S1 = "{}"
S2 = "{} + {}"
S3 = "{} = {} + {}"

x = 3
y = 2
z = 1

print(S2.format(y, z))
print(S3.format(x, y, z))
	2 + 1
	3 = 2 + 1

The pairs of braces ({}) indicate where the values should be substitued into the resulting string. If you provide empty braces with nothing in between, Python will figure out the data type and use the default formatting for that data type.

If for some reason you need to specify some particular formatting:

Op Result
{:d} int
{:f} float
{:s} str
{:10s} str, in a field 10 characters wide
{:.3f} float, with 3 digits past the decimal

Reading Files

To read a file, we use the open() command. By default, this command opens files in read mode. The open() command returns a file descriptor. Since we opened the file in read mode, you can iterate over the file descriptor to retrieve the file’s contents line by line.

L = []

for line in open('name_of_data_file_to_read'):

When reading lines of text from a file, each line includes the newline character (\n) at the end. You can strip this from the end using the strip() command.

line = "cherries are red\n"
line = line.strip()
# line is now: "cherries are red"

You can also split a line of text into a list of fields using the split() command.

line = "cherries are red\n"
row = line.strip().split()
	['cherries', 'are', 'red']

Writing Files

To write to a file, you must open your file in write mode. Specify you want to open the file this way by passing open() a second argument of “w”.

If the file you specified does not already exist, calling open(“your_filename”, “w”) will create an empty file. If the file already exists, open() will overwrite it with an empty file.

To fill the file line by line, iterate over the data you want to write to the file, and at each iteration use print() with the “file” argument set.

L = ['my first line',
     'my second line',
	 'my third line']

fd = open('name_of_file_to_write', 'w')
for line in L:
    print(line, file=fd)

Note that we are using print with an extra argument: “file=”.

Alternatively, if you want Python to take care of closing the file for you automatically:

L = ['my first line',
     'my second line',
	 'my third line']

with open('name_of_file_to_write', 'w') as fd:
	for line in L:
    	print(line, file=fd)