# Section 01: Jupyter and Python basics

(From Friday 9/9/16 lecture by Tim Dunn)

%%javascript
javascript:$('.math>span').css("border-left-color","transparent") // this cell fixes a Chrome latex display bug  <IPython.core.display.Javascript object>  ### Jupyter Notebook Jupyter notebooks can be launched by typing jupyter notebook into the command line and are typically edited in a web browser. Jupyter notebooks are organized into individual cells that can be used to group and organize code, plots, text, and equations. Ultimately, a jupyter notebook should deliver a clear, step-wise narrative that effectively communicates your code and data to yourself and others. As programming environments go, jupyter notebook is fairly simple. There are only a handful of keyboard shortcuts you will need to learn to use Jupyter effectively, and there are only two different types of cells you will use: markdown and code. #### Code Cells If you open a new Jupyter notebook, you’ll be greeted with a single input cell set to code mode. Anything typed into a code cell needs to be Python code, and Jupyter will use the Python interpreter to run the code in the cell and print its output just below the code itself. Code in a code cell can be run using the cell menu at the top of the page, or using one of the following three keyboard shortcuts: • CTRL + ENTER : Run code in selected cell • SHIFT + ENTER : Run code in selected cell, move to next cell • ALT + ENTER : Run code in selected cell, insert new cell below # This is a code cell. # The Python interpreter will run this code, and Jupyter will print its output just below. import numpy as np #Allows easy-to-use matrices import matplotlib.pyplot as plt #Allows us to use all main plotting tools # Makes sure plots and graphs are printed directly into the notebook file %matplotlib inline # Manually draw message x_narrow = np.asarray(range(10,21)) x_wide = np.asarray(range(5,41)) y_narrow = np.zeros((1,11), dtype=bool) y_wide = np.zeros((1,36), dtype=bool) im = np.zeros((50,50)) im[y_narrow + 5, x_narrow + 20] = 1; im[y_narrow + 40, x_narrow + 20] = 1; im[x_wide, y_wide + 35] = 1; im[x_wide, y_wide + 9] = 1; im[x_wide, y_wide + 19] = 1; im[y_narrow + 21, x_narrow - 1] = 1; # Display output plt.imshow(im, cmap = plt.cm.gray) plt.show()  #### Markdown Cells Often, you’ll want to surround code with some text as part of an explanation or narrative. Text (and other stylistic flourishes) can be added using a cell in markdown mode. A cell can be switched into a markdown cell by using the dropdown menu at the top of the page or by pressing ESC and then M. Text in markdown cells follows the markdown specification for text and page formatting. This cheatsheet will be helpful when using markdown cells. To switch back over to code mode, press ESC and then Y. Markdown cells also need to be rendered using CTRL + ENTER, but their output is displayed over the input markdown syntax. To edit a markdown cell after it has been rendered, simply double click the section. The text below is the underlying markdown for the above paragraph: #### Markdown Cells Often, you'll want to surround code with some text as part of an explanation or narrative. Text (and other stylistic flourishes) can be added using a cell in **markdown** mode. A cell can be switched into a markdown cell by using the dropdown menu at the top of the page or by pressing <kbd>ESC</kbd> and then <kbd>M</kbd>. Text in markdown cells follows the [markdown](https://daringfireball.net/projects/markdown/) specification for text and page formatting. This [cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) will be helpful when using markdown cells. To switch back over to code mode, press <kbd>ESC</kbd> and then <kbd>Y</kbd>. Markdown cells also need to be processed using <kbd>CTRL</kbd> + <kbd>ENTER</kbd>, but their output is displayed over the input markdown syntax. To edit a markdown cell after it has been processed, simply double click the section.  A few other useful keyboard shortcuts are: • ESC and then A : Insert cell before selected cell • ESC and then B : Insert cell after selected cell • ESC and then D and then D : Delete selected cell • CTRL + SHIFT + - : Split cell See the help menu at the top of your jupyter notebook for more keyboard shortcuts. ### LaTeX Jupyter markdown cells also support $\LaTeX$ formatting for math and equations. To use $\LaTeX$ formatting inline, surround your$statement\$ in dollar signs. This tells Jupyter to interpret the statement in math type and font.

statement becomes $statement$ in $\LaTeX$

To center equations in a cell, there is an additional command you’ll have to use.

not in $\LaTeX$: [x^2 + y^2]/z^2 = c

pre-$\LaTeX$:

\begin{align} \frac{x^2 + y^2}{z^2} = c \end{align}

processed $\LaTeX$:

\begin{align} \frac{x^2 + y^2}{z^2} = c \end{align}

Anything written between \begin{align} and \end{align} will be centered and processed as $\LaTeX$ text. In $\LaTeX$, symbols and commands generally begin with \, and groups are defined with {}. If there are multiple lines of math (separated by \\), the lines can be aligned at the equals sign using &=.

\begin{align}
\frac{x^2 + y^2}{z^2} &= c \\
x^2 + y^2 &= z^2 c
\end{align}


%

Now let’s render an equation from lecture this week.

\begin{align}
\tau_{i} = \frac{\nu_{i}}{\ell_{i}} \left( \sum_{j} \frac{\nu_{j}}{\ell_{j}} \right)^{-1}
\end{align}


\begin{align} \tau_{i} = \frac{\nu_{i}}{\ell_{i}} \left( \sum_{j} \frac{\nu_{j}}{\ell_{j}} \right)^{-1} \end{align}

You can use this $\LaTeX$ cheatsheet when working with your own equations, but it contains way more than you will ever need for this course. A more reasonable starting point is this table, which I’ve reproduced from a Caltech tutorial by Justin Bois.

Latex symbol
\approx $\approx$
\sim $\sim$
\propto $\propto$
\le $\le$
ge $\ge$
\pm $\pm$
\in $\in$
\ln $\ln$
\exp $\exp$
\prod_{i\in D} ${\displaystyle \prod_{i\in D}}$
\sum_{i\in D} ${\displaystyle \sum_{i\in D}}$
\frac{\partial f}{\partial x} ${\displaystyle \frac{\partial f}{\partial x}}$
\sqrt{x} $\sqrt{x}$
\bar{x} $\bar{x}$
\hat{x} $\hat{x}$
\langle x \rangle $\langle x \rangle$
\left\langle \frac{x}{y} \right\rangle $\left\langle \frac{x}{y} \right\rangle$

### Basic Python Concepts and Syntax

# Everything to the right of a '#' sound is ignored by the Python interpreter
# It is good practice to comment all of the code you ever write.
# Even if something seems incredibly obvious, it probably won't a year later (trust me)!


#### import

The import command allows you to access Python packages for use with analysis. import only needs to be used once per package, at the beginning of a Python session or Jupyter notebook. There are a few ways to use import:

# Import the entire package
import random
random.randrange(0,10)

5

# Import the entire package but give it a shorter, easy-to-type name
import random as rn
rn.randrange(0,10)

3

# Import only a specific part of the package
from random import randrange
randrange(0,10)

8

# Import only a specific part of the package and give it a shorter, easy-to-type name
from random import randrange as rr
rr(0,10)

5


#### magic commands

In addition to all of the other neat Jupyter features, the Jupyter also adds magic commands, a set of shortcuts only Jupyter can understand, like %lsmagic, which lists all of the Jupyter magic commands

%lsmagic

Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cd  %clear  %cls  %colors  %config  %connect_info  %copy  %ddir  %debug  %dhist  %dirs  %doctest_mode  %echo  %ed  %edit  %env  %gui  %hist  %history  %install_default_config  %install_ext  %install_profiles  %killbgscripts  %ldir  %less  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %macro  %magic  %matplotlib  %mkdir  %more  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %ren  %rep  %rerun  %reset  %reset_selective  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%cmd  %%debug  %%file  %%html  %%javascript  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.


One magic command that you will probably use at the beginning of every notebook for this course is:

%matplotlib inline


or

%matplotlib notebook


These commands tell Jupyter to embed all graphical output directly into the notebook, rather than plotting everything in separate pop-up windows. The latter differes from the former in that it embeds more interactive plots that allow for panning & zooming.

Another useful magic command is %load, which loads the contents of a file straight into a jupyter cell.

%load 'genenames_extract.py'
#! /usr/bin/python

import re
import sys

# genenames_extract.py
# Get a list of human protein-coding gene names, from a UCSC GTF annotation file.
#
# Usage:
#   ./genenames_extract.py <GTF file>
#
#
# The human genome annotation file, in GTF ("gene transfer format") format, is at:
#    ftp://ftp.ensembl.org/pub/release-85/gtf/homo_sapiens/Homo_sapiens.GRCh38.85.gtf.gz
# To get it:
#    wget ftp://ftp.ensembl.org/pub/release-85/gtf/homo_sapiens/Homo_sapiens.GRCh38.85.gtf.gz
#    gunzip Homo_sapiens.GRCh38.85.gtf.gz
#
# You could get fancier with your Python, with other standard Python modules:
#    gzip   : open a gzip-compressed file directly.
# Using both modules together, I'm pretty sure you could slurp in the UCSC compressed GTF file
# over the net.

filename = sys.argv[1]

seen = {}                            # seen{'genename'} = True, a dict, keeps track of whether we already have
# this gene name or not. Alas, if we haven't seen it, seen{'nosuchgene'}
# throws an exception, which is not what we want. But the seen.get('nosuchgene)
# method returns None, so it's a better method for testing seen/not seen.

for line in open(filename):
if line[0] == '#': continue      # Skip comment lines
line   = line.rstrip('\n')       # Remove the trailing newline
fields = line.split()            # Split into fields on whitespace

if (fields[2] == 'gene'):
# Lines of GTF files have a bunch of optional tags formatted as <key1> "<value1>"; <key2> "<value2>;"
# Here we use regexp matching to pull out the gene_biotype and gene_name tags, if they're there.
m1 = re.search(r'gene_biotype\s*"([^"]+)";', line)    # r'...' is a raw string:
m2 = re.search(r'gene_name\s*"([^"]+)";',    line)    #  you can use regexp metachars w/o escaping them.
if m1 and m2:
biotype  = m1.group(1)  # biotypes include "protein_coding"
genename = m2.group(1)
if biotype == 'protein_coding':
if not seen.get(genename):
print(genename)
seen[genename] = True


#### data types & structures

For this week’s homework, you’ll make use of 5 main Python data types: bools, numbers, strings (i.e. text/words), lists (i.e. lists of strings or numbers), and dicts (i.e. dictionaries, a special structure for storing data with labels)

#### boolean

A boolean, named after George Boole, is data type that can be either True or False. Booleans will become important for making comparisons and controlling the flow of your code.

#### floats vs. ints

Throughout your journey with Python, you’ll make use of at least two types of numbers: floating points and integers. At this point, all you need to know is that floating point numbers (or floats) are numbers with decimal points (e.g. 12.34546789) and integers (or ints) are integers (i.e. 0, negative and non-negative whole numbers, no decimal points).

x = 10 # int
y = 1.23456789 # float

print("{}: This is a float".format(y))
print("{}: This is a float".format(x + y))
print("{}: This is a float".format(x * y))
print("{}: This is a float".format(x / y))

z = int(y) # convert float into an int with rounding
print("{}: This is now an int".format(z))

print("{}: This is an int".format(x//z))
print("{}: This is an int".format(z//x))
print("{}: This is a float".format(x/z))
print("{}: This is a float".format(z/x))

1.23456789: This is a float
11.23456789: This is a float
12.3456789: This is a float
8.100000073710001: This is a float
1: This is now an int
10: This is an int
0: This is an int
10.0: This is a float
0.1: This is a float


#### strings

A string is the name for a variable that holds text, not numbers. In Python, strings are created by surrounding text with double or single quotes. There are many operations that can be used on strings, see this list. Importantly, if you see a number in a string, that number cannot be used as a number until it is parsed (or converted) into a number.

x = " is a string"
y = "this"
z = "9.2"

print(y + x)
print(z + x)

#z can't be used to do any math until it is converted into a number
z_ = float(z) # float() converts the string, z, into a float
print(z_)

this is a string
9.2 is a string
9.2

print(z_/10)

# to access an individual character, use its position in the string (starting from 0)
print(x[6])

0.9199999999999999
s


#### lists

A list is a convenient data structure for organizing variables. Lists group data of any type into a structure that can be sampled via indexing, where an item or items in a list are referred to by their position in the list. For instance,

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # This is a list of numbers. [] is the syntax for list

print(x[0]) # the ith number refers to the ith item in the list, with 0th being the first item
print(x[9])

1
10

joke = ["why is 6 afraid of 7?", "because", 7, 8, 9] # This is a list of strings *and* numbers

print(joke[0])
print(joke[1])
print(joke[2])
print(joke[3])
print(joke[4])

why is 6 afraid of 7?
because
7
8
9


Python lists can also be sliced, where multiple items in the list are referred to in a single index statement. When slicing a list, : is used.

print(joke[:]) # Alone, a colon means "everything in the list"

['why is 6 afraid of 7?', 'because', 7, 8, 9]

# with numbers you get a range of items from the list, starting with the the element referred
# to by the index to the left of :, and ending with the element referred to the index
# *minus one* to the right of :

print(joke[0:2])

['why is 6 afraid of 7?', 'because']


Here is a short table of other useful list operations, taken from the Python tutorial

operation result
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
s[i:j:k] slice of s from i to j with step k
len(s) length of s
min(s) smallest item of s
max(s) largest item of s
s.index(x) index of the first occurrence of x in s
s.count(x) total number of occurrences of x in s
sorted(s) for a list of numbers, returns a sorted list in ascending order
nums = [5.3, 6.1, 34.0, 8.999, 8.998]
sorted(nums)

[5.3, 6.1, 8.998, 8.999, 34.0]


#### dicts

A dict, or dictionary, is a special data structure that allows you to associate values with a specific keys (usually some name). The data structure takes its name from a dictionary, because a word dictionary is organized similarly: take a word, look it up in the dictionary, see some associated description. In a Python dict, you take a word (the key), look it up in the dict, and see some list of numbers or strings associated with that word (the values).

OfficeHours = {}
OfficeHours['Tim'] = ["Monday", "17:30 - 19:00", "BL1008"]
OfficeHours['Sean'] = ["Friday", "14:00 - 15:00", "BL1008"]
OfficeHours['Laura'] = ["Monday", "10:00 - 11:00", "NW463"]
OfficeHours['Chris'] = ["Tuesday", "10:30 - 11:30", "BL1008"]
OfficeHours['Jack'] = ["Thursday", "14:30 - 15:30", "NW330"]
OfficeHours['Kaia'] = ["Thursday", "15:00 - 16:00", "Bauer304"]
OfficeHours['Marco'] = ["Sunday", "16:30 - 17:30", "BL1008"]

print(OfficeHours['Sean'])

['Friday', '14:00 - 15:00', 'BL1008']

print(OfficeHours['Sean'][0]) # This is the first item of Sean's list

Friday


#### functions and methods

Functions operate on and transform your data. Functions take some inputs (arguments) and return some outputs. In programming lingo, functions are called when you use them, and functions are passed arguments:

return1, return2 = function(argument1, argument2).

You’ll often used functions that look like,

thing.function(argument1, argument2)

These functions are technically called methods and they are, by construction, associated with a specific object (e.g. a list) and sneakily take the object as one of the arguments.

#### for loops

Loops allow you to perform a task repeatedly with iteration, meaning after each task repetition, something changes. In a Python for loop, the loop addresses each item in a group, in sequence, until the last item in the group. For instance:

# Python's range() function returns a list of numbers in a certain range
list(range(0,10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Using range(), a for loop can be used to loop over the same code a certain number of times
for i in range(0,10):
print("This is the {}th iteration of the loop".format(i))

This is the 0th iteration of the loop
This is the 1th iteration of the loop
This is the 2th iteration of the loop
This is the 3th iteration of the loop
This is the 4th iteration of the loop
This is the 5th iteration of the loop
This is the 6th iteration of the loop
This is the 7th iteration of the loop
This is the 8th iteration of the loop
This is the 9th iteration of the loop


Here is a breakdown of what happened in the last cell:

for i in range(0, 10)


range(0, 10) is a list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In the first iteration of the loop, the variable i was assigned the first item in the list (in this case, the int 0). We then used the value assigned to i – by inserting the number into a printed string. After the print command, the loop moved over to the second item in the list (the int 1), and assigned this value to i, which was again inserted into the printed string. This pattern continued until i was assigned the last item in the list (the int 9) and used in the string. Note that at the end of the loop, i retained its final value.

i

9


More generally, a for loop can be used to address every item in any kind of list.

# In this example, the variable *name* is sequentially assigned each string from the list of strings *allnames*
allnames = ["Tim", "Laura", "Sean", "Marco", "Chris", "Jack", "Kaia"]
for name in allnames:
print("{} is cool".format(name))

Tim is cool
Laura is cool
Sean is cool
Marco is cool
Chris is cool
Jack is cool
Kaia is cool


Make specific note of the syntax associated with each for loop. A colon proceeds the for statement, and each line to be grouped into the loop is indented over. In Python, indentation, not “{}” or “end,” is used to group code.

#### conditional statements

Conditional statements allow you to write code that only runs when certain conditions are met. The most common conditional structures are if, elif, and else statements. Syntax:

for i in range(0,10):
flip = randrange(0,11)
if flip < 2:
print("low")
elif flip < 10:
print("higher")
else:
print("highest")

higher
higher
low
higher
higher
higher
low
higher
higher
highest


In conditional statements, you’ll make use of the following comparison operators:

symbol meaning
< less than but not equal to
> greater than but not equal to
<= less than or equal to
>= greater than or equal to
== equal to
!= not equal to

A common mistake is to use = for comparison rather than the double ==. The former is telling Python to make something equal to something else. The latter is telling Python to see if two somethings are equal.

Note that each if, elif, and else block in the above code belong to the same group; once a condition is met, its associated code is run, and the interpreter skips past all other conditionals in the group.

Note how this code behaves identically to the code above:

for i in range(0,10):
flip = randrange(0,11)
if flip < 2:
print("low")
if flip >= 2 and flip < 10:
print("higher")
if flip >= 10:
print("highest")

higher
higher
highest
higher
higher
higher
highest
higher
higher
higher


But this code does not:

for i in range(0,10):
flip = randrange(0,11)
# In this version, if flip is 0 or 1, both "low" and "higher" get printed. elif prevents this.
if flip < 2:
print("low")
if flip < 10:
print("higher")
else:
print("highest")

highest
low
higher
higher
higher
higher
higher
low
higher
higher
higher
higher


#### user-defined functions

When you notice a chunk of code is getting reused over and over across many analyses, it is probably time to write your own Python function. A function takes in some input and generates some output. As such, a function starts with a definition that includes its name and the types of inputs (arguments) it takes. A function ends with a return statement that signals what the function produces as output. Syntax:

# whoscool(names)
# Takes list of strings, NAMES, and returns a random string from the list
def whoscool(names):
numberOfNames = len(names)
flip = randrange(0, numberOfNames)
name = names[flip]
return name

allnames = ["Tim", "Laura", "Sean", "Marco", "Chris", "Jack", "Kaia"]

# Use whoscool() to see who is cool
print("{} is cool".format(whoscool(allnames)))


Tim is cool