MCB112: Biological Data Analysis (Fall 2017)
answers 04:
a plague of sand mice
Various answers to this week’s problem set, as Jupyter Notebook pages for download:

Harleen: An example of a biologist with not so much Python experience (yet) working through it!

Sean: One of my tricks is to ‘digitize’ the sequences, so I can treat them as arrays of (0,1,2,3) instead of A,C,G,T. That lets me use numpy arrays, indexed by residue number. I also compute an ‘encoding’ of kmers as a base4 number: 0..15 for dinucleotides AA..TT, 0..63 for 3mers AAA..TTT, and so on. That lets me do arbitrary Nthorder Markov models in terms of , for a previous context of any length.

Will: A more advanced Python solution, introducing and demonstrating the use of generators,
map()
,iter()
,zip()
, and broadcasting numpy arrays.