# a plague of sand mice

• Sean: One of my tricks is to ‘digitize’ the sequences, so I can treat them as arrays of (0,1,2,3) instead of A,C,G,T. That lets me use numpy arrays, indexed by residue number. I also compute an ‘encoding’ of k-mers as a base-4 number: 0..15 for dinucleotides AA..TT, 0..63 for 3-mers AAA..TTT, and so on. That lets me do arbitrary Nth-order Markov models in terms of $P(x_i | c)$, for a previous context $c$ of any length.
• Will: A more advanced Python solution, introducing and demonstrating the use of generators, map(), iter(), zip(), and broadcasting numpy arrays.