the case of the dead sand mouse
Various different answers from us as Jupyter Notebook pages you can download:
Sean: My solutions each week will generally be the bare bones example, and this is no exception. I shortcut the data exploration in step (3) by using unix commands, including
sort, inside the Jupyter notebook page.
Joe: Includes using SciPy’s stats module to calculate a correlation coefficient between mRNA half-life and relative abundance at 96h. A correlation coefficient turns out to make a ton of assumptions (including linearity between the x and y variables being compared), most of which are violated here, but we still use correlation coefficients as rough tests for detecting dependencies between two variables. We’ll get more into this later.
James: Includes visualization of the correlation between mRNA half-life and relative abundance at 96h… a sneak peek ahead at using matplotlib to plot graphs inside a Jupyter notebook.
Nathan: A version that uses Pandas, a powerful module for manipulating tables of data - a sneak peak ahead in the course. We’re going to see four main modules in the course: numpy, scipy, matplotlib, and pandas; and maybe a bit of a fifth, seaborn.
Will: A more advanced version, worth studying! Here you’ll find examples of pythonic incantations for list comprehensions, set comprehensions and dictionary comprehensions, which among other things, let Will suck in the data, and rearrange it as he needs, in one-liners. Will also catches that the Excel corruption is even more severe than you might originally think because Excel maps both MARC1 and MARCH1 to 1-Mar (and similarly for MARC2/MARCH2 to 2-Mar), and you can’t tell which is which.