### MCB112: Biological Data Analysis (Fall 2018)

• Steffan: Uses straight python for reservoir sampling, then switched into pandas. Like Daniel's answer, uses pandas describe and info methods to notice that the data come in as objects (strings), not numbers, and that they need to be converted to floats. Shows a different way of tidying the data. Shows how you can use the Seaborn hue argument to highlight M vs. F data not just in a catplot, but even in a boxplot - Lestrade could've seen the bimodality in his data even in his boxplots.
• Sean: My bare bones version. Straight python until the last possible moment, when the pset actually specified using pandas to read in the tidy data. Plus an apology for a subtle error I made in generating the synthetic data (the TPMs don't sum to $$10^6$$ over all genes); and a link to the sandmouse script I used to generate the simulated data in the first place.