Lestrade's lost labels
Assorted answers to this week's problem set, as Jupyter Notebook pages for download:
Kate: Walks through a detailed example of how edgeR obtains its FDRs. Does a nice visualization of abundance vs. log fold change (essentially what people call a "MA plot", but with its axes rotated) which makes it obvious that there's two very different clusters of genes in these data, one of which is very differentially expressed.
Michael: Neatly organizes external input and output files in directories, and shows how to use the Python
os(operating system) module to interact with directories and files. Uses a different visualization than Kate to see the strongly differentially expressed subset of genes in the data, the telltale tip that suggests one might want to turn on the optional TMM normalization in edgeR. Shows some extra explanation and visualization of p-values, including the fact that when the null hypothesis is true (like, in comparing wt to wt sets of samples), p-values are uniformly distributed from 0 to 1, and visualizing the effect of TMM normalization on all genes' p-values.
Sean: A bare bones solution.