Section 03: Data

Notes by Jakob Heinz (2024), Guanlan Dong (2022), Artur Rego-Costa (2021), Luis Gutierrez (2020), and Mary Richardson (2019)

Handling data with Pandas and basic plotting

We will first go through the main points in notebook w03-section-notes.ipynb. We will use data from the US Department of Agriculture on vegetable prices in 2022 veggie_prices_2022.csv that was downloaded from their website, Jakob's vegetable eating habits (jakobs_veggie_consumption.tsv), and this example dataset: (section-data.tbl)

Download these files along with the notebook if you'd like to follow along on your own laptop in class.

Keep this notebook as a quick reference guide in the coming days as you get acquainted with Pandas and basic plotting!

Practice exercises

These exercises DO NOT need to be turned in.

We will go together through some of the practice exercises in notebook w03-section-exercises.ipynb.

You should use the section notes as a reference and ask questions to the teaching staff if you get stuck. Here are the solutions to help you with your studies: w03-section-solutions.ipynb.

w00: intro	lecture	section	"pset"	"answers"
w01: genomes	lecture	[molbio] [py]	pset	answers
w02: probability	lecture	section	pset	answers
w03: data	lecture	section	pset	answers
w04: alignment	lecture	section	pset	answers
w05: Bayes	lecture	section	pset	answers
w06: p-values	lecture	section	pset	answers
w07: EM	lecture	section	pset	answers
w08: HMMs	lecture	section	pset	answers
w09: k-means	lecture	section	pset	answers
w10: regression	lecture	section	pset	answers
w11: PCA	lecture	section	pset	answers
w12: work	lecture	-	-	-
w13: trees	lecture	section	pset	answers