Figure 3

Prof. James Bagrow

Email: james.bagrow [at]
Lectures: M/W 15:30–16:45 in Perkins 107
Office Hours:  Tu 9:30–10:30, Th 9:00–10:00, or by appointment
Office: Farrell Hall 206 ( Map to my office)
Course syllabus

Extracting meaning from data remains one of the biggest tasks of science and industry. The Internet and modern computers have given us vast amounts of data, so it is more important than ever to understand how to collect, process, and analyze these data. A picture is worth a thousand words, so visualizations, from scientific plots and infographics to interactive data explorers, are crucial to summarize and communicate new discoveries.


Lecture Materials

Lecture 00 (Aug 27)

Course overview, motivation, logistics and computer setup. Introduction to python.


Lecture 01 (Aug 29)

More python review, reading and writing files, tour of first- and third-party python libraries; the "central dogma" of statistics.

Notes (screen) Notes (board)

Lecture 02 (Sep 5)

The "central dogma" of statistics, inference and prediction, brief review of probability


Lecture 03 (Sep 10)

Random variables and their statistics, finish review


Lecture 04 (Sep 12)

Social ratings: using statistical models to capture uncertainty, part 1.

Slides Notes

Lecture 05 (Sep 17)

Social ratings: using statistical models to capture uncertainty, part 2.

Notes (board) Notes (computer)

Lecture 06 (Sep 19)

Finish social ratings problem, how to rank items with uncertainty; Q01 review; start HW01 review.

Notes (board) Notes (computer)

Lecture 07 (Sep 24)

A data science workflow, the typology of data and levels of measurement, storing data


Lecture 08 (Sep 26)

Accessing JSON data over the internet, a case study. Asides: dates and times, plotting


Lecture 09 (Oct 1)

Data cleaning: rejecting bad data, combining data, filtering and processing data; Start on HISTOGRAMS!!!


Lecture 10 (Oct 3)

More on density estimation: histograms, box plots, kernel densities, violin plots, cumulative distributions


Lecture 11 (Oct 10)

Review Q02, HW02. Building the XY-toolbox with scatterplots and trendlines.


Lecture 12 (Oct 23)

Missing data and imputation (part 1).


Lecture 13 (Oct 25)

Missing data and imputation (part 2).


Lecture 14 (Oct 22)

Spam filtering with an intro to natural language processing (NLP).


Lecture 15 (Oct 24)

Text classification (direct continuation of previous lecture).


Lecture 16 (Oct 29)

Representing text on the computer: ASCII, unicode and the UTF-8 miracle (Part 1).


Lecture 17 (Oct 31)

Representing text on the computer: ASCII, unicode and the UTF-8 miracle (Part 2).

See Part 1 for notes.

Lecture 18 (Nov 5)

Linear and logistic regression.

Notes (linear) Notes (logistic)

Lecture 19 (Nov 12)

Designing visualizations.


Lecture 20 (Nov 14)

More on visualizations. Maps, drawings, colors, graph visualizations.

Notes 1 Notes 2

Lecture 21 (Nov 26)

Intro to Bayesian inference. How to check if a coin is fair?

Notes (board) Notes (computer)

Lecture 22 (Nov 28)

Bayesian Inference of text message rates, building a two-level poisson statistical model, prove the Poisson Limit Theorem, seeing how data warps a prior distribution into a posterior.

Lecture 23 (Dec 03)

Performing Bayesian Inference, Intro to Markov Chain Monte Carlo.

Notes (board) Notes (computer)

Lecture 24 (Dec 05)

Diagnosing MCMC samples and proving Metropolis-Hastings for Bayesian Inference.

Notes (board) Notes (computer)
Further resources:

Reading Materials

Reading 00 Assigned 2018-08-27 | Due 2018-09-05

A Whirlwind Tour of Python for STAT/CS 287 (as a pdf) (on GitHub).

Reading 01 Assigned 2017-10-01 | Due 2017-10-08

Spreadsheets considered harmful!

Gene name errors are widespread in the scientific literature. Genome Biology, 2016.

Reading 02 Assigned 2018-10-08 | Due 2018-10-15

Error bars considered harmful!

Researchers Misunderstand Confidence Intervals and Standard Error Bars. Psychological Methods, 2005. (Journal link.)


Homework and projects are posted on Blackboard.