Figure 3

Prof. James Bagrow

Email: james.bagrow [at]
Lectures: M/W 15:30–16:45 in Votey 254
Office Hours:  Tu/Th 14:00–15:00, or by appointment
Office: Farrell Hall 206 ( Map to my office)
Course syllabus

Extracting meaning from data remains one of the biggest tasks of science and industry. The Internet and modern computers have given us vast amounts of data, so it is more important than ever to understand how to collect, process, and analyze these data. A picture is worth a thousand words, so visualizations, from scientific plots and infographics to interactive data explorers, are crucial to summarize and communicate new discoveries.


Lecture Materials

Lecture 00 (Aug 28)

Course overview, motivation, logistics and computer setup. Introduction to python.


Lecture 01 (Aug 30)

More python review, reading and writing files, tour of first- and third-party python libraries.


Lecture 02 (Sep 6)

The "central dogma" of statistics, inference and prediction, brief review of probability

Slides Notes

Lecture 03 (Sep 11)

Random variables and their statistics, modeling social ratings and their uncertainty


Lecture 04 (Sep 13)

Modeling social ratings: confidence intervals on the binomial proportion, generating and counting random data, Wilson interval and smoothing!

Notes (board) Notes (screen)

Lecture 05 (Sep 18)

Wrap up social ratings (see previous lecture), review HW 01 (see Blackboard).

Lecture 06 (Sep 20)

Introduction to Jupyter notebooks and Markdown.

Lecture 07 (Sep 25)

A data science workflow, the typology of data and levels of measurement, storing data


Lecture 08 (Sep 27)

Accessing JSON data over the internet, a case study. Asides: dates and times, plotting


Lecture 09 (Oct 2)

Data cleaning: rejecting bad data, combining data, filtering and processing data; Start on HISTOGRAMS!!!


Lecture 10 (Oct 4 Oct 11)

More on density estimation: histograms, box plots, kernel densities, violin plots, cumulative distributions


Lecture 11 (Oct 18)

Review HW02. Building the XY-toolbox with scatterplots and trendlines.


Lecture 12 (Oct 23)

Missing data and imputation (part 1).


Lecture 13 (Oct 25)

Missing data and imputation (part 2).


Lecture 14 (Oct 30)

Spam filtering with an intro to natural language processing (NLP).


Lecture 15 (Nov 01)

Text classification (direct continuation of previous lecture).


Lecture 16 (Nov 6)

Quiz and Homework review. Start on text data (see Lec 17 notes).

Lecture 17 (Nov 8)

Representing text on the computer: ASCII, unicode and the UTF-8 miracle.


Lecture 18 (Nov 13)

Intro to Bayesian inference.

Notes (board) Notes (computer)

Lecture 19 (Nov 15)

Bayesian Inference of text message rates, building a two-level poisson statistical model, prove the Poisson Limit Theorem, seeing how data warps a prior distribution into a posterior.

Lecture 20 (Nov 27)

Performing Bayesian Inference, Intro to Markov Chain Monte Carlo.

Notes (board) Notes (computer)

Lecture 21 (Nov 29)

Diagnosing MCMC samples and proving Metropolis-Hastings for Bayesian Inference.

Notes (board) Notes (computer)
Further resources:

Lecture 22 (Dec 4)

Designing visualizations


Lecture 23 (Nov 30)

More on visualizations. Maps, drawings, colors, graph visualizations.

Notes 1 Notes 2

Reading Materials

Reading 00 Assigned 2016-08-28 | Due 2016-09-07

A Whirlwind Tour of Python for STAT/CS 287 (as a pdf) (on GitHub).

Reading 01 Assigned 2017-10-12 | Due 2017-10-18

Spreadsheets considered harmful!

Gene name errors are widespread in the scientific literature. Genome Biology, 2016.

Reading 02 Assigned 2017-10-12 | Due 2017-10-23

Error bars considered harmful!

Researchers Misunderstand Confidence Intervals and Standard Error Bars. Psychological Methods, 2005.

Reading 03 Assigned 2017-10-12 | Due 2016-10-25

Ten Simple Rules for Effective Statistical Practice. PLoS Computational Biology, 2016.


Homework and projects are posted on Blackboard.