## Prof. James Bagrow | |

Email: | james.bagrow [at] uvm.edu |

Lectures: | M/W 15:30–16:45 in Perkins 107 |

Office Hours: | Tu 9:30–10:30, Th 9:00–10:00, or by appointment |

Office: | Farrell Hall 206
( Map to my office) |

Course syllabus |

Extracting meaning from data remains one of the biggest tasks of science and industry. The Internet and modern computers have given us vast amounts of data, so it is more important than ever to understand how to collect, process, and analyze these data. A picture is worth a thousand words, so visualizations, from scientific plots and infographics to interactive data explorers, are crucial to summarize and communicate new discoveries.

- The
**course syllabus**. Be sure to check this out. - The course introductory reading, A Whirlwind Tour of Python (as a pdf) (on GitHub).
- Anaconda. The scientific Python environment we will use.
- Some datasets to consider for your final project.

Course overview, motivation, logistics and computer setup. **Introduction to python**.

More python review, reading and writing files, tour of first- and third-party python libraries; the "central dogma" of statistics.

Notes (screen) Notes (board)The "central dogma" of statistics, inference and prediction, brief review of probability

NotesSocial ratings: using statistical models to capture uncertainty, part 2.

Notes (board) Notes (computer)Finish social ratings problem, how to rank items with uncertainty; Q01 review; start HW01 review.

Notes (board) Notes (computer)A data science workflow, the typology of data and levels of measurement, storing data

NotesData cleaning: rejecting bad data, combining data, filtering and
processing data; Start on *HISTOGRAMS!!!*

More on density estimation: histograms, box plots, kernel densities, violin plots,
**cumulative distributions**

Review Q02, HW02. Building the XY-toolbox with scatterplots and trendlines.

NotesMissing data and imputation (part 2).

- Imputation Illustrations (Courtesy Gelman & Hill).

Representing text on the computer: ASCII, unicode and the UTF-8 miracle (Part 1).

NotesRepresenting text on the computer: ASCII, unicode and the UTF-8 miracle (Part 2).

See Part 1 for notes.Intro to Bayesian inference. How to check if a coin is fair?

Notes (board) Notes (computer)Bayesian Inference of text message rates, building a two-level poisson statistical model, prove the Poisson Limit Theorem, seeing how data warps a prior distribution into a posterior.

Performing Bayesian Inference, Intro to Markov Chain Monte Carlo.

Notes (board) Notes (computer)Diagnosing MCMC samples and proving Metropolis-Hastings for Bayesian Inference.

Notes (board) Notes (computer)Further resources:

- Andrieu
*et al.*, An introduction to MCMC in Machine Learning,*Machine Learning*(2003) [Especially pages 13-15]. - Stochastic processes:
- Gardiner, Stochastic Methods: A Handbook for the Natural and Social Sciences (2009)
- Van Kampen, Stochastic Processes in Physics and Chemistry (2007)

- SciPy 2014 talk [YouTube] on the history of bayesian computing, MCMC, and PyMC from PyMC's creator (20 min)
- The STAN programming language for Bayesian inference, with interfaces for Python, R, MATLAB, and more.

Spreadsheets considered harmful!

Gene name errors are widespread in the scientific literature.
*Genome Biology*, 2016.

Error bars considered harmful!

Researchers Misunderstand Confidence Intervals and Standard Error Bars.
*Psychological Methods*, 2005.
(Journal link.)

Homework and projects are posted on **Blackboard**.