## Prof. James Bagrow | |

Email: | james.bagrow [at] uvm.edu |

Lectures: | M/W 15:30–16:45 in Votey 254 |

Office Hours: | Tu/Th 14:00–15:00, or by appointment |

Office: | Farrell Hall 206 ( Map to my office) |

Course syllabus |

Extracting meaning from data remains one of the biggest tasks of science and industry. The Internet and modern computers have given us vast amounts of data, so it is more important than ever to understand how to collect, process, and analyze these data. A picture is worth a thousand words, so visualizations, from scientific plots and infographics to interactive data explorers, are crucial to summarize and communicate new discoveries.

- The
**course syllabus**. Be sure to check this out. - The course introductory reading, A Whirlwind Tour of Python (as a pdf) (on GitHub).
- Anaconda. The scientific Python environment we will use.

Course overview, motivation, logistics and computer setup. **Introduction to python**.

More python review, reading and writing files, tour of first- and third-party python libraries.

- Proof that spaces beat tabs!. Fun!

Random variables and their statistics, modeling social ratings and their uncertainty

NotesModeling social ratings: confidence intervals on the binomial proportion, generating and counting random data, Wilson interval and smoothing!

Notes (board) Notes (screen)Wrap up social ratings (see previous lecture), review HW 01 (see Blackboard).

A data science workflow, the typology of data and levels of measurement, storing data

NotesData cleaning: rejecting bad data, combining data, filtering and
processing data; Start on *HISTOGRAMS!!!*

More on density estimation: histograms, box plots, kernel densities, violin plots,
**cumulative distributions**

Missing data and imputation (part 2).

- Imputation Illustrations (Courtesy Gelman & Hill).

Quiz and Homework review. Start on text data (see Lec 17 notes).

Bayesian Inference of text message rates, building a two-level poisson statistical model, prove the Poisson Limit Theorem, seeing how data warps a prior distribution into a posterior.

Performing Bayesian Inference, Intro to Markov Chain Monte Carlo.

Notes (board) Notes (computer)Diagnosing MCMC samples and proving Metropolis-Hastings for Bayesian Inference.

Notes (board) Notes (computer)Further resources:

- Andrieu
*et al.*, An introduction to MCMC in Machine Learning,*Machine Learning*(2003) [Especially pages 13-15]. - Stochastic processes:
- Gardiner, Stochastic Methods: A Handbook for the Natural and Social Sciences (2009)
- Van Kampen, Stochastic Processes in Physics and Chemistry (2007)

- SciPy 2014 talk [YouTube] on the history of bayesian computing, MCMC, and PyMC from PyMC's creator (20 min)
- The new STAN programming language for Bayesian inference, interfaces for Python, R, MATLAB, and more.

Spreadsheets considered harmful!

Gene name errors are widespread in the scientific literature.
*Genome Biology*, 2016.

Error bars considered harmful!

Researchers Misunderstand Confidence Intervals and Standard Error Bars.
*Psychological Methods*, 2005.

Ten Simple Rules for Effective Statistical Practice.
*PLoS Computational Biology*, 2016.

Homework and projects are posted on **Blackboard**.