Prof. James Bagrow
|Email:||james.bagrow [at] uvm.edu|
|Lectures:||M/W 15:30–16:45 in Votey 254|
|Office Hours:||Tu/Th 14:00–15:00, or by appointment|
|Office:||Farrell Hall 206 ( Map to my office)|
Extracting meaning from data remains one of the biggest tasks of science and industry. The Internet and modern computers have given us vast amounts of data, so it is more important than ever to understand how to collect, process, and analyze these data. A picture is worth a thousand words, so visualizations, from scientific plots and infographics to interactive data explorers, are crucial to summarize and communicate new discoveries.
Course overview, motivation, logistics and computer setup. Introduction to python.Slides
Random variables and their statistics, modeling social ratings and their uncertaintyNotes
Wrap up social ratings (see previous lecture), review HW 01 (see Blackboard).
A data science workflow, the typology of data and levels of measurement, storing dataNotes
Data cleaning: rejecting bad data, combining data, filtering and processing data; Start on HISTOGRAMS!!!Notes
More on density estimation: histograms, box plots, kernel densities, violin plots, cumulative distributionsNotes
Missing data and imputation (part 1).Notes
Spam filtering with an intro to natural language processing (NLP).Notes
Text classification (direct continuation of previous lecture).Notes
Quiz and Homework review. Start on text data (see Lec 17 notes).
Representing text on the computer: ASCII, unicode and the UTF-8 miracle.Notes
Diagnosing MCMC samples and proving Metropolis-Hastings for Bayesian Inference.Notes (board) Notes (computer)
Spreadsheets considered harmful!
Gene name errors are widespread in the scientific literature. Genome Biology, 2016.
Error bars considered harmful!
Researchers Misunderstand Confidence Intervals and Standard Error Bars. Psychological Methods, 2005.
Ten Simple Rules for Effective Statistical Practice. PLoS Computational Biology, 2016.
Homework and projects are posted on Blackboard.