Statistics 159/259, Spring 2022 Course Summary


This course teaches “the why and how” of reproducible and collaborative research by combining questions of good computational practice in science, open science and statistical data analysis, in the context of today’s research environment. We will interleave practical topics in software engineering and statistical computing with broader discussions on elements of the philosophy of science and the foundations of statistics.

More details can be found in the syllabus.

Key Resources

  • Communication: class Piazza.

  • Lectures will be recorded and posted in the Kaltura system (visible via bCourses), but attendance is mandatory. (Much of the pedagogical value of the class is in participating in discussions and code reviews).

  • Course readings that are not easy to find free on the web or through the UC Berkeley Library will be posted to bCourses.

  • Computing resources

    • We will use Jupyter notebooks. We will start with hosted notebooks on our Stat 159 JupyterHub. Later in the term, we will discuss installing Jupyter on your own device. The JupyterHub server will have all the packages you need pre-installed.

    • The sources for class notes and most other materials are available on github, with a rendered version here.

    • Assignments should be submitted by pull request to your private repositories using the GitHub Clasroom.

    • Whenever you need to work with GitHub, remember to activate GitHub authentication from the JupyterHub, by running the command github-app-user-auth at a terminal and following the instructions. If once authenticated you can’t push to a given repo, it may be that you forgot to add that repo/org to your setup of the authentication app, go here to configure the app’s permissions.

  • A note on the Berkeley Library EZProxy: Some of the resources listed here are scientific articles available only behind journal paywalls. If you haven’t already, you should configure your web browser to use the campus library EZProxy so you can access them even if you are working from an off-campus network.

Textbook and supporting materials

While not strictly a textbook for this course, we will rely heavily on the excellent, openly licensed: Research software engineering in Python. We will complement it with these other scientific python resources:

Other bibliography

Above are a list of books and websites mostly focusing on computational skills, and this is a list of all the bibliography we’ll refer to in the course. Some of these will become assigned readings, while others are available for your reference.

PLOS Ten Simple Rules

The PLOS Ten Simple Rules collection has many short, valuable papers full of relevant, practical advice in this space. A few that stand out, though many (if not most) are worth your time, are “Ten simple rules for …”:

Computational research

Open Source Software and Open Science

Data Management

The art of research

National Academies Reports

These are key reports produced by the National Academies of Science, Engineering and Medicine. They were created by teams of world experts in the field, and inform policy in multiple areas:

Other general references on reproduciblity and open science