Statistics 159/259: Weekly Plan
Contents
Statistics 159/259: Weekly Plan¶
Prof. Pérez and GSI Sapienza , Department of Statistics, UC Berkeley
Below is our current plan for the course. This is not a contract: it is a plan, and it may change substantially as the semester unfolds, especially given extra uncertainties due to COVID-19.
The table of contents on the left pane has links to lectures and labs. The following are live executable links that use the nbgitpuller service to give you a current copy from git of the given content, ready to be run in the Spring 2022 Berkeley hub. You can use these links in order to run the content conveniently without manual git work.
Lectures¶
Note that all lecture videos are posted on the bCourses “Lectures” playlist for the course (this link is accessible only to Berkeley personnel).
Jan 19. Logistics and intro to Git.
Jan 26. JupyterHub, JupyterLab and its various tools, dotfiles for reproducible personal configuration.
Feb 2. Github, Git tutorial continued. (Note: this lecture has damaged audio for the 2nd half).
Feb 9. Git visuals, an overview of Project Jupyter. IPython - beyond plain Python.
Feb 16. Rich output in Jupyter, VNC and virtual desktops in JupyterLab. Introduction to nbdime.
Feb 23. Climate data, xarray and open science at NASA. Guest lecture by Dr. Chelle Gentemann.
March 2. Merge conflicts with nbdime (sample repo). Custom display logic in Jupyter. (Note: this lecture has damaged audio for the last ~ 45 minutes).
March 9. Automation and Make, based on the Carpentries’ tutorial.
March 16. Python Testing and Continuous Integration, based on the Carpentries’ tutorial.
March 30. Environments and Makefiles, binder.
April 6. Packaging Python software (illustrated via a toy example). A conceptual overview of matplotlib, including an quick intro to Object Oriented Programming.
April 13. Documentation, JupyterBook and Github Pages & Actions.
April 20. Data Serialization.
April 27. Four vignettes in Open Science (each name links to their slides):
Lisa Rennels - PhD student at Berkeley who works on integrated modeling of the social and economic impact of climate change with Julia; co-lead on the Mimi Framework project.
Jordi Bolibar - Postdoc in glaciology at Utrecht University, working on projects that combine machine learning and physics to understand the fate of glaciers, with Julia.
Whyjay Zheng - Postdoc at UC Berkeley in my group, who works on both modeling glaciers and understanding them with remote sensing data, with Python.
Jarrod Millman - researcher at Berkeley who has been one of the leaders in the Scientific Python community since the early days, and today co-leads an effort to guide the ecosystem into the next decade.
Live links for labs on the Berkeley DataHub¶
Assigned Readings¶
When an assignment consists of multiple articles, you should submit a summary paragraph and idea highlight paragraph per each separate article.
#1, due Jan 31, 2022: Developing open source scientific practice.
#2, due Feb 14, 2022: Keith Baggerly and the Potti & Nevins Cancer Scandal. These are actually two videos, not reading, but the same format applies (short summary, then idea highlight; in this case only one paragraph of eacy kind is to be submitted, as the 60 minutes video is a short overview meant to give you context, while the talk contains more dense ideas):
#3, due Feb 22: Earth and Climate Science in the Cloud
Perspectives on Data Reproducibility and Replicability in Paleoclimate and Climate Science.
Gentemann et al. “Satellite sea surface temperatures along the West Coast of the United States during the 2014–2016 northeast Pacific marine heat wave.”. For this paper, start thinking about reproducibility issues with it. The lead author, Dr. Chelle Gentemann, will be our guest lecturer on Feb 23, and she will discuss aspects of this work. Later we will follow up with a reproducibility attempt around this paper.
#4, due Feb 28: Core concepts
Reproducibility and Replicability in Science, chapter 2 (SCIENTIFIC METHODS AND KNOWLEDGE).
#5, due Mar 7: Foundational classics
#6, due Mar 14: Jupyter in Computational Research
No assignment Mar 21, Spring Recess.
#7, due Mar 28: Computational challenges
#8, due Apr 4: Open Source Software and Open Science - these are four easy “ten simple rules for…”:
#9, due Apr 11: Computational challenges
#10, due Apr 18: Open Source and Open Science
Open Source Software Policy Options for NASA Earth and Space Sciences, 2018. Chapter 4, Lessons Learned from Community Perspectives.
Open Science by Design, Realizing a Vision for 21st Century Research (2018). Chapter 4, A VISION FOR OPEN SCIENCE BY DESIGN.
#11, due Apr 25: Earth and Climate Science in the Cloud
No assignment May 2, RRR Weeek, but if you’d like some fun reading for intellectual inspiraiton, I recommend this perspective on Opening Up to Open Science by Chelle Gentemann, that covers many of the questions you have explored around incentives and the future of science.
#12, due May 9: Earth and climate science