class: middle, center # Teaching Reproducible Workflows --- ## Reproducible vs Replicable <img src="imgs/leek_repro.jpeg" width="50%" style="display: block; margin: auto;" /> .footnote[ Source: Patil, Peng, Leek (2019) A visual tool for defining reproducibility and replicability. <i>Nature Human Behaviour</i> ] --- ## Reproducibility in practice <br/> - Can you recreate the tables and figures reproducible from the code and data? - Does the code actually do what you think it does? - In addition to what was done, is it clear *why* it was done? (e.g. how were hyper / tuning parameters chosen?) - Can the code be used for other data? - Can you hand the code off to someone else and expect it to work? (e.g. how were hyper / tuning parameters chosen?) - Can the code be used for other data? - Can you hand the code off to someone else and expect it to work? --- ## Core pieces <img src="imgs/repro_pieces.png" width="50%" style="display: block; margin: auto;" /> --- ## Context * I am the course organizer for Math 11176 - Statistical Programming * Course with ~200 Maths MSC students enrolled * 100% coursework, multiple marked assignments (individual and team based) * For each assignment we distribute: * Instruction document * Template `Rmd` for solutions * Data and other support files * Need to collect: * Completed template `Rmd` * Rendered output (`pdf`, `html`, `md`, etc.) --- ## GitHub Organization * 1 organization / course * Students are added (anonymously) members of the organization * 1 template repository / assignment * 1 private repository / assignment / (team | individual) * Automate the distribution, collection, and feedback using GitHub's API (`ghclass`) --- ## GitHub Organization <img src="imgs/github_org.png" width="100%" style="display: block; margin: auto;" /> --- ## Template Example - hw1 <img src="imgs/github_hw1.png" width="80%" style="display: block; margin: auto;" /> --- ## `ghclass` An R package that enables instructors to automate the management of courses on GitHub. Key features: - Repository creation, mirroring, updating, collecting, etc. - Organization management (members, teams, etc.) - Summary statistics (e.g. commits) by repo or over the org - Many other common tasks (issues, PR, etc.) For more details see the package website - https://rundel.github.io/ghclass/ --- ## Creating a team assignment ```r org_create_assignment( org = "ghclass-demo", repo = c("hw01-team01", "hw01-team01", "hw01-team02", "hw01-team02"), user = c("ghclass-anya", "ghclass-bruno", "ghclass-celine", "ghclass-diego"), team = c("hw01-team01", "hw01-team01", "hw01-team02", "hw01-team02"), source_repo = "statprog-s1-2019/hw1" ) ``` ``` ## ✓ Mirrored repo 'statprog-s1-2019/hw1' to repo 'ghclass-demo/hw01-team01'. ``` ``` ## ✓ Mirrored repo 'statprog-s1-2019/hw1' to repo 'ghclass-demo/hw01-team02'. ``` ``` ## ✓ Created team 'hw01-team01' in org 'ghclass-demo'. ``` ``` ## ✓ Created team 'hw01-team02' in org 'ghclass-demo'. ``` ``` ## ✓ Added user 'ghclass-anya' to team 'hw01-team01'. ``` ``` ## ✓ Added user 'ghclass-bruno' to team 'hw01-team01'. ``` ``` ## ✓ Added user 'ghclass-celine' to team 'hw01-team02'. ``` ``` ## ✓ Added user 'ghclass-diego' to team 'hw01-team02'. ``` ``` ## ✓ Added team 'hw01-team01' to repo 'ghclass-demo/hw01-team01' with 'push' access. ``` ``` ## ✓ Added team 'hw01-team02' to repo 'ghclass-demo/hw01-team02' with 'push' access. ``` --- ## Collecting student work ```r local_repo_clone(repo = org_repos(org = "ghclass-demo", "hw01-"), local_path = "hw01") ``` ``` ## ✓ Cloned 'ghclass-demo/hw01-team01'. ``` ``` ## ✓ Cloned 'ghclass-demo/hw01-team02'. ``` -- <img src="imgs/github_clone.png" width="65%" style="display: block; margin: auto;" /> --- ## Contributor statistics ```r repo_contributors(repo = "statprog-s1-2019/hw02-lab01-team03") %>% mutate(username = LETTERS[1:4]) %>% arrange(desc(commits)) ``` ``` ## # A tibble: 4 x 3 ## repo username commits ## <chr> <chr> <int> ## 1 statprog-s1-2019/hw02-lab01-team03 D 8 ## 2 statprog-s1-2019/hw02-lab01-team03 B 5 ## 3 statprog-s1-2019/hw02-lab01-team03 C 5 ## 4 statprog-s1-2019/hw02-lab01-team03 A 3 ``` ```r repo_contributors(repo = "statprog-s1-2019/hw02-lab01-team10") %>% mutate(username = LETTERS[12+1:5]) %>% arrange(desc(commits)) ``` ``` ## # A tibble: 5 x 3 ## repo username commits ## <chr> <chr> <int> ## 1 statprog-s1-2019/hw02-lab01-team10 Q 17 ## 2 statprog-s1-2019/hw02-lab01-team10 P 9 ## 3 statprog-s1-2019/hw02-lab01-team10 O 5 ## 4 statprog-s1-2019/hw02-lab01-team10 M 1 ## 5 statprog-s1-2019/hw02-lab01-team10 N 1 ``` --- ## Automated feedback <img src="imgs/github_actions0.png" width="100%" style="display: block; margin: auto;" /> --- ## Automated feedback <img src="imgs/github_actions1.png" width="100%" style="display: block; margin: auto;" /> --- ## Automated feedback <img src="imgs/github_actions2.png" width="100%" style="display: block; margin: auto;" /> --- ## Related ongoing work * Peer evaluation (Mine Cetinkaya-Rundel and Therese Anders) * Simplifying the automated feedback process: * `checklist` - R package for simplifying automated checks <br/> https://github.com/rundel/checklist * `parsermd` - R package for programmatic interaction with R markdown documents <br/> https://rundel.github.io/parsermd/ --- ## Additional Resources * [Happy Git and GitHyb for the useR](https://happygitwithr.com/) <br/> Jenny Bryan, Jim Hester * [Excuse me, do you have a moment to talk about version control?](https://peerj.com/preprints/3159/) <br/> Jenny Bryan (2018), *The American Statistician*. * [Using GitHub Classroom To Teach Statistics](https://www.tandfonline.com/doi/full/10.1080/10691898.2019.1617089) <br/> Jacob Fiksel, Leah Jager, Johannna Hardin, and Margaret Taub (2019), <br/> *Journal of Statistics Education*. * [Implementing version control with Git as a learning objective in statistics courses](https://arxiv.org/abs/2001.01988) <br/> Matthew Beckman, Mine Çetinkaya-Rundel, Nicholas Horton, Colin Rundel, Adam Sullivan, Maria Tackett (2020), *Journal of Statistics Education (in review)* * [Teaching Statistics and Data Science Online Workshops](https://centreforstatistics.maths.ed.ac.uk/cfs/events/past-events-and-recordings/2020-events/teaching-statistics-and-data-science-online) <br/> Mine Çetinkaya-Rundel, Colin Rundel (2020), *Centre for Statistics Online Workshop* --- # Thank you! 