Course management with ghclass
Mine Çetinkaya-Rundel, Colin Rundel, Therese Anders
2024-03-26
Source:vignettes/articles/ghclass.Rmd
ghclass.Rmd
What is ghclass?
ghclass is an R package that is designed to enable instructors to efficiently manage their courses on GitHub. It has a wide range of functionality for managing organizations, teams, repositories, and users on GitHub and helps automate most of the tedious and repetitive tasks around creating and distributing assignments.
Who is this package for?
This package is for everyone! But really, if you’re an instructor who uses GitHub for your class management, e.g. students submit assignments via GitHub repos, this package is definitely for you! The package also assumes that you’re familiar with R, but teaching with R is not a requirement since this package is entirely agnostic to the contents of your repositories.
(If you’re a Python user, see this post for a Python based alternative.)
What is this vignette about?
This vignette is about the nitty-gritty of setting your class up in GitHub with ghclass. For a higher level discussion of why and how to use Git/GitHub in data science education, see this paper by the package authors.
Structuring your class on GitHub
The general framework is outlined below. This is not the only way to structure a class on GitHub, but it’s a good way, and one that ghclass is optimized to work with.
We outline steps for achieving this structure in the next section. This section is meant to give a high level view of what your course looks like on GitHub.
One organization per class: If you teach at a university, this means one semester of a given course. If you teach workshops, this would be one workshop. The instructor and any additional instructional staff, e.g. teaching assistants, are owners. Going forward we will refer to this group of people as “instructors”. The students are members.
One repo per student (or team) per assignment: The instructors have admin access to repos, i.e. they can clone, read, and write to every repository. Additionally, they can adjust repo and team memberships by adding or removing collaborators to assignment repositories as well as delete them. The students have write access to their assigned repo, which means that they can clone, read, and write to their assigned repositories but they cannot delete them and they cannot add or remove collaborators to them. This can help with minimizing accidents that cannot be undone and makes sure students cannot peek into each others’ repositories unless you explicitly allow them to do so.
If you have a teamwork component to your course, you can also set up teams on GitHub with your organization and each team can be given similar repository level access privileges for team assignments.
Suppose you have 40 students in your class, and they are in 10 teams of 4 students each. Suppose also that students turn in the following throughout a semester:
- Individual: 10 homework assignments + 2 exams
- Teamwork: 8 lab assignments + 1 project
Then, throughout the semester you will need to create total of 570 repositories.
That is a lot of repos to create and permissions to set manually! It’s also a lot of repos to clone when it comes time to grade. ghclass addresses this problem, and more! It does not, however, address the problem that that’s a lot of grading. Sorry, you’re on your own there!
That being said, ghclass does facilitate setting up continuous integration tools using GitHub actions for students’ assignment repos. This can allow for some automatic checking and feedback each time students push to a repo, more on this in a future vignette.
Authentication
This package uses GitHub personal access tokens for authentication
with GitHub, these
values can be supplied via environmental variables
GITHUB_PAT
or GITHUB_TOKEN
or saved as text in
~/.github/token
.
If this is your first time setting up a personal access token (PAT),
generate a token in the browser after logging into Github (Settings >
Developer Settings > Personal access tokens) or use usethis::browse_github_token
.
You can test that your token is working correctly using the
github_test_token()
function. If everything is working
correctly you should see something like the following:
#> ✓ Your GitHub PAT authenticated correctly.
If your token is not working you will see an error message like this instead:
github_test_token("bad token")
#> Warning in gh_auth(x$token %||% gh_token(x$api_url)): Token contains whitespace
#> characters
#> x Your GitHub PAT failed to authenticate.
#> └─GitHub API error (401):
#> ├─ API message: Bad credentials
#> └─ API docs: https://docs.github.com/rest
Step-by-step guide
Start with creating an organization on GitHub for the course. We
recommend using the course number, semester/quarter, and year in the
organization name, e.g. for a course numbered Sta 199 in Spring 18, you
can use something like Sta199-Sp18
. The exact format is not
critical, but being consistent is helpful so that you can keep track of
all of your different courses.
Previously, it was necessary to apply for GitHub’s Education Discount in order to obtain the ability to create private organization repositories for free. Recently, GitHub announced that they would be providing free unlimited private repositories for all users, making this step no longer necessary.
GitHub still provides educational benefits which are available here via a simple verification process. The list of the available benefits for teachers is provided in the teacher toolbox. Of particular note is the availability of GitHub swag for your students and free GitHub Teams plans for academic organizations and a free GitHub Pro plan for educators.
All of this is an optional step, but one that many will want to do. Approval is usually very quick, but it is not something you would want to do the night before classes begin. Give yourself at least a week to be safe.
Permissions
By default, each new GitHub organization defaults to repositories being readable by all members, regardless of whether they are private or public. This is clearly undesirable for most classroom settings.
Individual-level permissions can be set via the “People” tab on the organization page. We recommend the course instructor to be the owner of the organization and teaching assistants to receive admin privileges. Students should receive member privileges.
Github allows further permissions for accessing and changing repositories to be set for each individual member or at the organization-level (under Settings > Member Privileges). We suggest the organization-level settings below.
Member repository permissions
- Base permissions: None
- Repository creation (both Public and Private): Disabled
- Repository forking: Disabled
Admin repository permissions
- Repository visibility change: Disabled
- Repository deletion and transfer: Disabled
- Issue deletion: Disabled
Member team permissions
- Allow members to create teams: Disabled
We can get a quick snapshot of our organization using the
org_sitrep
function which reports on this permission as
well as other important details.
org_sitrep("ghclass-vignette")
#>
#> ── ghclass-vignette sitrep: ─────────────────────────────────────────────────────────────
#> • Admins: "mine-cetinkaya-rundel", "rundel", and "thereseanders"
#> • Members: 0
#> • Public repos: 0
#> • Private repos: 0
#> • Default repository permission: "read" <- Warning: members can currently view all repos
#> in this org.
#> • Members can create public repos: TRUE
#> • Members can create private repos: TRUE
We can see that this function indicates that the current default
repository permission settings is “read” and provides a helpful warning
that this enables members to view all repositories. This permission can
easily be addressed within the Organization Settings page under Member
privileges. Alternatively, we can also use ghclass to change this
directly with org_set_repo_permission
org_set_repo_permission("ghclass-vignette", permission = "none")
#> ✓ Set org "ghclass-vignette"'s repo permissions to "none".
After changing this setting we can once again check the org’s sitrep and see that the warning is now resolved.
org_sitrep("ghclass-vignette")
#>
#> ── ghclass-vignette sitrep: ─────────────────────────────────────────────────────────────
#> • Admins: "mine-cetinkaya-rundel", "rundel", and "thereseanders"
#> • Members: 0
#> • Public repos: 0
#> • Private repos: 0
#> • Default repository permission: "none"
#> • Members can create public repos: TRUE
#> • Members can create private repos: TRUE
Adding students to the organization
Next, collect your students’ GitHub usernames. You can do this using your web form tool of choice (e.g. Google Forms, MS Forms, etc.) or via a quiz or survey on your school’s learning management system (LMS). We will assume that you are able to then read in these data into an R data frame.
For example, your roster file might look something like the following:
roster = readr::read_csv( system.file("roster.csv", package = "ghclass") )
roster
#> # A tibble: 6 × 5
#> email github hw1 hw2 hw3
#> <chr> <chr> <chr> <chr> <chr>
#> 1 anya@school.edu ghclass-anya hw1-team01 hw2-team01 hw3-team01
#> 2 bruno@school.edu ghclass-bruno hw1-team02 hw2-team02 hw3-team02
#> 3 celine@school.edu ghclass-celine hw1-team03 hw2-team03 hw3-team03
#> 4 diego@school.edu ghclass-diego hw1-team01 hw2-team03 hw3-team02
#> 5 elijah@school.edu ghclass-elijah hw1-team02 hw2-team01 hw3-team03
#> 6 francis@school.edu ghclass-francis hw1-team03 hw2-team02 hw3-team01
Here we are using the student’s school email address as a unique identifier, we also have their GitHub username and we have also assigned them to different teams for our three homework assignments.
Using the roster
data frame, we can then invite the
students to the class’ organization. Each of these students will be
notified via email from GitHub asking them to join the
ghclass-vignette
organization.
org_invite(org = "ghclass-vignette", user = roster$github)
#> ✓ Invited user "ghclass-anya" to org "ghclass-vignette".
#> ✓ Invited user "ghclass-bruno" to org "ghclass-vignette".
#> ✓ Invited user "ghclass-celine" to org "ghclass-vignette".
#> ✓ Invited user "ghclass-diego" to org "ghclass-vignette".
#> ✓ Invited user "ghclass-elijah" to org "ghclass-vignette".
#> ✓ Invited user "ghclass-francis" to org "ghclass-vignette".
We now need to wait for the students to accept these invitations
before they will have access to the organization. We can check the
status of these acceptances using the org_members()
and
org_pending()
functions to see which students have accepted
or not accepted the invitation.
org_members("ghclass-vignette")
#> [1] "mine-cetinkaya-rundel" "rundel" "thereseanders"
org_members("ghclass-vignette", include_admins = FALSE)
#> character(0)
org_pending("ghclass-vignette")
#> [1] "ghclass-anya" "ghclass-bruno" "ghclass-diego" "ghclass-elijah"
#> [5] "ghclass-francis" "ghclass-celine"
After some time, some of the students will have accepted the invitation.
org_members("ghclass-vignette", include_admins = FALSE)
#> [1] "ghclass-anya" "ghclass-celine" "ghclass-francis"
org_pending("ghclass-vignette")
#> [1] "ghclass-bruno" "ghclass-diego" "ghclass-elijah"
We can now see that Anya, Celine, and Francis have accepted the invite and we are still waiting on Bruno, Diego, and Elijah. Gentle prodding and reminder emails are often necessary to get all of the students into the organization.
Preparing an assignment
As described above, ghclass uses a workflow where each team or
individual is given access to a single repo for each assignment. To
create these repositories we use a single template repository which
contains all of the files necessary for the assignment. Generally, this
will consist of things like a README.md
with instructions,
a scaffolded Rmd
or R
file where the students
will enter their answers, and any other necessary support files
(e.g. data, images, support scripts, etc.). For an example of such a
repository you can take a look at hw1 from a Statistical
Computing course offered at Duke in the Spring of 2019. Note that this
repository is public and viewable by anyone, but this is not necessary
for your template repository.
Once you have created the repository and are ready to distribute it
to students there is one more suggest step - setting the repo’s template
status to TRUE
. This is a GitHub specific detail, but doing
so makes it much more efficient to create copies of the repo for your
students. We can set this option under the repo’s Settings on GitHub,
just check the box labelled “Template Repository” on the main settings
page, or use the repo_set_template
function. This status
can also be checked with ghclass using
repo_is_template
.
repo_is_template("Sta323-Sp19/hw1")
#> [1] FALSE
repo_set_template("Sta323-Sp19/hw1")
#> ✓ Changed the template status of repo "Sta323-Sp19/hw1" to TRUE.
repo_is_template("Sta323-Sp19/hw1")
#> [1] TRUE
Distributing a team assignment
Once you have created your template repository, it is then straight
forward process to create the team or individual repositories for your
students. The recommended process is to use the
org_create_assignment
function, which is a high level
function that takes care of each of the underlying steps for you. To
start we will create the hw1 team-based assignment given the teams in
roster
.
org_create_assignment(
org = "ghclass-vignette",
user = roster$github,
repo = roster$hw1,
team = roster$hw1,
source_repo = "Sta323-Sp19/hw1",
private = TRUE
)
#> ✓ Mirrored repo "Sta323-Sp19/hw1" to repo "ghclass-vignette/hw1-team01".
#> ✓ Mirrored repo "Sta323-Sp19/hw1" to repo "ghclass-vignette/hw1-team02".
#> ✓ Mirrored repo "Sta323-Sp19/hw1" to repo "ghclass-vignette/hw1-team03".
#> ✓ Created team "hw1-team01" in org "ghclass-vignette".
#> ✓ Created team "hw1-team02" in org "ghclass-vignette".
#> ✓ Created team "hw1-team03" in org "ghclass-vignette".
#> ✓ Added user "ghclass-anya" to team "hw1-team01".
#> ✓ Added user "ghclass-bruno" to team "hw1-team02".
#> ✓ Added user "ghclass-celine" to team "hw1-team03".
#> ✓ Added user "ghclass-diego" to team "hw1-team01".
#> ✓ Added user "ghclass-elijah" to team "hw1-team02".
#> ✓ Added user "ghclass-francis" to team "hw1-team03".
#> ✓ Team "hw1-team01-1" given "push" access to repo "ghclass-vignette/hw1-team01"
#> ✓ Team "hw1-team02" given "push" access to repo "ghclass-vignette/hw1-team02"
#> ✓ Team "hw1-team03" given "push" access to repo "ghclass-vignette/hw1-team03"
Based on the output we can see that multiple steps are involved in this process:
The repositories are created by mirroring the contents of “Sta323-Sp19/hw1” into the new repositories. The names of these repositories are given by the
repo
argument, and in this case match the team names.Each of the teams is created within the organization.
The students are added to their assignment teams.
Teams are added to the repositories with “push” permission, allowing them to write and make changes to the repo.
Distributing an individual assignment
If instead of hw1 being a team assignment, we wanted to distribute it
as an individual assignment, we can also achieve this using the
org_create_assignment
function by simply excluding the
team
argument (and providing appropriate repo names).
org_create_assignment(
org = "ghclass-vignette",
user = roster$github,
repo = paste0("hw1-ind-", roster$github),
source_repo = "Sta323-Sp19/hw1",
private = TRUE
)
#> ✓ Mirrored repo "Sta323-Sp19/hw1" to repo "ghclass-vignette/hw1-ind-ghclass-anya".
#> ✓ Mirrored repo "Sta323-Sp19/hw1" to repo "ghclass-vignette/hw1-ind-ghclass-bruno".
#> ✓ Mirrored repo "Sta323-Sp19/hw1" to repo "ghclass-vignette/hw1-ind-ghclass-celine".
#> ✓ Mirrored repo "Sta323-Sp19/hw1" to repo "ghclass-vignette/hw1-ind-ghclass-diego".
#> ✓ Mirrored repo "Sta323-Sp19/hw1" to repo "ghclass-vignette/hw1-ind-ghclass-elijah".
#> ✓ Mirrored repo "Sta323-Sp19/hw1" to repo "ghclass-vignette/hw1-ind-ghclass-francis".
#> ✓ User "ghclass-anya" given "push" access to repo "ghclass-vignette/hw1-ind-ghclass-anya"
#> ✓ User "ghclass-bruno" given "push" access to repo "ghclass-vignette/hw1-ind-ghclass-bruno"
#> ✓ User "ghclass-celine" given "push" access to repo "ghclass-vignette/hw1-ind-ghclass-celine"
#> ✓ User "ghclass-diego" given "push" access to repo "ghclass-vignette/hw1-ind-ghclass-diego"
#> ✓ User "ghclass-elijah" given "push" access to repo "ghclass-vignette/hw1-ind-ghclass-elijah"
#> ✓ User "ghclass-francis" given "push" access to repo "ghclass-vignette/hw1-ind-ghclass-francis"
The underlying process here is very similar with the only difference being that we no longer need to create teams and instead add the users directly to the repositories.
Listing Repos
Once the repos are created we can interact with them with ghclass, one of the most common needs is simply to list which repos exist within our organization and or selecting some subset of them.
org_repos("ghclass-vignette")
#> [1] "ghclass-vignette/hw1-team01" "ghclass-vignette/hw1-team02"
#> [3] "ghclass-vignette/hw1-team03" "ghclass-vignette/hw1-ind-ghclass-anya"
#> [5] "ghclass-vignette/hw1-ind-ghclass-bruno" "ghclass-vignette/hw1-ind-ghclass-celine"
#> [7] "ghclass-vignette/hw1-ind-ghclass-diego" "ghclass-vignette/hw1-ind-ghclass-elijah"
#> [9] "ghclass-vignette/hw1-ind-ghclass-francis"
org_repos("ghclass-vignette", filter="hw1-team")
#> [1] "ghclass-vignette/hw1-team01" "ghclass-vignette/hw1-team02"
#> [3] "ghclass-vignette/hw1-team03"
Modifying Repos
Mirroring repos is somewhat heavy handed, since it forces the target repo to be identical to the source repo. In some cases we only want to add or modify a single file in the repository. Most often this occurs after distributing an assignment and discovering that there is an issue with the instructions, the data, etc.
ghclass allows you to automate the process of adding, modifying, or replacing files across repos after they have been created. While this process does overwrite existing files in the repo everything is being done within the context of git and changes can be rolled back or merged if they conflict.
Lets assume that we distributed hw1 with the wrong version of the
README.md
included, if we want to replace this with the
correct version across all of the hw1 repositories then we could do the
following,
file = system.file("README.md", package = "ghclass")
repo_add_file(
org_repos("ghclass-vignette","hw1-team"),
message = "Replace README.md with the correct version",
file = file,
overwrite = TRUE
)
#> ✓ Added file "README.md" to repo "ghclass-vignette/hw1-team01".
#> ✓ Added file "README.md" to repo "ghclass-vignette/hw1-team02".
#> ✓ Added file "README.md" to repo "ghclass-vignette/hw1-team03".
The updated version looks like the following,
repo_get_readme("ghclass-vignette/hw1-team01", include_details = FALSE)
#> [1] "## Homework 01\n\nThis is the corrected version of the HW01 Readme\n"
We can also use the function repo_modify_file
to make
changes to existing files,
repo_modify_file(
repo = org_repos("ghclass-vignette","hw1-team"),
path = "README.md",
pattern = "## Homework 01\n\n",
content = "Due: Tomorrow\n",
method = "after"
)
#> ✓ Modified file "ghclass-vignette/hw1-team01/README.md".
#> ✓ Modified file "ghclass-vignette/hw1-team02/README.md".
#> ✓ Modified file "ghclass-vignette/hw1-team03/README.md".
repo_get_readme("ghclass-vignette/hw1-team01", include_details = FALSE)
#> [1] "## Homework 01\n\nDue: Tomorrow\nThis is the corrected version of the HW01 Readme\n"
Collecting Student Work
Eventually the students will be finished with the work and or the assignment deadline will have passed. ghclass makes it easy to collect all of the student work off of GitHub and make it accessible on your local computer for grading. We make use of the gert package to provide basic git functionality within ghclass.
local_repo_clone(
repo = org_repos("ghclass-vignette", "hw1-team"),
local_path = repo_dir
)
#> ✔ Cloned "ghclass-vignette/hw1-team01".
#> ✔ Cloned "ghclass-vignette/hw1-team02".
#> ✔ Cloned "ghclass-vignette/hw1-team03".
fs::dir_tree(repo_dir)
#> /var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gn/T//RtmpEkGyKu/hw1
#> ├── hw1-team01
#> │ ├── README.md
#> │ ├── fizzbuzz.png
#> │ ├── hw1.Rmd
#> │ ├── hw1.Rproj
#> │ ├── hw1_whitelist.R
#> │ └── wercker.yml
#> ├── hw1-team02
#> │ ├── README.md
#> │ ├── fizzbuzz.png
#> │ ├── hw1.Rmd
#> │ ├── hw1.Rproj
#> │ ├── hw1_whitelist.R
#> │ └── wercker.yml
#> └── hw1-team03
#> ├── README.md
#> ├── fizzbuzz.png
#> ├── hw1.Rmd
#> ├── hw1.Rproj
#> ├── hw1_whitelist.R
#> └── wercker.yml
FAQ
Do I really need private repositories for my students’ assignments?
You might not care, but the law might. For example, in the United States, FERPA regulations stipulate that student information should be kept private. If you use public repositories, anyone can find out who is enrolled in your course. Additionally, you will likely be using GitHub issues for providing feedback on the students’ work, and potentially even mention their grade in a given assignment. This information should not be publicly available to anyone.
Also, your students may not want their coursework to be publicly available. They are bound to make mistakes as they learn and it should be up to them whether they want those to be a piece of their public profile on GitHub.-
Why not use GitHub Classroom?
Actually you don’t have to choose between ghclass and GitHub Classroom, your workflow can use either or both - they are just different interfaces to the same underlying API. Generally, it is mostly a matter of preference, but there are a couple of features in ghclass that are not present in GitHub Classroom:- Instructor defined teams – GitHub Classroom asks students to choose their teammates when creating their repository.
- Editing existing repositories – being able to push changes to student repositories after an assignment is released can be quite valuable.
- Command-line interface – if you like writing R code to solve your problems this may be a better fit for you as it provides a greater level of control and more flexibility.
-
Does the default branch of my repository matter (
master
vsmain
)?
Yes and no - recently, GitHub has announced that they will be changing the default branch for all new repositories on their platform tomain
frommaster
. Details on this change and the timeline for implementation are available here. In anticipation of these changes we have updatedghclass
to support alternative default branch names across the entire package. For the vast majority of use cases you will not see any differences as the GitHub API and/or Git will already use the default branch without any additional specification. In a small number of cases a branch name is required, in which case the package no longer provides a default value and you will be prompted to specify that argument. Hopefully these changes will have a minimal impact on our users in terms of both backwards and forwards compatibility.
A couple other quick points about this change:- The default GitHub behavior is expected to change in mid-October 2020, existing repositories and organizations will not be affected.
- Currently, our recommendation is if your classroom org is already
using
master
to leave it as is, particularly for repos already distributed to students. GitHub will be providing migration tools later in the year which hopefully will be useful for migrating your entire organization. - For new classroom orgs, before the GitHub wide change, you can
choose a new default branch name organization wide under
Org > Settings > Repository Defaults > Repository default branch
. Note that this will only affect newly created repositories, not existing repositories.