What is ghclass?

ghclass is an R package that is designed to enable instructors to efficiently manage their courses on GitHub. It has a wide range of functionality for managing organizations, teams, repositories, and users on GitHub and helps automate most of the tedious and repetitive tasks around creating and distributing assignments.

Who is this package for?

This package is for everyone! But really, if you’re an instructor who uses GitHub for your class management, e.g. students submit assignments via GitHub repos, this package is definitely for you! The package also assumes that you’re familiar with R, but teaching with R is not a requirement since this package is entirely agnostic to the contents of your repositories.

(If you’re a Python user, see this post for a Python based alternative.)

What is this vignette about?

This vignette is about the nitty-gritty of setting your class up in GitHub with ghclass. For a higher level discussion of why and how to use Git/GitHub in data science education, see this paper by the package authors.

Structuring your class on GitHub

The general framework is outlined below. This is not the only way to structure a class on GitHub, but it’s a good way, and one that ghclass is optimized to work with.

We outline steps for achieving this structure in the next section. This section is meant to give a high level view of what your course looks like on GitHub.

  • One organization per class: If you teach at a university, this means one semester of a given course. If you teach workshops, this would be one workshop. The instructor and any additional instructional staff, e.g. teaching assistants, are owners. Going forward we will refer to this group of people as “instructors”. The students are members.

  • One repo per student (or team) per assignment: The instructors have admin access to repos, i.e. they can clone, read, and write to every repository. Additionally, they can adjust repo and team memberships by adding or removing collaborators to assignment repositories as well as delete them. The students have write access to their assigned repo, which means that they can clone, read, and write to their assigned repositories but they cannot delete them and they cannot add or remove collaborators to them. This can help with minimizing accidents that cannot be undone and makes sure students cannot peek into each others’ repositories unless you explicitly allow them to do so.

If you have a teamwork component to your course, you can also set up teams on GitHub with your organization and each team can be given similar repository level access privileges for team assignments.

Suppose you have 40 students in your class, and they are in 10 teams of 4 students each. Suppose also that students turn in the following throughout a semester:

  • Individual: 10 homework assignments + 2 exams
  • Teamwork: 8 lab assignments + 1 project

Then, throughout the semester you will need to create total of 570 repositories.

That is a lot of repos to create and permissions to set manually! It’s also a lot of repos to clone when it comes time to grade. ghclass addresses this problem, and more! It does not, however, address the problem that that’s a lot of grading. Sorry, you’re on your own there!

That being said, ghclass does facilitate setting up continuous integration tools using GitHub actions for students’ assignment repos. This can allow for some automatic checking and feedback each time students push to a repo, more on this in a future vignette.

Authentication

This package uses GitHub personal access tokens for authentication with GitHub, these values can be supplied via environmental variables GITHUB_PAT or GITHUB_TOKEN or saved as text in ~/.github/token.

If this is your first time setting up a personal access token (PAT), generate a token in the browser after logging into Github (Settings > Developer Settings > Personal access tokens) or use usethis::browse_github_token.

You can test that your token is working correctly using the github_test_token() function. If everything is working correctly you should see something like the following:

github_test_token()

plot of chunk test_token

If your token is not working you will see an error message like this instead:

github_test_token("bad token")

plot of chunk bad_token

Step-by-step guide

Start with creating an organization on GitHub for the course. We recommend using the course number, semester/quarter, and year in the organization name, e.g. for a course numbered Sta 199 in Spring 18, you can use something like Sta199-Sp18. The exact format is not critical, but being consistent is helpful so that you can keep track of all of your different courses.

Previously, it was necessary to apply for GitHub’s Education Discount in order to obtain the ability to create private organization repositories for free. Recently, GitHub announced that they would be providing free unlimited private repositories for all users, making this step no longer necessary.

GitHub still provides educational benefits which are available here via a simple verification process. The list of the available benefits for teachers is provided in the teacher toolbox. Of particular note is the availability of GitHub swag for your students and free GitHub Teams plans for academic organizations and a free GitHub Pro plan for educators.

All of this is an optional step, but one that many will want to do. Approval is usually very quick, but it is not something you would want to do the night before classes begin. Give yourself at least a week to be safe.

Permissions

By default, each new GitHub organization defaults to repositories being readable by all members, regardless of whether they are private or public. This is clearly undesirable for most classroom settings.

Individual-level permissions can be set via the “People” tab on the organization page. We recommend the course instructor to be the owner of the organization and teaching assistants to receive admin privileges. Students should receive member privileges.

Github allows further permissions for accessing and changing repositories to be set for each individual member or at the organization-level (under Settings > Member Privileges). We suggest the organization-level settings below.

Member repository permissions

  • Base permissions: None
  • Repository creation (both Public and Private): Disabled
  • Repository forking: Disabled

Admin repository permissions

  • Repository visibility change: Disabled
  • Repository deletion and transfer: Disabled
  • Issue deletion: Disabled

Member team permissions

  • Allow members to create teams: Disabled

We can get a quick snapshot of our organization using the org_sitrep function which reports on this permission as well as other important details.

org_sitrep("ghclass-vignette")

plot of chunk sitrep

We can see that this function indicates that the current default repository permission settings is “read” and provides a helpful warning that this enables members to view all repositories. This permission can easily be addressed within the Organization Settings page under Member privileges. Alternatively, we can also use ghclass to change this directly with org_set_repo_permission

org_set_repo_permission("ghclass-vignette", permission = "none")

plot of chunk set_perm

After changing this setting we can once again check the org’s sitrep and see that the warning is now resolved.

org_sitrep("ghclass-vignette")

plot of chunk sitrep2

Adding students to the organization

Next, collect your students’ GitHub usernames. You can do this using your web form tool of choice (e.g. Google Forms, MS Forms, etc.) or via a quiz or survey on your school’s learning management system (LMS). We will assume that you are able to then read in these data into an R data frame.

For example, your roster file might look something like the following:

roster = readr::read_csv( system.file("roster.csv", package = "ghclass") )
roster

plot of chunk roster Here we are using the student’s school email address as a unique identifier, we also have their GitHub username and we have also assigned them to different teams for our three homework assignments.

Using the roster data frame, we can then invite the students to the class’ organization. Each of these students will be notified via email from GitHub asking them to join the ghclass-vignette organization.

org_invite(org = "ghclass-vignette", user = roster$github)

plot of chunk invite

We now need to wait for the students to accept these invitations before they will have access to the organization. We can check the status of these acceptances using the org_members() and org_pending() functions to see which students have accepted or not accepted the invitation.

org_members("ghclass-vignette")

plot of chunk members

org_members("ghclass-vignette", include_admins = FALSE)

plot of chunk members2

org_pending("ghclass-vignette")

plot of chunk pending

After some time, some of the students will have accepted the invitation.

org_members("ghclass-vignette", include_admins = FALSE)

plot of chunk members3

org_pending("ghclass-vignette")

plot of chunk pending2

We can now see that Anya, Celine, and Francis have accepted the invite and we are still waiting on Bruno, Diego, and Elijah. Gentle prodding and reminder emails are often necessary to get all of the students into the organization.

Preparing an assignment

As described above, ghclass uses a workflow where each team or individual is given access to a single repo for each assignment. To create these repositories we use a single template repository which contains all of the files necessary for the assignment. Generally, this will consist of things like a README.md with instructions, a scaffolded Rmd or R file where the students will enter their answers, and any other necessary support files (e.g. data, images, support scripts, etc.). For an example of such a repository you can take a look at hw1 from a Statistical Computing course offered at Duke in the Spring of 2019. Note that this repository is public and viewable by anyone, but this is not necessary for your template repository.

Once you have created the repository and are ready to distribute it to students there is one more suggest step - setting the repo’s template status to TRUE. This is a GitHub specific detail, but doing so makes it much more efficient to create copies of the repo for your students. We can set this option under the repo’s Settings on GitHub, just check the box labelled “Template Repository” on the main settings page, or use the repo_set_template function. This status can also be checked with ghclass using repo_is_template.

repo_is_template("Sta323-Sp19/hw1")

plot of chunk is_template

repo_set_template("Sta323-Sp19/hw1")

plot of chunk set_template

repo_is_template("Sta323-Sp19/hw1")

plot of chunk is_template2

Distributing a team assignment

Once you have created your template repository, it is then straight forward process to create the team or individual repositories for your students. The recommended process is to use the org_create_assignment function, which is a high level function that takes care of each of the underlying steps for you. To start we will create the hw1 team-based assignment given the teams in roster.

org_create_assignment(
  org = "ghclass-vignette",
  user = roster$github,
  repo = roster$hw1,
  team = roster$hw1,
  source_repo = "Sta323-Sp19/hw1",
  private = TRUE
)

plot of chunk create_team_assign

Based on the output we can see that multiple steps are involved in this process:

  1. The repositories are created by mirroring the contents of “Sta323-Sp19/hw1” into the new repositories. The names of these repositories are given by the repo argument, and in this case match the team names.

  2. Each of the teams is created within the organization.

  3. The students are added to their assignment teams.

  4. Teams are added to the repositories with “push” permission, allowing them to write and make changes to the repo.

Distributing an individual assignment

If instead of hw1 being a team assignment, we wanted to distribute it as an individual assignment, we can also achieve this using the org_create_assignment function by simply excluding the team argument (and providing appropriate repo names).

org_create_assignment(
  org = "ghclass-vignette",
  user = roster$github,
  repo = paste0("hw1-ind-", roster$github),
  source_repo = "Sta323-Sp19/hw1",
  private = TRUE
)

plot of chunk create_ind_assign

The underlying process here is very similar with the only difference being that we no longer need to create teams and instead add the users directly to the repositories.

Listing Repos

Once the repos are created we can interact with them with ghclass, one of the most common needs is simply to list which repos exist within our organization and or selecting some subset of them.

org_repos("ghclass-vignette")

plot of chunk repos

org_repos("ghclass-vignette", filter="hw1-team")

plot of chunk repos_filter

Modifying Repos

Mirroring repos is somewhat heavy handed, since it forces the target repo to be identical to the source repo. In some cases we only want to add or modify a single file in the repository. Most often this occurs after distributing an assignment and discovering that there is an issue with the instructions, the data, etc.

ghclass allows you to automate the process of adding, modifying, or replacing files across repos after they have been created. While this process does overwrite existing files in the repo everything is being done within the context of git and changes can be rolled back or merged if they conflict.

Lets assume that we distributed hw1 with the wrong version of the README.md included, if we want to replace this with the correct version across all of the hw1 repositories then we could do the following,

file = system.file("README.md", package = "ghclass")

repo_add_file(
  org_repos("ghclass-vignette","hw1-team"),
  message = "Replace README.md with the correct version",
  file = file,
  overwrite = TRUE
)

plot of chunk add_file

The updated version looks like the following,

repo_get_readme("ghclass-vignette/hw1-team01", include_details = FALSE)

plot of chunk get_readme

We can also use the function repo_modify_file to make changes to existing files,

repo_modify_file(
  repo = org_repos("ghclass-vignette","hw1-team"),
  path = "README.md",
  pattern = "## Homework 01\n\n",
  content = "Due: Tomorrow\n",
  method = "after"
)

plot of chunk modify_file

repo_get_readme("ghclass-vignette/hw1-team01", include_details = FALSE)

plot of chunk get_readme2

Collecting Student Work

Eventually the students will be finished with the work and or the assignment deadline will have passed. ghclass makes it easy to collect all of the student work off of GitHub and make it accessible on your local computer for grading. We make use of the gert package to provide basic git functionality within ghclass.

local_repo_clone(
  repo = org_repos("ghclass-vignette", "hw1-team"),
  local_path = "hw1"
)

plot of chunk local_clone

fs::dir_tree("hw1/")

plot of chunk dir

FAQ

  1. Do I really need private repositories for my students’ assignments?

    You might not care, but the law might. For example, in the United States, FERPA regulations stipulate that student information should be kept private. If you use public repositories, anyone can find out who is enrolled in your course. Additionally, you will likely be using GitHub issues for providing feedback on the students’ work, and potentially even mention their grade in a given assignment. This information should not be publicly available to anyone.

    Also, your students may not want their coursework to be publicly available. They are bound to make mistakes as they learn and it should be up to them whether they want those to be a piece of their public profile on GitHub.

  2. Why not use GitHub Classroom?

    Actually you don’t have to choose between ghclass and GitHub Classroom, your workflow can use either or both - they are just different interfaces to the same underlying API. Generally, it is mostly a matter of preference, but there are a couple of features in ghclass that are not present in GitHub Classroom:

    • Instructor defined teams – GitHub Classroom asks students to choose their teammates when creating their repository.
    • Editing existing repositories – being able to push changes to student repositories after an assignment is released can be quite valuable.
    • Command-line interface – if you like writing R code to solve your problems this may be a better fit for you as it provides a greater level of control and more flexibility.