Example - Assignment with Key • parsermd

library(parsermd)
library(stringr)

Introduction

A common workflow in educational settings involves creating homework assignments that contain both student scaffolding and instructor solutions within the same document. This vignette demonstrates how to use parsermd to process such documents and automatically generate separate versions for students and instructors.

The typical workflow involves:

Primary Document: A single Qmd/Rmd file containing both student prompts and complete solutions
Student Version: Contains only student chunks with scaffolding and instructions
Instructor Key: Contains only solution chunks with complete answers
Minimalist Key: A streamlined version with solutions only (no instructional text)

Sample Assignment Structure

Let’s start by examining a sample homework assignment that follows this pattern. The assignment includes multiple exercises, each with two code chunks:

Student chunk: labeled with -student suffix, contains scaffolding code
Solution chunk: labeled with -key suffix, contains complete solutions

# Load the sample assignment
assignment_path = system.file("examples/hw03-full.qmd", package = "parsermd")
cat(readLines(assignment_path), sep = "\n")

#> ---
#> title: "Homework 3 - Data Analysis with R"
#> author: "Your Name"
#> date: "Due: Friday, March 15, 2024"
#> format: html
#> execute:
#>   warning: false
#>   message: false
#> ---
#>
#> ## Setup
#>
#> Load the required packages for this assignment:
#>
#> ```{r setup}
#> library(tidyverse)
#> library(palmerpenguins)
#> ```
#>
#> ## Exercise 1: Basic Data Exploration
#>
#> Examine the `penguins` dataset from the `palmerpenguins` package. Your task is to create a summary of the dataset that shows the number of observations and variables, and identify any missing values.
#>
#> ```{r ex1-student}
#> # Write your code here to:
#> # 1. Display the dimensions of the penguins dataset
#> # 2. Show the structure of the dataset
#> # 3. Count missing values in each column
#>
#> ```
#>
#> ```{r ex1-key}
#> # Solution: Basic data exploration
#> # 1. Display dimensions
#> cat("Dataset dimensions:", dim(penguins), "\n")
#> cat("Rows:", nrow(penguins), "Columns:", ncol(penguins), "\n\n")
#>
#> # 2. Show structure
#> str(penguins)
#>
#> # 3. Count missing values
#> cat("\nMissing values by column:\n")
#> penguins %>%
#>   summarise(across(everything(), ~ sum(is.na(.))))
#> ```
#>
#> ## Exercise 2: Data Visualization
#>
#> Create a scatter plot showing the relationship between flipper length and body mass for penguins. Color the points by species and add appropriate labels and a title.
#>
#> ```{r ex2-student}
#> # Create a scatter plot with:
#> # - x-axis: flipper_length_mm
#> # - y-axis: body_mass_g
#> # - color by species
#> # - add appropriate labels and title
#>
#> ggplot(data = penguins, aes(x = ___, y = ___)) +
#>   geom_point(aes(color = ___)) +
#>   labs(
#>     title = "___",
#>     x = "___",
#>     y = "___"
#>   )
#> ```
#>
#> ```{r ex2-key}
#> # Solution: Scatter plot of flipper length vs body mass
#> ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
#>   geom_point(aes(color = species), alpha = 0.8, size = 2) +
#>   labs(
#>     title = "Penguin Flipper Length vs Body Mass by Species",
#>     x = "Flipper Length (mm)",
#>     y = "Body Mass (g)",
#>     color = "Species"
#>   ) +
#>   theme_minimal() +
#>   scale_color_viridis_d()
#> ```
#>
#> ## Exercise 3: Statistical Analysis
#>
#> Calculate summary statistics for bill length by species. Create a table showing the mean, median, standard deviation, and count for each species.
#>
#> ```{r ex3-student}
#> # Calculate summary statistics for bill_length_mm by species
#> # Include: mean, median, standard deviation, and count
#> # Remove missing values before calculating
#>
#> penguins %>%
#>   # Add your code here
#>
#> ```
#>
#> ```{r ex3-key}
#> # Solution: Summary statistics for bill length by species
#> penguins %>%
#>   filter(!is.na(bill_length_mm)) %>%
#>   group_by(species) %>%
#>   summarise(
#>     count = n(),
#>     mean_bill_length = round(mean(bill_length_mm), 2),
#>     median_bill_length = round(median(bill_length_mm), 2),
#>     sd_bill_length = round(sd(bill_length_mm), 2),
#>     .groups = "drop"
#>   ) %>%
#>   arrange(desc(mean_bill_length))
#> ```
#>
#> ## Exercise 4: Advanced Data Manipulation
#>
#> Filter the dataset to include only penguins with complete data (no missing values), then create a new variable called `bill_ratio` that represents the ratio of bill length to bill depth. Finally, identify which species has the highest average bill ratio.
#>
#> ```{r ex4-student}
#> # Step 1: Filter for complete cases
#> # Step 2: Create bill_ratio variable (bill_length_mm / bill_depth_mm)
#> # Step 3: Calculate average bill_ratio by species
#> # Step 4: Identify species with highest average ratio
#>
#> ```
#>
#> ```{r ex4-key}
#> # Solution: Advanced data manipulation
#> complete_penguins = penguins %>%
#>   # Remove rows with any missing values
#>   filter(complete.cases(.)) %>%
#>   # Create bill_ratio variable
#>   mutate(bill_ratio = bill_length_mm / bill_depth_mm)
#>
#> # Calculate average bill ratio by species
#> bill_ratio_summary = complete_penguins %>%
#>   group_by(species) %>%
#>   summarise(
#>     avg_bill_ratio = round(mean(bill_ratio), 3),
#>     n = n(),
#>     .groups = "drop"
#>   ) %>%
#>   arrange(desc(avg_bill_ratio))
#>
#> print(bill_ratio_summary)
#>
#> # Identify species with highest average bill ratio
#> highest_ratio_species = bill_ratio_summary %>%
#>   slice_max(avg_bill_ratio, n = 1) %>%
#>   pull(species)
#>
#> cat("\nSpecies with highest average bill ratio:", as.character(highest_ratio_species))
#> ```
#>
#> ## Bonus Exercise: Conditional Logic
#>
#> Write a function that categorizes penguins as "small", "medium", or "large" based on their body mass. Use the following criteria:
#> - Small: body mass < 3500g
#> - Medium: body mass between 3500g and 4500g
#> - Large: body mass > 4500g
#>
#> Apply this function to create a new column and create a summary table.
#>
#> ```{r bonus-student}
#> # Create a function to categorize penguins by size
#> categorize_size = function(mass) {
#>   # Add your conditional logic here
#>
#> }
#>
#> # Apply the function and create summary
#> ```
#>
#> ```{r bonus-key}
#> # Solution: Conditional logic for size categorization
#> categorize_size = function(mass) {
#>   case_when(
#>     is.na(mass) ~ "Unknown",
#>     mass < 3500 ~ "Small",
#>     mass >= 3500 & mass <= 4500 ~ "Medium",
#>     mass > 4500 ~ "Large"
#>   )
#> }
#>
#> # Apply the function and create summary
#> penguins_with_size = penguins %>%
#>   mutate(size_category = categorize_size(body_mass_g))
#>
#> # Create summary table
#> size_summary = penguins_with_size %>%
#>   count(species, size_category) %>%
#>   pivot_wider(names_from = size_category, values_from = n, values_fill = 0)
#>
#> print(size_summary)
#>
#> # Overall size distribution
#> penguins_with_size %>%
#>   count(size_category) %>%
#>   mutate(percentage = round(n / sum(n) * 100, 1))
#> ```

Parsing the Document

First, let’s parse the assignment document to understand its structure:

# Parse the assignment
rmd = parse_rmd(assignment_path)

# Display the document structure
print(rmd)
#> ├── YAML [5 fields]
#> ├── Heading [h2] - Setup
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 2 lines] - setup
#> ├── Heading [h2] - Exercise 1: Basic Data Exploration
#> │   ├── Markdown [1 line]
#> │   ├── Chunk [r, 5 lines] - ex1-student
#> │   └── Chunk [r, 12 lines] - ex1-key
#> ├── Heading [h2] - Exercise 2: Data Visualization
#> │   ├── Markdown [1 line]
#> │   ├── Chunk [r, 13 lines] - ex2-student
#> │   └── Chunk [r, 11 lines] - ex2-key
#> ├── Heading [h2] - Exercise 3: Statistical Analysis
#> │   ├── Markdown [1 line]
#> │   ├── Chunk [r, 7 lines] - ex3-student
#> │   └── Chunk [r, 12 lines] - ex3-key
#> ├── Heading [h2] - Exercise 4: Advanced Data Manipulation
#> │   ├── Markdown [1 line]
#> │   ├── Chunk [r, 5 lines] - ex4-student
#> │   └── Chunk [r, 25 lines] - ex4-key
#> └── Heading [h2] - Bonus Exercise: Conditional Logic
#>     ├── Markdown [6 lines]
#>     ├── Chunk [r, 7 lines] - bonus-student
#>     └── Chunk [r, 25 lines] - bonus-key

We can also examine the document as a tibble to better understand the chunk labels and structure:

# Convert to tibble for easier inspection
as_tibble(rmd)
#> # A tibble: 24 × 4
#>    sec_h2                             type         label       ast           
#>    <chr>                              <chr>        <chr>       <list>        
#>  1 <NA>                               rmd_yaml     <NA>        <yaml>        
#>  2 Setup                              rmd_heading  <NA>        <heading [h2]>
#>  3 Setup                              rmd_markdown <NA>        <markdown>    
#>  4 Setup                              rmd_chunk    setup       <chunk [r]>   
#>  5 Exercise 1: Basic Data Exploration rmd_heading  <NA>        <heading [h2]>
#>  6 Exercise 1: Basic Data Exploration rmd_markdown <NA>        <markdown>    
#>  7 Exercise 1: Basic Data Exploration rmd_chunk    ex1-student <chunk [r]>   
#>  8 Exercise 1: Basic Data Exploration rmd_chunk    ex1-key     <chunk [r]>   
#>  9 Exercise 2: Data Visualization     rmd_heading  <NA>        <heading [h2]>
#> 10 Exercise 2: Data Visualization     rmd_markdown <NA>        <markdown>    
#> # ℹ 14 more rows

Creating the Student Version

To create the student version, we need to:

Keep all markdown content (instructions, problem statements)
Keep only the student chunks (those with -student suffix)
Remove all solution chunks

# Select student chunks and all non-chunk content
student_version = rmd |>
  rmd_select(
    # Easier to specify the nodes we want to remove
    !has_label("*-key")
  )

# Display the student version structure
student_version
#> ├── YAML [5 fields]
#> ├── Heading [h2] - Setup
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 2 lines] - setup
#> ├── Heading [h2] - Exercise 1: Basic Data Exploration
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 5 lines] - ex1-student
#> ├── Heading [h2] - Exercise 2: Data Visualization
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 13 lines] - ex2-student
#> ├── Heading [h2] - Exercise 3: Statistical Analysis
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 7 lines] - ex3-student
#> ├── Heading [h2] - Exercise 4: Advanced Data Manipulation
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 5 lines] - ex4-student
#> └── Heading [h2] - Bonus Exercise: Conditional Logic
#>     ├── Markdown [6 lines]
#>     └── Chunk [r, 7 lines] - bonus-student

If we don’t want to let the student on to the fact that the chunks are just for them we can use rmd_modify() to remove the -student suffix:

student_version = student_version |>
  rmd_modify(
    function(node) {
      rmd_node_label(node) = stringr::str_remove(rmd_node_label(node), "-student")
      node
    },
    has_label("*-student")
  )

# Show the first few chunks to see the label changes
student_version
#> ├── YAML [5 fields]
#> ├── Heading [h2] - Setup
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 2 lines] - setup
#> ├── Heading [h2] - Exercise 1: Basic Data Exploration
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 5 lines] - ex1
#> ├── Heading [h2] - Exercise 2: Data Visualization
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 13 lines] - ex2
#> ├── Heading [h2] - Exercise 3: Statistical Analysis
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 7 lines] - ex3
#> ├── Heading [h2] - Exercise 4: Advanced Data Manipulation
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 5 lines] - ex4
#> └── Heading [h2] - Bonus Exercise: Conditional Logic
#>     ├── Markdown [6 lines]
#>     └── Chunk [r, 7 lines] - bonus

Let’s see what the student version looks like as a document:

# Convert to document and display first few sections
as_document(student_version) |>
  cat(sep = "\n")

#> ---
#> title: Homework 3 - Data Analysis with R
#> author: Your Name
#> date: 'Due: Friday, March 15, 2024'
#> format: html
#> execute:
#>   warning: false
#>   message: false
#> ---
#>
#> ## Setup
#>
#> Load the required packages for this assignment:
#>
#>
#> ```{r setup}
#> library(tidyverse)
#> library(palmerpenguins)
#> ```
#>
#> ## Exercise 1: Basic Data Exploration
#>
#> Examine the `penguins` dataset from the `palmerpenguins` package. Your task is to create a summary of the dataset that shows the number of observations and variables, and identify any missing values.
#>
#>
#> ```{r ex1}
#> # Write your code here to:
#> # 1. Display the dimensions of the penguins dataset
#> # 2. Show the structure of the dataset
#> # 3. Count missing values in each column
#>
#> ```
#>
#> ## Exercise 2: Data Visualization
#>
#> Create a scatter plot showing the relationship between flipper length and body mass for penguins. Color the points by species and add appropriate labels and a title.
#>
#>
#> ```{r ex2}
#> # Create a scatter plot with:
#> # - x-axis: flipper_length_mm
#> # - y-axis: body_mass_g
#> # - color by species
#> # - add appropriate labels and title
#>
#> ggplot(data = penguins, aes(x = ___, y = ___)) +
#>   geom_point(aes(color = ___)) +
#>   labs(
#>     title = "___",
#>     x = "___",
#>     y = "___"
#>   )
#> ```
#>
#> ## Exercise 3: Statistical Analysis
#>
#> Calculate summary statistics for bill length by species. Create a table showing the mean, median, standard deviation, and count for each species.
#>
#>
#> ```{r ex3}
#> # Calculate summary statistics for bill_length_mm by species
#> # Include: mean, median, standard deviation, and count
#> # Remove missing values before calculating
#>
#> penguins %>%
#>   # Add your code here
#>
#> ```
#>
#> ## Exercise 4: Advanced Data Manipulation
#>
#> Filter the dataset to include only penguins with complete data (no missing values), then create a new variable called `bill_ratio` that represents the ratio of bill length to bill depth. Finally, identify which species has the highest average bill ratio.
#>
#>
#> ```{r ex4}
#> # Step 1: Filter for complete cases
#> # Step 2: Create bill_ratio variable (bill_length_mm / bill_depth_mm)
#> # Step 3: Calculate average bill_ratio by species
#> # Step 4: Identify species with highest average ratio
#>
#> ```
#>
#> ## Bonus Exercise: Conditional Logic
#>
#> Write a function that categorizes penguins as "small", "medium", or "large" based on their body mass. Use the following criteria:
#> - Small: body mass < 3500g
#> - Medium: body mass between 3500g and 4500g
#> - Large: body mass > 4500g
#>
#> Apply this function to create a new column and create a summary table.
#>
#>
#> ```{r bonus}
#> # Create a function to categorize penguins by size
#> categorize_size = function(mass) {
#>   # Add your conditional logic here
#>
#> }
#>
#> # Apply the function and create summary
#> ```

We can also save this to a file:

# Save student version (not run in vignette)
as_document(student_version) |>
  writeLines("homework-student.qmd")

Creating the Instructor Key

For the instructor key, we want to:

Keep all markdown content for context
Keep only the solution chunks (those with -key suffix)
Remove all student chunks

# Select solution chunks and all non-chunk content
instructor_key = rmd |>
  rmd_select(
    # Again this is easier to specify the nodes we want to remove
    !has_label("*-student")
  )

# Display the instructor key structure
instructor_key
#> ├── YAML [5 fields]
#> ├── Heading [h2] - Setup
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 2 lines] - setup
#> ├── Heading [h2] - Exercise 1: Basic Data Exploration
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 12 lines] - ex1-key
#> ├── Heading [h2] - Exercise 2: Data Visualization
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 11 lines] - ex2-key
#> ├── Heading [h2] - Exercise 3: Statistical Analysis
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 12 lines] - ex3-key
#> ├── Heading [h2] - Exercise 4: Advanced Data Manipulation
#> │   ├── Markdown [1 line]
#> │   └── Chunk [r, 25 lines] - ex4-key
#> └── Heading [h2] - Bonus Exercise: Conditional Logic
#>     ├── Markdown [6 lines]
#>     └── Chunk [r, 25 lines] - bonus-key

Let’s examine the instructor key document:

# Convert to document
instructor_doc = as_document(instructor_key)

# Display first part of the document
cat(head(strsplit(instructor_doc, "\n")[[1]], 50), sep = "\n")
#> ---

Creating a Minimalist Key

Sometimes instructors may want a very streamlined version that contains only the solution code without all the instructional text.

We can create this by:

Keeping only exercise headings and solution chunks
Removing all markdown instructions
Setting #| include: false for the setup chunk

# Select only headings and solution chunks
minimalist_key = rmd |>
  rmd_select(
    # Keep yaml and exercise headings for structure
    has_type("rmd_yaml"),
    has_heading(c("Exercise *", "Bonus*")),
    # Keep only solution chunks
    has_label(c("*-key", "setup"))
  ) |>
  rmd_modify(
    function(node) {
      rmd_node_options(node) = list(include = FALSE)
      node
    },
    has_label("setup")
  )

# Display the minimalist key structure
minimalist_key
#> ├── YAML [5 fields]
#> ├── Chunk [r, 2 lines] - setup
#> ├── Heading [h2] - Exercise 1: Basic Data Exploration
#> │   └── Chunk [r, 12 lines] - ex1-key
#> ├── Heading [h2] - Exercise 2: Data Visualization
#> │   └── Chunk [r, 11 lines] - ex2-key
#> ├── Heading [h2] - Exercise 3: Statistical Analysis
#> │   └── Chunk [r, 12 lines] - ex3-key
#> ├── Heading [h2] - Exercise 4: Advanced Data Manipulation
#> │   └── Chunk [r, 25 lines] - ex4-key
#> └── Heading [h2] - Bonus Exercise: Conditional Logic
#>     └── Chunk [r, 25 lines] - bonus-key

# Convert to document
minimalist_doc = as_document(minimalist_key)
cat(minimalist_doc, sep = "\n")

#> ---
#> title: Homework 3 - Data Analysis with R
#> author: Your Name
#> date: 'Due: Friday, March 15, 2024'
#> format: html
#> execute:
#>   warning: false
#>   message: false
#> ---
#>
#> ```{r setup}
#> #| include: false
#> library(tidyverse)
#> library(palmerpenguins)
#> ```
#>
#> ## Exercise 1: Basic Data Exploration
#>
#> ```{r ex1-key}
#> # Solution: Basic data exploration
#> # 1. Display dimensions
#> cat("Dataset dimensions:", dim(penguins), "\n")
#> cat("Rows:", nrow(penguins), "Columns:", ncol(penguins), "\n\n")
#>
#> # 2. Show structure
#> str(penguins)
#>
#> # 3. Count missing values
#> cat("\nMissing values by column:\n")
#> penguins %>%
#>   summarise(across(everything(), ~ sum(is.na(.))))
#> ```
#>
#> ## Exercise 2: Data Visualization
#>
#> ```{r ex2-key}
#> # Solution: Scatter plot of flipper length vs body mass
#> ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
#>   geom_point(aes(color = species), alpha = 0.8, size = 2) +
#>   labs(
#>     title = "Penguin Flipper Length vs Body Mass by Species",
#>     x = "Flipper Length (mm)",
#>     y = "Body Mass (g)",
#>     color = "Species"
#>   ) +
#>   theme_minimal() +
#>   scale_color_viridis_d()
#> ```
#>
#> ## Exercise 3: Statistical Analysis
#>
#> ```{r ex3-key}
#> # Solution: Summary statistics for bill length by species
#> penguins %>%
#>   filter(!is.na(bill_length_mm)) %>%
#>   group_by(species) %>%
#>   summarise(
#>     count = n(),
#>     mean_bill_length = round(mean(bill_length_mm), 2),
#>     median_bill_length = round(median(bill_length_mm), 2),
#>     sd_bill_length = round(sd(bill_length_mm), 2),
#>     .groups = "drop"
#>   ) %>%
#>   arrange(desc(mean_bill_length))
#> ```
#>
#> ## Exercise 4: Advanced Data Manipulation
#>
#> ```{r ex4-key}
#> # Solution: Advanced data manipulation
#> complete_penguins = penguins %>%
#>   # Remove rows with any missing values
#>   filter(complete.cases(.)) %>%
#>   # Create bill_ratio variable
#>   mutate(bill_ratio = bill_length_mm / bill_depth_mm)
#>
#> # Calculate average bill ratio by species
#> bill_ratio_summary = complete_penguins %>%
#>   group_by(species) %>%
#>   summarise(
#>     avg_bill_ratio = round(mean(bill_ratio), 3),
#>     n = n(),
#>     .groups = "drop"
#>   ) %>%
#>   arrange(desc(avg_bill_ratio))
#>
#> print(bill_ratio_summary)
#>
#> # Identify species with highest average bill ratio
#> highest_ratio_species = bill_ratio_summary %>%
#>   slice_max(avg_bill_ratio, n = 1) %>%
#>   pull(species)
#>
#> cat("\nSpecies with highest average bill ratio:", as.character(highest_ratio_species))
#> ```
#>
#> ## Bonus Exercise: Conditional Logic
#>
#> ```{r bonus-key}
#> # Solution: Conditional logic for size categorization
#> categorize_size = function(mass) {
#>   case_when(
#>     is.na(mass) ~ "Unknown",
#>     mass < 3500 ~ "Small",
#>     mass >= 3500 & mass <= 4500 ~ "Medium",
#>     mass > 4500 ~ "Large"
#>   )
#> }
#>
#> # Apply the function and create summary
#> penguins_with_size = penguins %>%
#>   mutate(size_category = categorize_size(body_mass_g))
#>
#> # Create summary table
#> size_summary = penguins_with_size %>%
#>   count(species, size_category) %>%
#>   pivot_wider(names_from = size_category, values_from = n, values_fill = 0)
#>
#> print(size_summary)
#>
#> # Overall size distribution
#> penguins_with_size %>%
#>   count(size_category) %>%
#>   mutate(percentage = round(n / sum(n) * 100, 1))
#> ```

Best Practices

When creating homework assignments for processing with parsermd, consider these best practices:

Clear Structure: Use headings to organize exercises and maintain hierarchy
Meaningful Labels: Use descriptive chunk labels that identify the document components and their type (e.g., ex1-student, ex2-key)
Testing: Always test the generated versions to ensure they work correctly and you haven’t lost anything important (e.g. your YAML front matter or your setup chunk)