Introduction
A common workflow in educational settings involves creating homework assignments that contain both student scaffolding and instructor solutions within the same document. This vignette demonstrates how to use parsermd
to process such documents and automatically generate separate versions for students and instructors.
The typical workflow involves:
- Primary Document: A single Qmd/Rmd file containing both student prompts and complete solutions
- Student Version: Contains only student chunks with scaffolding and instructions
- Instructor Key: Contains only solution chunks with complete answers
- Minimalist Key: A streamlined version with solutions only (no instructional text)
Sample Assignment Structure
Let’s start by examining a sample homework assignment that follows this pattern. The assignment includes multiple exercises, each with two code chunks:
-
Student chunk: labeled with
-student
suffix, contains scaffolding code -
Solution chunk: labeled with
-key
suffix, contains complete solutions
# Load the sample assignment
assignment_path = system.file("examples/hw03-full.qmd", package = "parsermd")
cat(readLines(assignment_path), sep = "\n")
#> ---
#> title: "Homework 3 - Data Analysis with R"
#> author: "Your Name"
#> date: "Due: Friday, March 15, 2024"
#> format: html
#> execute:
#> warning: false
#> message: false
#> ---
#>
#> ## Setup
#>
#> Load the required packages for this assignment:
#>
#> ```{r setup}
#> library(tidyverse)
#> library(palmerpenguins)
#> ```
#>
#> ## Exercise 1: Basic Data Exploration
#>
#> Examine the `penguins` dataset from the `palmerpenguins` package. Your task is to create a summary of the dataset that shows the number of observations and variables, and identify any missing values.
#>
#> ```{r ex1-student}
#> # Write your code here to:
#> # 1. Display the dimensions of the penguins dataset
#> # 2. Show the structure of the dataset
#> # 3. Count missing values in each column
#>
#> ```
#>
#> ```{r ex1-key}
#> # Solution: Basic data exploration
#> # 1. Display dimensions
#> cat("Dataset dimensions:", dim(penguins), "\n")
#> cat("Rows:", nrow(penguins), "Columns:", ncol(penguins), "\n\n")
#>
#> # 2. Show structure
#> str(penguins)
#>
#> # 3. Count missing values
#> cat("\nMissing values by column:\n")
#> penguins %>%
#> summarise(across(everything(), ~ sum(is.na(.))))
#> ```
#>
#> ## Exercise 2: Data Visualization
#>
#> Create a scatter plot showing the relationship between flipper length and body mass for penguins. Color the points by species and add appropriate labels and a title.
#>
#> ```{r ex2-student}
#> # Create a scatter plot with:
#> # - x-axis: flipper_length_mm
#> # - y-axis: body_mass_g
#> # - color by species
#> # - add appropriate labels and title
#>
#> ggplot(data = penguins, aes(x = ___, y = ___)) +
#> geom_point(aes(color = ___)) +
#> labs(
#> title = "___",
#> x = "___",
#> y = "___"
#> )
#> ```
#>
#> ```{r ex2-key}
#> # Solution: Scatter plot of flipper length vs body mass
#> ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
#> geom_point(aes(color = species), alpha = 0.8, size = 2) +
#> labs(
#> title = "Penguin Flipper Length vs Body Mass by Species",
#> x = "Flipper Length (mm)",
#> y = "Body Mass (g)",
#> color = "Species"
#> ) +
#> theme_minimal() +
#> scale_color_viridis_d()
#> ```
#>
#> ## Exercise 3: Statistical Analysis
#>
#> Calculate summary statistics for bill length by species. Create a table showing the mean, median, standard deviation, and count for each species.
#>
#> ```{r ex3-student}
#> # Calculate summary statistics for bill_length_mm by species
#> # Include: mean, median, standard deviation, and count
#> # Remove missing values before calculating
#>
#> penguins %>%
#> # Add your code here
#>
#> ```
#>
#> ```{r ex3-key}
#> # Solution: Summary statistics for bill length by species
#> penguins %>%
#> filter(!is.na(bill_length_mm)) %>%
#> group_by(species) %>%
#> summarise(
#> count = n(),
#> mean_bill_length = round(mean(bill_length_mm), 2),
#> median_bill_length = round(median(bill_length_mm), 2),
#> sd_bill_length = round(sd(bill_length_mm), 2),
#> .groups = "drop"
#> ) %>%
#> arrange(desc(mean_bill_length))
#> ```
#>
#> ## Exercise 4: Advanced Data Manipulation
#>
#> Filter the dataset to include only penguins with complete data (no missing values), then create a new variable called `bill_ratio` that represents the ratio of bill length to bill depth. Finally, identify which species has the highest average bill ratio.
#>
#> ```{r ex4-student}
#> # Step 1: Filter for complete cases
#> # Step 2: Create bill_ratio variable (bill_length_mm / bill_depth_mm)
#> # Step 3: Calculate average bill_ratio by species
#> # Step 4: Identify species with highest average ratio
#>
#> ```
#>
#> ```{r ex4-key}
#> # Solution: Advanced data manipulation
#> complete_penguins = penguins %>%
#> # Remove rows with any missing values
#> filter(complete.cases(.)) %>%
#> # Create bill_ratio variable
#> mutate(bill_ratio = bill_length_mm / bill_depth_mm)
#>
#> # Calculate average bill ratio by species
#> bill_ratio_summary = complete_penguins %>%
#> group_by(species) %>%
#> summarise(
#> avg_bill_ratio = round(mean(bill_ratio), 3),
#> n = n(),
#> .groups = "drop"
#> ) %>%
#> arrange(desc(avg_bill_ratio))
#>
#> print(bill_ratio_summary)
#>
#> # Identify species with highest average bill ratio
#> highest_ratio_species = bill_ratio_summary %>%
#> slice_max(avg_bill_ratio, n = 1) %>%
#> pull(species)
#>
#> cat("\nSpecies with highest average bill ratio:", as.character(highest_ratio_species))
#> ```
#>
#> ## Bonus Exercise: Conditional Logic
#>
#> Write a function that categorizes penguins as "small", "medium", or "large" based on their body mass. Use the following criteria:
#> - Small: body mass < 3500g
#> - Medium: body mass between 3500g and 4500g
#> - Large: body mass > 4500g
#>
#> Apply this function to create a new column and create a summary table.
#>
#> ```{r bonus-student}
#> # Create a function to categorize penguins by size
#> categorize_size = function(mass) {
#> # Add your conditional logic here
#>
#> }
#>
#> # Apply the function and create summary
#> ```
#>
#> ```{r bonus-key}
#> # Solution: Conditional logic for size categorization
#> categorize_size = function(mass) {
#> case_when(
#> is.na(mass) ~ "Unknown",
#> mass < 3500 ~ "Small",
#> mass >= 3500 & mass <= 4500 ~ "Medium",
#> mass > 4500 ~ "Large"
#> )
#> }
#>
#> # Apply the function and create summary
#> penguins_with_size = penguins %>%
#> mutate(size_category = categorize_size(body_mass_g))
#>
#> # Create summary table
#> size_summary = penguins_with_size %>%
#> count(species, size_category) %>%
#> pivot_wider(names_from = size_category, values_from = n, values_fill = 0)
#>
#> print(size_summary)
#>
#> # Overall size distribution
#> penguins_with_size %>%
#> count(size_category) %>%
#> mutate(percentage = round(n / sum(n) * 100, 1))
#> ```
Parsing the Document
First, let’s parse the assignment document to understand its structure:
# Parse the assignment
rmd = parse_rmd(assignment_path)
# Display the document structure
print(rmd)
#> ├── YAML [5 fields]
#> ├── Heading [h2] - Setup
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 2 lines] - setup
#> ├── Heading [h2] - Exercise 1: Basic Data Exploration
#> │ ├── Markdown [1 line]
#> │ ├── Chunk [r, 5 lines] - ex1-student
#> │ └── Chunk [r, 12 lines] - ex1-key
#> ├── Heading [h2] - Exercise 2: Data Visualization
#> │ ├── Markdown [1 line]
#> │ ├── Chunk [r, 13 lines] - ex2-student
#> │ └── Chunk [r, 11 lines] - ex2-key
#> ├── Heading [h2] - Exercise 3: Statistical Analysis
#> │ ├── Markdown [1 line]
#> │ ├── Chunk [r, 7 lines] - ex3-student
#> │ └── Chunk [r, 12 lines] - ex3-key
#> ├── Heading [h2] - Exercise 4: Advanced Data Manipulation
#> │ ├── Markdown [1 line]
#> │ ├── Chunk [r, 5 lines] - ex4-student
#> │ └── Chunk [r, 25 lines] - ex4-key
#> └── Heading [h2] - Bonus Exercise: Conditional Logic
#> ├── Markdown [6 lines]
#> ├── Chunk [r, 7 lines] - bonus-student
#> └── Chunk [r, 25 lines] - bonus-key
We can also examine the document as a tibble to better understand the chunk labels and structure:
# Convert to tibble for easier inspection
as_tibble(rmd)
#> # A tibble: 24 × 4
#> sec_h2 type label ast
#> <chr> <chr> <chr> <list>
#> 1 <NA> rmd_yaml <NA> <yaml>
#> 2 Setup rmd_heading <NA> <heading [h2]>
#> 3 Setup rmd_markdown <NA> <markdown>
#> 4 Setup rmd_chunk setup <chunk [r]>
#> 5 Exercise 1: Basic Data Exploration rmd_heading <NA> <heading [h2]>
#> 6 Exercise 1: Basic Data Exploration rmd_markdown <NA> <markdown>
#> 7 Exercise 1: Basic Data Exploration rmd_chunk ex1-student <chunk [r]>
#> 8 Exercise 1: Basic Data Exploration rmd_chunk ex1-key <chunk [r]>
#> 9 Exercise 2: Data Visualization rmd_heading <NA> <heading [h2]>
#> 10 Exercise 2: Data Visualization rmd_markdown <NA> <markdown>
#> # ℹ 14 more rows
Creating the Student Version
To create the student version, we need to:
- Keep all markdown content (instructions, problem statements)
- Keep only the student chunks (those with
-student
suffix) - Remove all solution chunks
# Select student chunks and all non-chunk content
student_version = rmd |>
rmd_select(
# Easier to specify the nodes we want to remove
!has_label("*-key")
)
# Display the student version structure
student_version
#> ├── YAML [5 fields]
#> ├── Heading [h2] - Setup
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 2 lines] - setup
#> ├── Heading [h2] - Exercise 1: Basic Data Exploration
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 5 lines] - ex1-student
#> ├── Heading [h2] - Exercise 2: Data Visualization
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 13 lines] - ex2-student
#> ├── Heading [h2] - Exercise 3: Statistical Analysis
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 7 lines] - ex3-student
#> ├── Heading [h2] - Exercise 4: Advanced Data Manipulation
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 5 lines] - ex4-student
#> └── Heading [h2] - Bonus Exercise: Conditional Logic
#> ├── Markdown [6 lines]
#> └── Chunk [r, 7 lines] - bonus-student
If we don’t want to let the student on to the fact that the chunks are just for them we can use rmd_modify()
to remove the -student
suffix:
student_version = student_version |>
rmd_modify(
function(node) {
rmd_node_label(node) = stringr::str_remove(rmd_node_label(node), "-student")
node
},
has_label("*-student")
)
# Show the first few chunks to see the label changes
student_version
#> ├── YAML [5 fields]
#> ├── Heading [h2] - Setup
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 2 lines] - setup
#> ├── Heading [h2] - Exercise 1: Basic Data Exploration
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 5 lines] - ex1
#> ├── Heading [h2] - Exercise 2: Data Visualization
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 13 lines] - ex2
#> ├── Heading [h2] - Exercise 3: Statistical Analysis
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 7 lines] - ex3
#> ├── Heading [h2] - Exercise 4: Advanced Data Manipulation
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 5 lines] - ex4
#> └── Heading [h2] - Bonus Exercise: Conditional Logic
#> ├── Markdown [6 lines]
#> └── Chunk [r, 7 lines] - bonus
Let’s see what the student version looks like as a document:
# Convert to document and display first few sections
as_document(student_version) |>
cat(sep = "\n")
#> ---
#> title: Homework 3 - Data Analysis with R
#> author: Your Name
#> date: 'Due: Friday, March 15, 2024'
#> format: html
#> execute:
#> warning: false
#> message: false
#> ---
#>
#> ## Setup
#>
#> Load the required packages for this assignment:
#>
#>
#> ```{r setup}
#> library(tidyverse)
#> library(palmerpenguins)
#> ```
#>
#> ## Exercise 1: Basic Data Exploration
#>
#> Examine the `penguins` dataset from the `palmerpenguins` package. Your task is to create a summary of the dataset that shows the number of observations and variables, and identify any missing values.
#>
#>
#> ```{r ex1}
#> # Write your code here to:
#> # 1. Display the dimensions of the penguins dataset
#> # 2. Show the structure of the dataset
#> # 3. Count missing values in each column
#>
#> ```
#>
#> ## Exercise 2: Data Visualization
#>
#> Create a scatter plot showing the relationship between flipper length and body mass for penguins. Color the points by species and add appropriate labels and a title.
#>
#>
#> ```{r ex2}
#> # Create a scatter plot with:
#> # - x-axis: flipper_length_mm
#> # - y-axis: body_mass_g
#> # - color by species
#> # - add appropriate labels and title
#>
#> ggplot(data = penguins, aes(x = ___, y = ___)) +
#> geom_point(aes(color = ___)) +
#> labs(
#> title = "___",
#> x = "___",
#> y = "___"
#> )
#> ```
#>
#> ## Exercise 3: Statistical Analysis
#>
#> Calculate summary statistics for bill length by species. Create a table showing the mean, median, standard deviation, and count for each species.
#>
#>
#> ```{r ex3}
#> # Calculate summary statistics for bill_length_mm by species
#> # Include: mean, median, standard deviation, and count
#> # Remove missing values before calculating
#>
#> penguins %>%
#> # Add your code here
#>
#> ```
#>
#> ## Exercise 4: Advanced Data Manipulation
#>
#> Filter the dataset to include only penguins with complete data (no missing values), then create a new variable called `bill_ratio` that represents the ratio of bill length to bill depth. Finally, identify which species has the highest average bill ratio.
#>
#>
#> ```{r ex4}
#> # Step 1: Filter for complete cases
#> # Step 2: Create bill_ratio variable (bill_length_mm / bill_depth_mm)
#> # Step 3: Calculate average bill_ratio by species
#> # Step 4: Identify species with highest average ratio
#>
#> ```
#>
#> ## Bonus Exercise: Conditional Logic
#>
#> Write a function that categorizes penguins as "small", "medium", or "large" based on their body mass. Use the following criteria:
#> - Small: body mass < 3500g
#> - Medium: body mass between 3500g and 4500g
#> - Large: body mass > 4500g
#>
#> Apply this function to create a new column and create a summary table.
#>
#>
#> ```{r bonus}
#> # Create a function to categorize penguins by size
#> categorize_size = function(mass) {
#> # Add your conditional logic here
#>
#> }
#>
#> # Apply the function and create summary
#> ```
We can also save this to a file:
# Save student version (not run in vignette)
as_document(student_version) |>
writeLines("homework-student.qmd")
Creating the Instructor Key
For the instructor key, we want to:
- Keep all markdown content for context
- Keep only the solution chunks (those with
-key
suffix) - Remove all student chunks
# Select solution chunks and all non-chunk content
instructor_key = rmd |>
rmd_select(
# Again this is easier to specify the nodes we want to remove
!has_label("*-student")
)
# Display the instructor key structure
instructor_key
#> ├── YAML [5 fields]
#> ├── Heading [h2] - Setup
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 2 lines] - setup
#> ├── Heading [h2] - Exercise 1: Basic Data Exploration
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 12 lines] - ex1-key
#> ├── Heading [h2] - Exercise 2: Data Visualization
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 11 lines] - ex2-key
#> ├── Heading [h2] - Exercise 3: Statistical Analysis
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 12 lines] - ex3-key
#> ├── Heading [h2] - Exercise 4: Advanced Data Manipulation
#> │ ├── Markdown [1 line]
#> │ └── Chunk [r, 25 lines] - ex4-key
#> └── Heading [h2] - Bonus Exercise: Conditional Logic
#> ├── Markdown [6 lines]
#> └── Chunk [r, 25 lines] - bonus-key
Let’s examine the instructor key document:
# Convert to document
instructor_doc = as_document(instructor_key)
# Display first part of the document
cat(head(strsplit(instructor_doc, "\n")[[1]], 50), sep = "\n")
#> ---
Creating a Minimalist Key
Sometimes instructors may want a very streamlined version that contains only the solution code without all the instructional text.
We can create this by:
- Keeping only exercise headings and solution chunks
- Removing all markdown instructions
- Setting
#| include: false
for thesetup
chunk
# Select only headings and solution chunks
minimalist_key = rmd |>
rmd_select(
# Keep yaml and exercise headings for structure
has_type("rmd_yaml"),
has_heading(c("Exercise *", "Bonus*")),
# Keep only solution chunks
has_label(c("*-key", "setup"))
) |>
rmd_modify(
function(node) {
rmd_node_options(node) = list(include = FALSE)
node
},
has_label("setup")
)
# Display the minimalist key structure
minimalist_key
#> ├── YAML [5 fields]
#> ├── Chunk [r, 2 lines] - setup
#> ├── Heading [h2] - Exercise 1: Basic Data Exploration
#> │ └── Chunk [r, 12 lines] - ex1-key
#> ├── Heading [h2] - Exercise 2: Data Visualization
#> │ └── Chunk [r, 11 lines] - ex2-key
#> ├── Heading [h2] - Exercise 3: Statistical Analysis
#> │ └── Chunk [r, 12 lines] - ex3-key
#> ├── Heading [h2] - Exercise 4: Advanced Data Manipulation
#> │ └── Chunk [r, 25 lines] - ex4-key
#> └── Heading [h2] - Bonus Exercise: Conditional Logic
#> └── Chunk [r, 25 lines] - bonus-key
# Convert to document
minimalist_doc = as_document(minimalist_key)
cat(minimalist_doc, sep = "\n")
#> ---
#> title: Homework 3 - Data Analysis with R
#> author: Your Name
#> date: 'Due: Friday, March 15, 2024'
#> format: html
#> execute:
#> warning: false
#> message: false
#> ---
#>
#> ```{r setup}
#> #| include: false
#> library(tidyverse)
#> library(palmerpenguins)
#> ```
#>
#> ## Exercise 1: Basic Data Exploration
#>
#> ```{r ex1-key}
#> # Solution: Basic data exploration
#> # 1. Display dimensions
#> cat("Dataset dimensions:", dim(penguins), "\n")
#> cat("Rows:", nrow(penguins), "Columns:", ncol(penguins), "\n\n")
#>
#> # 2. Show structure
#> str(penguins)
#>
#> # 3. Count missing values
#> cat("\nMissing values by column:\n")
#> penguins %>%
#> summarise(across(everything(), ~ sum(is.na(.))))
#> ```
#>
#> ## Exercise 2: Data Visualization
#>
#> ```{r ex2-key}
#> # Solution: Scatter plot of flipper length vs body mass
#> ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
#> geom_point(aes(color = species), alpha = 0.8, size = 2) +
#> labs(
#> title = "Penguin Flipper Length vs Body Mass by Species",
#> x = "Flipper Length (mm)",
#> y = "Body Mass (g)",
#> color = "Species"
#> ) +
#> theme_minimal() +
#> scale_color_viridis_d()
#> ```
#>
#> ## Exercise 3: Statistical Analysis
#>
#> ```{r ex3-key}
#> # Solution: Summary statistics for bill length by species
#> penguins %>%
#> filter(!is.na(bill_length_mm)) %>%
#> group_by(species) %>%
#> summarise(
#> count = n(),
#> mean_bill_length = round(mean(bill_length_mm), 2),
#> median_bill_length = round(median(bill_length_mm), 2),
#> sd_bill_length = round(sd(bill_length_mm), 2),
#> .groups = "drop"
#> ) %>%
#> arrange(desc(mean_bill_length))
#> ```
#>
#> ## Exercise 4: Advanced Data Manipulation
#>
#> ```{r ex4-key}
#> # Solution: Advanced data manipulation
#> complete_penguins = penguins %>%
#> # Remove rows with any missing values
#> filter(complete.cases(.)) %>%
#> # Create bill_ratio variable
#> mutate(bill_ratio = bill_length_mm / bill_depth_mm)
#>
#> # Calculate average bill ratio by species
#> bill_ratio_summary = complete_penguins %>%
#> group_by(species) %>%
#> summarise(
#> avg_bill_ratio = round(mean(bill_ratio), 3),
#> n = n(),
#> .groups = "drop"
#> ) %>%
#> arrange(desc(avg_bill_ratio))
#>
#> print(bill_ratio_summary)
#>
#> # Identify species with highest average bill ratio
#> highest_ratio_species = bill_ratio_summary %>%
#> slice_max(avg_bill_ratio, n = 1) %>%
#> pull(species)
#>
#> cat("\nSpecies with highest average bill ratio:", as.character(highest_ratio_species))
#> ```
#>
#> ## Bonus Exercise: Conditional Logic
#>
#> ```{r bonus-key}
#> # Solution: Conditional logic for size categorization
#> categorize_size = function(mass) {
#> case_when(
#> is.na(mass) ~ "Unknown",
#> mass < 3500 ~ "Small",
#> mass >= 3500 & mass <= 4500 ~ "Medium",
#> mass > 4500 ~ "Large"
#> )
#> }
#>
#> # Apply the function and create summary
#> penguins_with_size = penguins %>%
#> mutate(size_category = categorize_size(body_mass_g))
#>
#> # Create summary table
#> size_summary = penguins_with_size %>%
#> count(species, size_category) %>%
#> pivot_wider(names_from = size_category, values_from = n, values_fill = 0)
#>
#> print(size_summary)
#>
#> # Overall size distribution
#> penguins_with_size %>%
#> count(size_category) %>%
#> mutate(percentage = round(n / sum(n) * 100, 1))
#> ```
Best Practices
When creating homework assignments for processing with parsermd
, consider these best practices:
- Clear Structure: Use headings to organize exercises and maintain hierarchy
-
Meaningful Labels: Use descriptive chunk labels that identify the document components and their type (e.g.,
ex1-student
,ex2-key
) - Testing: Always test the generated versions to ensure they work correctly and you haven’t lost anything important (e.g. your YAML front matter or your setup chunk)