ast = rmd_ast(list(
rmd_yaml(list(title = "Example Document")),
rmd_heading(name = "Introduction", level = 1L),
rmd_markdown(lines = "This is some text."),
rmd_chunk(
engine = "r",
code = c("x <- 1:5", "mean(x)")
)
))
Introduction
The parsermd package parses R Markdown and Quarto documents into an Abstract Syntax Tree (AST) representation. This vignette introduces the different types of AST nodes and their properties, helping you understand how parsermd represents document structure.
AST Container - rmd_ast
The rmd_ast
object serves as the container for all parsed document nodes. It holds a linear sequence of nodes representing different document elements, where each node type corresponds to a specific R Markdown or Quarto construct (headings, code chunks, text, etc.).
Important: The AST represents documents as a linear sequence of nodes, not a nested tree structure. This means that structural elements like fenced divs are represented as separate opening and closing nodes in the sequence, rather than as nodes with children.
The default print method for rmd_ast
’s (flat = FALSE
) presents an implicit tree structure based on heading levels. This provides a hierarchical view that reflects the document’s logical organization, where content is grouped under headings based on their level.
Properties:
-
nodes
: A list containing all the parsed nodes in document order
Example:
Raw text that would be parsed:
---
title: "Example Document"
---
# Introduction
This is some text.
```{r}
<- 1:5
x mean(x)
```
This would create an rmd_ast
object containing:
-
rmd_yaml
node with the title -
rmd_heading
node with “Introduction” -
rmd_markdown
node with “This is some text.” -
rmd_chunk
node with the R code
Programmatic creation:
Hierarchical view (flat = FALSE
):
print(ast, flat = FALSE)
#> ├── YAML [1 field]
#> └── Heading [h1] - Introduction
#> ├── Markdown [1 line]
#> └── Chunk [r, 2 lines] -
Linear view (flat = TRUE
):
print(ast, flat = TRUE)
#> ├── YAML [1 field]
#> ├── Heading [h1] - Introduction
#> ├── Markdown [1 line]
#> └── Chunk [r, 2 lines] -
Core Node Types
Document Structure Nodes
YAML Header - rmd_yaml
The rmd_yaml
node represents YAML front matter at the beginning of documents.
Properties:
-
yaml
: List containing the parsed YAML content
Example:
Raw text that would be parsed:
---
title: "My Document"
author: "John Doe"
date: "2023-01-01"
---
Programmatic creation:
Markdown Headings - rmd_heading
The rmd_heading
node represents section headings in markdown.
Properties:
-
name
: Character string containing the heading text -
level
: Integer from 1-6 indicating the heading level (# = 1, ## = 2, etc.)
Example:
Raw text that would be parsed:
# Introduction
Programmatic creation:
heading_node = rmd_heading(
name = "Introduction",
level = 1L
)
heading_node
#> <rmd_heading>
#> @ name : chr "Introduction"
#> @ level: int 1
Markdown Text - rmd_markdown
The rmd_markdown
node represents plain markdown text content.
Properties:
-
lines
: Character vector containing the markdown text lines
Example:
Raw text that would be parsed:
This is a paragraph. With multiple lines.
Programmatic creation:
markdown_node = rmd_markdown(
lines = c("This is a paragraph.", "With multiple lines.")
)
markdown_node
#> <rmd_markdown>
#> @ lines: chr [1:2] "This is a paragraph." "With multiple lines."
Code and Execution Nodes
Executable Code Chunks - rmd_chunk
The rmd_chunk
node represents executable code chunks with options and metadata.
Properties:
-
engine
: The code engine (default: “r”) -
name
: Optional chunk name/label -
options
: List of chunk options containing both traditional and YAML options -
code
: Character vector containing the code lines -
indent
: Indentation string -
n_ticks
: Number of backticks used (default: 3)
Chunk Option Formats:
Chunks support two option formats that can be used independently or together:
Traditional format: Options specified in the chunk header after the engine and label ```markdown
YAML format: Options specified as YAML comments within the chunk ```markdown
Option Conflict Resolution:
When the same option is specified in both formats, YAML options take precedence over traditional options. A warning is emitted when conflicts occur:
::: {.cell}
:::
In this case, eval: false
(YAML) wins over eval=TRUE
(traditional), and the parser emits: “YAML options override traditional options for: eval”
Type Handling:
-
Traditional options: Always stored as strings (e.g.,
"TRUE"
,"5"
) -
YAML options: Preserve proper R types (e.g.,
TRUE
,5L
,3.14
)
Examples:
Traditional format chunk:
```{r example, eval=TRUE, echo=FALSE}
<- 1:10
x mean(x)
```
YAML format chunk:
```{r example}
#| eval: true
#| echo: false
<- 1:10
x mean(x)
```
Mixed format chunk (with conflict):
```{r example, eval=TRUE}
#| eval: false
#| message: false
<- 1:10
x mean(x)
```
In this case, eval: false
(YAML) overrides eval=TRUE
(traditional).
Programmatic creation:
# Traditional-style options
chunk_node_traditional = rmd_chunk(
engine = "r",
name = "example",
options = list(eval = "TRUE", echo = "FALSE"),
code = c("x <- 1:10", "mean(x)")
)
# YAML-style options with proper types
chunk_node_yaml = rmd_chunk(
engine = "r",
name = "example",
options = list(eval = TRUE, echo = FALSE),
code = c("x <- 1:10", "mean(x)")
)
chunk_node_yaml
#> <rmd_chunk>
#> @ engine : chr "r"
#> @ name : chr "example"
#> @ options:List of 2
#> .. $ eval: logi TRUE
#> .. $ echo: logi FALSE
#> @ code : chr [1:2] "x <- 1:10" "mean(x)"
#> @ indent : chr ""
#> @ n_ticks: int 3
Raw Output Chunks - rmd_raw_chunk
The rmd_raw_chunk
node represents raw output chunks for specific formats.
Properties:
-
format
: The output format (e.g., “html”, “latex”) -
code
: Character vector containing the raw content -
indent
: Indentation string -
n_ticks
: Number of backticks used
Example:
Raw text that would be parsed:
```{=html}
<div class='custom'>
<p>Custom HTML content</p>
</div>
```
Programmatic creation:
raw_chunk_node = rmd_raw_chunk(
format = "html",
code = c(
"<div class='custom'>",
" <p>Custom HTML content</p>",
"</div>"
)
)
raw_chunk_node
#> <rmd_raw_chunk>
#> @ format : chr "html"
#> @ code : chr [1:3] "<div class='custom'>" " <p>Custom HTML content</p>" ...
#> @ indent : chr ""
#> @ n_ticks: int 3
Fenced Code Blocks - rmd_code_block
The rmd_code_block
node represents non-executable fenced code blocks.
Properties:
-
attr
: Attributes string (language, classes, etc.) -
code
: Character vector containing the code lines -
indent
: Indentation string -
n_ticks
: Number of backticks used
Example:
Raw text that would be parsed:
```python
def hello():
print('Hello, World!')
```
Programmatic creation:
code_block_node = rmd_code_block(
attr = "python",
code = c(
"def hello():",
" print('Hello, World!')"
)
)
code_block_node
#> <rmd_code_block>
#> @ attr : chr "python"
#> @ code : chr [1:2] "def hello():" " print('Hello, World!')"
#> @ indent : chr ""
#> @ n_ticks: int 3
Inline Elements
Inline Code - rmd_inline_code
The rmd_inline_code
node represents inline code expressions.
Properties:
-
engine
: The code engine (empty string for static code) -
code
: The inline code content -
braced
: Whether the code uses braced syntax -
start
: Starting position in the source -
length
: Length of the inline code
Example:
Raw text that would be parsed:
The result is 4.
Programmatic creation:
inline_code_node = rmd_inline_code(
engine = "r",
code = "2 + 2",
braced = FALSE
)
inline_code_node
#> rmd_inline_code[-1,-1] `r 2 + 2`
Shortcode Function Calls - rmd_shortcode
The rmd_shortcode
node represents shortcode function calls (Quarto feature).
Properties:
-
func
: The shortcode function name -
args
: Character vector of arguments -
start
: Starting position in the source -
length
: Length of the shortcode
Example:
Raw text that would be parsed:
{{< embed type=video src=example.mp4 >}}
Programmatic creation:
shortcode_node = rmd_shortcode(
func = "embed",
args = c(
"type=video",
"src=example.mp4"
)
)
shortcode_node
#> rmd_shortcode[-1,-1] {{< embed type=video src=example.mp4 >}}
Structural Elements
Fenced Divs - rmd_fenced_div_open
& rmd_fenced_div_close
Fenced divs are represented as pairs of nodes in the linear AST sequence. The rmd_fenced_div_open
node marks the beginning of a fenced div block, and the rmd_fenced_div_close
node marks the end. Any content between these nodes is considered to be inside the fenced div.
rmd_fenced_div_open Properties:
-
attr
: Character vector of div attributes
rmd_fenced_div_close Properties: None (just a marker)
Example:
Raw text that would be parsed:
::: {.warning #important}
This content is inside the fenced div.
More content here. :::
This would create a sequence of nodes: 1. rmd_fenced_div_open
with attributes 2. rmd_markdown
with “This content is inside the fenced div.” 3. rmd_markdown
with “More content here.” 4. rmd_fenced_div_close
Programmatic creation:
# Create the opening node
fenced_div_open_node = rmd_fenced_div_open(
attr = c("class=warning", "id=important")
)
# Create the closing node
fenced_div_close_node = rmd_fenced_div_close()
# These would typically be combined with content nodes in an rmd_ast
ast_with_div = rmd_ast(list(
fenced_div_open_node,
rmd_markdown(
lines = "This content is inside the fenced div."
),
rmd_markdown(
lines = "More content here."
),
fenced_div_close_node
))