Custom Parsers
Weave handles CSV, JSON, JSONL, YAML, TOML, INI, XML, and HTML out of the box with sensible defaults. This page covers both configuring the builtin parsers when the defaults don’t fit, and writing your own parsers from scratch for custom formats.
Configuring Builtin Parsers
When you need different behavior from a builtin format — a pipe-separated CSV, headerless data, pretty-printed JSON — use builder functions. Each builder returns a configured parser or formatter function that works as a drop-in replacement for format symbols in read() and write().
# Default behavior — format symbol
data = read("data.csv", :csv)
# Configured behavior — builder function
my_parser = csv_parser(separator: "|", headers: false)
data = read("data.psv", my_parser)CSV
CSV has the most configuration options. Use csv_parser for reading and csv_formatter for writing.
csv_parser options:
| Option | Default | Description |
|---|---|---|
separator |
"," |
Field delimiter |
headers |
true |
First row contains column names |
quote |
"\"" |
Quote character for fields |
escape |
"\"" |
Escape character within quoted fields |
comment |
none | Comment line prefix character |
trim |
false |
Trim whitespace from fields |
# Read a pipe-separated file with no headers
parser = csv_parser(separator: "|", headers: false)
rows = read("data.psv", parser)
# rows is a container of containers (list of lists — no headers means no keys)
# Write pipe-separated output
formatter = csv_formatter(separator: "|")
write("output.psv", data, formatter)# Read a CSV with comment lines and trimmed fields
parser = csv_parser(comment: "#", trim: true)
data = read("messy.csv", parser)JSON
json_parser() has no configuration options.
json_formatter controls output style:
| Option | Default | Description |
|---|---|---|
pretty |
false |
Pretty-print with indentation |
indent |
2 |
Spaces per indent level (when pretty is true) |
# Pretty-print JSON output
formatter = json_formatter(pretty: true, indent: 4)
write("config.json", data, formatter)YAML
yaml_parser() has no configuration options.
yaml_formatter controls indentation:
| Option | Default | Description |
|---|---|---|
indent |
2 |
Spaces per indent level |
formatter = yaml_formatter(indent: 4)
write("config.yaml", data, formatter)TOML
toml_parser() and toml_formatter() have no configuration options. They exist for API consistency — the symbols map to these functions under the hood - but there’s nothing to configure here.
INI
ini_parser handles files with different comment and delimiter conventions:
| Option | Default | Description |
|---|---|---|
comment |
";" and "#" |
Comment prefix characters |
delimiter |
"=" |
Key-value delimiter |
ini_formatter controls output format:
| Option | Default | Description |
|---|---|---|
delimiter |
"=" |
Key-value delimiter |
# Parse an INI file that uses : instead of =
parser = ini_parser(delimiter: ":")
config = read("app.conf", parser)
# Write it back with the same convention
formatter = ini_formatter(delimiter: ":")
write("app.conf", config, formatter)XML
xml_parser handles XML documents:
| Option | Default | Description |
|---|---|---|
attr_prefix |
"@" |
Prefix for attribute keys in the result |
text_key |
"#text" |
Key for text content nodes |
collapse_text |
true |
Collapse single text children to string values |
trim_text |
true |
Trim whitespace from text nodes |
xml_formatter controls XML output:
| Option | Default | Description |
|---|---|---|
attr_prefix |
"@" |
Prefix identifying attribute keys |
text_key |
"#text" |
Key for text content |
pretty |
true |
Pretty-print with indentation |
indent |
2 |
Spaces per indent level |
root_name |
"root" |
Root element name (when data has no natural root) |
declaration |
true |
Include XML declaration header |
# Parse XML
parser = xml_parser()
doc = read("books.xml", parser)
# Write XML with custom formatting
formatter = xml_formatter(indent: 4, declaration: false)
write("output.xml", data, formatter)HTML
html_parser is a forgiving parser that handles malformed HTML gracefully — missing closing tags, unquoted attributes, etc.
| Option | Default | Description |
|---|---|---|
attr_prefix |
"@" |
Prefix for attribute keys |
text_key |
"#text" |
Key for text content |
collapse_text |
true |
Collapse single text children |
trim_text |
true |
Trim whitespace from text |
parser = html_parser()
page = read("index.html", parser)
# Works even with messy, real-world HTMLWriting Your Own Parsers
For formats Weave doesn’t handle natively — log files, fixed-width data, proprietary formats — you can write your own parser function and pass it to read().
How It Works
The read() function accepts either a format symbol or a function:
# Built-in format
data = read("config.json", :json)
# Custom parser
data = read("data.log", my_parser)When you pass a function, Weave calls it with the raw file contents and expects a Container back.
Your First Custom Parser
Parse a key-value log format separated by ---:
timestamp=2024-01-15T10:30:00
level=INFO
message=Server started
---
timestamp=2024-01-15T10:30:05
level=DEBUG
message=Connection acceptedfn parse_log_entries(raw_text) {
entries = []
blocks = raw_text.split("---")
blocks *> ^(block) {
entry = []
split(trim(block), "\n") *> ^(line) {
if line.len > 0 {
parts = split(line, "=") *> trim
entry[parts[0]] = parts[1]
}
}
if entry.len > 0 { entries << entry }
}
entries
}
# Use it
logs = read("server.log", parse_log_entries)
logs *> ^(entry) {
if entry[:level] == "ERROR" {
puts("Error at " + entry[:timestamp] + ": " + entry[:message])
}
}Parsing Fixed-Width Data
Many legacy systems export fixed-width files:
John Smith 42 Engineer 75000
Alice Johnson 35 Manager 92000
Bob Williams 28 Developer 68000fn parse_fixed_width(raw_text) {
fields = [
[start: 0, length: 17, name: :name],
[start: 17, length: 4, name: :age],
[start: 21, length: 14, name: :title],
[start: 35, length: 6, name: :salary]
]
records = []
split(raw_text, "\n") *> ^(line) {
if line.len > 0 {
record = []
fields *> ^(f) {
value = line.substr(f[:start], f[:length]).trim()
if f[:name] == :age || f[:name] == :salary {
value = value.to_num()
}
record[f[:name]] = value
}
records << record
}
}
records
}
employees = read("employees.dat", parse_fixed_width)
fn sum(v, acc: 0) { acc + v }
high_earners = employees *> ^(e) { if e[:salary] > 80000 { e } }
total = high_earners *> ^(e) { e[:salary] } &> sumParser Factories
In Weave, we return closures to build configurable functions for pipelines - and parsers:
fn make_delimited_parser(delimiter) {
^(raw_text) {
lines = split(raw_text, "\n")
headers = split(lines[0], delimiter) *> ^(h) { h.trim() }
records = []
i = 1
while i < lines.len {
if lines[i].len > 0 {
values = split(lines[i], delimiter)
record = []
j = 0
while j < headers.len {
record[headers[j]] = values[j].trim()
j += 1
}
records << record
}
i += 1
}
records
}
}
# Create parsers for different delimiters
parse_pipe = make_delimited_parser("|")
parse_tab = make_delimited_parser("\t")
parse_semicolon = make_delimited_parser(";")
pipe_data = read("data.psv", parse_pipe)
tab_data = read("data.tsv", parse_tab)