Custom Parsers

Documentation

Learn Weave

Custom Parsers

Weave handles CSV, JSON, JSONL, YAML, TOML, INI, XML, and HTML out of the box with sensible defaults. This page covers both configuring the builtin parsers when the defaults don’t fit, and writing your own parsers from scratch for custom formats.

Configuring Builtin Parsers

When you need different behavior from a builtin format — a pipe-separated CSV, headerless data, pretty-printed JSON — use builder functions. Each builder returns a configured parser or formatter function that works as a drop-in replacement for format symbols in read() and write().

# Default behavior — format symbol
data = read("data.csv", :csv)

# Configured behavior — builder function
my_parser = csv_parser(separator: "|", headers: false)
data = read("data.psv", my_parser)

CSV

CSV has the most configuration options. Use csv_parser for reading and csv_formatter for writing.

csv_parser options:

Option	Default	Description
`separator`	`","`	Field delimiter
`headers`	`true`	First row contains column names
`quote`	`"\""`	Quote character for fields
`escape`	`"\""`	Escape character within quoted fields
`comment`	none	Comment line prefix character
`trim`	`false`	Trim whitespace from fields

# Read a pipe-separated file with no headers
parser = csv_parser(separator: "|", headers: false)
rows = read("data.psv", parser)
# rows is a container of containers (list of lists — no headers means no keys)

# Write pipe-separated output
formatter = csv_formatter(separator: "|")
write("output.psv", data, formatter)

# Read a CSV with comment lines and trimmed fields
parser = csv_parser(comment: "#", trim: true)
data = read("messy.csv", parser)

JSON

json_parser() has no configuration options.

json_formatter controls output style:

Option	Default	Description
`pretty`	`false`	Pretty-print with indentation
`indent`	`2`	Spaces per indent level (when `pretty` is true)

# Pretty-print JSON output
formatter = json_formatter(pretty: true, indent: 4)
write("config.json", data, formatter)

YAML

yaml_parser() has no configuration options.

yaml_formatter controls indentation:

Option	Default	Description
`indent`	`2`	Spaces per indent level

formatter = yaml_formatter(indent: 4)
write("config.yaml", data, formatter)

TOML

toml_parser() and toml_formatter() have no configuration options. They exist for API consistency — the symbols map to these functions under the hood - but there’s nothing to configure here.

INI

ini_parser handles files with different comment and delimiter conventions:

Option	Default	Description
`comment`	`";"` and `"#"`	Comment prefix characters
`delimiter`	`"="`	Key-value delimiter

ini_formatter controls output format:

Option	Default	Description
`delimiter`	`"="`	Key-value delimiter

# Parse an INI file that uses : instead of =
parser = ini_parser(delimiter: ":")
config = read("app.conf", parser)

# Write it back with the same convention
formatter = ini_formatter(delimiter: ":")
write("app.conf", config, formatter)

XML

xml_parser handles XML documents:

Option	Default	Description
`attr_prefix`	`"@"`	Prefix for attribute keys in the result
`text_key`	`"#text"`	Key for text content nodes
`collapse_text`	`true`	Collapse single text children to string values
`trim_text`	`true`	Trim whitespace from text nodes

xml_formatter controls XML output:

Option	Default	Description
`attr_prefix`	`"@"`	Prefix identifying attribute keys
`text_key`	`"#text"`	Key for text content
`pretty`	`true`	Pretty-print with indentation
`indent`	`2`	Spaces per indent level
`root_name`	`"root"`	Root element name (when data has no natural root)
`declaration`	`true`	Include XML declaration header

# Parse XML
parser = xml_parser()
doc = read("books.xml", parser)

# Write XML with custom formatting
formatter = xml_formatter(indent: 4, declaration: false)
write("output.xml", data, formatter)

HTML

html_parser is a forgiving parser that handles malformed HTML gracefully — missing closing tags, unquoted attributes, etc.

Option	Default	Description
`attr_prefix`	`"@"`	Prefix for attribute keys
`text_key`	`"#text"`	Key for text content
`collapse_text`	`true`	Collapse single text children
`trim_text`	`true`	Trim whitespace from text

parser = html_parser()
page = read("index.html", parser)
# Works even with messy, real-world HTML

Writing Your Own Parsers

For formats Weave doesn’t handle natively — log files, fixed-width data, proprietary formats — you can write your own parser function and pass it to read().

How It Works

The read() function accepts either a format symbol or a function:

# Built-in format
data = read("config.json", :json)

# Custom parser
data = read("data.log", my_parser)

When you pass a function, Weave calls it with the raw file contents and expects a Container back.

Your First Custom Parser

Parse a key-value log format separated by ---:

timestamp=2024-01-15T10:30:00
level=INFO
message=Server started
---
timestamp=2024-01-15T10:30:05
level=DEBUG
message=Connection accepted

fn parse_log_entries(raw_text) {
    entries = []
    blocks = raw_text.split("---")

    blocks *> ^(block) {
        entry = []
        split(trim(block), "\n") *> ^(line) {
            if line.len > 0 {
                parts = split(line, "=") *> trim
                entry[parts[0]] = parts[1]
            }
        }
        if entry.len > 0 { entries << entry }
    }

    entries
}

# Use it
logs = read("server.log", parse_log_entries)

logs *> ^(entry) {
    if entry[:level] == "ERROR" {
        puts("Error at " + entry[:timestamp] + ": " + entry[:message])
    }
}

Parsing Fixed-Width Data

Many legacy systems export fixed-width files:

John Smith       42  Engineer      75000
Alice Johnson    35  Manager       92000
Bob Williams     28  Developer     68000

fn parse_fixed_width(raw_text) {
    fields = [
        [start: 0,  length: 17, name: :name],
        [start: 17, length: 4,  name: :age],
        [start: 21, length: 14, name: :title],
        [start: 35, length: 6,  name: :salary]
    ]

    records = []
    split(raw_text, "\n") *> ^(line) {
        if line.len > 0 {
            record = []
            fields *> ^(f) {
                value = line.substr(f[:start], f[:length]).trim()
                if f[:name] == :age || f[:name] == :salary {
                    value = value.to_num()
                }
                record[f[:name]] = value
            }
            records << record
        }
    }
    records
}

employees = read("employees.dat", parse_fixed_width)

fn sum(v, acc: 0) { acc + v }
high_earners = employees *> ^(e) { if e[:salary] > 80000 { e } }
total = high_earners *> ^(e) { e[:salary] } &> sum

Parser Factories

In Weave, we return closures to build configurable functions for pipelines - and parsers:

fn make_delimited_parser(delimiter) {
    ^(raw_text) {
        lines = split(raw_text, "\n")
        headers = split(lines[0], delimiter) *> ^(h) { h.trim() }

        records = []
        i = 1
        while i < lines.len {
            if lines[i].len > 0 {
                values = split(lines[i], delimiter)
                record = []
                j = 0
                while j < headers.len {
                    record[headers[j]] = values[j].trim()
                    j += 1
                }
                records << record
            }
            i += 1
        }
        records
    }
}

# Create parsers for different delimiters
parse_pipe = make_delimited_parser("|")
parse_tab = make_delimited_parser("\t")
parse_semicolon = make_delimited_parser(";")

pipe_data = read("data.psv", parse_pipe)
tab_data = read("data.tsv", parse_tab)

Error Handling Builtin Functions