URN Syntax
Format, parsing rules, and canonical form
Format
A capability URN is a string that starts with a scheme prefix followed by
semicolon-separated tags. Each tag is a key-value pair joined by =.
cap:key1=value1;key2=value2;key3=value3
The scheme prefix is cap: for capability URNs and media:
for media URNs. The prefix is case-insensitive when parsing (CAP: and
cap: are the same URN) but always lowercase in canonical form.
Tags are separated by semicolons. A trailing semicolon is allowed when parsing but omitted in canonical form:
# These parse to the same URN: cap:op=extract;format=pdf cap:op=extract;format=pdf; # Canonical form (no trailing semicolon): cap:format=pdf;op=extract
Keys
Tag keys use lowercase alphanumeric characters, hyphens, and underscores.
Keys are always normalized to lowercase during parsing. If you write
Format=pdf, the key becomes format.
Each key can appear at most once in a URN. Duplicate keys are a parse error.
# Valid keys: op, format, target, model_name, content-type # Invalid keys (parse error): cap:op=extract;op=transform (duplicate key "op")
Values
Unquoted values can contain alphanumeric characters, hyphens, underscores, dots, slashes, colons, and the plus sign. They are normalized to lowercase.
# These are equivalent after normalization: cap:op=Extract → cap:op=extract cap:format=PDF → cap:format=pdf
Values can also be the special pattern characters *, !,
and ?, which have specific meanings in matching (see
Matching).
Quoting
Values that contain special characters or uppercase letters that must be preserved
need double quotes. Inside quotes, any character is allowed. The only escape
sequences are \" (literal quote) and \\ (literal backslash).
# Quoting is required when: # - The value contains semicolons, equals signs, or quotes cap:query="SELECT * FROM docs";format=json # - The value contains spaces cap:label="my label";op=test # - You need to preserve uppercase cap:path="/usr/Local/Bin"
Quoted values preserve their case exactly. The value "PDF" stays
PDF, while the unquoted value PDF becomes pdf.
Smart quoting in canonical form
When a URN is serialized to canonical form, values are quoted only when necessary. A value that is entirely lowercase alphanumeric with hyphens, underscores, dots, slashes, and colons does not need quotes and is written without them.
# Input with unnecessary quotes: cap:op="extract";format="pdf" # Canonical form (quotes removed because values don't need them): cap:format=pdf;op=extract
Canonical Form
Every URN has exactly one canonical form. Two URNs that represent the same capability produce identical canonical strings. Canonical form applies these transformations:
- Scheme prefix is lowercase (
cap:) - Tags are sorted alphabetically by key
- Keys are lowercase
- Unquoted values are lowercase
- Values are quoted only when necessary (smart quoting)
- No trailing semicolon
- Value-less tags are written without
=*
# Before canonicalization: CAP:Op=Extract;FORMAT=pdf;Target=text; # After canonicalization: cap:format=pdf;op=extract;target=text
Canonical form matters for storage and display. It does not affect matching — the matching algorithm works on parsed tag sets, not on strings.
Case Normalization
Case handling follows two rules:
- Keys and unquoted values are normalized to lowercase.
Format=PDFbecomesformat=pdf. - Quoted values preserve their case exactly.
path="/Usr/Local"stayspath="/Usr/Local".
This means case is only significant when you explicitly quote a value. If you don't quote, everything is case-insensitive.
Special Values
Four value types control how tags participate in matching:
| Syntax | Name | Meaning |
|---|---|---|
K=v |
Exact value | The tag must be present with exactly this value |
K=* |
Must-have-any | The tag must be present, but any value is accepted |
K=! |
Must-not-have | The tag must be absent entirely |
K=? |
Unspecified | No constraint. The tag may or may not be present |
When a tag key is missing from a URN entirely, it behaves like K=?
(no constraint). The two are equivalent.
Examples
# Exact: must have format=pdf cap:format=pdf;op=extract # Must-have-any: must have format, don't care what value cap:format=*;op=extract # Must-not-have: must NOT have a debug tag cap:debug=!;op=extract # Unspecified: no constraint on format (same as omitting it) cap:format=?;op=extract
Value-less Tags
A tag written without = is shorthand for =*
(must-have-any). This is useful when you want to require a tag's presence
without constraining its value.
# These are equivalent: cap:image;op=classify cap:image=*;op=classify
In canonical form, K=* is always written as just K
(the value-less form).
Writing K= (key, equals sign, no value) is a parse error. Use
K or K=* instead.
Direction Specifiers
Capability URNs have two required tags: in and out.
Their values are Media URNs describing the input and output data formats.
Because Media URNs contain colons and semicolons, they must be quoted:
cap:in="media:pdf;bytes";op=extract;out="media:text;utf8"
The in value describes what data the capability accepts.
The out value describes what data it produces. These are
not matched by exact string comparison — they use semantic subtype
matching, explained in Matching.
Media URNs
Media URNs follow the same tag syntax as capability URNs but use the
media: prefix. They describe data formats using value-less
tags as markers:
media:bytes # raw bytes media:pdf;bytes # PDF file as bytes media:image;png;bytes # PNG image as bytes media:text;utf8 # UTF-8 text media:image;png;bytes;thumbnail # PNG thumbnail as bytes
Media URNs typically use value-less tags (markers) rather than key-value pairs, though key-value tags are allowed. The marker count determines specificity for direction matching.
Parse Errors
The following conditions produce parse errors, with consistent error codes across all implementations:
- Invalid format — missing or wrong scheme prefix
- Duplicate key — same key appears more than once
- Empty key — a tag starts with
= - Empty value —
key=with no value after the equals sign - Invalid character — character not allowed in unquoted position
- Unterminated quote — opening
"without closing" - Invalid escape sequence —
\followed by something other than"or\
Complete Examples
# PDF text extraction capability cap:in="media:pdf;bytes";op=extract;out="media:text;utf8" # Spanish translation cap:in="media:text;utf8";language=es;op=translate;out="media:text;utf8" # Image classification with specific model cap:in="media:image;bytes";model=resnet;op=classify;out="media:object;json" # Thumbnail generation cap:in="media:image;bytes";op=generate_thumbnail;out="media:image;png;bytes;thumbnail" # A pattern that matches any extraction capability cap:in;op=extract;out # A pattern requiring format=pdf but no debug tag cap:debug=!;format=pdf;op=extract