cap:
Browse Sign In

URN Syntax

Format, parsing rules, and canonical form

Format

A capability URN is a string that starts with a scheme prefix followed by semicolon-separated tags. Each tag is a key-value pair joined by =.

cap:key1=value1;key2=value2;key3=value3

The scheme prefix is cap: for capability URNs and media: for media URNs. The prefix is case-insensitive when parsing (CAP: and cap: are the same URN) but always lowercase in canonical form.

Tags are separated by semicolons. A trailing semicolon is allowed when parsing but omitted in canonical form:

# These parse to the same URN:
cap:op=extract;format=pdf
cap:op=extract;format=pdf;

# Canonical form (no trailing semicolon):
cap:format=pdf;op=extract

Keys

Tag keys use lowercase alphanumeric characters, hyphens, and underscores. Keys are always normalized to lowercase during parsing. If you write Format=pdf, the key becomes format.

Each key can appear at most once in a URN. Duplicate keys are a parse error.

# Valid keys:
op, format, target, model_name, content-type

# Invalid keys (parse error):
cap:op=extract;op=transform   (duplicate key "op")

Values

Unquoted values can contain alphanumeric characters, hyphens, underscores, dots, slashes, colons, and the plus sign. They are normalized to lowercase.

# These are equivalent after normalization:
cap:op=Extract    →  cap:op=extract
cap:format=PDF    →  cap:format=pdf

Values can also be the special pattern characters *, !, and ?, which have specific meanings in matching (see Matching).

Quoting

Values that contain special characters or uppercase letters that must be preserved need double quotes. Inside quotes, any character is allowed. The only escape sequences are \" (literal quote) and \\ (literal backslash).

# Quoting is required when:
# - The value contains semicolons, equals signs, or quotes
cap:query="SELECT * FROM docs";format=json

# - The value contains spaces
cap:label="my label";op=test

# - You need to preserve uppercase
cap:path="/usr/Local/Bin"

Quoted values preserve their case exactly. The value "PDF" stays PDF, while the unquoted value PDF becomes pdf.

Smart quoting in canonical form

When a URN is serialized to canonical form, values are quoted only when necessary. A value that is entirely lowercase alphanumeric with hyphens, underscores, dots, slashes, and colons does not need quotes and is written without them.

# Input with unnecessary quotes:
cap:op="extract";format="pdf"

# Canonical form (quotes removed because values don't need them):
cap:format=pdf;op=extract

Canonical Form

Every URN has exactly one canonical form. Two URNs that represent the same capability produce identical canonical strings. Canonical form applies these transformations:

  1. Scheme prefix is lowercase (cap:)
  2. Tags are sorted alphabetically by key
  3. Keys are lowercase
  4. Unquoted values are lowercase
  5. Values are quoted only when necessary (smart quoting)
  6. No trailing semicolon
  7. Value-less tags are written without =*
# Before canonicalization:
CAP:Op=Extract;FORMAT=pdf;Target=text;

# After canonicalization:
cap:format=pdf;op=extract;target=text

Canonical form matters for storage and display. It does not affect matching — the matching algorithm works on parsed tag sets, not on strings.

Case Normalization

Case handling follows two rules:

  • Keys and unquoted values are normalized to lowercase. Format=PDF becomes format=pdf.
  • Quoted values preserve their case exactly. path="/Usr/Local" stays path="/Usr/Local".

This means case is only significant when you explicitly quote a value. If you don't quote, everything is case-insensitive.

Special Values

Four value types control how tags participate in matching:

SyntaxNameMeaning
K=v Exact value The tag must be present with exactly this value
K=* Must-have-any The tag must be present, but any value is accepted
K=! Must-not-have The tag must be absent entirely
K=? Unspecified No constraint. The tag may or may not be present

When a tag key is missing from a URN entirely, it behaves like K=? (no constraint). The two are equivalent.

Examples

# Exact: must have format=pdf
cap:format=pdf;op=extract

# Must-have-any: must have format, don't care what value
cap:format=*;op=extract

# Must-not-have: must NOT have a debug tag
cap:debug=!;op=extract

# Unspecified: no constraint on format (same as omitting it)
cap:format=?;op=extract

Value-less Tags

A tag written without = is shorthand for =* (must-have-any). This is useful when you want to require a tag's presence without constraining its value.

# These are equivalent:
cap:image;op=classify
cap:image=*;op=classify

In canonical form, K=* is always written as just K (the value-less form).

Writing K= (key, equals sign, no value) is a parse error. Use K or K=* instead.

Direction Specifiers

Capability URNs have two required tags: in and out. Their values are Media URNs describing the input and output data formats. Because Media URNs contain colons and semicolons, they must be quoted:

cap:in="media:pdf;bytes";op=extract;out="media:text;utf8"

The in value describes what data the capability accepts. The out value describes what data it produces. These are not matched by exact string comparison — they use semantic subtype matching, explained in Matching.

Media URNs

Media URNs follow the same tag syntax as capability URNs but use the media: prefix. They describe data formats using value-less tags as markers:

media:bytes                        # raw bytes
media:pdf;bytes                    # PDF file as bytes
media:image;png;bytes              # PNG image as bytes
media:text;utf8                    # UTF-8 text
media:image;png;bytes;thumbnail    # PNG thumbnail as bytes

Media URNs typically use value-less tags (markers) rather than key-value pairs, though key-value tags are allowed. The marker count determines specificity for direction matching.

Parse Errors

The following conditions produce parse errors, with consistent error codes across all implementations:

  • Invalid format — missing or wrong scheme prefix
  • Duplicate key — same key appears more than once
  • Empty key — a tag starts with =
  • Empty valuekey= with no value after the equals sign
  • Invalid character — character not allowed in unquoted position
  • Unterminated quote — opening " without closing "
  • Invalid escape sequence\ followed by something other than " or \

Complete Examples

# PDF text extraction capability
cap:in="media:pdf;bytes";op=extract;out="media:text;utf8"

# Spanish translation
cap:in="media:text;utf8";language=es;op=translate;out="media:text;utf8"

# Image classification with specific model
cap:in="media:image;bytes";model=resnet;op=classify;out="media:object;json"

# Thumbnail generation
cap:in="media:image;bytes";op=generate_thumbnail;out="media:image;png;bytes;thumbnail"

# A pattern that matches any extraction capability
cap:in;op=extract;out

# A pattern requiring format=pdf but no debug tag
cap:debug=!;format=pdf;op=extract