Capability Definitions
JSON structure for describing what a capability does
What a Definition Is
A capability definition is a JSON document that describes a capability in enough detail for tools to work with it. It includes the capability's URN, a human-readable title and description, the arguments it accepts, and the output it produces.
Definitions serve two purposes: they let humans understand what a capability does, and they let programs generate UIs, validate inputs, or compose capabilities together without special-case code.
Structure
A definition has these top-level fields:
{
"urn": "cap:in=\"media:pdf;bytes\";op=extract;out=\"media:text;utf8\"",
"title": "PDF Text Extractor",
"description": "Extracts text content from PDF documents",
"arguments": {
"required": [ ... ],
"optional": [ ... ]
},
"output": {
"type": "text",
"media": "media:text;utf8",
"description": "Extracted text content"
}
}
urn
The capability's URN in canonical form. This is the unique identifier for the
capability. It includes in and out direction specs
that describe the input and output data formats.
title
A short, human-readable name. Keep it under 60 characters. It should describe what the capability does, not how it does it.
description
A longer explanation of what the capability does, any limitations, and how it behaves in edge cases. Plain text.
Arguments
The arguments object has two arrays: required and
optional. Each element describes one argument.
"arguments": {
"required": [
{
"name": "input",
"type": "file",
"description": "PDF file to process",
"media": "media:pdf;bytes"
}
],
"optional": [
{
"name": "pages",
"type": "string",
"description": "Index Range (e.g., '1-5' or '1,3,5')"
},
{
"name": "preserve_layout",
"type": "boolean",
"description": "Whether to preserve spatial layout of text"
}
]
}
Argument fields
- name — identifier for the argument, used as the key when passing values
- type — the data type (see below)
- description — what this argument controls
- media (optional) — a Media URN further constraining the expected data format, used with
fileandbinarytypes
Argument Types
Each argument has a type that tells callers what kind of data to provide.
| Type | Description | Example values |
|---|---|---|
string |
Text input. UTF-8 string. | "hello world", "1-5" |
number |
Numeric input. Integer or floating-point. | 42, 0.75 |
boolean |
True or false. | true, false |
file |
File data. Typically provided as a path or binary content. | File path or byte content |
json |
Structured data as a JSON object or array. | {"key": "value"} |
The file type often has an accompanying media field
specifying the expected format. A capability that processes PDFs would specify
"media": "media:pdf;bytes" on its file argument.
Output
The output object describes what the capability produces.
"output": {
"type": "text",
"media": "media:text;utf8",
"description": "Extracted text content"
}
Output types
| Type | Description |
|---|---|
text |
Plain UTF-8 text |
json |
Structured JSON data |
binary |
Binary data (images, audio, files) |
The media field on the output matches the out direction
spec in the capability URN. It describes the data format in detail using a Media URN.
Media Specs
The media field on arguments and output uses Media URNs to describe
data formats precisely. This connects the definition to the
matching system — the media specs in a
definition correspond to the in and out tags in the
capability URN.
# A capability that converts images to thumbnails:
{
"urn": "cap:in=\"media:image;bytes\";op=thumbnail;out=\"media:image;png;bytes;thumbnail\"",
"arguments": {
"required": [
{
"name": "input",
"type": "file",
"media": "media:image;bytes",
"description": "Image file to create thumbnail from"
}
],
"optional": [
{
"name": "size",
"type": "number",
"description": "Maximum dimension in pixels"
}
]
},
"output": {
"type": "binary",
"media": "media:image;png;bytes;thumbnail",
"description": "PNG thumbnail image"
}
}
The input media spec (media:image;bytes) matches the in
direction spec in the URN. The output media spec (media:image;png;bytes;thumbnail)
matches the out direction spec. This ensures consistency between the
URN and the definition.
Complete Example
{
"urn": "cap:in=\"media:text;utf8\";language=es;op=translate;out=\"media:text;utf8\"",
"title": "Spanish Translator",
"description": "Translates text from any language to Spanish",
"arguments": {
"required": [
{
"name": "input",
"type": "string",
"description": "Text to translate"
}
],
"optional": [
{
"name": "source_language",
"type": "string",
"description": "Source language code (auto-detected if omitted)"
},
{
"name": "formality",
"type": "string",
"description": "Formality level: 'formal' or 'informal'"
}
]
},
"output": {
"type": "text",
"media": "media:text;utf8",
"description": "Translated text in Spanish"
}
}