cap:
Browse Sign In

Capability Definitions

JSON structure for describing what a capability does

What a Definition Is

A capability definition is a JSON document that describes a capability in enough detail for tools to work with it. It includes the capability's URN, a human-readable title and description, the arguments it accepts, and the output it produces.

Definitions serve two purposes: they let humans understand what a capability does, and they let programs generate UIs, validate inputs, or compose capabilities together without special-case code.

Structure

A definition has these top-level fields:

{
  "urn": "cap:in=\"media:pdf;bytes\";op=extract;out=\"media:text;utf8\"",
  "title": "PDF Text Extractor",
  "description": "Extracts text content from PDF documents",
  "arguments": {
    "required": [ ... ],
    "optional": [ ... ]
  },
  "output": {
    "type": "text",
    "media": "media:text;utf8",
    "description": "Extracted text content"
  }
}

urn

The capability's URN in canonical form. This is the unique identifier for the capability. It includes in and out direction specs that describe the input and output data formats.

title

A short, human-readable name. Keep it under 60 characters. It should describe what the capability does, not how it does it.

description

A longer explanation of what the capability does, any limitations, and how it behaves in edge cases. Plain text.

Arguments

The arguments object has two arrays: required and optional. Each element describes one argument.

"arguments": {
  "required": [
    {
      "name": "input",
      "type": "file",
      "description": "PDF file to process",
      "media": "media:pdf;bytes"
    }
  ],
  "optional": [
    {
      "name": "pages",
      "type": "string",
      "description": "Index Range (e.g., '1-5' or '1,3,5')"
    },
    {
      "name": "preserve_layout",
      "type": "boolean",
      "description": "Whether to preserve spatial layout of text"
    }
  ]
}

Argument fields

  • name — identifier for the argument, used as the key when passing values
  • type — the data type (see below)
  • description — what this argument controls
  • media (optional) — a Media URN further constraining the expected data format, used with file and binary types

Argument Types

Each argument has a type that tells callers what kind of data to provide.

TypeDescriptionExample values
string Text input. UTF-8 string. "hello world", "1-5"
number Numeric input. Integer or floating-point. 42, 0.75
boolean True or false. true, false
file File data. Typically provided as a path or binary content. File path or byte content
json Structured data as a JSON object or array. {"key": "value"}

The file type often has an accompanying media field specifying the expected format. A capability that processes PDFs would specify "media": "media:pdf;bytes" on its file argument.

Output

The output object describes what the capability produces.

"output": {
  "type": "text",
  "media": "media:text;utf8",
  "description": "Extracted text content"
}

Output types

TypeDescription
text Plain UTF-8 text
json Structured JSON data
binary Binary data (images, audio, files)

The media field on the output matches the out direction spec in the capability URN. It describes the data format in detail using a Media URN.

Media Specs

The media field on arguments and output uses Media URNs to describe data formats precisely. This connects the definition to the matching system — the media specs in a definition correspond to the in and out tags in the capability URN.

# A capability that converts images to thumbnails:
{
  "urn": "cap:in=\"media:image;bytes\";op=thumbnail;out=\"media:image;png;bytes;thumbnail\"",
  "arguments": {
    "required": [
      {
        "name": "input",
        "type": "file",
        "media": "media:image;bytes",
        "description": "Image file to create thumbnail from"
      }
    ],
    "optional": [
      {
        "name": "size",
        "type": "number",
        "description": "Maximum dimension in pixels"
      }
    ]
  },
  "output": {
    "type": "binary",
    "media": "media:image;png;bytes;thumbnail",
    "description": "PNG thumbnail image"
  }
}

The input media spec (media:image;bytes) matches the in direction spec in the URN. The output media spec (media:image;png;bytes;thumbnail) matches the out direction spec. This ensures consistency between the URN and the definition.

Complete Example

{
  "urn": "cap:in=\"media:text;utf8\";language=es;op=translate;out=\"media:text;utf8\"",
  "title": "Spanish Translator",
  "description": "Translates text from any language to Spanish",
  "arguments": {
    "required": [
      {
        "name": "input",
        "type": "string",
        "description": "Text to translate"
      }
    ],
    "optional": [
      {
        "name": "source_language",
        "type": "string",
        "description": "Source language code (auto-detected if omitted)"
      },
      {
        "name": "formality",
        "type": "string",
        "description": "Formality level: 'formal' or 'informal'"
      }
    ]
  },
  "output": {
    "type": "text",
    "media": "media:text;utf8",
    "description": "Translated text in Spanish"
  }
}