{
  "openapi": "3.1.0",
  "info": {
    "title": "kiapi Embedding API",
    "description": "Text and multimodal embeddings for retrieval and similarity search.\n\nPOST one item to `/v1/embedding` with one field per modality (`text` and/or\n`image`). Returns a single L2-normalized, last-token-pooled vector. This is **not**\nOpenAI's array `input` shape — one item per request. Sync only (no async mode).\n\n## Upstream docs\n- [mlx-embeddings](https://github.com/Blaizzy/mlx-embeddings) — the MLX embedding engine kiapi runs\n- [mlx-community/Qwen3-Embedding-8B-mxfp8](https://huggingface.co/mlx-community/Qwen3-Embedding-8B-mxfp8) — `qwen3-embedding-8b` weights\n- [mlx-community/Qwen3-VL-Embedding-2B-mxfp8](https://huggingface.co/mlx-community/Qwen3-VL-Embedding-2B-mxfp8) — `qwen3-vl-embedding-2b` weights\n\n## Choosing A Model\n`model` selects a registered embedding model (full catalog and aliases:\n`GET /v1/embedding/models`). The served models differ by input modality, so\nchoose by what you send:\n- **qwen3-embedding-8b** (default) — text only. Larger, stronger text retrieval.\n- **qwen3-vl-embedding-2b** — text + image. Use it for image embeddings or a\n  shared text/image space; sending `image` to the text-only model returns 400.\n\nVectors from different models live in different spaces and may differ in\ndimensionality — embed both your queries and your corpus with the **same** model.\n\n## Notes\n- One input item per request. To embed N items, make N requests.\n- Provide at least one modality input; an empty request returns 400.\n- Vectors are already L2-normalized, so cosine similarity reduces to a dot\n  product.\n\n## Examples\n\n### Text (default model)\n```sh\ncurl -sS http://HOST:PORT/v1/embedding \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"text\": \"今日の天気は晴れです\"}'\n```\n\n### Image (multimodal model)\n```sh\ncurl -sS http://HOST:PORT/v1/embedding \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n    \"model\": \"qwen3-vl-embedding-2b\",\n    \"image\": \"data:image/png;base64,iVBORw0...\"\n  }'\n```\n",
    "version": "0.1.0"
  },
  "paths": {
    "/v1/embedding": {
      "post": {
        "summary": "Embedding",
        "description": "Embed a single item (one field per modality).\n\nSend `text` and/or `image` in one request (one item, not OpenAI's array\n`input`). The resolved `model` (see GET /v1/embedding/models) determines which\nmodalities are accepted; provide at least one input it supports. Returns one\nL2-normalized, last-token-pooled vector with its `dimension`.\n\nEmbedding runs as a single-flight job, so it appears in /v1/jobs and is\nserialized with all other generation, but there is no async mode: the caller\nalways waits for the vector. A wait that exceeds the sync timeout returns 504\nwhile the job keeps running and can be polled at /v1/jobs/{id}.",
        "operationId": "embedding_v1_embedding_post",
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/EmbedRequest"
              }
            }
          },
          "required": true
        },
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/EmbeddingResponse"
                }
              }
            }
          },
          "400": {
            "description": "Unknown model, empty input, or unsupported modality."
          },
          "503": {
            "description": "Model not set up, or memory budget exceeded."
          },
          "504": {
            "description": "Sync timeout exceeded; the job keeps running."
          },
          "422": {
            "description": "Validation Error",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/HTTPValidationError"
                }
              }
            }
          }
        }
      }
    },
    "/v1/embedding/models": {
      "get": {
        "summary": "List Models",
        "description": "List the servable models for this capability.\n\nReturns the public catalog of every variant selectable via the ``model``\nfield on this capability's endpoints.",
        "operationId": "list_models_v1_embedding_models_get",
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {
                  "items": {
                    "$ref": "#/components/schemas/CapabilityModelSpec"
                  },
                  "type": "array",
                  "title": "Response List Models V1 Embedding Models Get"
                }
              }
            }
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "CapabilityModelSpec": {
        "properties": {
          "name": {
            "type": "string",
            "title": "Name",
            "description": "Model variant name to pass in the request model field.",
            "examples": [
              "turbo"
            ]
          },
          "family": {
            "type": "string",
            "title": "Family",
            "description": "Capability family that resolves this model variant.",
            "examples": [
              "zimage"
            ]
          },
          "domain": {
            "type": "string",
            "title": "Domain",
            "description": "Capability domain used for grouping model lists.",
            "examples": [
              "image"
            ]
          },
          "aliases": {
            "items": {
              "type": "string"
            },
            "type": "array",
            "title": "Aliases",
            "description": "Alternative names that also resolve to this model.",
            "examples": [
              [
                "omni",
                "qwen3-omni-30b"
              ]
            ]
          },
          "default": {
            "type": "boolean",
            "title": "Default",
            "description": "Whether this is the default model when the request omits model.",
            "default": false,
            "examples": [
              true
            ]
          },
          "features": {
            "items": {
              "type": "string"
            },
            "type": "array",
            "title": "Features",
            "description": "Handler-declared modalities and features supported by this model.",
            "examples": [
              [
                "text",
                "image"
              ]
            ]
          }
        },
        "type": "object",
        "required": [
          "name",
          "family",
          "domain"
        ],
        "title": "CapabilityModelSpec",
        "description": "Public model discovery entry for capability-specific model lists."
      },
      "EmbedRequest": {
        "properties": {
          "model": {
            "anyOf": [
              {
                "type": "string"
              },
              {
                "type": "null"
              }
            ],
            "title": "Model",
            "description": "Registry name, alias, or repo of the embedding model (see `GET /v1/embedding/models`). Omit to use the default model."
          },
          "text": {
            "anyOf": [
              {
                "type": "string"
              },
              {
                "type": "null"
              }
            ],
            "title": "Text",
            "description": "Text to embed. Supported by every embedding model."
          },
          "image": {
            "anyOf": [
              {
                "type": "string"
              },
              {
                "type": "null"
              }
            ],
            "title": "Image",
            "description": "Image to embed, as a base64 string, data URL, http(s) URL, or local path. Only multimodal models accept it; sending it to a text-only model returns HTTP 400."
          }
        },
        "additionalProperties": true,
        "type": "object",
        "title": "EmbedRequest"
      },
      "EmbeddingResponse": {
        "properties": {
          "model": {
            "type": "string",
            "title": "Model",
            "description": "Resolved model name that produced the vector."
          },
          "embedding": {
            "items": {
              "type": "number"
            },
            "type": "array",
            "title": "Embedding",
            "description": "L2-normalized, last-token-pooled embedding vector for the item."
          },
          "dimension": {
            "type": "integer",
            "title": "Dimension",
            "description": "Length of `embedding` (vector dimensionality)."
          },
          "usage": {
            "$ref": "#/components/schemas/_Usage",
            "description": "Token accounting for the request."
          },
          "timings": {
            "$ref": "#/components/schemas/_Timings",
            "description": "kiapi extension: server-side timing."
          }
        },
        "type": "object",
        "required": [
          "model",
          "embedding",
          "dimension",
          "usage",
          "timings"
        ],
        "title": "EmbeddingResponse"
      },
      "HTTPValidationError": {
        "properties": {
          "detail": {
            "items": {
              "$ref": "#/components/schemas/ValidationError"
            },
            "type": "array",
            "title": "Detail"
          }
        },
        "type": "object",
        "title": "HTTPValidationError"
      },
      "ValidationError": {
        "properties": {
          "loc": {
            "items": {
              "anyOf": [
                {
                  "type": "string"
                },
                {
                  "type": "integer"
                }
              ]
            },
            "type": "array",
            "title": "Location"
          },
          "msg": {
            "type": "string",
            "title": "Message"
          },
          "type": {
            "type": "string",
            "title": "Error Type"
          },
          "input": {
            "title": "Input"
          },
          "ctx": {
            "type": "object",
            "title": "Context"
          }
        },
        "type": "object",
        "required": [
          "loc",
          "msg",
          "type"
        ],
        "title": "ValidationError"
      },
      "_Timings": {
        "properties": {
          "total_s": {
            "type": "number",
            "title": "Total S",
            "description": "Wall-clock embedding time in seconds."
          }
        },
        "type": "object",
        "required": [
          "total_s"
        ],
        "title": "_Timings"
      },
      "_Usage": {
        "properties": {
          "prompt_tokens": {
            "type": "integer",
            "title": "Prompt Tokens",
            "description": "Tokens in the embedded input (0 when the model does not report it)."
          },
          "total_tokens": {
            "type": "integer",
            "title": "Total Tokens",
            "description": "Total tokens accounted for; equals `prompt_tokens` (no generation)."
          }
        },
        "type": "object",
        "required": [
          "prompt_tokens",
          "total_tokens"
        ],
        "title": "_Usage"
      }
    }
  },
  "x-kiapi-capability": "embedding",
  "x-kiapi-domain": "embedding",
  "x-kiapi-root-openapi": "/openapi.json"
}
