Skip to Content
APIServer TemplatescreateCompressionHandler

createCompressionHandler

createCompressionHandler implements the /api/compression endpoint expected by the compression controller. It receives the current transcript (plus pinned messages, artifacts, usage metrics) and responds with a snapshot + artifact list that the UI merges back into the store.

Use it alongside the client-side compression config to keep long-running chats inside the model’s context window while giving users transparency over what was summarised.

Import

import { createCompressionHandler } from "ai-chat-bootstrap/server";

Signature

function createCompressionHandler( options: CreateCompressionHandlerOptions ): RequestHandler;

Options

interface CreateCompressionHandlerOptions { /** Language model (or resolver) used to generate summaries. */ model?: LanguageModel | ((ctx: ModelResolverContext) => LanguageModel | Promise<LanguageModel | null | undefined> | null | undefined); /** Custom system prompt (defaults to a built-in compression prompt). */ systemPrompt?: string; /** Legacy alias (deprecated) for `maxRecentMessages`. */ minRecentMessages?: number; /** How many of the most recent turns should always survive. Defaults to 4. */ maxRecentMessages?: number; /** Maximum number of artifacts to return per run. Defaults to 3. */ maxArtifacts?: number; /** Static options passed into `generateObject`. */ generateOptions?: GenerateObjectOptions; /** Build `generateObject` options dynamically per request. */ buildGenerateOptions?: (ctx: { req: Request; body: CompressionServiceRequest; }) => Promise<GenerateObjectOptions> | GenerateObjectOptions; /** Optional logger invoked on handler errors. */ onError?: ( error: unknown, ctx: { req: Request; body?: CompressionServiceRequest } ) => void; }

If you pass a function to model it receives { req, body, requestedModel } and should return a LanguageModel instance (or null to fall back to the default).

Basic Usage

// app/api/compression/route.ts import { createCompressionHandler } from "ai-chat-bootstrap/server"; import { openai } from "@ai-sdk/openai"; export const POST = createCompressionHandler({ model: openai("gpt-4o-mini"), maxRecentMessages: 6, maxArtifacts: 4, });

Dynamic model selection

const handler = createCompressionHandler({ async model({ body }) { // Prefer the model requested by the client, otherwise default. if (body.config?.model) { return openai(body.config.model); } return openai("gpt-4o-mini"); }, buildGenerateOptions: async ({ body }) => ({ response_format: { type: "json_schema", }, temperature: body.reason === "manual" ? 0.1 : 0.25, }), }); export const POST = handler;

Request Payload

The handler receives the same payload produced by the client-side compression controller:

interface CompressionServiceRequest { messages: UIMessage[]; pinnedMessages: CompressionPinnedMessage[]; artifacts: CompressionArtifact[]; snapshot: CompressionSnapshot | null; usage: CompressionUsage; config: { maxTokenBudget: number | null; compressionThreshold: number; pinnedMessageLimit: number | null; model?: string | null; }; metadata?: CompressionModelMetadata | null; reason: "manual" | "threshold" | "over-budget"; }
  • messages are the surviving transcript turns the client will continue sending to the chat model.
  • pinnedMessages must remain verbatim after compression.
  • artifacts includes previously generated summaries (can be edited or deleted by the user).
  • snapshot is the last successful compression run (if any).
  • usage describes current token counts, over-budget status, and timestamps.

Response Format

Return a JSON object describing the updated compression state:

interface CompressionServiceResponse { snapshot: CompressionSnapshot; artifacts: CompressionArtifact[]; usage?: CompressionUsage; pinnedMessages?: CompressionPinnedMessage[]; }
  • snapshot summarises the run (timestamp, reason, survivor ids, tokens saved, artifact ids).
  • artifacts is the list of summaries to merge into the UI.
  • usage optionally overrides the computed token usage (if you perform your own counting on the server).
  • pinnedMessages lets the server enforce pin changes (for example auto-pinning system messages).

The helper automatically sets Content-Type: application/json and returns a 200 response. Throwing an error (or returning a Response) short-circuits the default behaviour.

Error Handling

  • Throwing inside the handler returns a 500 with { error: string }. Override with onError to emit custom responses.
  • Non-2xx responses from generateObject (rate limits, auth) propagate through the promise chain—log them in onError for telemetry.

Frontend Integration

Enable compression on the client and point to the endpoint:

<ChatContainer transport={{ api: "/api/chat" }} compression={{ enabled: true, api: "/api/compression" }} />

The prompt toolbar now displays token usage, a compression review sheet, and pinned message indicators. See the Compression guide for end-to-end configuration tips.

Last updated on