The JSON layer between your LLM and production.

FORMA compiles your schema into a live decoding constraint that sits between any LLM and your application. Invalid tokens are dropped before they're emitted — no retries, no parser crashes.

Get early access to the private beta.

stream/gpt-5-mini/flight_booking.schema.json
12.4ms/tok

Live decode state

clean

Streaming valid tokens

The model is still inside the schema and every emitted token is safe to stream.

blocked: "return": "2026-05-19"emitted: "return": null
model streamblockedvalid
0 accepted0 dropped
100% schema-valid
§ 01[the problem]

Your model returns “valid” JSON. Your pipeline still breaks.

Native “Strict Mode” validates types afterthe model has already chosen them. By the time you see the error, you’ve already paid for the tokens — and your user is staring at a spinner.

+312%
avg cost on retry

Retry loops eat your budget

Validation-fail-retry cycles silently 3× your token spend and add 800–2400 ms of latency per request.

1 / 47
calls fail to parse

Parsers break in production

Trailing commas, unescaped quotes, truncated arrays. Strict Mode catches types — not the syntax that actually breaks JSON.parse().

Silent
no error thrown

Semantic hallucinations slip through

The model returns a one-way ticket with a return date. Schema-valid. Logic-broken. Your downstream system corrupts silently.

§ 02[how it works]

Constraints at the decoding layer.

FORMA isn’t a validator. It’s a real-time mask over the model’s logit distribution. Invalid tokens are eliminated before they’re ever sampled — which means generation can never produce an unparseable, illogical, or off-schema output.

request path
your appformaopenai · anthropic · groq
tokens stream back → validated mid-flight → emitted to your client
  1. 01

    Compile schema

    Your JSON schema is parsed into a deterministic finite automaton plus a cross-field constraint graph.

    >POST /v1/schema/compile
  2. 02

    Intercept the stream

    Every token from the LLM provider passes through FORMA before it is emitted to your client.

    >stream → forma.gate(tok)
  3. 03

    Mask the distribution

    Invalid tokens are dropped from the model's output distribution. The next valid token is selected — not retried.

    >logits[invalid] = -inf
  4. 04

    Emit guaranteed JSON

    Your application receives a stream that is provably schema-compliant. JSON.parse() never throws. Downstream never corrupts.

    >✓ schema_valid: true
§ 03[built for production]

Everything Strict Mode forgot.

Six things we built because every LLM-serious team ends up rebuilding them badly themselves.

cross-field constraints

If A implies not B, FORMA will never let the model write B.

Define semantic rules in JSON Schema, JSONLogic, or plain TypeScript predicates. FORMA enforces them at the token level, across nested objects and arrays.

rules / flight_booking.ts● compiled
// reject tokens that would violate this
forbid(trip => trip.type === "one_way" && trip.return)
require(trip => trip.depart < trip.return)
enum(trip.cabin, ["economy", "business"])
~12ms

Per-token overhead

Compiled DFAs, zero allocations on the hot path. Streaming-first.

Drop-in OpenAI compatible

Change one URL. Keep your SDK, prompts, and tools.

base_url = "https://api.example.com/v1"

Every major provider

OpenAI, Anthropic, Groq, Bedrock, Vertex, xAI, Together, Fireworks. One contract.

Self-host or cloud

Single binary under 30 MB. Your tokens never leave your VPC.

Zero data retention

Schemas and prompts are never stored. Audit logs stay yours.

§ 04[forma vs strict mode]

Type-checking isn’t the same as truth-checking.

Capability
OpenAI Strict Mode
post-hoc validation
FORMA
token-level decoding
Validates basic JSON types
Catches errors before tokens are emitted
Enforces cross-field logical constraints
Works across OpenAI, Anthropic, Groq, Bedrock
True streaming (no buffering to end)
partial
Eliminates retry loops
Self-hostable inside your VPC
Zero downstream parser errors
§ 05[30-second integration]

One URL.
Zero refactor.

FORMA speaks the OpenAI wire protocol. Keep your SDK, your prompts, your tool calls. Point to a different host and pass your schema.

  • Works with the official OpenAI, Anthropic and Vercel AI SDKs
  • Streaming, tool calls, structured outputs — all preserved
  • Schema can live in your repo, in S3, or be passed inline
import OpenAI from "openai"
import { schema } from "./flight_booking.schema"

const openai = new OpenAI({
  // 1. swap the base URL — that's it.
  baseURL: "https://api.example.com/v1"//api.example.com/v1",
  apiKey:  processprocess.env.OPENAI_API_KEY,
})

const stream = await openai.chat.completions.create({
  model: "gpt-5-mini",
  stream: true,
  messages: [{ role: "user", content: "Book me a one-way to Tokyo" }],
  // 2. pass your schema. forma compiles it into a token-level mask.
  forma: { schema, mode: "strict+semantic" },
})

// 3. every token you receive is provably schema-valid.
for await (const chunk of stream) process.stdout.write(chunk.delta)
§ 07[book a demo]

See FORMA on your schema.

Bring a real schema and a real pain point. We’ll wire FORMA into your stack live and show you the broken outputs that would have hit production today.

30-min walkthrough

A real integration,
in a single call.

  • Working integration in your repo
    Under 30 minutes, on a shared screen
  • Demo on your worst-behaving prompt
    We’ll reproduce a failure, then fix it
  • Cost & latency model for your traffic
    Concrete numbers, not marketing math
  • Founding engineer, not a BDR
    The people who wrote the decoding kernel
JRMKAC

Hosted by the engineers who built the decoding kernel.

We usually reply in < 4 hrs

§ 08[faq]

Questions,
answered.

Still curious? Email reach out to us — a real engineer replies.