The JSON layer between your LLM and production.

FORMA compiles your schema into a live decoding constraint that sits between any LLM and your application. Invalid tokens are dropped before they're emitted — no retries, no parser crashes.

Get early access to the private beta.

stream/gpt-5-mini/flight_booking.schema.json

12.4ms/tok

Live decode state

clean

Streaming valid tokens

The model is still inside the schema and every emitted token is safe to stream.

blocked: "return": "2026-05-19"emitted: "return": null

model streamblockedvalid

0 accepted0 dropped

100% schema-valid

§ 01[the problem]

Your model returns “valid” JSON. Your pipeline still breaks.

Native “Strict Mode” validates types afterthe model has already chosen them. By the time you see the error, you’ve already paid for the tokens — and your user is staring at a spinner.

+312%

avg cost on retry

Retry loops eat your budget

Validation-fail-retry cycles silently 3× your token spend and add 800–2400 ms of latency per request.

1 / 47

calls fail to parse

Parsers break in production

Trailing commas, unescaped quotes, truncated arrays. Strict Mode catches types — not the syntax that actually breaks JSON.parse().

Silent

no error thrown

Semantic hallucinations slip through

The model returns a one-way ticket with a return date. Schema-valid. Logic-broken. Your downstream system corrupts silently.

§ 02[how it works]

Constraints at the decoding layer.

FORMA isn’t a validator. It’s a real-time mask over the model’s logit distribution. Invalid tokens are eliminated before they’re ever sampled — which means generation can never produce an unparseable, illogical, or off-schema output.

request path

your appformaopenai · anthropic · groq

tokens stream back → validated mid-flight → emitted to your client

01
Compile schema
Your JSON schema is parsed into a deterministic finite automaton plus a cross-field constraint graph.
>POST /v1/schema/compile
02
Intercept the stream
Every token from the LLM provider passes through FORMA before it is emitted to your client.
>stream → forma.gate(tok)
03
Mask the distribution
Invalid tokens are dropped from the model's output distribution. The next valid token is selected — not retried.
>logits[invalid] = -inf
04
Emit guaranteed JSON
Your application receives a stream that is provably schema-compliant. JSON.parse() never throws. Downstream never corrupts.
>✓ schema_valid: true

§ 03[built for production]

Everything Strict Mode forgot.

Six things we built because every LLM-serious team ends up rebuilding them badly themselves.

cross-field constraints

If A implies not B, FORMA will never let the model write B.

Define semantic rules in JSON Schema, JSONLogic, or plain TypeScript predicates. FORMA enforces them at the token level, across nested objects and arrays.

rules / flight_booking.ts● compiled

// reject tokens that would violate this
forbid(trip => trip.type === "one_way" && trip.return)
require(trip => trip.depart < trip.return)
enum(trip.cabin, ["economy", "business"])

~12ms

Per-token overhead

Compiled DFAs, zero allocations on the hot path. Streaming-first.

Drop-in OpenAI compatible

Change one URL. Keep your SDK, prompts, and tools.

base_url = "https://api.example.com/v1"

Every major provider

OpenAI, Anthropic, Groq, Bedrock, Vertex, xAI, Together, Fireworks. One contract.

Self-host or cloud

Single binary under 30 MB. Your tokens never leave your VPC.

Zero data retention

Schemas and prompts are never stored. Audit logs stay yours.

§ 04[forma vs strict mode]

Type-checking isn’t the same as truth-checking.

Capability

OpenAI Strict Mode

post-hoc validation

FORMA

token-level decoding

Validates basic JSON types

Catches errors before tokens are emitted

Enforces cross-field logical constraints

Works across OpenAI, Anthropic, Groq, Bedrock

True streaming (no buffering to end)

partial

Eliminates retry loops

Self-hostable inside your VPC

Zero downstream parser errors

§ 05[30-second integration]

One URL.
Zero refactor.

FORMA speaks the OpenAI wire protocol. Keep your SDK, your prompts, your tool calls. Point to a different host and pass your schema.

Works with the official OpenAI, Anthropic and Vercel AI SDKs
Streaming, tool calls, structured outputs — all preserved
Schema can live in your repo, in S3, or be passed inline

import OpenAI from "openai"
import { schema } from "./flight_booking.schema"

const openai = new OpenAI({
  // 1. swap the base URL — that's it.
  baseURL: "https://api.example.com/v1"//api.example.com/v1",
  apiKey:  processprocess.env.OPENAI_API_KEY,
})

const stream = await openai.chat.completions.create({
  model: "gpt-5-mini",
  stream: true,
  messages: [{ role: "user", content: "Book me a one-way to Tokyo" }],
  // 2. pass your schema. forma compiles it into a token-level mask.
  forma: { schema, mode: "strict+semantic" },
})

// 3. every token you receive is provably schema-valid.
for await (const chunk of stream) process.stdout.write(chunk.delta)

§ 07[book a demo]

See FORMA on your schema.

Bring a real schema and a real pain point. We’ll wire FORMA into your stack live and show you the broken outputs that would have hit production today.

30-min walkthrough

A real integration,
in a single call.

Working integration in your repo
Under 30 minutes, on a shared screen
Demo on your worst-behaving prompt
We’ll reproduce a failure, then fix it
Cost & latency model for your traffic
Concrete numbers, not marketing math
Founding engineer, not a BDR
The people who wrote the decoding kernel

JRMKAC

Hosted by the engineers who built the decoding kernel.

§ 08[faq]

Questions,
answered.

Still curious? Email reach out to us — a real engineer replies.

The JSON layer between your LLM and production.

Your model returns “valid” JSON. Your pipeline still breaks.

Retry loops eat your budget

Parsers break in production

Semantic hallucinations slip through

Constraints at the decoding layer.

Compile schema

Intercept the stream

Mask the distribution

Emit guaranteed JSON

Everything Strict Mode forgot.

If A implies not B, FORMA will never let the model write B.

Per-token overhead

Drop-in OpenAI compatible

Every major provider

Self-host or cloud

Zero data retention

Type-checking isn’t the same as truth-checking.

One URL.Zero refactor.

See FORMA on your schema.

A real integration,in a single call.

Questions,answered.

01How is this different from OpenAI's Strict Mode or function calling?

02Will it slow down my inference?

03Which models and providers are supported?

04Can I run FORMA inside my own VPC?

05What happens if the model can't produce a valid output?

06How does pricing work?

One URL.
Zero refactor.

A real integration,
in a single call.

Questions,
answered.