Retry loops eat your budget
Validation-fail-retry cycles silently 3× your token spend and add 800–2400 ms of latency per request.
FORMA compiles your schema into a live decoding constraint that sits between any LLM and your application. Invalid tokens are dropped before they're emitted — no retries, no parser crashes.
Get early access to the private beta.
Live decode state
cleanStreaming valid tokens
The model is still inside the schema and every emitted token is safe to stream.
Native “Strict Mode” validates types afterthe model has already chosen them. By the time you see the error, you’ve already paid for the tokens — and your user is staring at a spinner.
Validation-fail-retry cycles silently 3× your token spend and add 800–2400 ms of latency per request.
Trailing commas, unescaped quotes, truncated arrays. Strict Mode catches types — not the syntax that actually breaks JSON.parse().
The model returns a one-way ticket with a return date. Schema-valid. Logic-broken. Your downstream system corrupts silently.
FORMA isn’t a validator. It’s a real-time mask over the model’s logit distribution. Invalid tokens are eliminated before they’re ever sampled — which means generation can never produce an unparseable, illogical, or off-schema output.
Your JSON schema is parsed into a deterministic finite automaton plus a cross-field constraint graph.
Every token from the LLM provider passes through FORMA before it is emitted to your client.
Invalid tokens are dropped from the model's output distribution. The next valid token is selected — not retried.
Your application receives a stream that is provably schema-compliant. JSON.parse() never throws. Downstream never corrupts.
Six things we built because every LLM-serious team ends up rebuilding them badly themselves.
Define semantic rules in JSON Schema, JSONLogic, or plain TypeScript predicates. FORMA enforces them at the token level, across nested objects and arrays.
// reject tokens that would violate this forbid(trip => trip.type === "one_way" && trip.return) require(trip => trip.depart < trip.return) enum(trip.cabin, ["economy", "business"])
Compiled DFAs, zero allocations on the hot path. Streaming-first.
Change one URL. Keep your SDK, prompts, and tools.
OpenAI, Anthropic, Groq, Bedrock, Vertex, xAI, Together, Fireworks. One contract.
Single binary under 30 MB. Your tokens never leave your VPC.
Schemas and prompts are never stored. Audit logs stay yours.
FORMA speaks the OpenAI wire protocol. Keep your SDK, your prompts, your tool calls. Point to a different host and pass your schema.
import OpenAI from "openai"
import { schema } from "./flight_booking.schema"
const openai = new OpenAI({
// 1. swap the base URL — that's it.
baseURL: "https://api.example.com/v1"//api.example.com/v1",
apiKey: processprocess.env.OPENAI_API_KEY,
})
const stream = await openai.chat.completions.create({
model: "gpt-5-mini",
stream: true,
messages: [{ role: "user", content: "Book me a one-way to Tokyo" }],
// 2. pass your schema. forma compiles it into a token-level mask.
forma: { schema, mode: "strict+semantic" },
})
// 3. every token you receive is provably schema-valid.
for await (const chunk of stream) process.stdout.write(chunk.delta)Bring a real schema and a real pain point. We’ll wire FORMA into your stack live and show you the broken outputs that would have hit production today.
Hosted by the engineers who built the decoding kernel.