How it works

Index once. Reuse forever.

Convert a model once, index your documents once — then reuse and recombine them across chats and agents with no prefill to redo. Here is the whole pipeline, in buyer terms.

Convert & retrain
A model is converted to a sparse, sub-quadratic architecture and retrained, with the context-length ceiling removed. Attention can then be served efficiently, hardware needs drop, and context length is no longer capped. Today this is the SmolLM3-3B model.
For ML buyers — the credibility and mechanism story.
Ingest & index — once
Through the console, an API call (including from another application), or an integration, a user selects a document to be converted, indexed, and stored — supporting multiple versions and edits. This is a one-time cost per document.
The “set it up once” promise.
Reuse at inference
The same model uses any stored document — or any combination — as part of a chat or agent: the console for testing, the API for production. Because step 2 is persisted, there is no prefill to re-run between uses, even days apart.
Where cost and latency savings land.

What stays the same

Standard tokenizers and an OpenAI-compatible endpoint — point your existing stack at it and keep your tooling.

Today vs. roadmap

TodayOne converted model — SmolLM3-3B — via AWS Marketplace.
RoadmapConvert the specific model you want.
RoadmapConversion as a service for AI labs.

Claims are tied to what ships today.

Capability is live on SmolLM3-3B; magnitude figures are labeled illustrative until benchmarked.

Make context durable.

Index once. Reuse forever. Inside your own AWS account.

Request access

Index once. Reuse forever.

Convert & retrain

Ingest & index — once

Reuse at inference

What stays the same

Today vs. roadmap

Make context durable.