How it works
Index once. Reuse forever.
Convert a model once, index your documents once — then reuse and recombine them across chats and agents with no prefill to redo. Here is the whole pipeline, in buyer terms.
Convert & retrain
A model is converted to a sparse, sub-quadratic architecture and retrained, with the context-length ceiling removed. Attention can then be served efficiently, hardware needs drop, and context length is no longer capped. Today this is the SmolLM3-3B model.
For ML buyers — the credibility and mechanism story.
Ingest & index — once
Through the console, an API call (including from another application), or an integration, a user selects a document to be converted, indexed, and stored — supporting multiple versions and edits. This is a one-time cost per document.
The “set it up once” promise.
Reuse at inference
The same model uses any stored document — or any combination — as part of a chat or agent: the console for testing, the API for production. Because step 2 is persisted, there is no prefill to re-run between uses, even days apart.
Where cost and latency savings land.
What stays the same
Standard tokenizers and an OpenAI-compatible endpoint — point your existing stack at it and keep your tooling.
Today vs. roadmap
- TodayOne converted model — SmolLM3-3B — via AWS Marketplace.
- RoadmapConvert the specific model you want.
- RoadmapConversion as a service for AI labs.
Claims are tied to what ships today.
Capability is live on SmolLM3-3B; magnitude figures are labeled illustrative until benchmarked.