Skip to main content

How it works

Index once. Reuse forever.

Convert a model once, index your documents once — then reuse and recombine them across chats and agents with no prefill to redo. Here is the whole pipeline, in buyer terms.

  1. Convert & retrain

    A model is converted to a sparse, sub-quadratic architecture and retrained, with the context-length ceiling removed. Attention can then be served efficiently, hardware needs drop, and context length is no longer capped. Today this is the SmolLM3-3B model.

    For ML buyers — the credibility and mechanism story.

  2. Ingest & index — once

    Through the console, an API call (including from another application), or an integration, a user selects a document to be converted, indexed, and stored — supporting multiple versions and edits. This is a one-time cost per document.

    The “set it up once” promise.

  3. Reuse at inference

    The same model uses any stored document — or any combination — as part of a chat or agent: the console for testing, the API for production. Because step 2 is persisted, there is no prefill to re-run between uses, even days apart.

    Where cost and latency savings land.

What stays the same

Standard tokenizers and an OpenAI-compatible endpoint — point your existing stack at it and keep your tooling.

Today vs. roadmap

  • TodayOne converted model — SmolLM3-3B — via AWS Marketplace.
  • RoadmapConvert the specific model you want.
  • RoadmapConversion as a service for AI labs.

Claims are tied to what ships today.

Capability is live on SmolLM3-3B; magnitude figures are labeled illustrative until benchmarked.

Make context durable.

Index once. Reuse forever. Inside your own AWS account.

Request access