Spongram

Second memory for Claude Code & Claude Desktop

The persistent brain that never leaves your GPUs.

Spongram gives Claude a long-term, multi-tenant memory, 100% on-prem, plugged into your own inference. No cloud, no leak, no lock-in.

Your memory, your infra, your AI.

Why Spongram

100% on-prem, data and inference

Postgres + Neo4j + Graphiti + LLM all self-hosted. No episode token ever leaves your network. Runs on Docker Swarm or plain compose.

Multi-tenant by design

Strict per-tenant isolation via server-injected `group_id` with anti-tampering. One instance, N brains, zero cross-leak.

One-command Claude Code install

Per-tenant `.plugin` bundle generated on the fly: marketplace, SKILL.md, SessionStart hook. `claude plugin install spongram` and memory is online.

Bi-temporal by construction

Every fact carries `valid_at` / `invalid_at`. When you correct an entry, the old one is superseded — the history stays auditable.

Pluggable inference provider

Ollama, LM Studio, local vLLM, OpenAI, Anthropic, Mistral, Groq, or any OpenAI-compat endpoint. Hot-swap from the admin, zero restart, zero lock-in. One-click 3-layer validation (chat / tool_calls / strict json_schema).

Code map included

Deterministic AST extraction (tree-sitter) of your repos → a multi-tenant structural graph queryable through 5 MCP tools. Measured in a real bench: −68% cost on architecture questions, ~2× faster than grep/read.

The code map: fewer tokens, better answers

Spongram maps your repos via AST (tree-sitter) and exposes the graph to Claude Code through 5 MCP tools. Everything below is measured in real claude -p sessions, on the Spongram repo itself — protocol and runner ship with the product (docs/BENCH_CODEMAP_2026-06-10.md).

Code-navigation questionWithout Spongram (grep/read)With Spongram (code map)Δ cost
Most central files / symbols $0.329 — subagent + 29 greps, estimated ranking $0.105 — 1 god_nodes call, exact graph degree −68%
Package overview $0.291 $0.178 −39%
Who calls this function? $0.086 $0.106 — prod + test callers in 1 call +23%*
Module contents (classes, methods) $0.128 $0.181 +41%*
Dependency path A → B $0.107 $0.129 +20%*
Total (6 questions) $1.05 $0.83 −21%
Cumulative response time 293 s 151 s −48%

* For point lookups grep stays cheaper — and that is exactly what the shipped SKILL tells the agent: every tool where it belongs. A/B bench in real claude -p sessions (Sonnet), 1 question per session, read-only tools, 276-file repo; total between −12% and −21% across runs.

Deterministic AST extraction, zero LLM

Client-side tree-sitter (~30 languages): 276 files → 2,143 nodes / 4,509 edges in ~14 s. No indexing cost, reproducible to the commit.

5 dedicated MCP tools

code_map_query, neighbors, god_nodes, shortest_path, stats — including AST-inferred qualified module.attr() calls that grep often misses.

Mini-map at SessionStart

~110 tokens injected at session open: repo size, directories, central files. The agent is oriented before its first tool call.

Memory and code linked

A memory that mentions a file or symbol automatically surfaces its code nodes in search_nodes. The map stays fresh via the git post-commit hook.

In both editions

3D “Code map” tab; portable graph on Neo4j (cloud) / embedded FalkorDB (desktop), tenant-isolated.

What Spongram does today

Every capability listed here is verifiable in the shipped codebase.

On-prem self-host
docker-compose or Swarm + Portainer stack. Zero external cloud dependency.
Pluggable inference & hot-swap
Admin Settings page + 9-preset dropdown + 3-layer probe (chat / tool_calls / strict json_schema). Provider change takes effect on the next request, no container restart.
One-command Claude Code plugin
Per-tenant .plugin bundle: marketplace, SKILL.md, SessionStart hook.
Native Desktop edition (macOS)
Developer ID-signed Tauri app: frozen Python sidecar (PyInstaller), embedded FalkorDB, 100% offline. Memory, Code map, 3D Cortex, Wiki, Search, Pulse, Settings — no Docker, no external database. Windows and Linux in preparation.
Built-in Lemon Squeezy licensing (desktop)
In-app license activation, local validation, lifetime per-seat license.
Code map (structural code graph)
Client-side AST extraction via tree-sitter (~30 languages): files, classes, functions, imports, calls — including AST-inferred qualified module.attr() calls. 276 files → 2,143 nodes / 4,509 edges in ~14 s on the Spongram repo itself. Portable graph on Neo4j (cloud) and FalkorDB (desktop), tenant-isolated, kept fresh by a git post-commit hook.
5 code_map_* MCP tools + dedicated 3D tab
code_map_query / neighbors / god_nodes / shortest_path / stats. Memories that mention a file or symbol automatically surface their linked code nodes in search_nodes. A mini-map (~110 tokens) is injected at SessionStart: repo size, directories, central files — the agent is oriented before its first tool call.
Strict multi-tenant
Server-injected group_id, anti-tampering rejection. One instance, N brains, zero cross-leakage.
Bi-temporal facts
valid_at / invalid_at inherited from Graphiti — automatic supersession on contradiction.
Proactive memory capture
Agents (Claude Code, Claude Desktop) call add_memory on their own whenever a fact, preference or decision is stated — no need to ask.
Cross-project, cross-client "global" memory
A fact tagged project=global (stated in any project, or from Claude Desktop) is recalled across ALL your projects: one brain shared across every codebase and client. Project-specific facts stay scoped per project; search keeps the current project + global by default and drops other-project noise.
Authority validation at ingest (constitutional layer)
Optional provenance gate at the add_memory boundary: an episode must carry the expected tags (project=, and client= from a configured allow-list) before it enters the brain, otherwise it's rejected before even hitting the durable queue. Deterministic, dependency-free, flag-activated. Answers the classic critique « stores whatever it's given with no authority validation on ingested entities ».
Durable episode queue
Every add_memory is persisted to Postgres before reaching Graphiti. Zero loss on container restart, automatic retry on transient error (writes via the queue, reads via bounded retry/backoff on the Graphiti path). Oversized episodes (>32k tokens) are auto-chunked before extraction so they're never rejected by the LLM.
Sleep-time compute (Sleeper)
Background worker that synthesises Summary nodes and Wiki pages, with bounded concurrency (several entities in parallel, cap via SPONGRAM_SLEEPER_CONCURRENCY). Idempotent: only re-runs entities whose episode set actually changed (sha256 hash).
Dream cycle (periodic consolidation)
Duplicate-candidate detection via cosine similarity, time decay on edges (Ebbinghaus curve exp(-Δt/τ)), community propagation (LPA), boost on multi-source corroboration.
Non-destructive de-duplication + co-evolution (A-MEM)
The Dream links same-normalised-name entities via a :SAME_AS edge — no merge, zero data loss: recall and UI treat them as one. Aliases co-evolve (A-MEM pattern, NeurIPS 2025): when one carries a fresher summary, the other is re-synthesised on the next tick instead of drifting apart.
Importance score at ingest
Every entity gets a 1-10 score derived from source count and connection density (log-based formula). Acts as a 3rd retrieval axis (cosine + recency + importance) via reciprocal rank fusion.
Multi-hop search with Personalized PageRank
HippoRAG-style Personalized PageRank over the 2-hop subgraph of top-K cosine seeds, reciprocal-rank fused. Observable (Prometheus rerank / fallback-by-reason / latency metrics), subgraph timeout widened so the rerank actually runs instead of silently falling back. Flag-activated, ready for measured A/B.
Hot cache on the search path
LRU + TTL on top of search_nodes / search_memory_facts, scoped per tenant. Repeated recall served in ~2 ms (≈100× faster, measured locally), immediate write-invalidation (including via the ingestion queue), hit-rate exposed in Prometheus. Flag-activated (Memory³ pattern).
Procedural memory (skills)
Separate :Skill namespace, isolated from facts. MCP tools record_skill / recall_skill — recall scored name > intent > body, strict multi-tenant isolation.
Failure reflection
reflect_on_failure MCP tool: verbal capture of a failure (intent + lesson) transcoded into a failure_pattern-tagged episode with boosted importance. Reflexion pattern (Shinn et al. 2023), no weight updates.
Karpathy-style block-by-block Wiki
Deterministic Markdown page per entity: header, Sleeper summary, sources, [[wiki]] links. Only the summary comes from the LLM, the rest is rendered by pure Python — versioned, exportable as Obsidian ZIP.
Communities and themes
Label Propagation Algorithm in pure Cypher on the Neo4j graph (no GDS plugin), /v1/wiki/themes.json endpoint. Each community gets a deterministic label (key members by importance, no LLM so no hallucination), generated by the Dream pass after detection, then rolled up into a tenant-level digest (a hierarchical abstraction above communities). The Wiki and admin render a named per-tenant « Themes » section.
Bundled 3D Cortex UI
Graph explorer rendered with 3d-force-graph (no bundler pipeline). SSE live updates, per-project filters, point-and-click entity deletion, server-side pagination to explore large graphs in windows.
Confidence scoring on facts
Every relationship carries a confidence score that climbs on multi-source corroboration. Threshold-based filtering exposed via /v1/graph/data.
Granular forgetting (3 levels)
Entity (one-click in the 3D Cortex), specific fact (DELETE /forget/fact/{uuid}, GDPR-grade with audit log), whole instance (DELETE that automatically purges Neo4j and Postgres).
Full admin web
Inference settings, instances, per-tenant activity, 90-day usage ledger, bundle downloads, token and marketplace management. SPONGRAM_ADMIN_TOKEN-gated.
Production hardening
Multi-component /healthz (Postgres, Neo4j, Graphiti, 3 workers), /metrics Prometheus, circuit breakers on LLM and embedder, Postgres ↔ Neo4j audit, idempotency-checked migrations, /orphans/{gid}/purge recovery endpoint.
Reproducible recall-quality bench
pytest tests/bench/ suite that replays a LongMemEval subset (knowledge_update) with LLM-as-judge. The bench waits for the real queue drain AND the extraction plateau before measuring (no more truncated baseline). Reports accuracy / latency p50-p95 / tokens, JSON-frozen baseline, ON / OFF comparison of retrieval features.

Other products in this space — compare for yourself:

Choose your inference

Spongram speaks OpenAI-compat. Nine presets plus a Custom mode for any endpoint. Hot-swap from the admin, 3-layer validation built in.

hosted
SPT Models
Sponge Theory hosted stack (default)
local
Ollama
Local workstation, port 11434
local
LM Studio
Desktop Mac/Win, port 1234
local
vLLM
Local NVIDIA GPU server, port 8000
local
llama.cpp server
Lightweight, GGUF, port 8080
cloud
OpenAI
Cloud, gpt-4.1 family
cloud
Anthropic Claude
Cloud via LiteLLM adapter
cloud
Mistral
Cloud, mistral-large
cloud
Groq
Cloud, llama-3.3 sub-second

All driven by the same admin Settings page. “Test connection” button → 3-layer probe (chat / tool_calls / strict json_schema) → you know in 3 seconds whether your provider is Graphiti-compatible.

Spongram in 90 seconds

Walkthrough: create a brain, install the Claude Code plugin, push your first memory, do a bi-temporal lookup, open the 3D Cortex.

Architecture in one slide

A single FastAPI instance serves the MCP, the admin SPA, the 3D Cortex explorer and this landing. Everything else is Postgres + Neo4j, side by side.

Claude Code Claude Desktop MCP client Spongram auth + group_id /mcp /admin/api /v1/graph (cortex) SPT Models LLM (Qwen/Gemma) jina-embeddings-v5 Graphiti bi-temporal extraction Postgres instances episodes Neo4j nodes edges Sleeper background Summary + Wiki
  • Client (Claude Code, Claude Desktop) → MCP HTTP → Spongram (Bearer spt_brain_* auth)
  • Spongram injects `group_id`, routes to Graphiti, rejects any tampering attempt
  • Graphiti writes to Neo4j (graph) and Postgres (episodes/instances), calls SPT Models for LLM + embeddings
  • Sleeper runs a background loop that generates Summary nodes and deterministic Wiki pages

Two editions, one brain

Direct purchase via Lemon Squeezy. Same memory engine, same code map — you choose where it runs.

Spongram Cloud

Self-hosted multi-tenant, for teams and GPU infra

  • Full Docker stack (Postgres + Neo4j + Graphiti) on YOUR infra — plain compose or Swarm
  • Strict multi-tenant: one instance, N brains isolated by anti-tampering group_id
  • Complete web admin: instances, per-tenant Claude Code bundles, 90-day usage, Prometheus
  • Pluggable inference provider: 9 OpenAI-compat presets, hot-swap without restart
Buy the Cloud edition →

Self-host license, source included, deployment documentation.

Spongram Desktop

Native app, 100% offline, zero Docker

macOS Windows · coming soonLinux · coming soon
  • Signed native macOS app (Apple Silicon + Intel), frozen Python sidecar embedded
  • Embedded FalkorDB — no external database, your data stays in your user session
  • Memory, Code map, 3D Cortex, Wiki, Search, Pulse — 7 native tabs
  • Local inference of your choice: Ollama, LM Studio… your machine, your data, period
Buy the Desktop edition →

Lifetime per-seat license, Lemon Squeezy activation built into the app.

Install, short version

On purchase you receive the Docker images and the env template. Three steps, 10 minutes on any box with Docker.

  1. 01

    Configure the environment

    cp .env.example .env
    # set SPONGRAM_ADMIN_TOKEN + SPONGRAM_INFERENCE_API_KEY
    # point SPONGRAM_INFERENCE_BASE_URL to your OpenAI-compat endpoint
  2. 02

    Boot the stack

    docker compose up -d
    # Postgres + Neo4j + Spongram come up;
    # SQL migrations applied automatically
  3. 03

    Create a brain, then wire up Claude Code

    # Admin console: http://localhost:8091/admin/
    # create an instance, copy the spt_brain_ key (shown once)
    claude marketplace add https://your-host.tld/admin/api/spongram/marketplace.json
    claude plugin install spongram

For prod: Docker Swarm behind Traefik, deploy via Portainer. Full documentation shipped with purchase.

Frequently asked

What exactly is Spongram?

A multi-tenant long-term memory for Claude agents. Spongram packages Graphiti, its client bundles, its admin and its inference into one coherent product that you deploy yourself.

Why not just use an open-source alternative?

You can. If your primary constraint is strict multi-tenant + 100% on-prem + one-click Claude Code integration with a production-ready deliverable, Spongram ticks all three in V1. If not, pick the product that covers your actual need.

Which LLM do you depend on?

None specifically — Spongram speaks OpenAI-compat. Pick your provider in the admin (9-preset dropdown: SPT Models, Ollama, LM Studio, local vLLM, OpenAI, Anthropic via adapter, Mistral, Groq, or Custom). Hot-swap, no restart. A 3-layer probe validates that your provider supports tool_calls + strict json_schema before you save.

What does the code map change for my Claude Code token bill?

A lot on structural questions: "what are the most central files" costs −68% (one code_map_god_nodes call replaces a subagent + 29 greps, and is the only exact answer), a package overview −39%. Across a 6-question real bench: ≈ −17% total cost and ~2× faster. For a bare "where is X defined", grep stays cheaper — and that is what the SKILL tells the agent. Numbers and protocol ship with the product, reproducible.

How much does it cost?

Two editions, direct purchase via Lemon Squeezy: Spongram Desktop (lifetime per-seat license — macOS available, Windows/Linux in preparation) and Spongram Cloud (self-host license, source included). For assisted, multi-site or volume deployments: contact@sponge-theory.ai.