Second memory for Claude Code & Claude Desktop

The persistent brain
that never leaves your GPUs.

Every conversation feeds a living knowledge graph — extracted, consolidated and recalled by your own models, on your own infra. Exportable in one click. Zero lock-in, zero leak.

Buy Spongram Watch the demo — 90s

10s

extraction per imported memory

tokens to an external LLM

100%

brain exportable · spongram-brain/v1

editions · cloud & desktop

10s

Proven portability

Export cloud → import desktop: every memory is re-extracted and re-dated in ~10 seconds. spongram-brain/v1 format, portable across instances.

External leakage

Extraction, embeddings, consolidation: it all runs on your GPUs, on your own infra — never a third party.

Living Cortex

Every entity is a node in the 3D Cortex, updated live over SSE the moment a new memory lands.

FR·EN

Full coverage

Product, explorers and admin, entirely bilingual — translation parity verified key by key.

Portability

Your memory belongs to you.

The whole brain — episodes, entities, dated facts — exports and re-imports through the open spongram-brain/v1 format. No data is a prisoner of the instance that created it.

Export / import spongram-brain/v1

A single versioned JSON file holds the entire brain. Re-imported elsewhere, every episode is re-extracted and re-dated — nothing overwritten, nothing lost.

Cloud → desktop in 10 seconds

Export from the cloud admin, import into the desktop app (or the reverse): the format is identical on both sides, no migration script needed.

Built-in desktop backup

The Desktop edition ships a dedicated Backup card in settings — schedulable export, one-click restore, no external database required.

Granular forgetting, never forced

Entity, fact, or whole instance: you choose what stays. Zero forced retention, zero crippled export.

The command deck

The product as it actually runs, not a mockup.

Four real screens from the production cloud admin — dark-first, a living Cortex dome, full-screen explorers.

Spongram admin login screen, dark theme — Login — dark-first, one visual identity from desktop to cloud.

Spongram instance list in the admin — Instance fleet — status, per-tenant isolated brains, one-click bundles.

Live 3D Cortex dome in the admin — The bridge + the dome — live 3D Cortex, SSE, every thought becomes a node.

Full-screen graph explorer overlay — Full-screen explorer — graph, wiki and code map, the same visual tokens everywhere.

Spongram in 90 seconds

Create a brain, install the Claude Code plugin, add a first memory and recall it, open the live 3D Cortex, export cloud → import desktop cross-instance, full-screen explorers, and the desktop backup card — captured on the real product, not a mockup.

Why Spongram

100% on-prem, data and inference

Postgres + Neo4j + Graphiti + LLM all self-hosted. No episode token ever leaves your network. Runs on Docker Swarm or plain compose.

Multi-tenant by design

Strict per-tenant isolation via server-injected `group_id` with anti-tampering. One instance, N brains, zero cross-leak.

One-command Claude Code install

Per-tenant `.plugin` bundle generated on the fly: marketplace, SKILL.md, SessionStart hook. `claude plugin install spongram` and memory is online.

Bi-temporal by construction

Every fact carries `valid_at` / `invalid_at`. When you correct an entry, the old one is superseded — the history stays auditable.

Pluggable inference provider

Ollama, LM Studio, local vLLM, OpenAI, Anthropic, Mistral, Groq, or any OpenAI-compat endpoint. Hot-swap from the admin, zero restart, zero lock-in. One-click 3-layer validation (chat / tool_calls / strict json_schema).

Code map included

Deterministic AST extraction (tree-sitter) of your repos → a multi-tenant structural graph queryable through 5 MCP tools. Measured in a real bench: −68% cost on architecture questions, ~2× faster than grep/read.

What Spongram does today

Every capability listed here is verifiable in the shipped codebase.

Foundations

On-prem self-host

docker-compose or Swarm + Portainer stack. Zero external cloud dependency.

Pluggable inference & hot-swap

Admin Settings page + 9-preset dropdown + 3-layer probe (chat / tool_calls / strict json_schema). Provider change takes effect on the next request, no container restart.

One-command Claude Code plugin

Per-tenant .plugin bundle: marketplace, SKILL.md, SessionStart hook.

Native Desktop edition (macOS)

Developer ID-signed Tauri app: frozen Python sidecar (PyInstaller), embedded FalkorDB, 100% offline. Memory, Code map, 3D Cortex, Wiki, Search, Pulse, Settings — no Docker, no external database. Windows and Linux in preparation.

Built-in Lemon Squeezy licensing (desktop)

In-app license activation, local validation, lifetime per-seat license.

Memory & multi-tenant

Strict multi-tenant

Server-injected group_id, anti-tampering rejection. One instance, N brains, zero cross-leakage.

Bi-temporal facts

valid_at / invalid_at inherited from Graphiti — automatic supersession on contradiction.

Proactive memory capture

Agents (Claude Code, Claude Desktop) call add_memory on their own whenever a fact, preference or decision is stated — no need to ask.

Cross-project, cross-client "global" memory

A fact tagged project=global (stated in any project, or from Claude Desktop) is recalled across ALL your projects: one brain shared across every codebase and client. Project-specific facts stay scoped per project; search keeps the current project + global by default and drops other-project noise.

Authority validation at ingest (constitutional layer)

Optional provenance gate at the add_memory boundary: an episode must carry the expected tags (project=, and client= from a configured allow-list) before it enters the brain, otherwise it's rejected before even hitting the durable queue. Deterministic, dependency-free, flag-activated. Answers the classic critique « stores whatever it's given with no authority validation on ingested entities ».

Durable episode queue

Every add_memory is persisted to Postgres before reaching Graphiti. Zero loss on container restart, automatic retry on transient error (writes via the queue, reads via bounded retry/backoff on the Graphiti path). Oversized episodes (>32k tokens) are auto-chunked before extraction so they're never rejected by the LLM.

Confidence scoring on facts

Every relationship carries a confidence score that climbs on multi-source corroboration. Threshold-based filtering exposed via /v1/graph/data.

Granular forgetting (3 levels)

Entity (one-click in the 3D Cortex), specific fact (DELETE /forget/fact/{uuid}, GDPR-grade with audit log), whole instance (DELETE that automatically purges Neo4j and Postgres).

Code map

Code map (structural code graph)

Client-side AST extraction via tree-sitter (~30 languages): files, classes, functions, imports, calls — including AST-inferred qualified module.attr() calls. 276 files → 2,143 nodes / 4,509 edges in ~14 s on the Spongram repo itself. Portable graph on Neo4j (cloud) and FalkorDB (desktop), tenant-isolated, kept fresh by a git post-commit hook.

5 code_map_* MCP tools + dedicated 3D tab

code_map_query / neighbors / god_nodes / shortest_path / stats. Memories that mention a file or symbol automatically surface their linked code nodes in search_nodes. A mini-map (~110 tokens) is injected at SessionStart: repo size, directories, central files — the agent is oriented before its first tool call.

Background science

Sleep-time compute (Sleeper)

Background worker that synthesises Summary nodes and Wiki pages, with bounded concurrency (several entities in parallel, cap via SPONGRAM_SLEEPER_CONCURRENCY). Idempotent: only re-runs entities whose episode set actually changed (sha256 hash).

Dream cycle (periodic consolidation)

Duplicate-candidate detection via cosine similarity, time decay on edges (Ebbinghaus curve exp(-Δt/τ)), community propagation (LPA), boost on multi-source corroboration.

Non-destructive de-duplication + co-evolution (A-MEM)

The Dream links same-normalised-name entities via a :SAME_AS edge — no merge, zero data loss: recall and UI treat them as one. Aliases co-evolve (A-MEM pattern, NeurIPS 2025): when one carries a fresher summary, the other is re-synthesised on the next tick instead of drifting apart.

Importance score at ingest

Every entity gets a 1-10 score derived from source count and connection density (log-based formula). Acts as a 3rd retrieval axis (cosine + recency + importance) via reciprocal rank fusion.

Multi-hop search with Personalized PageRank

HippoRAG-style Personalized PageRank over the 2-hop subgraph of top-K cosine seeds, reciprocal-rank fused. Observable (Prometheus rerank / fallback-by-reason / latency metrics), subgraph timeout widened so the rerank actually runs instead of silently falling back. Flag-activated, ready for measured A/B.

Hot cache on the search path

LRU + TTL on top of search_nodes / search_memory_facts, scoped per tenant. Repeated recall served in ~2 ms (≈100× faster, measured locally), immediate write-invalidation (including via the ingestion queue), hit-rate exposed in Prometheus. Flag-activated (Memory³ pattern).

Procedural memory (skills)

Separate :Skill namespace, isolated from facts. MCP tools record_skill / recall_skill — recall scored name > intent > body, strict multi-tenant isolation.

Failure reflection

reflect_on_failure MCP tool: verbal capture of a failure (intent + lesson) transcoded into a failure_pattern-tagged episode with boosted importance. Reflexion pattern (Shinn et al. 2023), no weight updates.

Wiki & 3D Cortex

Karpathy-style block-by-block Wiki

Deterministic Markdown page per entity: header, Sleeper summary, sources, [[wiki]] links. Only the summary comes from the LLM, the rest is rendered by pure Python — versioned, exportable as Obsidian ZIP.

Communities and themes

Label Propagation Algorithm in pure Cypher on the Neo4j graph (no GDS plugin), /v1/wiki/themes.json endpoint. Each community gets a deterministic label (key members by importance, no LLM so no hallucination), generated by the Dream pass after detection, then rolled up into a tenant-level digest (a hierarchical abstraction above communities). The Wiki and admin render a named per-tenant « Themes » section.

Bundled 3D Cortex UI

Graph explorer rendered with 3d-force-graph (no bundler pipeline). SSE live updates, per-project filters, point-and-click entity deletion, server-side pagination to explore large graphs in windows.

Production hardening

Full admin web

Inference settings, instances, per-tenant activity, 90-day usage ledger, bundle downloads, token and marketplace management. SPONGRAM_ADMIN_TOKEN-gated.

Production hardening

Multi-component /healthz (Postgres, Neo4j, Graphiti, 3 workers), /metrics Prometheus, circuit breakers on LLM and embedder, Postgres ↔ Neo4j audit, idempotency-checked migrations, /orphans/{gid}/purge recovery endpoint.

Per-instance rate limiting

100 requests/minute per tenant, 429 beyond that (spec §8.4). Protects shared inference: one noisy tenant can't starve the others.

Reproducible recall-quality bench

pytest tests/bench/ suite that replays a LongMemEval subset (knowledge_update) with LLM-as-judge. The bench waits for the real queue drain AND the extraction plateau before measuring (no more truncated baseline). Reports accuracy / latency p50-p95 / tokens, JSON-frozen baseline, ON / OFF comparison of retrieval features.

Full i18n

Fully bilingual product

Admin (296 keys), desktop (127 keys), landing and server surfaces (/v1/graph, /v1/wiki, /v1/codemap via ?lang=) — FR/EN end to end, parity verified key by key every release.

Other products in this space — compare for yourself:

Mem0 ↗Letta ↗Zep ↗Cognee ↗

The code map: fewer tokens, better answers

Spongram maps your repos via AST (tree-sitter) and exposes the graph to Claude Code through 5 MCP tools. Everything below is measured in real claude -p sessions, on the Spongram repo itself — protocol and runner ship with the product (docs/BENCH_CODEMAP_2026-06-10.md).

Code-navigation question	Without Spongram (grep/read)	With Spongram (code map)	Δ cost
Most central files / symbols	$0.329 — subagent + 29 greps, estimated ranking	$0.105 — 1 god_nodes call, exact graph degree	−68%
Package overview	$0.291	$0.178	−39%
Who calls this function?	$0.086	$0.106 — prod + test callers in 1 call	+23%*
Module contents (classes, methods)	$0.128	$0.181	+41%*
Dependency path A → B	$0.107	$0.129	+20%*
Total (6 questions)	$1.05	$0.83	−21%
Cumulative response time	293 s	151 s	−48%

* For point lookups grep stays cheaper — and that is exactly what the shipped SKILL tells the agent: every tool where it belongs. A/B bench in real claude -p sessions (Sonnet), 1 question per session, read-only tools, 276-file repo; total between −12% and −21% across runs.

Deterministic AST extraction, zero LLM

Client-side tree-sitter (~30 languages): 276 files → 2,143 nodes / 4,509 edges in ~14 s. No indexing cost, reproducible to the commit.

5 dedicated MCP tools

code_map_query, neighbors, god_nodes, shortest_path, stats — including AST-inferred qualified module.attr() calls that grep often misses.

Mini-map at SessionStart

~110 tokens injected at session open: repo size, directories, central files. The agent is oriented before its first tool call.

Memory and code linked

A memory that mentions a file or symbol automatically surfaces its code nodes in search_nodes. The map stays fresh via the git post-commit hook.

In both editions

3D “Code map” tab; portable graph on Neo4j (cloud) / embedded FalkorDB (desktop), tenant-isolated.

Choose your inference

Spongram speaks OpenAI-compat. Nine presets plus a Custom mode for any endpoint. Hot-swap from the admin, 3-layer validation built in.

hosted

SPT Models

Sponge Theory hosted stack (default)

local

Ollama

Local workstation, port 11434

local

LM Studio

Desktop Mac/Win, port 1234

local

vLLM

Local NVIDIA GPU server, port 8000

local

llama.cpp server

Lightweight, GGUF, port 8080

cloud

OpenAI

Cloud, gpt-4.1 family

cloud

Anthropic Claude

Cloud via LiteLLM adapter

cloud

Mistral

Cloud, mistral-large

cloud

Groq

Cloud, llama-3.3 sub-second

All driven by the same admin Settings page. “Test connection” button → 3-layer probe (chat / tool_calls / strict json_schema) → you know in 3 seconds whether your provider is Graphiti-compatible.

Architecture in one slide

A single FastAPI instance serves the MCP, the admin SPA, the 3D Cortex explorer and this landing. Everything else is Postgres + Neo4j, side by side.

Client (Claude Code, Claude Desktop) → MCP HTTP → Spongram (Bearer spt_brain_* auth)
Spongram injects `group_id`, routes to Graphiti, rejects any tampering attempt
Graphiti writes to Neo4j (graph) and Postgres (episodes/instances), calls SPT Models for LLM + embeddings
Sleeper runs a background loop that generates Summary nodes and deterministic Wiki pages

Two editions, one brain

Direct purchase via Lemon Squeezy. Same memory engine, same code map — you choose where it runs.

Spongram Cloud

Self-hosted multi-tenant, for teams and GPU infra

Full Docker stack (Postgres + Neo4j + Graphiti) on YOUR infra — plain compose or Swarm
Strict multi-tenant: one instance, N brains isolated by anti-tampering group_id
Complete web admin: instances, per-tenant Claude Code bundles, 90-day usage, Prometheus
Pluggable inference provider: 9 OpenAI-compat presets, hot-swap without restart

Buy the Cloud edition →

Self-host license, source included, deployment documentation.

Spongram Desktop

Native app, 100% offline, zero Docker

macOS Windows · coming soonLinux · coming soon

Signed native macOS app (Apple Silicon + Intel), frozen Python sidecar embedded
Embedded FalkorDB — no external database, your data stays in your user session
Memory, Code map, 3D Cortex, Wiki, Search, Pulse — 7 native tabs
Backup built into settings: spongram-brain/v1 export/import, one-click restore
Local inference of your choice: Ollama, LM Studio… your machine, your data, period

Buy the Desktop edition →

Lifetime per-seat license, Lemon Squeezy activation built into the app.

Install, short version

On purchase you receive the Docker images and the env template. Three steps, 10 minutes on any box with Docker.

Configure the environment

cp .env.example .env
# set SPONGRAM_ADMIN_TOKEN + SPONGRAM_INFERENCE_API_KEY
# point SPONGRAM_INFERENCE_BASE_URL to your OpenAI-compat endpoint

Boot the stack

docker compose up -d
# Postgres + Neo4j + Spongram come up;
# SQL migrations applied automatically

Create a brain, then wire up Claude Code

# Admin console: http://localhost:8091/admin/
# create an instance, copy the spt_brain_ key (shown once)
claude marketplace add https://your-host.tld/admin/api/spongram/marketplace.json
claude plugin install spongram

For prod: Docker Swarm behind Traefik, deploy via Portainer. Full documentation shipped with purchase.

Frequently asked

What exactly is Spongram?

A multi-tenant long-term memory for Claude agents. Spongram packages Graphiti, its client bundles, its admin and its inference into one coherent product that you deploy yourself.

Why not just use an open-source alternative?

You can. If your primary constraint is strict multi-tenant + 100% on-prem + one-click Claude Code integration with a production-ready deliverable, Spongram ticks all three in V1. If not, pick the product that covers your actual need.

Which LLM do you depend on?

None specifically — Spongram speaks OpenAI-compat. Pick your provider in the admin (9-preset dropdown: SPT Models, Ollama, LM Studio, local vLLM, OpenAI, Anthropic via adapter, Mistral, Groq, or Custom). Hot-swap, no restart. A 3-layer probe validates that your provider supports tool_calls + strict json_schema before you save.

What does the code map change for my Claude Code token bill?

A lot on structural questions: "what are the most central files" costs −68% (one code_map_god_nodes call replaces a subagent + 29 greps, and is the only exact answer), a package overview −39%. Across a 6-question real bench: ≈ −17% total cost and ~2× faster. For a bare "where is X defined", grep stays cheaper — and that is what the SKILL tells the agent. Numbers and protocol ship with the product, reproducible.

How much does it cost?

Two editions, direct purchase via Lemon Squeezy: Spongram Desktop (lifetime per-seat license — macOS available, Windows/Linux in preparation) and Spongram Cloud (self-host license, source included). For assisted, multi-site or volume deployments: contact@sponge-theory.ai.

The persistent brainthat never leaves your GPUs.

Your memory belongs to you.

Export / import spongram-brain/v1

Cloud → desktop in 10 seconds

Built-in desktop backup

Granular forgetting, never forced

The product as it actually runs, not a mockup.

Spongram in 90 seconds

Why Spongram

100% on-prem, data and inference

Multi-tenant by design

One-command Claude Code install

Bi-temporal by construction

Pluggable inference provider

Code map included

What Spongram does today

Foundations

Memory & multi-tenant

Code map

Background science

Wiki & 3D Cortex

Production hardening

Full i18n

The code map: fewer tokens, better answers

Deterministic AST extraction, zero LLM

5 dedicated MCP tools

Mini-map at SessionStart

Memory and code linked

In both editions

Choose your inference

Architecture in one slide

Two editions, one brain

Spongram Cloud

Spongram Desktop

Install, short version

Configure the environment

Boot the stack

Create a brain, then wire up Claude Code

Frequently asked

The persistent brain
that never leaves your GPUs.