Second memory for Claude Code & Claude Desktop

The persistent brain that never leaves your GPUs.

Spongram gives Claude a long-term, multi-tenant memory, 100% on-prem, plugged into your own inference. No cloud, no leak, no lock-in.

Buy Spongram → Watch the demo

Your memory, your infra, your AI.

Why Spongram

100% on-prem, data and inference

Postgres + Neo4j + Graphiti + LLM all self-hosted. No episode token ever leaves your network. Runs on Docker Swarm or plain compose.

Multi-tenant by design

Strict per-tenant isolation via server-injected `group_id` with anti-tampering. One instance, N brains, zero cross-leak.

One-command Claude Code install

Per-tenant `.plugin` bundle generated on the fly: marketplace, SKILL.md, SessionStart hook. `claude plugin install spongram` and memory is online.

Bi-temporal by construction

Every fact carries `valid_at` / `invalid_at`. When you correct an entry, the old one is superseded — the history stays auditable.

Pluggable inference provider

Ollama, LM Studio, local vLLM, OpenAI, Anthropic, Mistral, Groq, or any OpenAI-compat endpoint. Hot-swap from the admin, zero restart, zero lock-in. One-click 3-layer validation (chat / tool_calls / strict json_schema).

Code map included

Deterministic AST extraction (tree-sitter) of your repos → a multi-tenant structural graph queryable through 5 MCP tools. Measured in a real bench: −68% cost on architecture questions, ~2× faster than grep/read.

The code map: fewer tokens, better answers

Spongram maps your repos via AST (tree-sitter) and exposes the graph to Claude Code through 5 MCP tools. Everything below is measured in real claude -p sessions, on the Spongram repo itself — protocol and runner ship with the product (docs/BENCH_CODEMAP_2026-06-10.md).

Code-navigation question	Without Spongram (grep/read)	With Spongram (code map)	Δ cost
Most central files / symbols	$0.329 — subagent + 29 greps, estimated ranking	$0.105 — 1 god_nodes call, exact graph degree	−68%
Package overview	$0.291	$0.178	−39%
Who calls this function?	$0.086	$0.106 — prod + test callers in 1 call	+23%*
Module contents (classes, methods)	$0.128	$0.181	+41%*
Dependency path A → B	$0.107	$0.129	+20%*
Total (6 questions)	$1.05	$0.83	−21%
Cumulative response time	293 s	151 s	−48%

* For point lookups grep stays cheaper — and that is exactly what the shipped SKILL tells the agent: every tool where it belongs. A/B bench in real claude -p sessions (Sonnet), 1 question per session, read-only tools, 276-file repo; total between −12% and −21% across runs.

Deterministic AST extraction, zero LLM

Client-side tree-sitter (~30 languages): 276 files → 2,143 nodes / 4,509 edges in ~14 s. No indexing cost, reproducible to the commit.

5 dedicated MCP tools

code_map_query, neighbors, god_nodes, shortest_path, stats — including AST-inferred qualified module.attr() calls that grep often misses.

Mini-map at SessionStart

~110 tokens injected at session open: repo size, directories, central files. The agent is oriented before its first tool call.

Memory and code linked

A memory that mentions a file or symbol automatically surfaces its code nodes in search_nodes. The map stays fresh via the git post-commit hook.

In both editions

3D “Code map” tab; portable graph on Neo4j (cloud) / embedded FalkorDB (desktop), tenant-isolated.

What Spongram does today

Every capability listed here is verifiable in the shipped codebase.

On-prem self-host

docker-compose or Swarm + Portainer stack. Zero external cloud dependency.

Pluggable inference & hot-swap

Admin Settings page + 9-preset dropdown + 3-layer probe (chat / tool_calls / strict json_schema). Provider change takes effect on the next request, no container restart.

One-command Claude Code plugin

Per-tenant .plugin bundle: marketplace, SKILL.md, SessionStart hook.

Native Desktop edition (macOS)

Developer ID-signed Tauri app: frozen Python sidecar (PyInstaller), embedded FalkorDB, 100% offline. Memory, Code map, 3D Cortex, Wiki, Search, Pulse, Settings — no Docker, no external database. Windows and Linux in preparation.

Built-in Lemon Squeezy licensing (desktop)

In-app license activation, local validation, lifetime per-seat license.

Code map (structural code graph)

Client-side AST extraction via tree-sitter (~30 languages): files, classes, functions, imports, calls — including AST-inferred qualified module.attr() calls. 276 files → 2,143 nodes / 4,509 edges in ~14 s on the Spongram repo itself. Portable graph on Neo4j (cloud) and FalkorDB (desktop), tenant-isolated, kept fresh by a git post-commit hook.

5 code_map_* MCP tools + dedicated 3D tab

code_map_query / neighbors / god_nodes / shortest_path / stats. Memories that mention a file or symbol automatically surface their linked code nodes in search_nodes. A mini-map (~110 tokens) is injected at SessionStart: repo size, directories, central files — the agent is oriented before its first tool call.

Strict multi-tenant

Server-injected group_id, anti-tampering rejection. One instance, N brains, zero cross-leakage.

Bi-temporal facts

valid_at / invalid_at inherited from Graphiti — automatic supersession on contradiction.

Proactive memory capture

Agents (Claude Code, Claude Desktop) call add_memory on their own whenever a fact, preference or decision is stated — no need to ask.

Cross-project, cross-client "global" memory

A fact tagged project=global (stated in any project, or from Claude Desktop) is recalled across ALL your projects: one brain shared across every codebase and client. Project-specific facts stay scoped per project; search keeps the current project + global by default and drops other-project noise.

Authority validation at ingest (constitutional layer)

Optional provenance gate at the add_memory boundary: an episode must carry the expected tags (project=, and client= from a configured allow-list) before it enters the brain, otherwise it's rejected before even hitting the durable queue. Deterministic, dependency-free, flag-activated. Answers the classic critique « stores whatever it's given with no authority validation on ingested entities ».

Durable episode queue

Every add_memory is persisted to Postgres before reaching Graphiti. Zero loss on container restart, automatic retry on transient error (writes via the queue, reads via bounded retry/backoff on the Graphiti path). Oversized episodes (>32k tokens) are auto-chunked before extraction so they're never rejected by the LLM.

Sleep-time compute (Sleeper)

Background worker that synthesises Summary nodes and Wiki pages, with bounded concurrency (several entities in parallel, cap via SPONGRAM_SLEEPER_CONCURRENCY). Idempotent: only re-runs entities whose episode set actually changed (sha256 hash).

Dream cycle (periodic consolidation)

Duplicate-candidate detection via cosine similarity, time decay on edges (Ebbinghaus curve exp(-Δt/τ)), community propagation (LPA), boost on multi-source corroboration.

Non-destructive de-duplication + co-evolution (A-MEM)

The Dream links same-normalised-name entities via a :SAME_AS edge — no merge, zero data loss: recall and UI treat them as one. Aliases co-evolve (A-MEM pattern, NeurIPS 2025): when one carries a fresher summary, the other is re-synthesised on the next tick instead of drifting apart.

Importance score at ingest

Every entity gets a 1-10 score derived from source count and connection density (log-based formula). Acts as a 3rd retrieval axis (cosine + recency + importance) via reciprocal rank fusion.

Multi-hop search with Personalized PageRank

HippoRAG-style Personalized PageRank over the 2-hop subgraph of top-K cosine seeds, reciprocal-rank fused. Observable (Prometheus rerank / fallback-by-reason / latency metrics), subgraph timeout widened so the rerank actually runs instead of silently falling back. Flag-activated, ready for measured A/B.

Hot cache on the search path

LRU + TTL on top of search_nodes / search_memory_facts, scoped per tenant. Repeated recall served in ~2 ms (≈100× faster, measured locally), immediate write-invalidation (including via the ingestion queue), hit-rate exposed in Prometheus. Flag-activated (Memory³ pattern).

Procedural memory (skills)

Separate :Skill namespace, isolated from facts. MCP tools record_skill / recall_skill — recall scored name > intent > body, strict multi-tenant isolation.

Failure reflection

reflect_on_failure MCP tool: verbal capture of a failure (intent + lesson) transcoded into a failure_pattern-tagged episode with boosted importance. Reflexion pattern (Shinn et al. 2023), no weight updates.

Karpathy-style block-by-block Wiki

Deterministic Markdown page per entity: header, Sleeper summary, sources, [[wiki]] links. Only the summary comes from the LLM, the rest is rendered by pure Python — versioned, exportable as Obsidian ZIP.

Communities and themes

Label Propagation Algorithm in pure Cypher on the Neo4j graph (no GDS plugin), /v1/wiki/themes.json endpoint. Each community gets a deterministic label (key members by importance, no LLM so no hallucination), generated by the Dream pass after detection, then rolled up into a tenant-level digest (a hierarchical abstraction above communities). The Wiki and admin render a named per-tenant « Themes » section.

Bundled 3D Cortex UI

Graph explorer rendered with 3d-force-graph (no bundler pipeline). SSE live updates, per-project filters, point-and-click entity deletion, server-side pagination to explore large graphs in windows.

Confidence scoring on facts

Every relationship carries a confidence score that climbs on multi-source corroboration. Threshold-based filtering exposed via /v1/graph/data.

Granular forgetting (3 levels)

Entity (one-click in the 3D Cortex), specific fact (DELETE /forget/fact/{uuid}, GDPR-grade with audit log), whole instance (DELETE that automatically purges Neo4j and Postgres).

Full admin web

Inference settings, instances, per-tenant activity, 90-day usage ledger, bundle downloads, token and marketplace management. SPONGRAM_ADMIN_TOKEN-gated.

Production hardening

Multi-component /healthz (Postgres, Neo4j, Graphiti, 3 workers), /metrics Prometheus, circuit breakers on LLM and embedder, Postgres ↔ Neo4j audit, idempotency-checked migrations, /orphans/{gid}/purge recovery endpoint.

Reproducible recall-quality bench

pytest tests/bench/ suite that replays a LongMemEval subset (knowledge_update) with LLM-as-judge. The bench waits for the real queue drain AND the extraction plateau before measuring (no more truncated baseline). Reports accuracy / latency p50-p95 / tokens, JSON-frozen baseline, ON / OFF comparison of retrieval features.

Other products in this space — compare for yourself:

Mem0 ↗Letta ↗Zep ↗Cognee ↗

Choose your inference

Spongram speaks OpenAI-compat. Nine presets plus a Custom mode for any endpoint. Hot-swap from the admin, 3-layer validation built in.

hosted

SPT Models

Sponge Theory hosted stack (default)

local

Ollama

Local workstation, port 11434

local

LM Studio

Desktop Mac/Win, port 1234

local

vLLM

Local NVIDIA GPU server, port 8000

local

llama.cpp server

Lightweight, GGUF, port 8080

cloud

OpenAI

Cloud, gpt-4.1 family

cloud

Anthropic Claude

Cloud via LiteLLM adapter

cloud

Mistral

Cloud, mistral-large

cloud

Groq

Cloud, llama-3.3 sub-second

All driven by the same admin Settings page. “Test connection” button → 3-layer probe (chat / tool_calls / strict json_schema) → you know in 3 seconds whether your provider is Graphiti-compatible.

Spongram in 90 seconds

Walkthrough: create a brain, install the Claude Code plugin, push your first memory, do a bi-temporal lookup, open the 3D Cortex.

Architecture in one slide

A single FastAPI instance serves the MCP, the admin SPA, the 3D Cortex explorer and this landing. Everything else is Postgres + Neo4j, side by side.

Client (Claude Code, Claude Desktop) → MCP HTTP → Spongram (Bearer spt_brain_* auth)
Spongram injects `group_id`, routes to Graphiti, rejects any tampering attempt
Graphiti writes to Neo4j (graph) and Postgres (episodes/instances), calls SPT Models for LLM + embeddings
Sleeper runs a background loop that generates Summary nodes and deterministic Wiki pages

Two editions, one brain

Direct purchase via Lemon Squeezy. Same memory engine, same code map — you choose where it runs.

Spongram Cloud

Self-hosted multi-tenant, for teams and GPU infra

Full Docker stack (Postgres + Neo4j + Graphiti) on YOUR infra — plain compose or Swarm
Strict multi-tenant: one instance, N brains isolated by anti-tampering group_id
Complete web admin: instances, per-tenant Claude Code bundles, 90-day usage, Prometheus
Pluggable inference provider: 9 OpenAI-compat presets, hot-swap without restart

Buy the Cloud edition →

Self-host license, source included, deployment documentation.

Spongram Desktop

Native app, 100% offline, zero Docker

macOS Windows · coming soonLinux · coming soon

Signed native macOS app (Apple Silicon + Intel), frozen Python sidecar embedded
Embedded FalkorDB — no external database, your data stays in your user session
Memory, Code map, 3D Cortex, Wiki, Search, Pulse — 7 native tabs
Local inference of your choice: Ollama, LM Studio… your machine, your data, period

Buy the Desktop edition →

Lifetime per-seat license, Lemon Squeezy activation built into the app.

Install, short version

On purchase you receive the Docker images and the env template. Three steps, 10 minutes on any box with Docker.

Configure the environment

cp .env.example .env
# set SPONGRAM_ADMIN_TOKEN + SPONGRAM_INFERENCE_API_KEY
# point SPONGRAM_INFERENCE_BASE_URL to your OpenAI-compat endpoint

Boot the stack

docker compose up -d
# Postgres + Neo4j + Spongram come up;
# SQL migrations applied automatically

Create a brain, then wire up Claude Code

# Admin console: http://localhost:8091/admin/
# create an instance, copy the spt_brain_ key (shown once)
claude marketplace add https://your-host.tld/admin/api/spongram/marketplace.json
claude plugin install spongram

For prod: Docker Swarm behind Traefik, deploy via Portainer. Full documentation shipped with purchase.

Frequently asked

What exactly is Spongram?

A multi-tenant long-term memory for Claude agents. Spongram packages Graphiti, its client bundles, its admin and its inference into one coherent product that you deploy yourself.

Why not just use an open-source alternative?

You can. If your primary constraint is strict multi-tenant + 100% on-prem + one-click Claude Code integration with a production-ready deliverable, Spongram ticks all three in V1. If not, pick the product that covers your actual need.

Which LLM do you depend on?

None specifically — Spongram speaks OpenAI-compat. Pick your provider in the admin (9-preset dropdown: SPT Models, Ollama, LM Studio, local vLLM, OpenAI, Anthropic via adapter, Mistral, Groq, or Custom). Hot-swap, no restart. A 3-layer probe validates that your provider supports tool_calls + strict json_schema before you save.

What does the code map change for my Claude Code token bill?

A lot on structural questions: "what are the most central files" costs −68% (one code_map_god_nodes call replaces a subagent + 29 greps, and is the only exact answer), a package overview −39%. Across a 6-question real bench: ≈ −17% total cost and ~2× faster. For a bare "where is X defined", grep stays cheaper — and that is what the SKILL tells the agent. Numbers and protocol ship with the product, reproducible.

How much does it cost?

Two editions, direct purchase via Lemon Squeezy: Spongram Desktop (lifetime per-seat license — macOS available, Windows/Linux in preparation) and Spongram Cloud (self-host license, source included). For assisted, multi-site or volume deployments: contact@sponge-theory.ai.