Part 1 ended on a diagnosis: the industry treated context as something that happens inside a session, inside an app, inside someone else's data centre. That was the category mistake. Every other failure — amnesia, the Librarian Tax, the firehose, privacy surrender, speed — is downstream of it.
Carabase is the category correction.
What it is
Carabase is a personal context engine. Not a notes app. Not a chatbot with memory features. Not a "second brain" with an AI tab bolted to the side. Not Rewind rebranded. Not another app that wants a new hotkey and twenty minutes of your setup time. An engine — a persistent, temporal, queryable model of everything you know, who you know it from, when you learned it, and how much you trust it. It runs on your hardware. It is controlled by you. Any AI tool can plug into it.
The structure has three pieces.
The Host is a small service that runs on a machine you own — a Mac Mini, a homelab box, a dedicated desktop, a closet server. It holds a Postgres database with pgvector, a knowledge graph, a set of ingestion workers that sync from your connected tools overnight, an orchestrator that routes queries to the right retrieval strategies, and a Model Context Protocol server that exposes the whole thing to any compliant AI tool. It is a boring stack. That is the feature.
The Desktop Client is the interface. Rust. Tauri. Native. Fast. Cmd+K for the palette, Cmd+Shift+Space to summon from anywhere in the OS, zero border-radius, warm neutrals, moss and ember. It is how you live in Carabase day to day.
The iOS client is for capture on the move. Share extension, Siri Shortcuts, Lock Screen widget. Walking with a podcast, you swipe over and dictate. Standing in a meeting, you press a button and save the moment. The capture lands in your Host over Tailscale and is waiting for you on the Desktop by the time you sit down.
Three pieces. All yours. All talking to each other over a private mesh that never touches the public internet.
The two taxes
Every existing personal-knowledge tool charges two taxes, and it is worth naming them before the rest makes sense.
The Librarian Tax you already know from Part 1 — the cost of getting context in. The filing, the tagging, the manual curation, the maintenance of an ontology that only you can see.
The Retrieval Tax is the one nobody names. It is the cost of getting context out. Every time you stop what you're doing to open the right app, dig through the right folder, remember the right keyword, copy the right passage, paste it into the right window — you are paying the Retrieval Tax. You are not doing knowledge work at that point. You are doing data entry on yourself. Billing by the minute to a project with no client.
Most tools reduce one tax and raise the other. Strict folder systems reduce the Retrieval Tax at the cost of a brutal Librarian Tax. Automated ingestion reduces the Librarian Tax at the cost of an explosive Retrieval Tax, because now everything is in there and you have to fight the firehose to find the thing you want.
The goal of Carabase is to zero both. You don't file things; the engine files them for you, with provenance, into a structure that is already indexed for retrieval. You don't dig for things; you ask, and the answer comes back in milliseconds with citations and a confidence score.
Consider a small example that has haunted a lot of people. You're walking, listening to a podcast. Someone says something that resonates. You do not want to stop walking. You do not want to open a notes app. You do not want to type out the insight, tag it, file it in the right folder, link it to the right project. You want to screenshot the podcast player at that moment, swipe over to Carabase, dictate "what they said about X was epic," and keep walking. Later — maybe the same day, maybe three months later when the thought becomes relevant — you ask "that thing from the podcast about X" and the engine brings back the screenshot, the transcript around it, your one-line reaction, and the project it belongs to. No Librarian Tax on the way in. No Retrieval Tax on the way out.
That is the product. Everything below is how we built it.
Speed is the first architectural choice
The engine has to feel like an extension of your working memory, not a petition to a distant bureaucracy. That means sub-hundred-millisecond speed. That is impossible from the cloud, no matter how much you pay AWS.
Retrieval runs in under ten milliseconds. We got there by doing the unfashionable thing. We took the LLM out of the retrieval loop.
Every "AI" search product on the market ships the same pipeline: query → embed → vector search → feed the results to an LLM → LLM picks the best one → return. It is slow because the LLM step is slow. It is also a pyromaniac's approach to tokens — setting fire to money on every fucking query to answer questions a deterministic system could handle in one round trip.
Carabase does it differently. Your query hits a deterministic router that fires the right strategies across multiple layers of maps:
- A knowledge graph for relationships ("who introduced me to the CFO at Acme")
- A semantic index for meaning ("that thing about distribution strategy")
- A metadata index for time and source ("what did I say in yesterday's meeting")
- An entity resolver for names ("David" → which David, out of the three in your graph)
- A hypothesis verifier for claims ("did I actually commit to that deadline")
The router returns the union, ranked by signals from your actual data. You do not brute-force a lifetime of context. You map it, and then you navigate the maps. Postgres, pgvector, and a graph traversal each do their job in single-digit milliseconds. We let them.
The LLM is still there when you want it. It is summoned explicitly, as an act — for summarisation, for drafting, for reasoning across what was retrieved. It is not snuck into the hot path. You will notice the difference the first time you use it. You will stop tolerating the alternative within a week.
The interface is the second architectural choice
The cloud chatbot architecture forecloses something else no roadmap can add back: interfaces that aren't the chat window.
Most AI tools ship a chat window because a chat window is cheap to build and demos well. It is a box with a cursor. It is the laziest thing you can ship. Everybody shipped it. Superhuman won email — a commodity category if there ever was one — by refusing to accept that the inbox's UI was finished. Command palette. Keyboard shortcuts. Speed as identity. Everyone else was still shipping lists of checkboxes.
Knowledge tools are in the same place email was in 2015. Someone is going to win by refusing to accept that the chat window is the interface. We intend to be that someone.
Carabase ships:
- A daily log — your timeline of the day, mixed moss and ember, a block editor with rich types (log cards, calendar events, file artifacts, agent summaries) and slash commands. This is where you live.
- A command palette — Cmd+K, fuzzy-search your entire knowledge graph, actions, folios, entities, documents. Because the retrieval is under ten milliseconds, the palette is useful on every keystroke.
- An ambient graph — a live thumbnail of your knowledge graph in the sidebar, pulsing ember when the harvester is working in the background. You can see the engine breathing.
- An ingestion controller — the page where you decide which iMessage threads matter, which emails, which calendar events, which repositories. The intermediary between the firehose and your substrate.
- A hypothesis verifier — a surface where you ask "is X true" and the engine partitions the evidence into corroborated, contradicted, and inconclusive, with citations.
- A task triage — the tasks extracted from your daily log, cross-day, with their provenance.
- And yes, a summon panel for when the job is genuinely conversational — which is less often than you'd think.
Each surface is designed for the job it does. None of them is a chat window with a sidebar.
Six properties. None suck.
The engine has six properties the existing tools cannot match. Each was a specific architectural decision.
It is temporal. Every fact is timestamped with when it was observed, not just when it was typed. It knows when something became true and — crucially — when it stopped. "Who owned that account in March?" is a different query from "who owns it now," and both are answerable in milliseconds.
It is provenance-aware. Every edge in the knowledge graph carries a source, a confidence score, and a flag for whether it was directly observed, inferred, or merely suspected. When the engine answers, it tells you why — which artifacts corroborate, which contradict, which are plausible but unverified. A chatbot will confidently invent a citation. Carabase will show you the citation, in full, with a confidence score, from a document you actually wrote.
It materializes in layers. The mesh has a shape, and that shape is yours. Capture goes wide on purpose — calendars, mail, code, browser, photos, messages, health, the lot. What lands as a full body, what lands as a TL;DR + entity graph, what lands as metadata-only, and what gets a tombstone — that's a per-rule, per-account, per-source decision the engine respects. Retention caps stack — workspace default, account cap, rule cap, tightest wins. You can veto. You can scope. You can audit. You can purge, and once purged, by default Carabase will not silently re-fetch. The firehose doesn't get to decide what your context becomes. This is your life. Act like it.
It is agent-native and open. It speaks MCP — the Model Context Protocol — so any compliant agent can query it. Claude Desktop, OpenClaw, the desktop client we ship, custom agents, whatever ships next quarter. It also has an integration SDK so you can write your own connectors when your stack doesn't match ours. Stacks are heterogeneous. We're fine with that. Integrate away.
It is yours, materially. Zero public ports on the Host. AES-256-GCM for every stored credential. Row-level security at the database layer, so even a bug in application code cannot leak context across workspaces. Tailscale mesh for device-to-device traffic. If our company disappears tomorrow, your engine keeps running. The Host code is open source. The data is on your drive. The vendor risk is zero by construction. Go read the source.
It distinguishes what you made from what it harvested. Every entry in the daily log carries a source. Authored in moss green if you wrote it. Harvested in ember orange if the engine extracted it from your email, calendar, GitHub, meeting transcripts. You can always see the line between your hand and its hand. This is not a design flourish. It is an epistemic contract.
It tells you what it inferred before believing it. The engine extracts relationships from what it harvests — "Sarah → works_at → Mailchimp," "that meeting → relates_to → the term sheet thread." When confidence is high, the edge lands canonical. When confidence is low, it lands as a question on a Review queue, with provenance attached. You accept (it becomes canonical), reject (it gets soft-deleted), silence (don't ask about this pair again), or you can dispatch an in-loop agent that does web search and stamps a verdict on the card before you see it. Chatbots invent citations and serve them with confidence. Carabase puts the inference in front of you with the citation already attached.
Why now
We stand on the shoulders of giants. You could not have built this a year ago. Five underlying shifts converged:
- MCP turned "plug any agent into any knowledge source" from a fantasy into a shipped standard
- pgvector got fast enough that a Postgres instance on a Mac Mini holds years of embeddings and answers in milliseconds
- Ollama and open-weight models made local LLM inference realistic
- Tailscale made self-hosting as reachable as the cloud, without the public-internet surface area
- Apple Silicon made a thousand-dollar desktop genuinely fast enough to run all of it at once
- Multimodal models got cheap enough that first-pass image analysis is rounding-error money (~$0.0001/image), and Apple Vision OCR runs on the local Mac for free. "Best photos from my Tokyo trip" is now a query, not a fantasy.
None alone would have been enough. Stack them and the floor is here. The industry walked right past it and kept building chatbots. That is their prerogative. It is also their mistake.
The bet
We are willing to be wrong about a few things. That privacy matters to enough people. That self-hosting can feel effortless. That the two taxes can be zeroed rather than merely reduced. Those are empirical. The market will answer them.
We are not willing to be wrong about the structural claims: that speed is a category-definer, that interface is a category-definer, that your context belongs to you, and that an engine beats a chatbot over any timescale longer than a quarter.
Those are premises. Everything is built on them. If any one of them turns out to be wrong, we are building the wrong thing. They are not wrong.
The AI industry spent three years convincing itself that the product was the model. Models are a commodity. The product was never going to be the model. The product is the thing that knows you, runs at the speed of thought, refuses the chat window, belongs to you, and persists.
We are building the engine that remembers last Tuesday. Not in a session. Not in a context window. In a substrate you own, on hardware you control, that gets smarter while you sleep and answers before you finish typing.
If you've read this far, Part 3 is for you — how to actually get the most out of it once it's running.