April 11, 2026
What if Karpathy’s LLM Wiki Could Reason?
Knowledge decays. Not the knowledge itself, but the systems we build to hold it.
Last week, Andrej Karpathy published a GitHub Gist that crystallized something many of us have felt but few have articulated with such precision. He called it “LLM Wiki,” a system where an LLM compiles raw sources into a structured markdown wiki, maintained automatically. His core thesis was disarmingly simple:
“The tedious part of maintaining a knowledge base is not the reading or the thinking. It is the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims. Humans abandon wikis because the maintenance burden grows faster than the value.”
He is right. And this is the problem I have been building against for the past year.
Files Do Not Reason
Karpathy’s approach works beautifully at the individual level. One researcher, one topic, markdown files rendered in Obsidian. But there is a structural limitation embedded in the substrate itself: files do not reason. They do not know when one document contradicts another. They cannot trace a chain of decisions to surface which ones have been superseded. They do not learn which knowledge you access most frequently and adjust accordingly.
A file system gives you storage. What you need is structure. And structure, in the formal sense, is not organization; it is the capacity for inference.
The Filing Cabinet Problem
Consider what the word “memory” actually means in the context of AI agent systems today. Mem0, Zep, LangChain Memory: these are vector stores with search bars. You embed text, you retrieve by similarity. This works for recall, the way a filing cabinet works for retrieval. You know roughly which drawer to open, and you find something close to what you were looking for.
But recall is not reasoning. A filing cabinet cannot tell you that the document in drawer three contradicts the one in drawer seven. It cannot trace the chain from a decision made in January through its downstream effects in March. It does not know that the API rate limit decision from November was never implemented, or that the same subsystem has been the source of three incidents in ninety days.
Similarity search finds what looks alike. Reasoning finds what matters.
Knowledge Has Types
When I built Cortex, I started from a premise that the current market has overlooked entirely: knowledge has types, and relationships between knowledge objects have semantics.
A decision is not the same as a lesson. “Supersedes” is not the same as “contradicts.” And if A supersedes B and B supersedes C, then A supersedes C. That is not a similarity search. That is inference. Classical, deterministic, formal inference.
Cortex uses a formal OWL-RL ontology with eight knowledge object types: Decision, Lesson, Fix, Session, Research, Source, Synthesis, Idea. Eight relationship types: causedBy, contradicts, supports, supersedes, dependsOn, ledTo, implements, mentions. The relationships carry inference rules: transitive chains, symmetric pairs, inverse properties. The reasoning engine produces the same output every time. No LLM calls. No hallucination. No stochastic variance.
The word “ontology” comes from the Greek ontos, meaning “being” or “that which is.” It is the study of what exists and how things relate to each other. In computer science, a formal ontology is a machine-readable specification of concepts and their relationships. It is, in a sense, the grammar of a knowledge base. Without it, you have words. With it, you have language.
Two Stores, One Truth
Under the hood, Cortex runs a dual-store architecture. Oxigraph, a SPARQL graph database, handles relationships, RDF triples, and OWL-RL inference. SQLite with FTS5 handles content, full-text search with BM25 ranking, embeddings, and access patterns. Every knowledge object lives in both stores, consistent and queryable from either side.
Retrieval combines four independent signals: keyword relevance at forty percent, semantic similarity at thirty, graph connectivity at twenty, and recency at ten. Each signal is normalized, weighted, and transparent. The system does not hide its reasoning behind a black box. You can see why a result ranked where it did.
Everything runs locally from ~/.cortex/. No cloud service. No API keys required for core operations. Local embeddings mean semantic search without calls to third parties. The knowledge is yours, structurally and literally.
What This Looks Like in Practice
Scanning 147 knowledge objects...
[contradiction] TTL: config 24h, middleware 1h
[pattern] Auth: 3 incidents in 90 days
[stale] Rate limit decision (Nov) unresolved
[gap] No post-mortem for Mar 12 outage
4 findings. 1 contradiction. 1 systemic pattern.
This runs in sub-second time. The contradiction was found by comparing the supersedes and dependsOnrelationships between two knowledge objects referencing the same entity. The pattern was detected by counting fixes tagged to the same subsystem. The staleness was calculated by checking whether a decision’s dependent actions were ever recorded.
Baseline tested at 100% accuracy on inference operations. The reasoning is formal logic, not statistical prediction.
MCP as the Connective Tissue
Cortex runs as a Model Context Protocol server. Twenty-two tools covering capture, search, reasoning, graph operations, and diagnostics. Any MCP client, whether Claude, Cursor, or Windsurf, can connect to it. Your AI agent gains persistent, structured memory that compounds across sessions.
This is the piece that connects Karpathy’s vision to the broader tool ecosystem. His LLM Wiki is brilliant for individual research. But wire it through MCP, back it with a knowledge graph, give it formal reasoning, and you have something that works at every scale: a solo developer, a team of ten, an organization of thousands.
The Convergence
Karpathy’s insight is correct. LLMs should maintain knowledge, not just retrieve it. But the substrate matters. Markdown files are a beginning. A formal knowledge graph with inference rules, contradiction detection, and self-improving retrieval is where this work needs to go.
The question I keep returning to is this: what happens when the systems we build to hold knowledge become capable of reasoning about it? Not in the probabilistic, approximate way that language models reason, but in the formal, deterministic way that logicians have understood for centuries? What changes when your tools can tell you not just what you know, but where your knowledge is broken?
I do not have a complete answer. But I have a system that is beginning to explore it.
$ pip install abbacus-cortex $ cortex init
GitHub: github.com/abbacusgroup/cortex
grayisnotacolor, Abbacus Group