LLM Wiki — Andrej Karpathy
Source: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
A pattern for building personal knowledge bases using LLMs.
Core Idea
Instead of RAG (re-deriving knowledge from raw documents on every query), the LLM incrementally builds and maintains a persistent wiki — structured, interlinked markdown files between user and raw sources. Knowledge is compiled once and kept current, not re-derived on every query.
The wiki is a persistent, compounding artifact. Cross-references are already there. Contradictions already flagged. Synthesis reflects everything read.
The human curates sources and asks questions. The LLM does summarizing, cross-referencing, filing, and bookkeeping.
Architecture — Three Layers
- Raw sources — Immutable curated documents. LLM reads but never modifies.
- The wiki — LLM-generated markdown files. LLM owns entirely — creates, updates, maintains cross-references.
- The schema — Configuration (e.g. CLAUDE.md) telling the LLM how the wiki is structured, conventions, workflows. Co-evolved with user.
Operations
- Ingest: New source → LLM reads, discusses takeaways, writes summary, updates index, updates entity/concept pages, appends to log. Single source may touch 10-15 pages.
- Query: Search relevant pages via index, synthesize answer with citations. Good answers filed back as new wiki pages — explorations compound.
- Lint: Health-check for contradictions, stale claims, orphan pages, missing cross-references, data gaps.
Special Files
- index.md — Content-oriented catalog. Each page with link, one-line summary. LLM reads index first to find relevant pages. Works well at ~100 sources / hundreds of pages.
- log.md — Chronological append-only record. Format:
## [YYYY-MM-DD] action | Title.
Use Cases
Personal tracking, research depth, book reading (companion wiki), team knowledge, competitive analysis, due diligence, trip planning, hobbies.
Why It Works
The tedious part of knowledge bases is bookkeeping, not reading or thinking. Humans abandon wikis because maintenance burden outgrows value. LLMs don’t get bored and can touch 15 files in one pass. Maintenance cost near zero.
Related to Vannevar Bush’s Memex (1945) — personal curated knowledge with associative trails. Bush couldn’t solve who does the maintenance. The LLM handles that.
Tooling
- Obsidian Web Clipper for source collection
- Obsidian Graph View for visualization
- Dataview plugin for dynamic queries over frontmatter
- qmd for scaled search (BM25/vector/LLM reranking)
- Marp for slide decks from wiki content
- Git for version history