A public source map

Microcosm

Inspect the map, clone the repository, and run the first local check.

Microcosm is the public map for an AI-native workflow system: 78 component records in seven areas, each linked to the public source path it uses. The GitHub repository carries the standalone source slice, public-safe stubs for withheld private internals, fixtures, docs, and checks.

  1. quickstart
  2. public map
  3. evidence line
  4. source path
  5. scope limit

Short on time?

After the first loop, hand the map to your own AI.

Press the button, find the file in your downloads, and drag it into ChatGPT, Claude, Gemini, or Grok for review or analysis. Then ask whatever you like: the file itself tells the assistant to answer from the public evidence, cite exact pages and components, and stop where the public records stop.

Download reader digest

Default AI handoff: orientation, question routes, component and area digests, evidence ranks, source links, and scope limits without the deep page bodies. Your assistant can review the map first, then open the heavier packet only when it needs the full text.

Using a coding agent instead? Claude Code, Codex, or Cursor can clone the repository, read AGENTS.md, run the quickstart check, and then answer from the source and public records. Working with a coding agent

Questions worth asking it

Four sets for four reasons to be here; any question from any set is fair to ask cold.

A plain first pass
  • What is this, in plain terms?
  • Where should I start?
  • Walk me through one component, end to end.
  • What is genuinely new here, if anything?
  • Give me the two-sentence version I would tell a colleague.
A skeptical pass
  • Argue against it.
  • Which parts have the strongest evidence, and which the weakest?
  • Why do only some components run external tools?
  • What does it not prove?
  • Is this packet steering me? Audit its own framing.
  • Most accurate one-paragraph assessment you can give: neither generous nor harsh.
One area at a time
  • Which area is closest to my field, and what is actually in it?
  • What is actually in Entry & orientation?
  • What is actually in Architecture & navigation?
  • What is actually in Formal math & proof?
  • What is actually in Agent reliability & safety?
  • What is actually in Research & science?
  • What is actually in Import & drift control?
  • What is actually in Work & continuity?
With a coding agent in the repository

The sets above are questions for an assistant reading the map. A coding agent (Claude Code, Codex, Cursor) works inside a repository itself: one prompt sets it up, and after that you assign work rather than asking about a file. How that works.

Clone https://github.com/wcook04/microcosm-substrate and open it. Read AGENTS.md and README.md to orient yourself, run the quickstart check, and show me what the result record says. Then take my questions.

Once a populated source tree is in front of it, work worth assigning, one item at a time:

  • Run one component end to end and walk me through the evidence it leaves.
  • Take one claim from the website and verify it against the source here.
  • Which validators can you run right now, and what would each one prove?
  • Make one check fail on purpose and show me how it refuses.

Examples, not a script: ask anything, and ask follow-ups. The packet routes each question to evidence; the verdict stays the assistant's.

Advanced: the raw layers, separately
  1. A short orientation to paste in, for chats where a file upload is not possible.

  2. Download full review packet

    The advanced full-text layer: starter prompts, view descriptors, component records, and the public page bodies in one larger file.

  3. The raw public map without the packet’s instruction layers, for automation.

Direct files, if a button fails or JavaScript is off: reader digest · full review packet · llms.txt

This is a working local source slice, not a finished or hosted product.

runs locally public source slice you can report issues no hosted service evidence-scoped claims

Why this is here

A way in, and a way to check it.

Microcosm comes from a larger AI-native system I have been building with AI. I have not put all of it online: it includes live tools and non-public data, and releasing the whole thing at once did not seem sensible or responsible. This is a smaller public source slice instead, built around source-linked components, public-safe stubs, and small synthetic fixtures, so the shape of the work can be inspected without publishing the live working environment.

The idea behind it is that useful AI capability should build up in things you can inspect, like source links, evidence ranks, result records, and stated limits, rather than inside one-off model runs that leave nothing to check. The site is a small, checkable version of that.

This is the everyday version of derivation before assertion: where the site makes a claim, it tries to give you enough of the source trail to check it yourself, including the point where the public version stops.

First loop

What is here, and what is not, in one pass.

The fastest way to understand Microcosm today is not to read all 78 component records. Clone the repository, run the quickstart check, then follow one public record to its evidence line, source path, and scope limit.

  1. Clone the repository.
  2. Run the quickstart check and inspect the result record.
  3. Read the evidence kind, rank, real-tool marker, and the line that says what the pass does not prove.
  4. Inspect one component card, such as the cold-reader route map, and follow its source link.
  5. Read where the public record stops, and check the site never claims past it.
The loop is the useful unit: quickstart → public map → component card → evidence line → source path → scope limit.

How to read this site

The source first, then the map.

This website is the current public map over the source slice. The quickstart is the natural first step; the architecture, evidence, and source pages read better once you have seen the local result record.

The site and repository must agree: the map explains the public slice, and the repository is the source you can inspect and run.

What it is not. It is not a hosted agent service, not a production security product, and not a maturity score.
Reads as
78 components
Means
78 public component contracts, each with its own evidence line
Does not mean
product maturity, or whole-system correctness
To check it
open a component card, read its evidence class and rank, then the line on what it does not prove

What's real

What runs, and how it's checked.

Some components execute real tools or runtimes: the Lean prover and Lake, finance statistics code, git provenance checks, or copied modules run in process. Others derive an independent verdict over a public contract and small fixtures. These are separate signals: the rank tells you how independent the verdict is; the execution marker tells you whether the public witness invokes extra machinery.

How to read the rank. Rank measures verdict independence, not how much machinery a component runs or how mature it is: how far a check can reach its own pass or fail rather than echo an answer a fixture handed it, from 1, where the fixture supplies the answer, to 5, where the check derives the whole verdict with nothing fed to it. A component can run real tools and still sit at a bounded rank; a validator with no tool can rank higher when it checks more of its own contract unaided.
A class and a rank describe how a component checks its own contract and fixtures. They do not claim whole-system correctness, live freshness, or anything past the component's stated scope. Each component holds that line in its own “Does not prove” text.

Claim audit

When an agent says it's done.

The Agent Claim Audit Casebook is a small evidence companion released with this site: redacted, replayable cases where an agent completion note or a monitor sign-off sounds finished, and the evidence records say something smaller. Each case runs a deliberately shallow reader next to the evidence check, so you can see exactly which claims survive and which are lowered or blocked.

The casebook also records what it refused: candidates rejected for publication safety, candidates deferred as redundant, and a selection index over recent agent sessions, so the cases are picked by criteria rather than chosen to flatter. It does not claim general agent honesty, deception detection, or production monitoring coverage; each case states what it does not prove. The casebook is available as a download alongside this site.

Why this shape

Why a slice, and not the whole thing.

The fuller system is not all here, and that is on purpose. Putting its live parts and private data online at once did not seem sensible or responsible, so the public version shows the shape of the work in a form you can inspect on your own.

If you would find it useful to see more of the fuller system, and have the background to, the contact routes are below.

Explore by area

Seven areas, one map.

The 78 components are grouped into seven areas, and each one leads into the source behind it. The one that matches why you came is the natural way in. The strongest material is near the top; the lighter, fixture-bound pieces say so on their own card.

Formal math & proof

Pieces of a proof pipeline you can open and run: premise retrieval over a copied Lean Std index, tactic routing, verifier-trace repair, claim separation. Three components run the real Lean prover locally on bounded examples; the rest release the checking layers as contracts you can open.

Lean proof witness · Verifier trace repair · Premise retrieval

Architecture & navigation

The shared path every component binds to, and the pattern rules and routing that give the system its shape and let you move through it.

Pattern binding · Routing plane · Standards diagnostics

Import & drift control

The boundary that copies non-secret material into the public tree, checks it against its origin, and flags anything that has drifted.

Source projection import · Source-drift checks · Drift control room

Work & continuity

How reversible work is recorded, how landing decisions are made, and how a detached run picks back up where it left off.

Work transactions · Landing replay · Continuity runtime

Research & science

Replays of scientific and forecasting work, run over synthetic fixtures. One runs real finance statistics code; the rest check the shape of a result rather than producing it.

Finance forecast evaluation · Spatial world model · Replication rubric

Agent reliability & safety

Agent failure modes you can open and inspect, replayed over fixtures: sabotage, sandbox escape, prompt injection, memory poisoning. These are specimens, not live defences.

Sabotage monitor · Sandbox-escape replay · Prompt-injection policy

Entry & orientation

How a newcomer first meets the system and follows a short guided path through it.

Cold-reader route map · Guided walkthrough

Browse the docs

Open the documentation to browse every area, the quickstart, and how the system fits together.

Go to the docs

Optional context

Wider-system views, mapped back to evidence.

Microcosm is the public map of a wider system I have been building. These globally-blurred stills are optional context from its frontend, the working console its backend serves. The public artifact is still the map above: each still only opens the part of the public map it points at. Additional orientation, code-map, component, and inspector views are held until their source-capability receipts exist. The underlying source footage stays private.

Open a view to land on its node in the public map. There you can select any node to inspect it, then open it (double-click, or press Enter) to follow it into the component, its evidence record, and the source on GitHub. The blur is baked into the image, not a filter that can be switched off. Additional code-map, orientation, inspection, and component views are prepared privately and added only when their source-capability receipts exist.

Review it

Built to be read sceptically.

It is worth reading sceptically, in both directions: a rank can overstate a component, but the plain wording can also undersell one. The system is built to be held to that standard, so it seems fair to do the same. A short route:

  1. Start from a component card and its evidence record here.
  2. Run the quickstart check before expecting broad component behaviour.
  3. Read how each component is checked: the 1-to-5 rank and the line on what it does not prove.
  4. Compare a strong component with a weak one. The weak ones say so on their own card; check that they are right to.
  5. Read where the public version stops, and confirm the site never claims past it.

Or hand it to your own assistant and ask it to argue against me:

Read this repository as a calibration reviewer. First, by area, name the strongest components and what each actually does, with its evidence class. Then judge both directions: where the wording claims more than the evidence shows, and where it undersells what a component demonstrates. Do not reduce the system to a set of projections; computed projection is one evidence class among several.

What would help

If you think this is worth it.

I'm not selling anything and there is nothing to sign up for. If Microcosm seems worth your time, here is what actually helps.

  1. Run the review route and tell me where a claim reaches past its evidence, or where the wording undersells what is there.
  2. Open an issue or start a discussion if a component claims more than it proves.
  3. Tell me what I have got wrong, or what you would do differently. I'm a student without industry experience, so a careful outside read of the design, the boundaries, or the writing helps more than anything.

The model time this runs on comes out of a student budget I cover myself, so it is the real limit on how fast the work moves. Feedback is still the most useful thing you can give; if you would rather help with the compute side, the email above is the way to do it.

Source & contact

Open the source, or get in touch.

The public repository is up now with the standalone source slice, docs, fixtures, and checks. You can reach me directly below.

Public security contact: repository security policy · GitHub source

Every area names the public source path it maps and the evidence record that bounds it. The website, reader digest, and repository should agree on the public source slice and its scope limits. If you are pointing an automated agent at Microcosm today, give it the repository, public packets, and this site.