A public source map

Microcosm

Inspect the map, clone the repository, and run the first local check.

Microcosm is the public map for an AI-native workflow system: 78 component records in seven areas, each linked to the public source path it uses. The GitHub repository carries the standalone source slice, public-safe stubs for withheld private internals, fixtures, docs, and checks.

quickstart
public map
evidence line
source path
scope limit

Review the public map See the loop Download reader digest

Short on time?

After the first loop, hand the map to your own AI.

Press the button, find the file in your downloads, and drag it into ChatGPT, Claude, Gemini, or Grok for review or analysis. Then ask whatever you like: the file itself tells the assistant to answer from the public evidence, cite exact pages and components, and stop where the public records stop.

Download reader digest

Default AI handoff: orientation, question routes, component and area digests, evidence ranks, source links, and scope limits without the deep page bodies. Your assistant can review the map first, then open the heavier packet only when it needs the full text.

Using a coding agent instead? Claude Code, Codex, or Cursor can clone the repository, read AGENTS.md, run the quickstart check, and then answer from the source and public records. Working with a coding agent

Questions worth asking it

Four sets for four reasons to be here; any question from any set is fair to ask cold.

A plain first pass

What is this, in plain terms?
Where should I start?
Walk me through one component, end to end.
What is genuinely new here, if anything?
Give me the two-sentence version I would tell a colleague.

A skeptical pass

Argue against it.
Which parts have the strongest evidence, and which the weakest?
Why do only some components run external tools?
What does it not prove?
Is this packet steering me? Audit its own framing.
Most accurate one-paragraph assessment you can give: neither generous nor harsh.

One area at a time

Which area is closest to my field, and what is actually in it?
What is actually in Entry & orientation?
What is actually in Architecture & navigation?
What is actually in Formal math & proof?
What is actually in Agent reliability & safety?
What is actually in Research & science?
What is actually in Import & drift control?
What is actually in Work & continuity?

With a coding agent in the repository

The sets above are questions for an assistant reading the map. A coding agent (Claude Code, Codex, Cursor) works inside a repository itself: one prompt sets it up, and after that you assign work rather than asking about a file. How that works.

Clone https://github.com/wcook04/microcosm-substrate and open it. Read AGENTS.md and README.md to orient yourself, run the quickstart check, and show me what the result record says. Then take my questions.

Once a populated source tree is in front of it, work worth assigning, one item at a time:

Run one component end to end and walk me through the evidence it leaves.
Take one claim from the website and verify it against the source here.
Which validators can you run right now, and what would each one prove?
Make one check fail on purpose and show me how it refuses.

Examples, not a script: ask anything, and ask follow-ups. The packet routes each question to evidence; the verdict stays the assistant's.

Advanced: the raw layers, separately

A short orientation to paste in, for chats where a file upload is not possible.
Download full review packet
The advanced full-text layer: starter prompts, view descriptors, component records, and the public page bodies in one larger file.
The raw public map without the packet’s instruction layers, for automation.

Direct files, if a button fails or JavaScript is off: reader digest · full review packet · llms.txt

This is a working local source slice, not a finished or hosted product.

runs locally public source slice you can report issues no hosted service evidence-scoped claims

Why this is here

A way in, and a way to check it.

Microcosm comes from a larger AI-native system I have been building with AI. I have not put all of it online: it includes live tools and non-public data, and releasing the whole thing at once did not seem sensible or responsible. This is a smaller public source slice instead, built around source-linked components, public-safe stubs, and small synthetic fixtures, so the shape of the work can be inspected without publishing the live working environment.

The idea behind it is that useful AI capability should build up in things you can inspect, like source links, evidence ranks, result records, and stated limits, rather than inside one-off model runs that leave nothing to check. The site is a small, checkable version of that.

It is meant as a way in. You can start from the map, open a component, and read its evidence record, source reference, and the line on what it does not prove, so nothing here asks you to take a summary on trust. The public repository carries the standalone source slice and its local checks.
If something here is confusing, or claims more than it shows, I would rather hear it. Please let me know.

This is the everyday version of derivation before assertion: where the site makes a claim, it tries to give you enough of the source trail to check it yourself, including the point where the public version stops.

First loop

What is here, and what is not, in one pass.

The fastest way to understand Microcosm today is not to read all 78 component records. Clone the repository, run the quickstart check, then follow one public record to its evidence line, source path, and scope limit.

The loop is the useful unit: quickstart → public map → component card → evidence line → source path → scope limit.

How to read this site

The source first, then the map.

This website is the current public map over the source slice. The quickstart is the natural first step; the architecture, evidence, and source pages read better once you have seen the local result record.

The site and repository must agree: the map explains the public slice, and the repository is the source you can inspect and run.

What this is

Seventy-eight components across seven areas.

Each of the seven areas carries component records for the public slice. Formal math and proof has a Lean and Lake proof-witness pipeline, premise retrieval, tactic routing, and verifier-trace repair. Agent reliability and safety replays sabotage, sandbox-escape, prompt-injection, and memory-poisoning cases. Research and science covers finance forecasting, a spatial world model, and a replication rubric. The rest cover source import and drift control, architecture and navigation, work and continuity, and entry and orientation.
The populated source slice is designed to run locally against a folder and write readable state without external model calls. The public repository contains the standalone source slice; non-public internals stay out or appear only as explicit public-safe stubs.
Most bind to one shared path that runs from a project on disk through to a result you can read.
Each component declares its own evidence: a class, a strength rank from 1 to 5 for how independently it is checked, and a line on what it does not prove. The card gives you that, so you can judge it on the evidence rather than the wording.
Some components run real tools: the Lean prover, Lake, finance statistics, git provenance checks. Others derive their verdict over copied source or a declared contract against small synthetic fixtures. Each card says which, in plain language.

What it is not. It is not a hosted agent service, not a production security product, and not a maturity score.

Reads as: 78 components
Means: 78 public component contracts, each with its own evidence line
Does not mean: product maturity, or whole-system correctness
To check it: open a component card, read its evidence class and rank, then the line on what it does not prove

What's real

What runs, and how it's checked.

Some components execute real tools or runtimes: the Lean prover and Lake, finance statistics code, git provenance checks, or copied modules run in process. Others derive an independent verdict over a public contract and small fixtures. These are separate signals: the rank tells you how independent the verdict is; the execution marker tells you whether the public witness invokes extra machinery.

How to read the rank. Rank measures verdict independence, not how much machinery a component runs or how mature it is: how far a check can reach its own pass or fail rather than echo an answer a fixture handed it, from 1, where the fixture supplies the answer, to 5, where the check derives the whole verdict with nothing fed to it. A component can run real tools and still sit at a bounded rank; a validator with no tool can rank higher when it checks more of its own contract unaided.

Running a real tool and earning a high rank are separate signals. A real-tool run is bounded to a small scope on purpose, so it is capped at 4: a bounded run proves less than a check that derives its own verdict. The evidence page ranks all 78 and says what each class checks.
A fixture-bound replay is not a product substitute. It is the public-safe form of a mechanism that should not be disclosed as live private-system authority: it exposes the mechanism, the boundary, and the refusal, not a running service.
The weakest class is honest about being weak: it confirms a fixture is well-formed. That is shape, not behaviour, and the components that carry it say so on their own card.

A class and a rank describe how a component checks its own contract and fixtures. They do not claim whole-system correctness, live freshness, or anything past the component's stated scope. Each component holds that line in its own “Does not prove” text.

Claim audit

When an agent says it's done.

The Agent Claim Audit Casebook is a small evidence companion released with this site: redacted, replayable cases where an agent completion note or a monitor sign-off sounds finished, and the evidence records say something smaller. Each case runs a deliberately shallow reader next to the evidence check, so you can see exactly which claims survive and which are lowered or blocked.

One case is a real agent trace whose "committed, clean" completion note lacked a final delta and any terminal validation; the claim is lowered, not accepted.
One plants fabricated commit, ledger, and test-pass claims in a fixture completion note: real git and pytest witnesses block each one with a typed error code.
One takes a monitor's "coverage complete, cleared" sign-off and reduces it to what its probes actually backed. A verdict is not authority; its evidence is.

The casebook also records what it refused: candidates rejected for publication safety, candidates deferred as redundant, and a selection index over recent agent sessions, so the cases are picked by criteria rather than chosen to flatter. It does not claim general agent honesty, deception detection, or production monitoring coverage; each case states what it does not prove. The casebook is available as a download alongside this site.

Why this shape

Why a slice, and not the whole thing.

The fuller system is not all here, and that is on purpose. Putting its live parts and private data online at once did not seem sensible or responsible, so the public version shows the shape of the work in a form you can inspect on your own.

It is standalone code you can run, with non-secret source copied and checked against where it came from, and result records over small synthetic fixtures.
The fixtures are there for a reason: they are how a piece of the work can be shown without shipping the live system it normally runs against. Where a component only checks the shape of a fixture, its card says so.
You can inspect the public map with no special access: run the quickstart, read the ranks, open the source paths, and follow the records.

If you would find it useful to see more of the fuller system, and have the background to, the contact routes are below.

Explore by area

Seven areas, one map.

The 78 components are grouped into seven areas, and each one leads into the source behind it. The one that matches why you came is the natural way in. The strongest material is near the top; the lighter, fixture-bound pieces say so on their own card.

Formal math & proof

Pieces of a proof pipeline you can open and run: premise retrieval over a copied Lean Std index, tactic routing, verifier-trace repair, claim separation. Three components run the real Lean prover locally on bounded examples; the rest release the checking layers as contracts you can open.

Lean proof witness · Verifier trace repair · Premise retrieval

Architecture & navigation

The shared path every component binds to, and the pattern rules and routing that give the system its shape and let you move through it.

Pattern binding · Routing plane · Standards diagnostics

Import & drift control

The boundary that copies non-secret material into the public tree, checks it against its origin, and flags anything that has drifted.

Source projection import · Source-drift checks · Drift control room

Work & continuity

How reversible work is recorded, how landing decisions are made, and how a detached run picks back up where it left off.

Work transactions · Landing replay · Continuity runtime

Research & science

Replays of scientific and forecasting work, run over synthetic fixtures. One runs real finance statistics code; the rest check the shape of a result rather than producing it.

Finance forecast evaluation · Spatial world model · Replication rubric

Agent reliability & safety

Agent failure modes you can open and inspect, replayed over fixtures: sabotage, sandbox escape, prompt injection, memory poisoning. These are specimens, not live defences.

Sabotage monitor · Sandbox-escape replay · Prompt-injection policy

Entry & orientation

How a newcomer first meets the system and follows a short guided path through it.

Cold-reader route map · Guided walkthrough

Browse the docs

Open the documentation to browse every area, the quickstart, and how the system fits together.

Go to the docs

Optional context

Wider-system views, mapped back to evidence.

Microcosm is the public map of a wider system I have been building. These globally-blurred stills are optional context from its frontend, the working console its backend serves. The public artifact is still the map above: each still only opens the part of the public map it points at. Additional orientation, code-map, component, and inspector views are held until their source-capability receipts exist. The underlying source footage stays private.

Blurred still of the system's frontend: an agent-trace workbench, a dark console with a live fleet of streaming panels under a top navigation bar. The detail is blurred out; only the layout is visible.

Work in motion

See how the system watches its agents run

An agent-trace workbench: a live fleet of runs streaming in. Opens the agent-observability runtime in the public map.

Open in the map

Blurred still of the system's frontend: a detail view, with a list of records on the left and an inspector panel on the right. The detail is blurred out; only the layout is visible.

Domain execution

See a research workload running

A market-intelligence view: records on the left, an inspector on the right. Opens the finance-forecast evaluation spine in the public map.

Open in the map

Open a view to land on its node in the public map. There you can select any node to inspect it, then open it (double-click, or press Enter) to follow it into the component, its evidence record, and the source on GitHub. The blur is baked into the image, not a filter that can be switched off. Additional code-map, orientation, inspection, and component views are prepared privately and added only when their source-capability receipts exist.

Review it

Built to be read sceptically.

It is worth reading sceptically, in both directions: a rank can overstate a component, but the plain wording can also undersell one. The system is built to be held to that standard, so it seems fair to do the same. A short route:

Or hand it to your own assistant and ask it to argue against me:

Read this repository as a calibration reviewer. First, by area, name the strongest components and what each actually does, with its evidence class. Then judge both directions: where the wording claims more than the evidence shows, and where it undersells what a component demonstrates. Do not reduce the system to a set of projections; computed projection is one evidence class among several.

What would help

If you think this is worth it.

I'm not selling anything and there is nothing to sign up for. If Microcosm seems worth your time, here is what actually helps.

The model time this runs on comes out of a student budget I cover myself, so it is the real limit on how fast the work moves. Feedback is still the most useful thing you can give; if you would rather help with the compute side, the email above is the way to do it.

Source & contact

Open the source, or get in touch.

The public repository is up now with the standalone source slice, docs, fixtures, and checks. You can reach me directly below.

GitHub repository Standalone source slice, docs, fixtures, and checks. Open repo Documentation Quickstart, architecture, whole-system map, and the seven areas. Read docs Whole-system map The architecture map, drawn from the public source files behind it. Open map Source, license & provenance Apache-2.0 terms, authorship, the no-affiliation note, and where the public slice ends. Open reference Security policy Use the public repository policy for vulnerability reports or source-site issues. Open policy Repository discussions Use public repository issues or discussions for questions, critique, or collaboration. Open GitHub Email Reach me at williamwkcook+microcosm@gmail.com with questions, feedback, or anything you think I have got wrong. Send email LinkedIn Will Cook, who builds and maintains Microcosm. Open profile

Public security contact: repository security policy · GitHub source

Every area names the public source path it maps and the evidence record that bounds it. The website, reader digest, and repository should agree on the public source slice and its scope limits. If you are pointing an automated agent at Microcosm today, give it the repository, public packets, and this site.

Microcosm

After the first loop, hand the map to your own AI.

A way in, and a way to check it.

What is here, and what is not, in one pass.

The source first, then the map.

Seventy-eight components across seven areas. →

What runs, and how it's checked. →

When an agent says it's done.

Why a slice, and not the whole thing.

Seven areas, one map. →

Formal math & proof →

Architecture & navigation →

Import & drift control →

Work & continuity →

Research & science →

Agent reliability & safety →

Entry & orientation →

Browse the docs →

Wider-system views, mapped back to evidence.

See how the system watches its agents run

See a research workload running

Built to be read sceptically.

If you think this is worth it.

Open the source, or get in touch.

Seventy-eight components across seven areas.

What runs, and how it's checked.

Seven areas, one map.

Formal math & proof

Architecture & navigation

Import & drift control

Work & continuity

Research & science

Agent reliability & safety

Entry & orientation

Browse the docs