Microcosm
This page

Reference

Components

A generated index over the 78 public components, grouped by the seven source families. Open any one to see what it does, what it explicitly does not prove, how to run it, the evidence behind it, and what it links to. Each field is projected directly from source.

Entry & orientation (2)

Cold Reader Route MapVerifies the first-run guided path so every step names a real command, doc, and evidence.5/5

Does It checks Microcosm's "what do I run first" guided path so that every step on the cold-reader route map is backed by a real command, a public doc reference, and an evidence result record instead of just prose promises. A newcomer can therefore trust that the suggested ten-minute first-run tour is actually wired and honestly labeled, with small verified counts shown as plain accounting rather than success badges.

Scope limit It is projection-only metadata that validates the declared public route contract; it is not route registry control and excludes source-file changes, external model access, launch/public sharing, financial decisions, private-data equivalence, or whole-system correctness.

Run
microcosm cold-reader-route-map run-route-map-bundle --input examples/cold_reader_route_map/exported_cold_reader_route_map_bundle --out receipts/runtime_shell/demo_project/organs/cold_reader_route_map --card

Paper module Cold-Reader Route Map

cold_reader_route_map makes Microcosm's first ten minutes executable. It validates a public route map whose rows bind the first-run sequence to runnable commands, docs refs, result record refs, and scope limits.

Purpose

A cold technical reader should not have to infer the product path from a long README or raw result record tree. The route map answers one question: what should I run first, and what evidence proves that path is wired?

The unusual part is how the validator checks that proof. It does not merely confirm that each route row carries the right fields. It replays every route against real source: each row's command, its docs refs, its result record refs, and the human-readable signals it claims to show are matched against the actual text of copied source modules and public docs. A command whose material tokens do not appear anywhere in that source corpus is blocked, as is a docs ref that does not resolve to a real heading and a result record ref that does not open a pass-status result record. So a route cannot promise a command the system does not actually run, which is the failure mode a hand-written quick-start guide drifts into the moment the commands change underneath it.

The evidence contract is source-open by default: public route cards, route result record bindings, route policy, exported bundle refs, and generated result records carry the system, while secret_exclusion_scan excludes only private source bodies, model-output data, account or browser material, secrets, and account secret-equivalent live-access data. Result record bodies are not inlined; they are represented by body_in_receipt: false plus public runtime refs.

Shape

Public project pathrepo -> .microcosmPublic project path repo -> .microcosmFirst-screen cardclaim frame, first command,evidence legend, exit ruleFirst-screen card claim frame, first command, evidence legend, exit ruleOrdered route rowscommand, result record ref,evidence classOrdered route rows command, result record ref, evidence classScope boundary and scopelimitattached to each rowScope boundary and scope limit attached to each rowReader branchchoose one first actionReader branch choose one first actionSafety/evalsstatus, authority,workingnessSafety/evals status, authority, workingnessHiring reviewerfirst-screen card,legibility scorecardHiring reviewer first-screen card, legibility scorecardPeer developertour, observe,explain or compilePeer developer tour, observe, explain or compileDrilldownstour, status, explain,observe, compile, serveDrilldowns tour, status, explain, observe, compile, serveResult records and route refspublic refs only;body_in_receipt falseResult records and route refs public refs only; body_in_receipt false
Diagram source
flowchart LR subgraph Entry["First-screen entry"] Project["Public project path repo -> .microcosm"] First["First-screen card claim frame, first command, evidence legend, exit rule"] end subgraph Accounting["Route-map accounting"] Route["Ordered route rows command, result record ref, evidence class"] Ceiling["Scope boundary and scope limit attached to each row"] end subgraph ReaderBranch["Reader branch"] Branch["Reader branch choose one first action"] Safety["Safety/evals status, authority, workingness"] Hiring["Hiring reviewer first-screen card, legibility scorecard"] Developer["Peer developer tour, observe, explain or compile"] end subgraph Boundary["Proof boundary"] Drilldown["Drilldowns tour, status, explain, observe, compile, serve"] Result record["Result records and route refs public refs only; body_in_receipt false"] end Project --> First First --> Route Route --> Ceiling Route --> Branch Branch --> Safety Branch --> Hiring Branch --> Developer Safety --> Drilldown Hiring --> Drilldown Developer --> Drilldown Ceiling --> Result record Drilldown --> Result record

Reader Evidence Routing

Start with core/paper_module_capsules.json::paper_modules[13:paper_module.cold_reader_route_map], then read the generated JSON projection for the resolved relationships. A diagram view is generated for this module and an atlas card entry is available. The route-map fixture, exported bundle, source-module manifests, and temporary result records are evidence for replay shape. This Markdown gives cold readers the interpretation order, source-linked only.

Prior Art Grounding

This component is grounded in documentation systems that treat reader state and task shape as first-class. Diataxis separates tutorials, how-to guides, reference, and explanation so readers are not forced through one undifferentiated documentation pile. Knuth's literate programming is an older anchor for the idea that executable systems should be written for human comprehension as well as machine execution.

Microcosm borrows the reader-route pattern: first command, result record ref, evidence class, scope boundary, scope limit, and next drilldown are ordered for a cold reader. It does not make the route map source authority or substitute documentation sequence for validator evidence.

Reader-Specific Evidence Routing

The route map should make the evidence-count frame visible before the reader chooses a drilldown. Honest counters are not progress badges:

  • A safety/evals engineer follows microcosm status --card, microcosm authority --card, and microcosm workingness --card first. The useful question is whether each claim names its evidence class, validator, failure mode, and scope limit.
  • A hiring reviewer follows the first-screen card and legibility scorecard first. The useful question is whether small verified counts are framed as honest proof boundaries instead of hidden or inflated.
  • A peer developer follows microcosm tour --card, microcosm observe --card, and then full microcosm observe, microcosm compile, or microcosm explain as drilldowns. The useful question is whether a fresh clone can reproduce the route/work/event/evidence chain locally without opening full event rows first.

The route map must therefore preserve both the command order and the evidence interpretation order: command, result record ref, evidence class, scope boundary, scope limit, then deeper route. Reader-specific branches may hide other branches, but they may not hide the accounting frame that prevents "1 verified import" from being read as either failure or marketing.

One-Screen Handoff Contract

The route map consumes the first-screen card as the handoff, not as another route row. A cold reader should see this sequence:

  1. First-screen card: claim frame, microcosm hello <project>, shared proof, evidence legend, structural join, reader rail, and exit rule.
  2. Route map: the accepted command order, with result record refs and scope limits attached to each command.
  3. Reader branch: one audience-specific first action, one proof surface, one success criterion, and one next drilldown.

The handoff fails when the first screen turns into a complete route inventory, or when the route map assumes the reader already understands evidence classes. The first screen should compress; the route map should sequence; the reveal should demonstrate the path against public result records.

Comparison-Backed Route Rows

Each route row should make the unusual discipline visible by naming the normal failure mode it is avoiding. The route map is not just a command list; it is a sequence of claim-boundary checks:

Route row fieldFailure avoidedRequired reader cue
command_refProse-only claims about what runs.Show the exact local command before the claim it supports.
receipt_refTrusting generated summaries as source authority.Point to the result record or validator that bounds the row.
evidence_classTreating all evidence as equal proof.Label body import, subprocess witness, projection, validator, or fixture evidence.
anti_claimLetting a successful demo imply launch, production, provider, or proof authority.State the forbidden read beside the positive claim.
failure_mode_refGovernance looking like abstract ceremony.Name the concrete overclaim or missing-standard case this row catches.

Rows that omit the comparison cue are still technically navigable, but they make the rigor invisible to a cold reader. The validator should prefer a shorter row with command, result record, class, scope boundary, and failure mode over a longer row that lists more components without explaining what each boundary prevents.

Observable Drilldown Order

Browser-first readers follow the same route map as terminal-first readers. The route order is compressed, not replaced:

  1. First-screen card or compact browser board.
  2. microcosm tour --card <project> as the shared behavior proof.
  3. Selected route plus work/event/evidence refs.
  4. Compact observatory view for the same route.
  5. Full route map, result records, standards, and raw JSON drilldowns.

The compact observatory row must carry the same command ref, result record ref, evidence class, scope boundary, and scope limit as the terminal route row. If the browser board cannot show those fields, it is a preview only and cannot serve as the cold-reader route handoff.

readme_onboarding_route is the selected route only for projects with a README; folders without one still get a route/work/event/evidence path through the selected route emitted by tour and compile.

Each route card must include a command and public docs refs. Each route id must also resolve to at least one result record ref. The sequence must be ordinal sorted so the public entry does not drift into a bag of impressive but unordered components.

Validation

The fixture observes negative cases for missing command refs, missing result record refs, route sequence gaps, launch/provider overclaims, and private source body fields. The exported bundle omits negative cases and validates the real runtime shape used by microcosm run, with synthetic result record stand-ins explicitly disallowed as product evidence. If focused validation reports an exact-copy source-module body mismatch, route that repair through microcosm_exact_copy_refresh; do not treat this Markdown projection as the authority for copied source bodies.

Validation Result record Path

From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:

The focused pytest file is the proof consumer for this Markdown section. It asserts the fixture status, ten-route command and result record-ref counts, front-door route order, expected negative cases, route-source replay support, exported bundle shape, copied source-module digest and anchor matches, source-open fixture-manifest counts, no source bodies in public result records, streamed line-count and digest handling, and fresh exported-bundle card reuse. The corpus check verifies that this page remains in the 98-module paper-module set and that the JSON bundle, generated Mermaid, Atlas card, and Markdown projection stay mutually consistent.

These result records validate the route-map fixture, exported bundle result records, copied cold-entry evidence, and paper-module corpus membership only; they do not grant route registry control, external model service, source-file changes, launch-scope decision, private-data equivalence, financial decisions, publishing-scope decision, hosted readiness, or whole-system correctness.

Scope boundary

Scope limit

This module covers public cold-reader route-map validation: command refs, result record refs, ordinal route sequencing, evidence classes, scope boundaries, scope limits, exported-bundle provenance, copied cold-entry evidence, and negative cases for missing refs, sequence gaps, overclaims, and private body fields.

The ceiling stops before route-registry source authority, live session inspection, external model service, source-file changes, hosted readiness, launch, public sharing, private-data equivalence, or whole-system correctness. The route map can tell a cold reader what to run first and which result record bounds that run; it cannot promote the docs sequence into proof beyond those public fixtures and result records.

Scope limit

This component is projection-only metadata. It is not route registry control, it does not change source files projects, it does not use external model services, and it excludes launch, public sharing, trading or financial decisions, private-data equivalence, or whole-system correctness claims.

Source and projection details
Source-Open Body Floor

The source-open body floor is the public route-map fixture, route card set, route policy, exported cold-reader route-map bundle, source-module manifests, and generated result records. It carries public refs, digests, route ids, result record refs, evidence classes, scope boundaries, scope limits, and body_in_receipt: false markers instead of inlining private source or live state.

The floor excludes private source bodies, model-output data, account or browser material, browser or HUD state, account secret-equivalent live-access data, recipient state, and route-registry mutation authority. A reader can inspect the route map and exported bundle to reproduce the first-run sequence, but the bundle remains evidence for public replay shape rather than launch or production authority.

Public Reveal WalkthroughBinds the first-time reader tour to evidence so each count leads to a source.4/5Runs real tools

Does This checks the short guided tour Microcosm advertises for a first-time reader and binds it to real public evidence: the declared route through patterns, work, events, and evidence must still point to result records, and the exported reveal bundle must carry digest-verified copies of the public source bodies that back the walkthrough. The card remains bounded, but each impressive-looking count leads to a source-body witness instead of stopping at marketing copy.

Scope limit It authorizes only bounded public reveal runtime behavior and a digest-verified public body-import witness; it excludes launch, hosted deployment, public sharing, recipient work, external model access, secret export, private-data equivalence, Lean/Lake execution, whole-system correctness, or general product authority.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.public_reveal_walkthrough run --input fixtures/first_wave/public_reveal_walkthrough/input --out receipts/first_wave/public_reveal_walkthrough

EvidenceBounded runtime computationevidence 4/5Real runtime result

getting-startedinteresting-partsevaluation

Source Design note · Source atlas

Paper module Public Reveal Walkthrough

public_reveal_walkthrough is the accepted component that makes Microcosm's public reveal executable instead of descriptive.

It validates a ten-minute cold-reader path:

  1. Compile a project into .microcosm/.
  2. Inspect catalog, patterns, and routes.
  3. Explain one route through patterns, standard pressure, work, events, and evidence.
  4. Open the observatory causal chain before raw JSON drilldown.
  5. Run microcosm intake to see the source projection intake cells connected to spine, reveal, and runtime evidence.
  6. Read the result records and scope limit.

The component reads public fixtures from fixtures/first_wave/public_reveal_walkthrough/input/ and exported runtime input from examples/public_reveal_walkthrough/exported_public_reveal_bundle/.

It emits:

  • receipts/first_wave/public_reveal_walkthrough/public_reveal_walkthrough_result.json
  • receipts/first_wave/public_reveal_walkthrough/ten_minute_reveal_board.json
  • receipts/first_wave/public_reveal_walkthrough/public_reveal_validation_receipt.json
  • result records/sign-off/first_wave/public_reveal_walkthrough_fixture_acceptance.json

The reveal path treats microcosm intake as a runtime bridge rather than a private planning note. The command exposes runtime_reveal_import_bridge, keeps formal_math_readiness_extensions visible as a public replacement when its extension board exists, and points back to the source projection intake board without copying private source bodies.

Purpose

A cold reader meeting Microcosm for the first time needs one thing the README cannot give them on its own: proof that the first ten minutes are real and not a tour of screenshots. This component answers a single question. Can a reader who has never seen the system run a short, fixed path from a command to local state, to a route, to the result record and source boundary behind it, with nothing on that path that the system does not actually run?

The validator enforces that path as an accounting floor rather than a narrative. A reveal only passes if it carries at least five steps, four distinct runnable commands, and four evidence refs, and if four overclaim fixtures stay rejected: a launch or hosting claim, a private-data equivalence claim, a step with no evidence ref, and marketing copy with no command behind it. The floor exists because a walkthrough drifts towards a hero pitch the moment it is allowed to. Removing the commands and the result record refs is the easiest way to make a reveal look more impressive and prove less.

The part worth noting is the real-lane witness. The fixture run does not pass on its own paperwork. It is gated on the exported reveal bundle actually running, with its copied source bodies present and digest-verified. If that backing run is missing or blocked, the fixture is marked blocked too, with real_runtime_receipt set to false. So the reveal cannot describe a runnable path while the runnable path is broken underneath it, which is the quiet failure mode of every quick-start guide that says more than it can execute.

Shape

Public Reveal Walkthrough is the source-backed entry membrane for a cold technical reader. It turns the local Microcosm first-run path into a runnable accounting exercise: commands produce local state, routes point at work and events, evidence refs point at result records, and scope limits keep the visual or browser layer from becoming a product or public-sharing claims.

JSON bundleJSON bundleFirst-wave public revealfixture10-minute route + negativecasesFirst-wave public reveal fixture 10-minute route + negative casesExported public reveal bundle5 copied source bodiesExported public reveal bundle 5 copied source bodiesRuntime componentRuntime componentRuntime shell bridgemicrocosm intake + publicreveal viewRuntime shell bridge microcosm intake + public reveal viewmetadata-only result recordsresult, board, validation,sign-offmetadata-only result records result, board, validation, sign-offCold reader routecommand -> route -> evidencerefs -> ceilingCold reader route command -> route -> evidence refs -> ceilingScope limitno launch, hosting, provider,or private-system claimsScope limit no launch, hosting, provider, or private-system claims

Source refs

JSON bundle
paper_module.public_reveal_walkthrough
Runtime component
public_reveal_walkthrough.py
Diagram source
flowchart TD Bundle["JSON bundle paper_module.public_reveal_walkthrough"] Fixture["First-wave public reveal fixture 10-minute route + negative cases"] Bundle["Exported public reveal bundle 5 copied source bodies"] Runtime["Runtime component public_reveal_walkthrough.py"] Shell["Runtime shell bridge microcosm intake + public reveal view"] Result records["metadata-only result records result, board, validation, sign-off"] Reader["Cold reader route command -> route -> evidence refs -> ceiling"] Ceiling["Scope limit no launch, hosting, provider, or private-system claims"] Bundle --> Runtime Fixture --> Runtime Bundle --> Runtime Runtime --> Result records Runtime --> Shell Shell --> Reader Result records --> Reader Reader --> Ceiling

The runtime shape has five bounded inputs:

  • the public reveal fixture under fixtures/first_wave/public_reveal_walkthrough/input;
  • the exported reveal bundle under examples/public_reveal_walkthrough/exported_public_reveal_bundle;
  • the source-module manifest for copied source bodies;
  • the component source and focused tests that enforce command, evidence, and negative-case behavior;
  • the standard and JSON bundle that bind the paper module to the mechanism, source locus, and scope limit.

The proof shape is route-first rather than dashboard-first. A valid reveal shows a command, a selected route, the route explanation through work/events/ evidence, result record refs, evidence-class counts, and the scope boundary beside any impressive total. Generated cards, observatory views, and browser/video boards are presentation layers over that accounting path.

The negative-case shape is part of the floor. launch or hosting overclaims, private-data equivalence, missing evidence refs, and marketing-only reveal material must remain rejected. If those refusals stop appearing, the reveal is no longer bounded enough for a cold reader.

The source-open shape is also bounded. The exported bundle carries five copied public bodies, and the manifest verifies exact-copy relation, digests, material classes, and metadata-only result records.

Evidence/accounting:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[paper_module.public_reveal_walkthrough] sets source_authority: json_capsule, binds the component, binds mechanism.public_reveal_walkthrough.validates_public_reveal_walkthrough, and resolves src/microcosm_core/organs/public_reveal_walkthrough.py.
  • Generated instance: paper_modules/public_reveal_walkthrough.json reports source_authority: json_capsule, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges_after_atlas_binding, 20 relationship edges, and a resolved paper_module.depends_on.paper_module edge to paper_module.first_screen_composition_root because the reveal path spends the first-screen composition contract before deeper route/evidence drilldown.
  • Runtime and shell consumers: src/microcosm_core/organs/public_reveal_walkthrough.py exposes run, run_reveal_bundle, _source_module_manifest_result, _source_open_body_import_summary, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and PUBLIC_SAFE_SOURCE_BODY_CLASSES. src/microcosm_core/runtime_shell.py routes the exported reveal bundle through public_reveal_walkthrough.run_reveal_bundle and publishes the public_reveal_view runtime lens.
  • Result record and test floor: receipts/first_wave/public_reveal_walkthrough/public_reveal_walkthrough_result.json, ten_minute_reveal_board.json, public_reveal_validation_receipt.json, and result records/sign-off/first_wave/public_reveal_walkthrough_fixture_acceptance.json are metadata-only evidence. tests/test_public_reveal_walkthrough.py checks the fixture path, exported-bundle path, source-module digest validation, negative cases, and public-relative result record posture.
  • Claim boundary: standards/std_microcosm_public_reveal_walkthrough.json, the generated structured source record, and this page limit the module to public reveal walkability, route/evidence accounting, exact-copy public source-body import evidence, negative-case rejection, and metadata-only result records. They do not include launch operations, hosted deployment, public sharing, recipient work, external model access, secret export, private-system equivalence, source-file changes, Lean/Lake execution, or whole-system correctness.

Source-Backed Mechanism

The source mechanism is mechanism.public_reveal_walkthrough.validates_public_reveal_walkthrough in core/mechanism_sources.json.

The runtime locus is src/microcosm_core/organs/public_reveal_walkthrough.py. The source symbols that matter for cold-agent drilldown are:

  • run
  • run_reveal_bundle
  • _source_module_manifest_result
  • _source_open_body_import_summary
  • EXPECTED_NEGATIVE_CASES
  • AUTHORITY_CEILING
  • PUBLIC_SAFE_SOURCE_BODY_CLASSES

The governing standard is standards/std_microcosm_public_reveal_walkthrough.json. Its paper_module_contract binds this Markdown module to core/paper_module_capsules.json#paper_module.public_reveal_walkthrough and to the mechanism row above.

The atlas source row is intentionally not claimed as complete in this pass: core/organ_atlas.json is the source surface that must later receive paper_module_ref, mechanism_refs, and code_loci for this component. The re-entry capture is cap_quick_public_reveal_atlas_edge_population_wait_147e39c7a896.

Source-Open Body Imports

The exported reveal bundle carries five copied source bodies under examples/public_reveal_walkthrough/exported_public_reveal_bundle/source_modules/. The authority manifest is examples/public_reveal_walkthrough/exported_public_reveal_bundle/source_module_manifest.json.

The copied materials are:

Module idMaterial classWhat it contributes
public_reveal_first_slice_execution_receipt_body_importpublic_macro_receipt_bodyFirst public Microcosm slice validation result record with launch/public sharing/hosting boundaries.
public_reveal_runtime_shell_reorientation_receipt_body_importpublic_macro_receipt_bodySource result record for the shift from result record archive posture to runnable runtime shell posture.
public_reveal_clean_clone_state_fixture_receipt_body_importpublic_macro_receipt_bodyClean-clone fixture repair result record showing self-contained public validation.
public_reveal_public_substrate_boundary_policy_body_importpublic_macro_tool_bodyBoundary policy for source import and excluded material classes.
public_reveal_walkthrough_control_plane_source_body_importpublic_python_source_bodyThe public component source body that validates reveal commands, claims, digest evidence, and metadata-only result records.

All five rows are exact-copy imports, body_in_receipt=false, and digest checks must pass before the exported reveal bundle can count as source-backed. Result records may name refs, hashes, counts, and verdicts; they do not embed copied body text.

First Commands

From microcosm-substrate/, the first fixture command is:

The exported bundle command is:

PYTHONPATH=src python3 -m microcosm_core.organs.public_reveal_walkthrough run-reveal-bundle --input examples/public_reveal_walkthrough/exported_public_reveal_bundle --out receipts/runtime_shell/demo_project/organs/public_reveal_walkthrough --card

Focused regression:

PYTHONPATH=src ../repo-pytest tests/test_public_reveal_walkthrough.py -q --basetemp=/tmp/microcosm-public-reveal-pytest --ignore-host-pressure

Evidence Counts In The Reveal

The reveal board should not ask a cold reader to decode evidence-class numbers from context. When the walkthrough shows source-open body material counts, verified import counts, subprocess witnesses, algorithmic projection counts, or rows with source imports, it should pair each number with the evidence class and the scope boundary:

  • Counts prove that the public route exposes an inspectable accounting surface.
  • Counts do not prove launch-scope decision, whole-system correctness, or equal evidence depth across every component.
  • A small high-authority count is stronger than a large low-authority count for the claim it actually covers.
  • Generated or projected rows are reveal handles; source files, validators, result records, and scope limits remain the proof surfaces.

This keeps the public reveal from becoming a dashboard of impressive totals. The first reveal task is to show how a reader can move from number to result record to source boundary without crossing into private bodies, model-output data, account or browser state, or launch claims.

Reveal First View

The reveal board should open with the same compression grammar as the first-screen card, then widen only after the reader has a route to inspect:

  1. Restate the bounded claim frame.
  2. Show the command that produced the local state.
  3. Show one route explanation with result record refs.
  4. Show the evidence-count legend beside the result record refs.
  5. Show the scope limit before any totals, drilldowns, or observatory links.

This gives video-first or browser-first readers a visible artifact without turning the reveal into a marketing hero. Motion, screenshots, and observatory views are allowed presentation layers only when the same evidence legend, scope boundary, and result record refs remain on the first view.

Discipline In The Reveal

The reveal should make discipline legible as prevented failure, not as a wall of policy labels. Before showing totals or motion, the board should pair each impressive-looking artifact with the boundary that keeps it honest:

Reveal artifactBoundary shown beside itWhat the boundary prevents
Local .microcosm/ statesource_files_mutated=false plus route/work/event/evidence refs.Reading a local demo as source-file changes, hosted launch, or external model service.
Body-import countsverified_macro_body_import rows with validator or result record refs.Reading copied public material as private-system equivalence.
Projection countsSource-coupling and generated-row scope boundaries.Reading generated cards as source authority or domain proof.
Observatory viewsCompact endpoint first, full model as drilldown.Letting browser motion replace command, result record, and evidence-class checks.
Doctrine constraintsFailure mode or scope boundary beside the constraint.Reading governance as ceremony rather than as a specific overclaim guard.

If the reveal cannot show those boundaries on the first view, it should defer the visual flourish and keep the compact result record-backed route visible instead.

Prior Art Grounding

The public reveal path is grounded in first-run CLI and progressive-disclosure practice. The Command Line Interface Guidelines motivate a single runnable command, examples, discoverable next steps, and machine-readable output. Nielsen Norman Group's progressive disclosure pattern motivates showing the bounded first route before expanding into full observatory or JSON drilldowns.

The reveal's evidence walk also borrows from provenance and tracing patterns: W3C PROV for moving from artifact to source and result record, and OpenTelemetry traces for representing causal chains as inspectable linked work. Microcosm applies those patterns to a local walkthrough so the visual board remains evidence accounting, not a launch or maturity claim.

Browser/Video Reveal Board

The reveal board is the public visual candidate for a 60-second walkthrough. It must therefore be more than raw JSON, but it must still be less than a product claim. The first browser/video frame should show:

  1. The command that produced the local state.
  2. The selected route and one-line route reason.
  3. The route explanation through work, events, evidence, and result record refs.
  4. The evidence legend, including evidence class and scope boundary.
  5. The compact observatory or first-screen endpoint used for the board.
  6. The scope limit before totals, motion, or full-model drilldown.

Motion is allowed to make the causal order easier to inspect: command to local state, local state to selected route, selected route to work/event/evidence, and evidence to result record or validator. Motion is not allowed to displace the command, result record/evidence ref, scope boundary, or scope limit from the first view.

The board should end by offering exactly three next steps: reader-specific branch, result record drilldown, and full observatory JSON. That keeps the visual surface from expanding into a second README while still making the public reveal inspectable by readers who will not start in the terminal.

The validated claim is narrow:

> Microcosm turns a repo into a local operating system: patterns, routes, > work transactions, events, evidence, and explanations.

Negative fixtures reject launch or hosting overclaim, private-data equivalence, missing evidence refs, and marketing-only reveal material without runtime commands.

Reader Evidence Routing

  • Start with the first commands and the JSON Bundle Binding to identify the fixture, exported bundle, source record, mechanism row, standard, and result record surfaces.
  • For behavior questions, read src/microcosm_core/organs/public_reveal_walkthrough.py and tests/test_public_reveal_walkthrough.py before trusting this prose.
  • For source-open body questions, read the exported bundle's source_module_manifest.json; it is the evidence for exact-copy relation, digest match, material class, and metadata-only result record posture.
  • For visual or browser walkthrough questions, read the evidence legend, result record refs, scope boundary, and scope limit before reading totals, observatory links, or motion as meaningful.
  • Treat generated atlas docs, generated coverage projections, generated result records, copied-body presence, and browser/video boards as navigation or validation projections. They do not become source authority for launch, hosting, provider, private-system-equivalence, or whole-system claims.

Validation Result record Path

PYTHONPATH=src ./repo-pytest tests/test_public_reveal_walkthrough.py -q --basetemp=/tmp/microcosm_public_reveal_walkthrough_pytest --ignore-host-pressure
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This paper module describes public reveal walkthrough validation only. It excludes launch, hosted deployment, public sharing, recipient work, external model access, secret export, private-system equivalence, Lean/Lake execution, source-file changes, or whole-system correctness.

Generated atlas docs, generated coverage projections, generated result records, copied-body presence, browser/video boards, and impressive evidence totals are source-linked only. The source authority remains with the standard, bundle, mechanism row, component source, source-module manifest, validators, and result record refs named above.

Scope limit

This module may claim a bounded public reveal walkthrough over the local fixture and exported bundle: runnable commands, selected route explanation, work/event/evidence refs, source-open body import manifest checks, evidence legend, negative-case refusals, metadata-only result records, and scope limits. A diagram view is generated for this module; an atlas card is a staged exercise pending atlas owner-lane binding. One selective dependency remains open and requires a governed bundle update to resolve.

It does not claim launch-scope decision, hosted deployment, publishing-scope decision, recipient work, external model service, secret export, private-system equivalence, Lean/Lake execution, source-file changes, or whole-system correctness. Visual boards, screenshots, observatory motion, copied-body counts, and generated cards remain presentation or navigation projections over the result record path.

Architecture & navigation (10)

Pattern Binding ContractChecks a real pattern catalog for digest, cross-reference, and dependency-cycle integrity.5/5

Does It checks that each declared "pattern" (a reusable bit of system structure) is properly hooked up: it has the required fields, points at real source material by reference, names the rule it answers to and what it explicitly does NOT claim, has no duplicate pattern IDs, and leaks no secrets or private bodies into the public record. It produces a written result record showing the overall pass/fail, which pattern rows are accepted versus rejected and why, and what each row is forbidden from claiming.

Scope limit It validates only the declared public pattern-binding/route-readiness contract; it does not certify the private pattern ledger, public launch or hosted-public posture, public sharing, external model access, private-data equivalence, or whole-system correctness, and it does not turn any mined pattern row into a standalone public leaf (selection stays component-first and fixture-bound).

Run
microcosm pattern-route-readiness validate-bundle --input examples/pattern_binding_contract/exported_route_readiness_bundle --out /tmp/microcosm-pattern-route-readiness

Paper module Pattern Binding Contract

Teleology

pattern_binding_contract is the public root component that binds pattern rows to source-available source bundles, public runtime refs, authority-chain handles, scope boundaries, and secret-exclusion result records. Synthetic rows are allowed only as regression controls or negative cases; they are not product evidence.

Purpose

A mined engineering pattern is a tempting thing to publish on its own. It reads like a self-contained insight, so it is easy to lift a single row out of a private ledger and present it as a finished public claim. This component exists to stop that. It answers one question: can a given pattern row be admitted to the public surface, and if so, under exactly what evidence and what ceiling?

The check is binding rather than display. Every pattern row must name a source bundle that points at a real public runtime ref or regression-harness ref, a governing standard, and an scope boundary. A row that lacks any of these, duplicates another row's id, or claims to be a standalone public leaf is rejected. The same validator runs deliberate negative cases alongside the positive control, so the result record proves not only that good rows pass but that each known failure mode is still caught.

The less obvious idea is truth accounting. When an exported bundle is validated, the component separates rows that merely describe runtime metadata from rows that represent a real pattern-ledger import, and records that a high accepted-row count is not the same as system progress. This guards against the quiet inflation where counting accepted rows starts to read like a measure of how much real work has landed. The route-readiness layer closes the matching gap on the selector side: a row can look selectable in isolation, but it is only admitted through the component that owns it, its fixture contract, and a gate that refuses to let hard no-standalone rows appear as selectable targets.

Public Contract

The validator checks required binding fields, duplicate pattern conflicts, unsupported authority-chain handles, unresolved reference bundles, secret/provider/operator body sentinels, and public-leaf overclaim failures. It emits command-owned result records under receipts/first_wave/pattern_binding_contract/.

The exported system bundle also carries the source route-readiness selector overlays as public source-open bodies: examples/pattern_binding_contract/exported_route_readiness_bundle/. The validator recomputes the selector contract against the imported pattern ledger, route-readiness audit, row-to-component router, route cards, fixture specs, decision matrix, dependency DAG, internal routing graph, and copied source validation report. This closes the old gap where a mined pattern row could look selectable without opening the component bundle that owns it.

Cold readers should use microcosm pattern-route-readiness validate-bundle against examples/pattern_binding_contract/exported_route_readiness_bundle/ when the question is selector admission rather than generic pattern binding. The older pattern-binding validate-route-readiness-bundle action remains a compatibility route to the same validator.

Shape

Pattern rowsid, governing standard,scope boundary, source refs,projection posturePattern rows id, governing standard, scope boundary, source refs, projection postureSource bundlesmetadata-only refs topublic runtime orregression harnessSource bundles metadata-only refs to public runtime or regression harnessAuthority-chain handlesresolver result recordsAuthority-chain handles resolver result recordsrequired fields, duplicateids,bundle resolution,secret-exclusion scanrequired fields, duplicate ids, bundle resolution, secret-exclusion scanDuplicate id rejectedDuplicate id rejectedPrivate body leak rejectedPrivate body leak rejectedPublic-leaf overclaimrejectedPublic-leaf overclaim rejectedUnsupported authorityhandle not upgradedUnsupported authority handle not upgradedTruth accountingruntime-metadata rows vsreal pattern-ledger importTruth accounting runtime-metadata rows vs real pattern-ledger importRoute-readiness selectorcomponent-first admission,fixture contract,hard no-standalone gateRoute-readiness selector component-first admission, fixture contract, hard no-standalone gateResult recordsrefs, digests, counts,verdicts; body text omittedResult records refs, digests, counts, verdicts; body text omittedNegativeNegativeBundleBundle

Source refs

required fields, duplicate ids, bundle resolution, secret-exclusion scan
pattern_binding_contract
Diagram source
flowchart LR subgraph Inputs["Pattern-binding inputs"] Patterns["Pattern rows id, governing standard, scope boundary, source refs, projection posture"] Bundles["Source bundles metadata-only refs to public runtime or regression harness"] Handles["Authority-chain handles resolver result records"] end Validator["pattern_binding_contract required fields, duplicate ids, bundle resolution, secret-exclusion scan"] subgraph Negative["Refusal floor"] Dup["Duplicate id rejected"] Leak["Private body leak rejected"] Overclaim["Public-leaf overclaim rejected"] Unsupported["Unsupported authority handle not upgraded"] end subgraph Bundle["Exported-bundle path"] Truth["Truth accounting runtime-metadata rows vs real pattern-ledger import"] RouteReadiness["Route-readiness selector component-first admission, fixture contract, hard no-standalone gate"] end Result records["Result records refs, digests, counts, verdicts; body text omitted"] Patterns --> Validator Bundles --> Validator Handles --> Validator Validator --> Negative Validator --> Bundle Truth --> RouteReadiness Negative --> Result records Bundle --> Result records

Evidence Binding

Accepted component row: core/organ_registry.json::implemented_organs[pattern_binding_contract]. Evidence class: core/organ_evidence_classes.json::organ_evidence_classes[pattern_binding_contract] with rank 5 semantic-validator authority. The runtime locus is src/microcosm_core/organs/pattern_binding_contract.py, with focused coverage in tests/test_pattern_binding_contract.py.

Paper bundle authority: core/paper_module_capsules.json#paper_module.pattern_binding_contract. Mechanism source: core/mechanism_sources.json#mechanism.pattern_binding_contract.validates_public_pattern_bindings.

Reader Evidence Routing

Read this module as a public binding membrane for pattern rows, not as a private pattern-ledger certificate or a standalone public-leaf selector. Start with paper_modules/pattern_binding_contract.json for the bundle payload, then open standards/std_microcosm_pattern_binding_contract.json to check the required fields, public/private boundary, source-open body import floor, route-readiness rules, and result record expectations.

Use core/fixture_manifests/pattern_binding_contract.fixture_manifest.json before inspecting fixtures or exported bundles. The manifest and the source_module_manifest.json files name the copied source body floor; result record payloads should carry source refs, digests, anchors, counts, verdicts, and omission result records rather than inlining body text.

Treat route-readiness selection as component-first evidence. A mined pattern row can be selectable only through the route-readiness bundle, selector contract, and result records that keep duplicates, unknown refs, private leakage, missing fixture contracts, dependency cycles, hard no-standalone rows, and companion-overlay gaps rejected.

Prior Art Grounding

This component follows the software pattern-language tradition of making reusable engineering structures explicit, named, and reviewable. The Hillside patterns library is the direct prior-art family for treating patterns as shared vocabulary rather than loose implementation notes.

The binding layer also borrows from provenance and supply-chain attestation patterns. W3C PROV motivates the source/ref/evidence relation shape, while SLSA and in-toto motivate digest-bound artifact claims and step-level metadata. Microcosm applies those ideas to pattern rows and route-readiness selectors, not to launch certification.

Re-entry condition: if copied source bodies, route-readiness overlays, or negative-case rules change, rerun the three first commands above and update this paper module plus standards/std_microcosm_pattern_binding_contract.json from the new result record fields. Do not raise the scope limit from selector and binding validation to launch, public sharing, private-data equivalence, or standalone public-leaf authority.

Validation Result record Path

From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:

These checks validate public pattern-binding fixtures, system-bundle result records, route-readiness selector result records, and metadata-only authority handles only; they do not certify the private pattern ledger, hosted readiness, launch, external model access, private-data equivalence, or whole-system correctness.

The current authority is the runtime result record set under receipts/first_wave/pattern_binding_contract/; do not cite a separate pattern-specific sign-off result record unless an sign-off-lane artifact is actually present. Cold readers should inspect result record fields rather than markdown constants: status, secret_exclusion_scan, source_open_body_imports, truth_accounting, route_readiness_summary, selection_contract, and source_manifest.

Scope boundary

Scope limit

This module covers public pattern-binding mechanics: source-bundle validation, reference-bundle validation, authority-handle validation, route-readiness selector admission, duplicate and unknown-ref rejection, private-leakage sentinel checks, and metadata-only result record shape. It is evidence for the pattern_binding_contract component and mechanism.pattern_binding_contract.validates_public_pattern_bindings.

The ceiling stops before private pattern-ledger authority, hosted or public launch-scope decision, deployment posture, standalone public-leaf selector status, private-data equivalence, external model access, recipient work, source-file changes, publishing-scope decision, or whole-system correctness.

Scope boundary

This module documents public pattern-binding mechanics and regression harnesses. It does not certify the private pattern ledger, public launch operations, hosted-public posture, public sharing, recipient work, external model access, private-data equivalence, or whole-system correctness. Route-readiness import does not make any mined pattern row a standalone public leaf; selection remains component-first and fixture-bound.

Source and projection details
Source-Open Body Floor

The source-open body floor is the imported public bundle, not the private pattern ledger. Cold readers can open examples/pattern_binding_contract/exported_substrate_bundle/ and examples/pattern_binding_contract/exported_route_readiness_bundle/ to inspect the copied source module manifests, source bundles, reference bundles, authority-chain handles, route-readiness overlays, selector contract inputs, and copied source validation report. The required body floor is named by each source_module_manifest.json plus source_capsules.json, reference_capsules.json, and authority_chain_handles.json.

Result records and manifests must stay metadata-only where the standard requires it: they carry refs, digests, anchors, counts, verdicts, omission result records, and secret-exclusion results. They do not inline private source bodies, raw operator payloads, model-output data, recipient data, or hidden pattern-ledger material.

Pattern Assimilation StepVerifies each landed task filed exactly one learning record naming what it changed.5/5

Does When a piece of work lands in the local system, this component checks the completion records for it — confirming that exactly one same-lane "what did we learn from this" decision was filed per landed item, that any claimed refinement names an owner-visible surface and the artifact it changed, that a "nothing to refine" entry carries its required typed fields, and that there are no duplicate or off-lane entries. It runs over fixture data and makes the completion bookkeeping inspectable, showing whether the recorded completions conform to the system's stated learning-from-landed-work rules, rather than leaving that on faith.

Scope limit It validates only the declared public completion contract over synthetic fixture data; it does not ingest private lessons, mutate live ledgers, promote global doctrine, include launch operations or public sharing, make external model access, claim private-data equivalence, or certify public runtime behavior.

Run
PYTHONPATH=src python3 -m microcosm_core.validators.acceptance --only pattern_assimilation_step --input fixtures/first_wave/pattern_assimilation_step/input --out receipts/first_wave/pattern_assimilation_acceptance.json

EvidenceContract validatorevidence 5/5Import validation

architecturenavigationdoctrine

Source Design note · Source atlas

Paper module Pattern Assimilation

pattern_assimilation_step is the public completion-learning contract for landed components. It validates that every component recorded as landed in a fixture set carries exactly one same-lane completion decision, and that the decision resolves to a result record that can be inspected rather than to a phrase.

Purpose

When a development pass claims that local work taught the system something, that claim is usually prose: a note that the run "improved the fixture" or "found nothing to refine". Prose is easy to assert and impossible to check. This component answers a single question: did each landed component actually deposit an inspectable completion decision, or is the learning claim unbacked?

The decision is forced into one of two typed shapes. Either a concrete refinement result record that names the owner surface it changed and the artifact it touched, or a typed nothing_to_refine result record that proves stewardship was checked, the next-best lane was considered, and a re-entry condition was recorded. A landed component with no completion, or with a completion that points at a result record that does not exist or does not match, is rejected. So is a duplicate result record id that would let one lesson be counted twice.

The interesting constraint is the one the component refuses to relax. A local lesson may route to the owner surface that owns the affected artifact, but it may not promote itself into global doctrine. A refinement row that sets claims_global_doctrine_authority is blocked outright. The point is that learning has to land on a specific board with a named steward, not become a free-floating rule, which is the failure mode that turns a useful local note into unsupported general advice.

Route Card

  • Component id: pattern_assimilation_step
  • JSON bundle authority: core/paper_module_capsules.json::paper_module.pattern_assimilation
  • Accepted-component evidence class: semantic_validator
  • Standard: standards/std_microcosm_pattern_assimilation_step.json
  • Validator authority: src/microcosm_core/validators/sign-off.py
  • Fixture manifest: core/fixture_manifests/pattern_assimilation_step.fixture_manifest.json
  • Fixture input: fixtures/first_wave/pattern_assimilation_step/input
  • Runtime bundle: examples/pattern_assimilation_step/exported_assimilation_bundle
  • Primary result records: receipts/first_wave/pattern_assimilation_acceptance.json, receipts/first_wave/pattern_assimilation_receipt.json, and receipts/first_wave/pattern_assimilation_step/exported_assimilation_bundle_validation_result.json
  • Projection posture: the JSON bundle is the paper-module source authority. This Markdown is the cold-reader explanation.

Shape

resolvedmissing, dangling, duplicate, upgradedLanded component rowseach names a completionresult and result record refLanded component rows each names a completion result and result record refRefinement result recordsowner_surface, changedartifactRefinement result records owner_surface, changed artifactNothing-to-refine resultrecordsstewardship, next-best lane,re-entryNothing-to-refine result records stewardship, next-best lane, re-entryValidatorValidatorPre-filter valid resultrecordsrefinement: named owner, nodoctrine upgradenothing: all three fieldspresentPre-filter valid result records refinement: named owner, no doctrine upgrade nothing: all three fields presentPer landed component:exactly one completion,ref resolves to a matchingrow?Per landed component: exactly one completion, ref resolves to a matching row?Acceptedtyped, owner-routedcompletion learningAccepted typed, owner-routed completion learningNegative cases recordedMISSING_PATTERN_ASSIMILATION_CompletionMISSING_REFINEMENT_OWNER_SURFACEDUPLICATE_REFINEMENT_RECEIPT_IDLOCAL_LESSON_AUTHORITY_UPGRADERAW_SEED_BODY_IN_ASSIMILATION_FIXTURENegative cases recorded MISSING_PATTERN_ASSIMILATION_Completion MISSING_REFINEMENT_OWNER_SURFACE DUPLICATE_REFINEMENT_RECEIPT_ID LOCAL_LESSON_AUTHORITY_UPGRADE RAW_SEED_BODY_IN_ASSIMILATION_FIXTUREmetadata-only result recordsmetadata-only result recordsScope limitpublic fixture metadata, nodoctrine changesScope limit public fixture metadata, no doctrine changes

Source refs

Landed component rows each names a completion result and result record ref
organ_landing_summaries.jsonl
Validator
acceptance.pyvalidate_pattern_assimilation
metadata-only result records
receipts/first_wave/pattern_assimilation_*
Diagram source
flowchart TD landings["Landed component rows organ_landing_summaries.jsonl each names a completion result and result record ref"] refinement["Refinement result records owner_surface, changed artifact"] nothing["Nothing-to-refine result records stewardship, next-best lane, re-entry"] validator["sign-off.py validate_pattern_assimilation"] filter["Pre-filter valid result records refinement: named owner, no doctrine upgrade nothing: all three fields present"] match{"Per landed component: exactly one completion, ref resolves to a matching row?"} pass["Accepted typed, owner-routed completion learning"] negatives["Negative cases recorded MISSING_PATTERN_ASSIMILATION_Completion MISSING_REFINEMENT_OWNER_SURFACE DUPLICATE_REFINEMENT_RECEIPT_ID LOCAL_LESSON_AUTHORITY_UPGRADE RAW_SEED_BODY_IN_ASSIMILATION_FIXTURE"] result records["metadata-only result records result records/first_wave/pattern_assimilation_*"] ceiling["Scope limit public fixture metadata, no doctrine changes"] landings --> match refinement --> filter nothing --> filter filter --> match validator --> filter match -->|resolved| pass match -->|missing, dangling, duplicate, upgraded| negatives pass --> result records negatives --> result records result records --> ceiling

The bundle is present, so the cold-reader path starts from core/paper_module_capsules.json::paper_module.pattern_assimilation, not from a legacy-only boundary. That bundle binds this Markdown to the accepted pattern_assimilation_step component, the sign-off.py validator locus, the standard, first-wave fixture manifest, exported assimilation bundle, focused tests, metadata-only result records, and generated Mermaid/Atlas navigation status.

Read the diagram as the validation flow, not an authority upgrade. The validator pre-filters the refinement and nothing-to-refine result records, then walks each landed component row and checks that its declared completion resolves to a matching valid result record; unresolved, missing, duplicate, or doctrine-upgraded rows become recorded negative cases. The ceiling remains public fixture and exported-bundle metadata plus metadata-only result records, with no live ledger mutation, source-file changes, source note ingestion, private-system equivalence, global doctrine changes, launch or publishing-scope decision, behavior-change proof, or whole-system correctness.

First Command

From microcosm-substrate:

Use the exported bundle validator when the question is whether the public source-open body imports still match their declared source bodies:

What It Proves

Pattern assimilation is the public completion-learning contract for landed components. It validates that each landed component in the fixture set has exactly one same-lane completion decision: either a concrete refinement result record naming the owner surface and changed artifact, or a typed nothing_to_refine result record with stewardship checked, next-best-lane checked, and a re-entry condition.

A cold agent should use this component when a pass claims that local work taught the system something. The validator makes that claim inspectable: it checks owner-surface evidence, duplicate result record ids, off-lane completions, missing completion decisions, residual lifecycle posture, and attempts to promote a local lesson into global doctrine authority without the governing lane.

Bundle-Bound Reader Shape

The paper-module bundle binds this Markdown to two explained subjects: pattern_assimilation_step and mechanism.pattern_assimilation_step.validates_public_pattern_assimilation_step. It also carries the route-contract concept concept.architecture_and_navigation_route_contract_bundle.

The executable locus is src/microcosm_core/validators/sign-off.py, specifically validate_pattern_assimilation, run_assimilation_bundle, validate_source_module_manifest, _write_jsonl_upsert, EXPECTED_NEGATIVE_CASES, PATTERN_ASSIMILATION_AUTHORITY_CEILING, and main.

Its law edges are bounded to the local completion-learning scope limit: P-1, P-2, P-3, P-5, P-6, P-7, P-8, P-9, P-12, P-13, P-15, AX-1, AX-4, AX-5, AX-6, AX-7, AX-8, AX-11, and AX-12. Its paper-module neighbors are cold_reader_route_map, pattern_binding_contract, and voice_to_doctrine_self_improvement_loop.

If the generated JSON instance disagrees with the bundle or validator source, the bundle and validator win; refresh the projection rather than editing it.

Source-Backed System

This component is more than a prose rule. The exported assimilation bundle imports four bodies by manifest:

  • macro_pattern_autonomy_process_contract_body_import from state/microcosm_portfolio/reconstruction/macro_pattern_autonomy_process_contract_v1.json
  • macro_pattern_assimilation_fixture_manifest_body_import from state/microcosm_portfolio/reconstruction/fixture_manifests/pattern_assimilation_step.fixture_manifest.json
  • pattern_assimilation_retracted_adapter_receipt_body_import from state/microcosm_portfolio/reconstruction/pattern_assimilation_step_real_substrate_adapter_receipt_v1.json
  • pattern_assimilation_acceptance_validator_source_body_import from src/microcosm_core/validators/sign-off.py

The manifest is examples/pattern_assimilation_step/exported_assimilation_bundle/source_module_manifest.json. It must keep body_in_receipt: false, exact source and target digests, required anchors, and validation refs. The copied validator body anchors validate_pattern_assimilation, run_assimilation_bundle, and PATTERN_ASSIMILATION_AUTHORITY_CEILING.

Result record Floor

A passing fixture run emits:

  • receipts/first_wave/pattern_assimilation_acceptance.json
  • receipts/first_wave/pattern_assimilation_receipt.json
  • state/microcosm_portfolio/reconstruction/macro_pattern_autonomy_process_runs_v1.jsonl

A passing exported-bundle run emits:

  • receipts/first_wave/pattern_assimilation_step/exported_assimilation_bundle_validation_result.json

The first-wave result records must include public-relative paths, no private root paths, no copied body text, a redacted non-public-state scan with zero blocking hits, observed negative cases, error codes, scope limit, scope boundary, and the exact result record paths. The bundle result record must show source_module_manifest_status: pass, body_copied_material_count: 4, the four body-material ids above, body_in_receipt: false, body_text_in_receipt: false, and only public replacement refs.

Reader Evidence Routing

A cold reader should inspect the evidence in this order:

  1. Open the JSON source record to confirm subject ids, dependency ids, principle and axiom refs, and code locus.
  2. Run the focused sign-off test or fixture command to prove the completion learning shape still accepts valid fixture rows and rejects the required negative cases.
  3. Run the exported bundle validator when source-module digest, anchor, copied body, or replacement posture is the question.
  4. Treat generated JSON, Mermaid, Atlas, and coverage as projection evidence only; if they drift, refresh them through the doctrine-lattice builder.
  5. Use the result record floor to check public-relative paths, metadata-only source verification, source note exclusion, and local-lesson scope limits.

Negative Cases

The current negative-case floor is:

  • MISSING_PATTERN_ASSIMILATION_CLOSEOUT for a landed component without a refinement or typed no-op completion.
  • MISSING_REFINEMENT_OWNER_SURFACE, MISSING_STEWARDSHIP_CHECK, and MISSING_REENTRY_CONDITION for refinement result records that cannot route the lesson to an owner surface and re-entry condition.
  • DUPLICATE_REFINEMENT_RECEIPT_ID for duplicate refinement result records.
  • LOCAL_LESSON_AUTHORITY_UPGRADE for local lessons that claim global doctrine authority.
  • RAW_SEED_BODY_IN_ASSIMILATION_FIXTURE for source notes or private source note bodies in the public fixture.
  • ASSIMILATION_BUNDLE_SOURCE_MODULE_INVALID for exported source-module digest or anchor mismatch.

These are not ornamental checks. If a run stops observing them, the module can no longer support the claim that Microcosm learns from landed work without turning local notes into unsupported global doctrine.

Prior Art Grounding

Pattern assimilation is grounded in software pattern-language practice: recurring engineering lessons should be named, bounded, reviewed, and connected to the context where they apply. The Hillside patterns library is the direct prior-art family for treating patterns as a shared engineering vocabulary rather than one-off notes.

The result record and trace shape also borrows from provenance and observability practice. W3C PROV informs the requirement that each refinement cite its owner surface and evidence relation, while OpenTelemetry traces are a useful analogue for linking spans of work into an inspectable causal chain. Microcosm uses those inspirations for completion learning only; a local lesson still needs the owning lane before it can become broader doctrine.

Validation Result record Path

From microcosm-substrate, keep validation result records outside tracked first-wave paths unless the owning result record lane intends to refresh them:

The fixture and bundle result records prove same-lane completion-learning shape over the public fixtures and copied body imports only; they do not promote a local lesson to global doctrine authority. Source-copy or result record drift is an owning validator/manifest lane issue, not Markdown source authority.

Focused pytest re-entry is:

PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_pattern_assimilation_step.py -q --basetemp=/tmp/microcosm_pattern_assimilation_pytest

Use an isolated /tmp basetemp for focused pytest runs so result record scratch paths do not rewrite source-run rows inside the checkout.

Validation Anchors

Focused coverage lives in tests/test_pattern_assimilation_step.py and checks:

  • streamed JSONL loading and upsert behavior;
  • required negative-case observation;
  • public-relative redacted result records;
  • source result record field floors from the fixture manifest;
  • exported assimilation bundle runtime shape;
  • source-module digest mismatch rejection;
  • exported bundle result records;
  • exact copied source body imports.

Scope boundary

Scope limit

Pattern assimilation validates public completion-learning metadata plus regression fixtures. It does not ingest private lessons, read source note bodies, mutate live work log or work log state, promote global doctrine, include launch operations or public sharing, make external model access, claim private-data equivalence, prove behavior changes, or certify public runtime behavior.

Its useful claim is narrower: over the supplied fixtures and copied public body imports, the component shows that completion learning has a typed, same-lane, owner-routed shape and that invalid completion claims are rejected before they become doctrine.

Scope limit

This module may claim public completion-learning validation over the supplied fixtures and copied body-import manifests: same-lane completion decisions, owner-surface refinement evidence, typed nothing_to_refine result records, stewardship and re-entry fields, duplicate result record rejection, local-lesson scope limits, source note exclusion, public-relative result records, and metadata-only source-module verification.

It does not claim complete pattern coverage, private source-root equivalence, live work log or work log mutation, source note ingestion, external model access, global doctrine changes, behavior-change proof, launch or publishing-scope decision, or whole-system correctness. The generated diagram and atlas views are navigation surfaces; they do not upgrade local lessons into global doctrine.

Executable Doctrine GrammarChecks that example standards files declare their purpose, rule, records, and what they do not claim.5/5

Does It checks that a folder of example "doctrine" files (toy public standards and write-ups that describe how the system is supposed to behave) actually have the required parts: a stated purpose, the rule that governs them, the result records they are expected to produce, and an honest statement of what they do NOT claim. It reports, file by file, which entries are well-formed and which ones fail a required check, including ones that overclaim (saying a passing grammar check proves the doctrine is complete) or that try to treat plain advice as enforceable authority.

Scope limit It validates an exported public executable-grammar metabolism bundle with exact copied-body digests and redacted result records, plus fixture regressions for standards/paper-module shape. It does not publish source doctrine bodies in result records, prove doctrine completeness, export a private standards engine, authorize later components, or claim external model access, private-data equivalence, launch-scope decision, or whole-system correctness.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.executable_doctrine_grammar validate-executable-grammar-metabolism-bundle --input examples/executable_doctrine_grammar/exported_executable_grammar_metabolism_bundle --out receipts/first_wave/executable_doctrine_grammar --card

EvidenceContract validatorevidence 5/5Import validation

architecturenavigationdoctrine

Source Design note · Source atlas

Paper module Executable Doctrine Grammar

Purpose

Doctrine in most systems is prose convention. A standard says a rule should hold, a paper module says a section should be present, and nothing checks whether the claim is actually true. This component exists to make doctrine shape a thing a program can pass or fail. It answers one question: does a standard row or a paper-module fixture carry the structure that doctrine here requires, or is it just text that looks the part?

What it checks is deliberately structural rather than semantic. A standard row must declare a teleology, a governing standard, result record expectations, and an scope boundary. A paper module must carry the matching sections by heading. The validator does not judge whether the prose is good. It judges whether the load-bearing fields are present, so a row cannot quietly drop its result record expectations or its scope boundary and still pass.

The less obvious part is that the failures are first-class. Five negative cases are part of the contract: a row missing its required fields, a prose-only standard that tries to claim executable authority, a source doctrine body copied into a public fixture, a duplicate standard slug, and a grammar pass that overclaims doctrine completeness. A run that does not observe each of these classes is blocked, so the checker is held to demonstrating that it can reject, not only that it can accept.

The component also imports copied source bodies, but only through a source-module manifest with declared SHA-256 digests, and never inlines a body into a result record. The result record reports refs, hashes, counts, and verdicts; the bodies live in the bundle. The point is to make the doctrine shape checkable without turning the public surface into an export of the private standards engine.

Teleology

executable_doctrine_grammar turns toy public standards and paper-module fixtures into deterministic grammar result records. It makes doctrine-shape claims checkable while importing copied, source bodies only through source-module manifests, digests, and result record boundaries.

Shape

Public doctrine fixturesPublic doctrine fixturesExecutable grammar validatorExecutable grammar validatorExported standards bundleExported standards bundleSource-module manifestSource-module manifestmetadata-only deterministicresult recordsmetadata-only deterministic result recordsBundle and atlas evidenceBundle and atlas evidenceBounded reader claimdoctrine-shape validation,not launch-scope decisionBounded reader claim doctrine-shape validation, not launch-scope decision

Source refs

Public doctrine fixtures
fixtures/first_wave/executable_doctrine_grammar/input
Executable grammar validator
src/microcosm_core/organs/executable_doctrine_grammar.py
Exported standards bundle
examples/executable_doctrine_grammar/exported_standards_bundle
Source-module manifest
examples/executable_doctrine_grammar/exported_executable_grammar_metabolism_bundle/source_module_manifest.json
metadata-only deterministic result records
receipts/first_wave/executable_doctrine_grammar/
Bundle and atlas evidence
core/paper_module_capsules.json::paper_modules[18]
Diagram source
flowchart TD A["Public doctrine fixtures fixtures/first_wave/executable_doctrine_grammar/input"] --> B["Executable grammar validator src/microcosm_core/components/executable_doctrine_grammar.py"] C["Exported standards bundle examples/executable_doctrine_grammar/exported_standards_bundle"] --> B D["Source-module manifest examples/executable_doctrine_grammar/exported_executable_grammar_metabolism_bundle/source_module_manifest.json"] --> B B --> E["metadata-only deterministic result records result records/first_wave/executable_doctrine_grammar/"] E --> F["Bundle and atlas evidence core/paper_module_capsules.json::paper_modules[18]"] F --> G["Bounded reader claim doctrine-shape validation, not launch-scope decision"]

Reader Evidence Routing

Reader evidence routes through the executable-grammar runtime, fixture inputs, exported standards bundle, executable-grammar metabolism bundle, source-module manifests, public result records, and focused tests. The Mermaid diagram and Atlas card are generated navigation projections; this page is the cold-reader explanation of the proof boundary.

Public Contract

The validator checks standard slugs, teleology, governing standard refs, result record expectations, scope boundaries, paper-module sections, source-body sentinels, duplicate slug conflicts, prose-only authority claims, and doctrine-completeness overclaims. It also validates the imported public executable-grammar specimen, standards registry, standards type-plane, lattice registry, kind-atlas runtime, and standards option-surface runtime as exact copied source modules.

Prior Art Grounding

This component is grounded in schema validation, parser generators, and executable semantics traditions. JSON Schema anchors the idea that document shape can be validated by a shared machine contract, Tree-sitter shows the practical value of generated grammars for inspectable source structure, and the K framework is a close reference point for turning semantic rules into executable artifacts.

Microcosm borrows the executable-contract pattern: doctrine shape, result record expectations, duplicate slugs, imported source bodies, and scope boundaries are checked by a validator instead of left as prose convention. It does not claim source doctrine completeness or launch-scope decision.

First Commands

From microcosm-substrate/, a cold agent can prove the fixture path:

PYTHONPATH=src python3 -m microcosm_core.organs.executable_doctrine_grammar validate --input fixtures/first_wave/executable_doctrine_grammar/input --out receipts/first_wave/executable_doctrine_grammar --card

The exported public standards bundle uses the same component with a narrower input:

PYTHONPATH=src python3 -m microcosm_core.organs.executable_doctrine_grammar validate-standards-bundle --input examples/executable_doctrine_grammar/exported_standards_bundle --out receipts/first_wave/executable_doctrine_grammar --card

The source-open source-body floor is the executable-grammar metabolism bundle:

PYTHONPATH=src python3 -m microcosm_core.organs.executable_doctrine_grammar validate-executable-grammar-metabolism-bundle --input examples/executable_doctrine_grammar/exported_executable_grammar_metabolism_bundle --out receipts/first_wave/executable_doctrine_grammar --card

Source-Backed Mechanism

The mechanism row mechanism.executable_doctrine_grammar.validates_public_doctrine_grammar_bundle points at validate, validate_standards_bundle, validate_executable_grammar_metabolism_bundle, validate_source_module_imports, validate_standard_registry, validate_paper_module_shape, result_card, EXPECTED_NEGATIVE_CASES, and GRAMMAR_AUTHORITY_CEILING.

Those symbols are the runnable floor:

  • validate writes the fixture standards, paper-module, group-index, and sign-off result records.
  • validate_standards_bundle validates the exported public standards bundle and keeps result record paths public-relative.
  • validate_executable_grammar_metabolism_bundle validates the copied executable-grammar metabolism specimen, standards registry/type-plane, lattice registry, kind-atlas, and standards option-surface bodies.
  • validate_source_module_imports requires source_module_manifest.json, copied_non_secret_macro_body, exact_copy, allowlisted source refs, body-in-result record exclusion, and SHA-256 digest matches.
  • result_card compresses result record evidence without duplicating body text.

Negative Cases

The fixture must keep these failures executable rather than prose-only:

  • invalid_standard_and_module: missing teleology, result record expectations, governing standard, and scope boundary.
  • prose_standard_claims_runtime_authority: prose cannot claim executable runtime authority.
  • macro_doctrine_body_copied_into_fixture: source doctrine body sentinels are rejected from public fixtures.
  • duplicate_standard_slug_conflict: duplicate slugs are rejected deterministically.
  • grammar_index_pass_overclaims_doctrine_complete: grammar pass is not doctrine-completeness authority.

Atlas Binding

  • paper_module_ref: core/paper_module_capsules.json#paper_module.executable_doctrine_grammar
  • mechanism_refs[].ref: mechanism.executable_doctrine_grammar.validates_public_doctrine_grammar_bundle
  • code_loci[]: src/microcosm_core/organs/executable_doctrine_grammar.py with the mechanism symbols named above.

Validation Result record Path

./repo-pytest tests/test_executable_doctrine_grammar.py -q --basetemp=/tmp/microcosm_executable_doctrine_grammar_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope boundary

This module documents a public grammar fixture plus exact source body imports. It does not claim source doctrine completeness, public launch-scope decision, hosted-public posture, public sharing, recipient work, external model access, private-data equivalence, or whole-system correctness.

Scope limit

This paper module can claim an executable-doctrine grammar fixture with a generated diagram view and an Atlas card. It can explain the public grammar specimen, exact source body imports, and metadata-only result record boundary.

It cannot claim source doctrine completeness, public launch-scope decision, hosted-public posture, publishing-scope decision, recipient execution, external model access, private-data equivalence, source-file changes, launch-scope decision, or whole-system correctness. Higher claims must land in the JSON bundle and generated projection before Markdown can narrate them.

Source and projection details
Source-Open Body Floor

examples/executable_doctrine_grammar/exported_executable_grammar_metabolism_bundle/source_module_manifest.json declares 12 copied source bodies. Result records may report refs, hashes, counts, classes, and verdicts, but body_in_receipt=false remains required.

The body-material classes are public_macro_receipt_body, public_macro_standard_body, and public_macro_tool_body. The body set covers the executable-grammar specimen README, board, and result record; standards registry and group-index standards; standard type-plane and core authority index; lattice registry and standard; and the kind-atlas / standards option-surface runtime tools.

Navigation Hologram Route PlaneAudits a folder's navigation so browse rows never pose as the source of truth.5/5

Does This checks that a folder's local "how to get around" surface behaves: the path starts from one control entry, then drills into browsable lists of routes and cards, and those browse rows are never allowed to pose as the source of truth. It makes visible, in plain result record files, that stale or mislabeled navigation is caught, that compressed cards keep a note of what they left out, and that nothing private or secret leaks into the navigation material while moving around. It only inspects toy fixture files and exact copied-but-navigation source modules; it does not touch any live system.

Scope limit It validates only the declared public toy route-plane contract and its regression fixtures (plus exact copied navigation source modules in the bundle path); it does not establish live route freshness, grant source authority, authorize any later component, run any provider/live-kernel call, or certify the whole wave.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.navigation_hologram_route_plane run --input fixtures/first_wave/navigation_hologram_route_plane/input --out receipts/first_wave/navigation_hologram_route_plane

Paper module Navigation Hologram Route Plane

Purpose

A large codebase has a recurring failure: the agent or reader that lands in it starts from whatever browse surface is nearest to hand, treats that surface as the authority, and acts on a stale or partial view. The route plane exists to make the first move legible and to stop a browse row from being mistaken for the thing it describes. It answers one question: given a control entry, what is the safe ordered path into the browsable route projections, and what proof says that path is wired rather than asserted?

The unusual part is that the component never asserts a route is correct from prose. It treats every browse row as a projection and demands a coupling result record before that projection is allowed any authority. Source coupling is a plain SHA-256 over the route rows: the manifest carries an expected fingerprint and an expected row count, and if either disagrees with the rows on disk the projection is denied current authority. A route summary that claims to be current while its coupling is stale is rejected outright.

The other half of the design is what it refuses to do. First contact must begin at the control entry, not at a drilldown projection, so a request that tries to start from a browse row is replaced with the entry route. Compaction of the entry packet may not drop a required control field. An affordance row whose passport carries an anti-trigger is demoted before similarity search can ever select it. None of these are stylistic preferences; each is a named negative case the fixture must keep catching, so the route plane is defined as much by the eight things it blocks as by the path it permits.

Teleology

The navigation route plane gives a public clone a typed way to move from a control entry to browseable route projections without treating browse rows as authority.

Public Contract

The component runs in two modes against the same checks. The fixture mode loads a set of synthetic inputs, builds a toy option-surface from the rows (a cluster-flag summary plus one selected card), and then runs the negative-case validators that prove each guard still fires. The exported-bundle mode runs the same kind of checks against a real copied bundle: it validates the route rows, the source-coupling fingerprint, the source-module manifest, the route-lease policy, the entry-packet floor, the affordance passports, and the code-architecture projection packet, and only reports a pass when the secret scan is clean, a card row is selected, and every component validator passes.

The source-coupling gate is the spine. It hashes the route rows with SHA-256 and compares that against the fingerprint and row count declared in the manifest; a mismatch denies the projection any current authority, and a summary that claims current authority while coupling is stale is recorded as an overclaim. The source-module manifest names five exact copies of source route and control bodies. Each is checked by digest and by required navigation anchors, and each must declare that its body is copied but never written into the result record, so the evidence is reproducible without exposing the source text.

Shape

yesnoControl entry firstbrowse row that claims firstcontactis replaced with the entryrouteControl entry first browse row that claims first contact is replaced with the entry routeSource couplingSHA-256 over route rows vsmanifestfingerprint + row countSource coupling SHA-256 over route rows vs manifest fingerprint + row countRoute rowssurface role, actionablecommand,no source-authority claim,omission result record whenrequiredRoute rows surface role, actionable command, no source-authority claim, omission result record when requiredSource-module manifest5 copied source bodiesdigest + required anchors,body never in result recordSource-module manifest 5 copied source bodies digest + required anchors, body never in result recordRoute-lease policyselected lane, permittedactions,source authority rejectedRoute-lease policy selected lane, permitted actions, source authority rejectedEntry-packet floorrequired control fieldssurvivecompactionEntry-packet floor required control fields survive compactionAffordance passportsanti-trigger rows demotedbeforesimilarity can select themAffordance passports anti-trigger rows demoted before similarity can select themCoupling current,all gates pass,card row selected?Coupling current, all gates pass, card row selected?metadata-only result recordscluster flag, card, coupling,route lease, entry admission,affordance, code-architecturepacketmetadata-only result records cluster flag, card, coupling, route lease, entry admission, affordance, code-architecture packetBlockedstable error codes,findings, bodies redactedBlocked stable error codes, findings, bodies redactedNegative-case floorBANNED_FIRST_CONTACT_ROUTE,SOURCE_COUPLING_STALE,and 7 moreNegative-case floor BANNED_FIRST_CONTACT_ROUTE, SOURCE_COUPLING_STALE, and 7 moreGatesGatesScope limitprojection evidence only;no live route freshness,source authority, or launchScope limit projection evidence only; no live route freshness, source authority, or launch
Diagram source
flowchart TD Entry["Control entry first browse row that claims first contact is replaced with the entry route"] subgraph Gates["Route-plane gates"] Couple["Source coupling SHA-256 over route rows vs manifest fingerprint + row count"] Rows["Route rows surface role, actionable command, no source-authority claim, omission result record when required"] Modules["Source-module manifest 5 copied source bodies digest + required anchors, body never in result record"] Lease["Route-lease policy selected lane, permitted actions, source authority rejected"] Floor["Entry-packet floor required control fields survive compaction"] Pass["Affordance passports anti-trigger rows demoted before similarity can select them"] end Verdict{"Coupling current, all gates pass, card row selected?"} Entry --> Couple Couple --> Rows --> Modules --> Lease --> Floor --> Pass --> Verdict Verdict -->|yes| Result records["metadata-only result records cluster flag, card, coupling, route lease, entry admission, affordance, code-architecture packet"] Verdict -->|no| Blocked["Blocked stable error codes, findings, bodies redacted"] Negative["Negative-case floor BANNED_FIRST_CONTACT_ROUTE, SOURCE_COUPLING_STALE, and 7 more"] -.-> Gates Result records --> Ceiling["Scope limit projection evidence only; no live route freshness, source authority, or launch"] Blocked --> Ceiling

Source-Backed Doctrine Packet

  • core/organ_registry.json::implemented_organs[navigation_hologram_route_plane] is the accepted component authority. It records status accepted_current_authority, evidence class semantic_validator, evidence strength rank 5, scope limit validates declared public contract only, and validator command python -m microcosm_core.organs.navigation_hologram_route_plane run --input fixtures/first_wave/navigation_hologram_route_plane/input --out receipts/first_wave/navigation_hologram_route_plane.
  • core/organ_atlas.json::organs[navigation_hologram_route_plane] gives the cold-reader gloss: control entry comes first, browse rows stay projections, eight route-plane negative cases are detected, exact copied navigation source modules validate, and result records omit body text.
  • standards/std_microcosm_navigation_hologram_route_plane.json governs the standard authority boundary public_navigation_route_plane_runtime_and_copied_source_body_validator_not_live_source_authority. It requires route rows, option-surface contracts, source coupling, source-module manifests, route leases, entry-packet floors, affordance passports, code-architecture packets, body-import verification, scope limit, and scope boundary.
  • src/microcosm_core/organs/navigation_hologram_route_plane.py is the runtime source for fixture validation, route-plane bundle validation, secret-exclusion scan, route-lease checks, entry-admission floor checks, affordance-passport demotion, code-architecture packet result records, and source-module digest/anchor validation.
  • core/fixture_manifests/navigation_hologram_route_plane.fixture_manifest.json binds fixture expectations: body_copied_material_count=5, body_material_status=copied_non_secret_macro_route_substrate_with_provenance, body_in_receipt=false, and negative cases tied to stable error codes.
  • examples/navigation_hologram_route_plane/exported_route_plane_bundle/source_module_manifest.json names five exact copied source route-control bodies: navigation_route_plane_intervention_source_body_import, navigation_route_plane_context_pack_source_body_import, navigation_route_plane_entry_packet_source_body_import, navigation_route_plane_option_surface_source_body_import, and navigation_route_plane_navigation_contract_source_body_import.
  • tests/test_navigation_hologram_route_plane.py is the regression floor for fixture result records, exact source-source digest matches, source-module anchors, result record redaction, exported bundle validation, digest-mismatch rejection, and this source-backed paper-module packet.
  • receipts/first_wave/navigation_hologram_route_plane/*.json carries public result records for cluster/card output, source coupling, route lease, entry-payload admission, affordance-passport selection, code-architecture packet, and exported bundle validation.

Source-module body floor:

Module idSource sourcePublic copied target
navigation_route_plane_intervention_source_body_importsystem/lib/navigation_route_intervention.pyexamples/navigation_hologram_route_plane/exported_route_plane_bundle/source_modules/system/lib/navigation_route_intervention.py
navigation_route_plane_context_pack_source_body_importsystem/lib/navigation_context_pack.pyexamples/navigation_hologram_route_plane/exported_route_plane_bundle/source_modules/system/lib/navigation_context_pack.py
navigation_route_plane_entry_packet_source_body_importsystem/lib/kernel/commands/comprehension_snapshot.pyexamples/navigation_hologram_route_plane/exported_route_plane_bundle/source_modules/system/lib/kernel/commands/comprehension_snapshot.py
navigation_route_plane_option_surface_source_body_importsystem/lib/standard_option_surface.pyexamples/navigation_hologram_route_plane/exported_route_plane_bundle/source_modules/system/lib/standard_option_surface.py
navigation_route_plane_navigation_contract_source_body_importcodex/standards/std_navigation_contract.jsonexamples/navigation_hologram_route_plane/exported_route_plane_bundle/source_modules/codex/standards/std_navigation_contract.json

Registry result record refs:

  • receipts/first_wave/navigation_hologram_route_plane/affordance_passport_selection_receipt.json
  • receipts/first_wave/navigation_hologram_route_plane/code_architecture_projection_packet_receipt.json
  • receipts/first_wave/navigation_hologram_route_plane/entry_payload_admission_receipt.json
  • receipts/first_wave/navigation_hologram_route_plane/route_lease.json
  • receipts/first_wave/navigation_hologram_route_plane/source_coupling_result.json
  • receipts/first_wave/navigation_hologram_route_plane/toy_kind_card.json
  • receipts/first_wave/navigation_hologram_route_plane/toy_kind_cluster_flag.json

First command from microcosm-substrate/:

PYTHONPATH=src python3 -m microcosm_core.organs.navigation_hologram_route_plane run --input fixtures/first_wave/navigation_hologram_route_plane/input --out receipts/first_wave/navigation_hologram_route_plane

Runtime bundle command from microcosm-substrate/:

PYTHONPATH=src python3 -m microcosm_core.organs.navigation_hologram_route_plane validate-route-plane-bundle --input examples/navigation_hologram_route_plane/exported_route_plane_bundle --out receipts/runtime_shell/demo_project/organs/navigation_hologram_route_plane

Standard-declared runtime bundle validator: python -m microcosm_core.organs.navigation_hologram_route_plane validate-route-plane-bundle --input examples/navigation_hologram_route_plane/exported_route_plane_bundle --out receipts/runtime_shell/demo_project/organs/navigation_hologram_route_plane.

Atlas scope limit restated: It validates only the declared public toy route-plane contract and its regression fixtures (plus exact copied navigation source modules in the bundle path); it does not establish live route freshness, grant source authority, authorize any later component, run any provider/live-kernel call, or certify the whole wave.

The negative-case floor is part of the doctrine, not incidental test trivia. Across the eight negative cases, the fixture must keep detecting these stable error codes (one case carries two codes, so the list runs to nine):

  • BANNED_FIRST_CONTACT_ROUTE
  • SOURCE_COUPLING_STALE
  • MISSING_OMISSION_RECEIPT
  • ATLAS_PROJECTION_NOT_CONTROL_ENTRY
  • ROUTE_CARD_PRIVATE_BODY_LEAK
  • ROUTE_SUMMARY_OVERCLAIMS_FRESHNESS
  • DUPLICATE_ROUTE_ID_CONFLICT
  • ENTRY_ADMISSION_CONTROL_FLOOR_DROPPED
  • AFFORDANCE_PASSPORT_ANTITRIGGER_IGNORED

Reader Evidence Routing

Reader evidence starts at the generated JSON instance, then routes through the route-plane runtime, fixture manifest, source-module manifest, public result records, and focused regression. The browse rows, Mermaid diagram, and Atlas card are derived projections; they are not control-entry or source authority.

Prior Art Grounding

The route plane is grounded in information-architecture and graph-navigation patterns. The first-contact rule follows the same usability pressure as progressive disclosure: show the control entry and immediate affordances before deeper browse rows. The CLI-facing surface is also informed by the Command Line Interface Guidelines, especially the emphasis on discoverable commands, examples, and clear next actions.

The graph side maps to established directed-graph tooling. NetworkX documents topological sorting as an ordering over dependency edges, and graph-ranking algorithms such as PageRank show the older pattern of computing route salience from graph structure. Microcosm keeps those ideas below authority: route cards, leases, and browse rows are projections unless source-coupling and entry-admission result records agree.

Validation Result record Path

From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:

PYTHONPATH=src ../repo-python -m microcosm_core.organs.navigation_hologram_route_plane run --input fixtures/first_wave/navigation_hologram_route_plane/input --out /tmp/microcosm-navigation-hologram-route-plane
PYTHONPATH=src ../repo-python -m microcosm_core.organs.navigation_hologram_route_plane validate-route-plane-bundle --input examples/navigation_hologram_route_plane/exported_route_plane_bundle --out /tmp/microcosm-navigation-hologram-route-plane-bundle
../repo-pytest tests/test_navigation_hologram_route_plane.py
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

These checks validate the public fixture and exported route-plane bundle only; they do not grant live route freshness, source authority, provider/live-kernel execution, later-component authorization, launch-scope decision, or whole-wave certification.

Scope boundary

Scope limit

This module can be cited as evidence that the public fixture and exported route-plane bundle validate their declared contract. It does not establish live route freshness, grant live source-kernel authority, authorize source-file changes, authorize external model access, export account or browser state, expose browser UI live access, authorize recipient work, authorize public sharing or launch, prove whole-system correctness, or certify private-system equivalence.

Scope limit

This module may claim public fixture evidence that the route-plane rows, exported bundle, copied navigation source modules, source manifests, negative cases, and validation result records agree on the declared public route-plane contract. It may also claim that the generated JSON row resolves the accepted component subject, resolved mechanism subject, runtime source locus, governed concept, and the full set of declared principles, axioms, dependency modules, and relationship bindings.

This module may not claim live route freshness, live source-kernel authority, provider or browser UI access, source-file changes, recipient work authorization, hosted-public posture, launch-scope decision, publishing-scope decision, private-system equivalence, implementation correctness beyond the listed witnesses, or whole-system correctness.

Scope boundary

This module documents a public route-plane fixture and exported source-body bundle. It does not certify live corpus freshness, later public components, launch operations, provider/account or browser access, private root equivalence, whole-system correctness, or secret export.

Standards Meta DiagnosticsConfirms every accepted part still ties to a written rule, a run command, and a saved proof.5/5

Does This is a coverage checker that reads the project's public catalogs and confirms every accepted part is still tied to a written standard, a documented way to run it, and a saved proof-of-run, while flagging any claim that overreaches (like "ready to launch") or any leaked private text. It makes the system's own bookkeeping inspectable: whether the pieces are accounted for is visible rather than taken on trust.

Scope limit It validates only the declared public coverage contract and never becomes source authority for the registries, mutates source, exposes private material, or authorizes launch, external model access, or any whole-system-correctness claim.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.standards_meta_diagnostics run --input fixtures/first_wave/standards_meta_diagnostics/input --out /tmp/standards_meta_diagnostics_out

EvidenceContract validatorevidence 5/5Import validation

agent-entryarchitecturenavigation

Source Design note · Source atlas

Paper module Standards Meta Diagnostics

standards_meta_diagnostics is the terminal public coverage diagnostic for the Microcosm runtime spine. It checks that accepted adapter-backed components remain mapped to standards, runtime contracts, result records, and explicit scope limits before a cold reader trusts the spine as coherent.

It consumes public standards_inventory.json, organ_runtime_contracts.json, and diagnostic_policy.json inputs backed by registry refs, runtime commands, sign-off result records, and the exported diagnostics bundle. Its result record contract is source-open by default: secret_exclusion_scan proves that secrets, account or browser material, model-output data bodies, raw operator bodies, and account secret-equivalent live-access material are excluded, while public_runtime_refs point at the real standards, component, sign-off, fixture, bundle, and paper-module system. Bodies are not inlined into JSON result records, so the positive evidence uses body_in_receipt: false, real_runtime_receipt: true, and synthetic_receipt_standin_allowed: false.

The component rejects five boundary failures:

  • accepted component rows without standard_id or standard_ref
  • accepted components missing from the standards inventory
  • accepted component rows without result record refs
  • launch, provider, public sharing, secret export, trading/advice, or whole-system correctness overclaims
  • private source bodies or model-output data bodies in public diagnostics

Purpose

A spine of accepted components is only coherent if each component is still attached to the things that make it accountable: a standard that describes it, a runtime contract that runs it, a result record that records its last verdict, and an explicit statement of what it is not allowed to claim. As the spine grows, those four attachments drift out of step one component at a time, and the drift is silent. A new component can be accepted into the runtime while its standard file, registry row, or result record ref is never added. Nothing breaks; the gap just sits there until a reader trusts the spine and finds a hole.

This component answers a single question: does every accepted component still resolve to a standard, a runtime contract, a result record, and an scope limit, with no extra and no missing entries? It treats the answer as a graph-closure check rather than a written audit. The accepted-component list, the standard rows, the runtime-contract rows, and the result record refs must agree on exactly the same set of components. Any component that appears in one surface but not another becomes a structured finding with a named error code, not a paragraph of prose.

The unusual choice is that the diagnostic refuses to grow its own authority. It projects its positive coverage from the live registry rather than a checked-in list, so a stale example cannot quietly become the thing the spine is measured against. It carries five negative fixtures that must each surface their expected failure, so the checker is itself falsifiable. And its result records deliberately hold refs, counts, hashes, and verdicts rather than the bodies they describe, so a coverage report can be read in the open without exporting private source.

Technical Mechanism

standards_meta_diagnostics is a public consistency validator over three finite surfaces: a standards inventory, component runtime contracts, and diagnostic policy. The positive path either reads those exported JSON inputs or projects them from the live public registry, then requires the accepted-component list, the standard rows, the runtime-contract rows, and the result record refs to agree on the same component set. This is a graph-closure check, not a narrative audit: an accepted component without a standard ref, registry-backed standard row, runtime step, validator command, or result record ref becomes a structured finding.

The mechanism has four guarded stages:

  1. run loads standards_inventory.json, organ_runtime_contracts.json, and diagnostic_policy.json, or projects the positive rows from live public registry state when the caller asks for live positives.
  2. The validator checks every accepted component row against a resolving std_microcosm_<organ_id> standard, the standards registry entry, the runtime shell step, a non-empty validator command, and non-empty result record refs with body_in_receipt: false.
  3. Five negative fixtures exercise the expected boundary failures: missing_standard_ref, unmapped_accepted_organ, missing_receipt_ref, release_overclaim, and private_source_leakage.
  4. The exported-bundle path revalidates the same shape through source_module_manifest.json, exact source-module digest checks, source-open body-import accounting, secret_exclusion_scan, and the projection-only AUTHORITY_CEILING.

The output card deliberately omits the covered-component list, findings, secret-exclusion detail, source refs, public runtime refs, scope boundary, scope limit, and source-module summary from the compact payload. Those keys remain in the full result record, which keeps the reader-facing card inspectable without turning it into a private-body export.

Shape

paper_module bundle: subjects+ code_loci + scope limitpaper_module bundle: subjects + code_loci + scope limitInventoryInventoryContractsContractsPolicyPolicynegative fixtures: missingstandard, unmapped component,missing result record,overclaim, private sourcenegative fixtures: missing standard, unmapped component, missing result record, overclaim, private sourceRuntimeRuntimesecret_exclusion_scan +public_runtime_refssecret_exclusion_scan + public_runtime_refssign-off result record:counts, error codes, scopeboundarysign-off result record: counts, error codes, scope boundarygenerated navigationprojections: mermaid + atlascardgenerated navigation projections: mermaid + atlas card

Source refs

Inventory
standards_inventory.json
Contracts
organ_runtime_contracts.json
Policy
diagnostic_policy.json
Runtime
standards_meta_diagnostics.run / run_diagnostics_bundle
Diagram source
flowchart TD bundle["paper_module bundle: subjects + code_loci + scope limit"] inventory["standards_inventory.json"] contracts["organ_runtime_contracts.json"] policy["diagnostic_policy.json"] negatives["negative fixtures: missing standard, unmapped component, missing result record, overclaim, private source"] runtime["standards_meta_diagnostics.run / run_diagnostics_bundle"] scan["secret_exclusion_scan + public_runtime_refs"] result record["sign-off result record: counts, error codes, scope boundary"] projections["generated navigation projections: mermaid + atlas card"] bundle --> projections inventory --> runtime contracts --> runtime policy --> runtime negatives --> runtime runtime --> scan scan --> result record runtime --> result record

Evidence/accounting:

  • core/paper_module_capsules.json::paper_modules[29:paper_module.standards_meta_diagnostics] is the JSON authority row. It names the component and mechanism subjects, the resolved code locus src/microcosm_core/organs/standards_meta_diagnostics.py, and the projection-only scope limit.
  • paper_modules/standards_meta_diagnostics.json::paper_module_payload.source_authority is json_capsule; generated_projections.mermaid.status is available_from_capsule_edges; generated_projections.atlas_card.status is linked_from_capsule_edges; relationships.edges currently has 11 edges.
  • organs/standards_meta_diagnostics.json::organ_payload.source_registry_row records status: accepted_current_authority, the validator command, and the generated result record refs; its claim_ceiling keeps the diagnostic scoped to the declared public contract.
  • src/microcosm_core/organs/standards_meta_diagnostics.py names INPUT_NAMES, NEGATIVE_INPUT_NAMES, EXPECTED_NEGATIVE_CASES, PUBLIC_RUNTIME_REFS, and AUTHORITY_CEILING, which are the runtime contract this reader section summarizes.
  • tests/test_standards_meta_diagnostics.py asserts the fixture and exported bundle paths, the five expected negative cases, source-module digest checks, body_in_receipt: false, real_runtime_receipt: true, and synthetic_receipt_standin_allowed: false.
  • result records/sign-off/first_wave/standards_meta_diagnostics_fixture_acceptance.json records status: pass, accepted_organ_count: 77, standard_mapping_count: 77, runtime_contract_count: 77, five expected error codes, secret_exclusion_scan.blocking_hit_count: 0, and the scope boundary that the diagnostic excludes launch, providers, registry mutation, formal-result correctness, or whole-system correctness.

Reader Evidence Routing

  • Start with the JSON Bundle Binding to identify the source record and the projection-only scope limit before treating the diagnostic as evidence.
  • Use Structured Lattice Bindings to understand which wiring is resolved and which dependencies remain pending. Pending dependencies are honest residuals, not hidden failures.
  • Use Validation Result record Path for reproducibility: focused pytest exercises the diagnostic policy and negative cases; the corpus check verifies paper-module parity.
  • Treat secret-exclusion and public-runtime refs as result record evidence about public projection consistency. They do not mutate standards, include launch operations, expose private source material, or prove whole-system correctness.

Named Proof Consumers

  • tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_observes_negative_cases is the fixture consumer. It proves that the positive public inputs cover the accepted component set and that the five expected negative cases surface their named error codes.
  • tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_bundle_validates_runtime_shape is the exported-bundle consumer. It checks the bundle id, covered component set, source-module manifest status, source-open body-import counts, body_in_receipt: false, and the false scope limit flags.
  • tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_rejects_source_module_digest_mismatch, ::test_standards_meta_diagnostics_rejects_partial_source_module_digest_mismatch, and ::test_standards_meta_diagnostics_rejects_partial_target_module_digest_mismatch are the digest-drift consumers. They make copied source-module bodies falsifiable instead of relying on manifest prose.
  • tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_source_modules_are_exact_macro_body_imports is the exact-copy consumer for the three public source-body imports named in the exported bundle.
  • tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_receipts_use_secret_exclusion is the public/private boundary consumer. It checks that result record evidence uses the secret-exclusion lane and keeps private bodies out of public diagnostics.
  • tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_input_builder_tracks_live_registry and the live-positive projection tests are the registry-freshness consumers. They keep fixture inputs tied to public registry state instead of allowing a stale checked-in example to become silent authority.

Prior Art Grounding

This component is grounded in schema- and contract-validation practice rather than in a claim that diagnostics create authority. JSON Schema treats a schema as a machine-readable vocabulary for validating structured JSON data, and OpenAPI uses interface descriptions so consumers can understand an API without reading source code or observing traffic. The component imports that pattern into Microcosm's launch boundary: standards, adapter contracts, result records, and scope limits are checked as public projections, while the diagnostic remains bounded evidence about consistency rather than a new source of truth.

Prior-art anchors:

  • JSON Schema validation and structured-data constraints: https://json-schema.org/
  • OpenAPI interface descriptions and conformance expectations: https://spec.openapis.org/oas/latest.html

Validation Result record Path

./repo-pytest tests/test_standards_meta_diagnostics.py -q \
  --basetemp=/tmp/microcosm_standards_meta_diagnostics_pytest
./repo-python scripts/build_doctrine_projection.py \
  --check-paper-module-corpus

Scope boundary

Scope limit

This module can claim that public standards inventory, runtime contracts, accepted-component refs, result record refs, diagnostic policy, and secret-exclusion checks are consistently projected into a reader-facing diagnostics result record. It cannot claim standards-registry mutation authority, provider authority, launch-scope decision, publishing-scope decision, private source export, or whole-system correctness.

Scope limit

This is a projection-only diagnostic. It does not become source authority for core/standards_registry.json, change source files surfaces, expose private source material, authorize providers, include launch operations, or prove whole-system correctness.

Source and projection details
Source-Open Body Floor

The public diagnostics bundle is source-open as evidence about refs, policies, runtime contracts, and result records. It may expose standards inventory rows, component runtime contract rows, diagnostic policy rows, sign-off result record refs, fixture refs, bundle refs, secret-exclusion scan verdicts, and public runtime refs.

It must not inline private source bodies, model-output data bodies, source notes, account or browser material, account secret-equivalent live-access material, launch-send state, or private source-root bodies. The positive result record evidence therefore stays at body_in_receipt: false, real_runtime_receipt: true, and synthetic_receipt_standin_allowed: false.

Voice To Doctrine Self Improvement LoopVerifies each lesson changed a named owner page with evidence before the loop closes.5/5

Does It makes the system's "learn from a lesson, then improve" cycle inspectable on a folder. Each local lesson is shown being assigned to a specific owner surface (a skill, a doctrine page, a standard, or a tracked to-do capture), then changed or captured there, then validated, then closed out with a concrete reason to revisit later. Using local files only, the result records confirm that improvements are tied to real owner-surface changes and evidence rather than just asserted.

Scope limit It validates only the declared contract of the loop on fixtures; it does not export source notes or private bodies, grant source/doctrine edits, global-promotion, live work log mutation, or publishing-scope decision, make external model access, prove correctness, or claim private-system equivalence.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.voice_to_doctrine_self_improvement_loop run --input fixtures/first_wave/voice_to_doctrine_self_improvement_loop/input --out receipts/first_wave/voice_to_doctrine_self_improvement_loop

EvidenceContract validatorevidence 5/5Import validation

agent-entryarchitecturenavigation

Source Design note · Source atlas

Paper module Voice to Doctrine Self-Improvement Loop

This module is the public Microcosm projection of the source system's recursive self-improvement metabolism. It is not a synthetic result record layer. It imports the real source shape from recursive_self_improvement_operating_loop, doctrine_population_loop, and local_to_general_propagation: local pressure is sensed, classified, assigned to an owner surface, mutated or captured there, validated, closed out, and given a concrete re-entry condition.

The exported bundle also carries exact copies of the source bodies that make this loop real: recursive self-improvement, doctrine population, local-to-general propagation, the plane-home decision table, work log metacontrol, work log skill doctrine, and the work log standard. Result records report only source refs, hashes, counts, and scan status; the body text lives under examples/.../source_modules/ai_workflow/.

Purpose

The component answers one question: did a declared lesson actually change a named owner surface and pass that surface's own validation, or did it only produce a result record that says so? "The system learned from its work" is an easy claim to assert and a hard one to back. Without a check, a log entry, a closed ticket, or a confident summary all read as progress. This validator refuses that shortcut.

Each lesson row must name the surface it changed (a skill, a paper module, a standard, or a captured Work item), the action it took there, and the validation and completion refs that show the change held. Every ref must resolve to a real file in the exported bundle, the copied source modules, or the public Microcosm tree. A lesson then lands in exactly one of four outcomes: refined_existing_surface (a surface changed and was validated), workitem_captured (deferred work, but only with a concrete re-entry condition), nothing_to_refine (a typed null result that still required stewardship and a next-best-lane check), or already_propagated_verified. Anything that does not fit one of these is a finding, not an outcome.

The unusual part is the defence against self-grading. A lesson row may carry an expected_label or expected_status field, but the validator ignores it and recomputes the verdict from the evidence. If the row is not genuinely backed, its own asserted label cannot rescue it, and the case is recorded as VOICE_DOCTRINE_BAKED_EXPECTED_LABEL_IGNORED. A fixture cannot pass by declaring its own success. The same instinct runs through the negative floor: source notes, private thread bodies, model-output data, direct edits to doctrine nodes, and global promotion without owner validation are each rejected, keeping "the system improves itself" separate from "this public artifact may rewrite doctrine or export private voice."

Shape

Local pressuremistake, route gap,validation finding, residualLocal pressure mistake, route gap, validation finding, residualClassifyowner surface + actionClassify owner surface + actionOwner surfaceskill, paper module,standard, Work itemOwner surface skill, paper module, standard, Work itemRefusedraw voice, private body,direct node edit, resultrecord-only,unvalidated promotionRefused raw voice, private body, direct node edit, result record-only, unvalidated promotionchanged ref + validationchanged ref + validationworkitem_capturedwith re-entry conditionworkitem_captured with re-entry conditionnothing_to_refinestewardship + next-lanecheckednothing_to_refine stewardship + next-lane checkedAlreadyAlreadyRecompute verdict fromevidenceexpected_label ignoredRecompute verdict from evidence expected_label ignoredValidationowner evidence + completionref;every ref must resolveValidation owner evidence + completion ref; every ref must resolveExact source bodies8 manifest rows: hashes,anchorsExact source bodies 8 manifest rows: hashes, anchorsmetadata-only result recordsresult, board, validation,fixture sign-offmetadata-only result records result, board, validation, fixture sign-offOutcomeOutcome

Source refs

changed ref + validation
refined_existing_surface
Already
already_propagated_verified
Diagram source
flowchart LR Signal["Local pressure mistake, route gap, validation finding, residual"] Classify["Classify owner surface + action"] Owner["Owner surface skill, paper module, standard, Work item"] Refused["Refused raw voice, private body, direct node edit, result record-only, unvalidated promotion"] subgraph Outcome["One of four typed outcomes"] Refined["refined_existing_surface changed ref + validation"] Captured["workitem_captured with re-entry condition"] Null["nothing_to_refine stewardship + next-lane checked"] Already["already_propagated_verified"] end Recompute["Recompute verdict from evidence expected_label ignored"] Validate["Validation owner evidence + completion ref; every ref must resolve"] Source["Exact source bodies 8 manifest rows: hashes, anchors"] Result records["metadata-only result records result, board, validation, fixture sign-off"] Signal --> Classify Classify --> Owner Owner --> Refused Owner --> Outcome Outcome --> Recompute Recompute --> Validate Source --> Result records Validate --> Result records Refused --> Result records

Public Mechanics

  • Local lessons carry source pattern refs, evidence refs, owner-surface ids, owner actions, validation refs, completion refs, and outcomes.
  • Owner surfaces are explicit: skills, paper modules, standards, and residual captures each retain their own mutation authority.
  • refined_existing_surface requires a changed owner surface and validation.
  • workitem_captured requires a concrete re-entry condition.
  • nothing_to_refine requires stewardship and next-best-lane checks.
  • source notes, private thread bodies, model-output data, live work log bodies, direct doctrine-node edits, and global promotion claims are rejected by negative cases.
  • Exported bundle validation requires source_module_manifest.json, verifies each copied body hash/line/byte/anchor contract, and scans copied bodies for forbidden public material.

Reader Evidence Routing

Read this module as a lesson-propagation validator, not as a general doctrine mutation license. The fixture proves that local pressure must choose a named owner surface, perform an owner-authorized action, carry validation and completion refs, and either refine an existing surface, capture a Work item with a re-entry condition, record a typed nothing_to_refine, or verify an already-propagated result.

Read source-open evidence through the exported bundle manifest. It carries eight copied source bodies: three paper modules, four skills or skill companions, and the work log standard. Each manifest row records byte and line counts, exact source and target hashes, required anchors, and body_in_receipt: false. The source bodies make the source loop inspectable, while result records remain refs, hashes, counts, scan status, and scope limits.

Read the negative floor as equally load-bearing. source notes bodies, private thread bodies, model-output data bodies, direct doctrine-node edits, result record-only progress, live work log mutation, and unvalidated global promotion are rejected. Those rejections keep "the system learns from work" separate from "this public artifact can mutate doctrine or export private state."

Prior Art Grounding

This component is grounded in after-action review, lessons-learned, and pattern language practices. NASA's Lessons Learned Information System is a public example of preserving operational lessons so future work can reuse them, while pattern-language practice gives a vocabulary for turning repeated local solutions into named, reusable forms. Microcosm adopts that direction without collapsing operator voice into doctrine: a local lesson only becomes durable when it has evidence, an owner surface, a validation result record, and a bounded re-entry path.

Prior-art anchors:

  • NASA Lessons Learned Information System: https://llis.nasa.gov/
  • Pattern language background: https://hillside.net/patterns/

Runtime

PYTHONPATH=src python -m microcosm_core.organs.voice_to_doctrine_self_improvement_loop run \
  --input fixtures/first_wave/voice_to_doctrine_self_improvement_loop/input \
  --out receipts/first_wave/voice_to_doctrine_self_improvement_loop

The exported bundle uses the same validator without negative-case inputs:

PYTHONPATH=src python -m microcosm_core.organs.voice_to_doctrine_self_improvement_loop run-bundle \
  --input examples/voice_to_doctrine_self_improvement_loop/exported_voice_to_doctrine_bundle \
  --out receipts/runtime_shell/demo_project/organs/voice_to_doctrine_self_improvement_loop

Validation Result record Path

Run from microcosm-substrate:

A green fixture or bundle result record proves only the public lesson-propagation boundary above; it does not grant source-file changes, live work log mutation, global doctrine-promotion, launch, or whole-system authority.

Scope boundary

Scope boundary

This module does not export source note, source notes, private thread bodies, model-output data, account or browser state, live work log rows, proof authority, source-file changes, publishing-scope decision, or private-system equivalence. It shows the public mechanics of system learning under owner-surface evidence gates.

Scope limit

This paper module can claim a lesson-propagation fixture. It can explain owner-surface checks, negative cases, copied source-module manifests, and metadata-only result records. A diagram view and atlas card are generated for this module.

It cannot claim source notes export, non-public body export, model-output data export, source-file changes, doctrine mutation authority, global-promotion authority, live work log mutation, publishing-scope decision, launch-scope decision, external model access, private-system equivalence, or whole-system correctness.

Cognitive Operator RegistryChecks the catalog of named thinking-moves so each is fully described and backed by evidence.5/5

Does This is a checker for the system's catalogue of named thinking-moves (operators like "reduce competing pressure to one bounded action" or "compile a handoff packet when validated work cannot be committed"). It confirms every operator in the public catalogue is fully described (how it is triggered, used, and checked) and that every operator marked active carries a real result record proving it once changed a live decision, while rejecting any operator that claims to speak with the owner's voice or overreaches into "ready to launch." It matters because it shows the system treats reasoning itself as inspectable typed system instead of ad-hoc prompt lore.

Scope limit It validates only the declared public registry contract and copied source bodies; it never becomes registry source authority, mutates operators, proves operator correctness, exposes source notes, or authorizes launch, external model access, or any whole-system-correctness claim.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.cognitive_operator_registry run --input fixtures/first_wave/cognitive_operator_registry/input --out /tmp/cognitive_operator_registry_out

EvidenceContract validatorevidence 5/5Import validation

architecturenavigationdoctrine

Source Design note · Source atlas

Paper module Cognitive Operator Registry

cognitive_operator_registry is the public contract diagnostic for the source system's typed cognitive-operator system. It checks that each public operator row carries the required operator-shape fields, that every active operator is backed by a dogfood result record proving it changed a live decision, and that the registry policy declares explicit scope limits before a cold reader trusts the operators as real reusable cognition rather than inspirational prose.

Purpose

A team that writes down its reusable thinking moves as a registry tends to accumulate entries faster than it can prove any of them help. The single question this component answers is: which of these listed operators has actually changed a live decision, and which is just a tidy description of one? An entry may only call itself active if it points to a dogfood result record, and that result record must carry cognition_delta_evidence recording a concrete decision that came out differently because the operator was applied.

The unusual part is that the check refuses to take a row at its word. Where a result record cites evidence surfaces, command paths, or task-ledger handles, the validator resolves each one against the public system (see _dogfood_receipt_ref_resolves and _record_dogfood_evidence_resolution_findings in the source). A row whose prose says it was dogfooded but whose evidence does not resolve is recorded as a failure, not a pass. A second check, the anti-sprawl case, flags two operators that share a slug or a near-identical claim unless an accretion decision was recorded, so the registry cannot quietly grow two near copies of the same idea.

The evidence contract is source-open by default. The validator emits refs, hashes, counts, and verdicts; secret_exclusion_scan proves that secrets, account or session material, model-output data bodies, source notes, and account secret-equivalent access material are excluded. Operator bodies are never inlined into the JSON result record, so the positive evidence carries body_in_receipt: false, real_runtime_receipt: true, and synthetic_receipt_standin_allowed: false.

Prior Art Grounding

This component borrows from cognitive work analysis, provenance, schema validation, and policy-gated registries. Useful anchors include:

  • Cognitive Work Analysis, summarized in this information-systems design overview, as prior art for analyzing cognitive work in complex sociotechnical systems.
  • W3C PROV, for connecting operator claims to activities, agents, and evidence used to evaluate trustworthiness.
  • JSON Schema, for the required-shape validation pattern behind public operator rows.
  • Open Policy Agent, as a precedent for policy evaluation that remains distinct from the registry data being evaluated.

Microcosm borrows the cognitive-work, provenance, shape-checking, and policy registry patterns, but keeps this component to a public contract diagnostic. It does not mutate operators, prove operator correctness, expose private operator bodies or source notes, authorize providers, or include launch operations.

It consumes public operator_registry.json, operator_standard.json, and dogfood_index.json inputs that project real source operator rows and dogfood result records. Its result record contract is source-open by default: secret_exclusion_scan proves that secrets, account or browser material, model-output data bodies, source notes, and account secret-equivalent live-access material are excluded, while public_runtime_refs point at the real standard, component, sign-off, fixture, bundle, and paper-module system. Bodies are not inlined into JSON result records, so the positive evidence uses body_in_receipt: false, real_runtime_receipt: true, and synthetic_receipt_standin_allowed: false.

The component rejects seven boundary failures:

  • operator rows missing required operator-shape fields
  • active operators with no backing dogfood result record
  • dogfood result records missing cognition_delta_evidence
  • near-duplicate operators (identical slug or near-identical claim) with no recorded accretion decision (the anti-sprawl governor case)
  • launch, provider, source-file changes, registry-mutation, or operator-correctness overclaims
  • operator rows that claim operator-voice or source note authority
  • private operator source bodies or model-output data bodies in public inputs

The exported bundle also imports three verbatim source bodies behind an import membrane: the cognitive-operator registry (codex/doctrine/cognitive_operators.json), the cognitive-operator standard (codex/standards/std_cognitive_operator.json), and the registry projection/validation tool (system/lib/cognitive_operator_registry.py). Each is copied byte-for-byte with a sha256 digest and required anchors; result records carry refs, hashes, counts, and verdicts only.

Shape

Public operator registryoperator ids, roles,runtime refsPublic operator registry operator ids, roles, runtime refsOperator standardrequired fields,scope limitOperator standard required fields, scope limitDogfood result recordscognition-delta evidenceDogfood result records cognition-delta evidenceValidatorValidatorCopied source bodiesregistry, standard,validator toolCopied source bodies registry, standard, validator toolNegative floormissing fields, no dogfood,sprawl, overclaim,private leakageNegative floor missing fields, no dogfood, sprawl, overclaim, private leakageResult recordsrefs, hashes, counts,verdicts; body text omittedResult records refs, hashes, counts, verdicts; body text omitted

Source refs

Validator
cognitive_operator_registry validator
Diagram source
flowchart LR Registry["Public operator registry operator ids, roles, runtime refs"] Standard["Operator standard required fields, scope limit"] Dogfood["Dogfood result records cognition-delta evidence"] Validator["cognitive_operator_registry validator"] Source["Copied source bodies registry, standard, validator tool"] Negative["Negative floor missing fields, no dogfood, sprawl, overclaim, private leakage"] Result record["Result records refs, hashes, counts, verdicts; body text omitted"] Registry --> Validator Standard --> Validator Dogfood --> Validator Source --> Validator Validator --> Negative Validator --> Result record

Reader Evidence Routing

Read this module as a public contract diagnostic, not as a glossary of operators or a live execution surface. This page explains the shape a reader should verify; the structured data lives in the JSON files below.

Start with paper_modules/cognitive_operator_registry.json for the full module record, then use standards/std_microcosm_cognitive_operator_registry.json to check required fields, forbidden authority, public/private boundary rules, and result record expectations. Open core/fixture_manifests/cognitive_operator_registry.fixture_manifest.json before inspecting fixtures or copied source modules, because the manifest names the source-open body floor and the body-omission contract.

Read dogfood result records as evidence that an active operator changed a live decision; do not read them as proof that the operator is generally correct. Read negative cases as part of the positive claim: missing roles, missing dogfood, missing cognition-delta evidence, duplicate/sprawl pressure, operator-voice claims, authority overclaims, and private-source leakage must remain rejected.

Technical Mechanism

The runtime mechanism lives in src/microcosm_core/organs/cognitive_operator_registry.py. run() loads the first-wave public fixture inputs: operator_registry.json, operator_standard.json, and dogfood_index.json. _positive_findings() checks that operator rows have required ids, slugs, roles, claims, runtime refs, evidence refs, and scope limits, then requires each active operator to resolve to a dogfood result record with cognition-delta evidence. The dogfood evidence resolver follows public fixture refs and copied bundle handles rather than accepting a row because its prose says it was dogfooded.

Negative pressure is source-declared in EXPECTED_NEGATIVE_CASES. _negative_findings() exercises missing required fields, active operators without dogfood result records, dogfood rows without cognition-delta evidence, operator sprawl without accretion decisions, operator-voice authority claims, provider/source/launch/correctness overclaims, and private source or model-output data leakage. A pass is therefore not only "the positive rows parsed"; it also means the expected refusal classes were observed and recorded.

run_registry_bundle() is the body-floor consumer. It executes the same registry contract against examples/cognitive_operator_registry/exported_cognitive_operator_registry_bundle and makes _source_module_manifest_result() mandatory. The manifest must prove exact copied source bodies for codex/doctrine/cognitive_operators.json, codex/standards/std_cognitive_operator.json, and system/lib/cognitive_operator_registry.py; _source_open_body_import_summary() then records body ids, classes, line counts, hashes, and body_in_receipt: false. AUTHORITY_CEILING keeps those result records below registry mutation, operator correctness, provider authority, source-file changes, launch, and whole-system correctness.

Named Proof Consumers

  • microcosm_core.organs.cognitive_operator_registry.run is the first-wave fixture consumer. It reads the public registry, standard, and dogfood index, writes the result, board, validation, and sign-off result records, and checks the expected negative floor.
  • microcosm_core.organs.cognitive_operator_registry.run_registry_bundle is the exported-bundle consumer. It proves the copied source registry, standard, and validator bodies through source-module manifest equality while keeping copied body text out of result records.
  • tests/test_cognitive_operator_registry.py::test_cognitive_operator_registry_observes_negative_cases is the public-contract regression. It asserts that all expected negative cases are observed and that all fixture operators have dogfood result records.
  • tests/test_cognitive_operator_registry.py::test_cognitive_operator_registry_bundle_validates_runtime_shape is the bundle-shape regression. It checks operator counts, source-module manifest status, body-material ids, and the metadata-only result record boundary.
  • tests/test_cognitive_operator_registry.py::test_cognitive_operator_registry_source_modules_are_exact_macro_body_imports is the exact-copy proof consumer. It byte-compares every manifest source ref with the copied target and verifies the recorded sha256 digests.

Validation Result record Path

Run the first-wave fixture into disposable result records from the Microcosm root:

Run the exported bundle through the same component:

cd microcosm-substrate
PYTHONPATH=src ../repo-python -m microcosm_core.organs.cognitive_operator_registry run-registry-bundle --input examples/cognitive_operator_registry/exported_cognitive_operator_registry_bundle --out /tmp/microcosm_cognitive_operator_registry_bundle
cd microcosm-substrate
../repo-pytest tests/test_cognitive_operator_registry.py -q
cd ..
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

The source atlas row carries the matching paper_module_ref, mechanism_refs, and code_loci entries.

Scope boundary

Scope limit

This paper module can claim a public cognitive-operator registry contract fixture with source-backed operator-shape checks, active-operator dogfood result record checks, cognition-delta evidence resolution, anti-sprawl accretion checks, expected negative cases, exact copied source body manifest equality, metadata-only result records, and a generated diagram view derived from the module's structured bindings.

It cannot become source authority for the cognitive-operator registry, mutate operators, prove operator correctness, expose private operator bodies or source notes, authorize providers, change source files, include launch operations or public sharing, or certify whole-system correctness.

If focused validation reports an exact-copy source-module body mismatch, route that repair through microcosm_exact_copy_refresh; do not treat this Markdown projection as source authority for copied source bodies.

Source and projection details
Governing Lattice Relation

That mechanism states the proof obligation in operational terms: operator rows must carry required shape fields, active operators must have dogfood result records, dogfood result records must include cognition-delta evidence, duplicate or near-duplicate operators must carry an accretion decision, and the exported bundle must prove copied registry, standard, and validator bodies by source module digest before any result record is trusted.

The generated JSON instance links this module to concept.architecture_and_navigation_route_contract_bundle, principles P-1, P-2, P-3, P-5, P-6, P-12, and P-15, and axioms AX-1, AX-4, AX-5, AX-7, AX-8, and AX-11. Those edges frame the module as an architecture-and-navigation contract validator. They do not make the Markdown or generated Atlas card source authority for operator definitions, live operator execution, or provider action.

Routing Anti Patterns RegistryIndexes the navigation mistakes agents repeat and guards the public list.5/5

Does This is a checker for the system's public list of navigation mistakes agents keep making, such as grepping before asking the kernel for a route or sending work to a bridge before scope is chosen. It confirms each anti-pattern row has a stable id and explanation, rejects duplicate or body-leaking rows, and proves the public bundle carries the actual source registry body copied byte-for-byte instead of a synthetic paraphrase.

Scope limit It validates only the declared public routing anti-pattern registry contract and copied source body; it never becomes route source authority, mutates routes, exposes private routing notes, calls providers, authorizes launch, or proves whole-system correctness.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.routing_anti_patterns_registry run --input fixtures/first_wave/routing_anti_patterns_registry/input --out /tmp/routing_anti_patterns_registry_out

EvidenceContract validatorevidence 5/5Import validation

architecturenavigationdoctrine

Source Design note · Source atlas

Paper module Routing Anti-Patterns Registry

routing_anti_patterns_registry is the public contract diagnostic for the source system's typed navigation failure rows. It validates the copied codex/doctrine/routing_anti_patterns.json registry as runnable Microcosm system: the input must declare kind: routing_anti_patterns, carry a positive version, and expose stable anti_patterns rows with unique ids and plain explanatory text.

The positive fixture imports the real source registry body. The exported bundle also carries a source module manifest and a byte-for-byte copy under source_modules/codex/doctrine/routing_anti_patterns.json, with sha256 hashes and anchors for kernel_before_grep, bridge_before_scope, and mode_in_chat_only. Result records carry refs, hashes, counts, and verdicts only; they do not inline the copied body.

The component rejects five boundary failures:

  • missing kind
  • duplicate anti-pattern ids
  • anti-pattern rows missing explanatory text
  • launch, provider, source-file changes, route-policy mutation, maturity, or whole-system-correctness overclaims
  • private routing bodies, source note bodies, model-output data bodies, or secret values in public inputs

Purpose

A navigation system can fail quietly. An agent reaches for grep when a kernel route would have narrowed the space first, or changes execution mode in chat without updating the disk contract, and nothing complains until the work is already off the rails. This component answers one question: does the public registry of known routing failures hold its declared shape, and does the copied source body that backs it stay byte-honest? It names recurring navigation mistakes as typed rows so they can be recognised, not rediscovered.

The registry is treated as a checked artifact, not as authority. A page describing routing failures is easy to read as a router or as policy. The component refuses both: a row may project a public anti-pattern, but it may not declare source_authority, route_authority, or any internal control role, and the validator rejects rows that try. So the document can describe how navigation goes wrong without itself becoming the thing that decides how navigation should go.

One design choice sits in how each row's route-repair state is decided. Rather than trust a label baked into the row, the checker derives the repair state from the row's own id and explanatory text: kernel_before_grep only earns kernel_first_navigation if its text actually mentions grep, kernel, and route. A row carrying a pre-written expected_route_repair_state is flagged, and baked_expected_labels_sufficient is fixed to false. The point is to stop a registry from grading itself by self-asserted labels, and to keep the meaning grounded in the text a reader can see.

Shape

This module is a projection over a bundle-backed public routing diagnostic, not route source authority. Cold readers should read it as a bounded chain: the JSON bundle and standard name the contract; the runtime component validates fixtures and an exported source bundle; result records preserve hashes, counts, verdicts, and negative cases; generated Mermaid and Atlas rows expose the bundle edges; the scope limit remains projection-only.

BundleBundleStandardStandardMarkdownrun / run-bundle / resultrecord writerrun / run-bundle / result record writerregistry + negative casesregistry + negative casessource_module_manifest +exact copied bodysource_module_manifest + exact copied bodyTestsTestsrefs, hashes, counts,verdictsrefs, hashes, counts, verdicts22 edges; Mermaid available;Atlas linked22 edges; Mermaid available; Atlas linkedScope limitno route authority, mutation,external model access,launch, or whole-system proofScope limit no route authority, mutation, external model access, launch, or whole-system proof

Source refs

Bundle
core/paper_module_capsules.jsonpaper_module.routing_anti_patterns_registry
Standard
standards/std_microcosm_routing_anti_patterns_registry.json
paper_modules/routing_anti_patterns_registry.md
run / run-bundle / result record writer
src/microcosm_core/organs/routing_anti_patterns_registry.py
registry + negative cases
fixtures/first_wave/routing_anti_patterns_registry/input
source_module_manifest + exact copied body
examples/routing_anti_patterns_registry/exported_routing_anti_patterns_bundle
Tests
tests/test_routing_anti_patterns_registry.py
refs, hashes, counts, verdicts
receipts/.../routing_anti_patterns_registry*.json
22 edges; Mermaid available; Atlas linked
paper_modules/routing_anti_patterns_registry.json
Diagram source
flowchart TD Bundle["core/paper_module_capsules.json paper_module.routing_anti_patterns_registry"] Standard["standards/std_microcosm_routing_anti_patterns_registry.json"] Markdown["paper_modules/routing_anti_patterns_registry.md reader projection; not route authority"] Runtime["src/microcosm_core/components/routing_anti_patterns_registry.py run / run-bundle / result record writer"] Fixture["fixtures/first_wave/routing_anti_patterns_registry/input registry + negative cases"] Bundle["examples/routing_anti_patterns_registry/exported_routing_anti_patterns_bundle source_module_manifest + exact copied body"] Tests["tests/test_routing_anti_patterns_registry.py"] Result records["result records/.../routing_anti_patterns_registry*.json refs, hashes, counts, verdicts"] structured source record["paper_modules/routing_anti_patterns_registry.json 22 edges; Mermaid available; Atlas linked"] Ceiling["Scope limit no route authority, mutation, external model access, launch, or whole-system proof"] Bundle --> Markdown Bundle --> structured source record Standard --> Runtime Fixture --> Runtime Bundle --> Runtime Runtime --> Tests Runtime --> Result records Tests --> Result records structured source record --> Ceiling Result records --> Ceiling Markdown --> Ceiling

Technical Mechanism

The component is a contract checker around a public routing-registry copy, not a router. run loads the first-wave fixture and asks _build_result to validate the positive routing_anti_patterns.json payload, all declared negative cases, the secret-exclusion scan, and the metadata-only result record bundle. The positive path requires kind: routing_anti_patterns, a positive integer version, stable anti-pattern ids, explanatory text, and the named source anchors kernel_before_grep, bridge_before_scope, and mode_in_chat_only.

The failure lattice is explicit. _payload_findings records typed evidence for missing kind, non-positive version, missing rows, missing ids, duplicate ids, missing text, forbidden authority-role masquerade, private-source fields, and overclaims about launch, external model access, source-file changes, route-policy mutation, maturity, readiness, or whole-system correctness. A pass is admitted only when every expected negative case appears with its expected error code and missing_negative_cases is empty. That makes the negative cases proof obligations rather than illustrative examples.

The exported-bundle path adds source-copy accountability. run-bundle calls run_routing_anti_patterns_bundle, which requires bundle_manifest.json, source_module_manifest.json, and the copied body under source_modules/codex/doctrine/routing_anti_patterns.json. The manifest checker streams sha256 over the copied target, verifies sha256, source_sha256, and target_sha256, checks required anchors, classifies the material as copied_non_secret_macro_body, and rejects any body-in-result record claim. The source body is available in the exported source-module tree; result records keep only refs, hashes, counts, verdicts, and omission fields.

The governing lattice is deliberately narrow. The bundle binds this mechanism to concept.architecture_and_navigation_route_contract_bundle, P-1, P-2, P-3, P-5, P-6, P-8, P-9, P-12, P-15, and AX-1, AX-4, AX-5, AX-7, AX-8, AX-11, but the checker consumes those refs as a scope limit: evidence must be replayable, typed, public-safe, and below projection authority. It also depends on navigation_hologram_route_plane, agent_route_observability_runtime, and cold_reader_route_map, so the registry can describe navigation failure shapes without becoming the internal control route source.

Reader Evidence Routing

Read this module through the following source-to-proof route:

  1. Start at the source record core/paper_module_capsules.json::paper_modules[58:paper_module.routing_anti_patterns_registry]. It is the source authority for source_authority: json_capsule, the component subject, mechanism subject, runtime source locus, concept, principles, axioms, dependency modules, and the projection statuses.
  2. Read the generated structured source record paper_modules/routing_anti_patterns_registry.json only as a projection from that source record.
  3. Follow the runtime proof path through src/microcosm_core/organs/routing_anti_patterns_registry.py, fixtures/first_wave/routing_anti_patterns_registry/input/, and examples/routing_anti_patterns_registry/exported_routing_anti_patterns_bundle/. Those surfaces carry the public registry fixture, negative cases, source_module_manifest.json, copied body target, required anchors, and digest checks.
  4. Confirm the public result record floor with the named fixture command, bundle command, focused regression, and corpus check below. Result records may carry ids, refs, hashes, counts, verdicts, and omission fields, but not private routing bodies or model-output data.
  5. Treat generated diagram, Atlas, search, object-map, and site cards as reachability projections from the same source row. They help a public reader find the module; they do not outrank the bundle, runtime, manifest, tests, or metadata-only result records.

Named Proof Consumers

  • First-wave fixture consumer: PYTHONPATH=src ../repo-python -m microcosm_core.components.routing_anti_patterns_registry run --input fixtures/first_wave/routing_anti_patterns_registry/input --out /tmp/microcosm-routing-anti-patterns-registry/fixture --sign-off-out /tmp/microcosm-routing-anti-patterns-registry/sign-off.json --card consumes the public registry fixture, six expected negative-case families, private-source rejection, secret-exclusion scan, metadata-only result record writer, and command-card omission boundary.
  • Exported-bundle consumer: PYTHONPATH=src ../repo-python -m microcosm_core.organs.routing_anti_patterns_registry run-bundle --input examples/routing_anti_patterns_registry/exported_routing_anti_patterns_bundle --out /tmp/microcosm-routing-anti-patterns-registry/bundle --card consumes the source-module manifest, exact copied source registry body, sha256 digest floor, required anchors, material class, and source-open summary while keeping body text out of result records.
  • Focused regression consumer: PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_routing_anti_patterns_registry.py -q pins negative-case coverage, source-authority masquerade rejection, digest mismatch blockers, exact copied-body imports, secret-exclusion result record policy, and fresh-card reuse behavior.
  • It is a read-only result record for this Markdown slice, not permission to hand-edit generated projections.

Prior Art Grounding

This registry follows the same family as pattern and anti-pattern catalogs: name recurring failure shapes so future operators can recognize and avoid them. The Hillside patterns library is the positive pattern-language ancestor, and the software anti-pattern literature supplies the inverse move: documenting repeated practices that look useful but produce bad outcomes.

The routing-specific presentation also borrows from CLI usability practice. The Command Line Interface Guidelines emphasize discoverability, clear errors, and suggested next actions; this component applies that pressure to navigation failures by requiring stable ids and explanatory text while keeping the registry projection below route-source authority.

Validation Result record Path

From microcosm-substrate, validate the public routing-registry diagnostic without writing tracked result records:

Passing validation proves the public anti-pattern registry fixture and copied-body digest floor only. It does not make this registry route source authority, and it excludes route-policy mutation, external model access, launch, or whole-system correctness.

Scope boundary

Scope limit

This module may claim public fixture evidence that anti-pattern row shape, stable anti-pattern ids, source-module digest checks, private-leak rejection, negative cases, and validation result records support the declared routing anti-pattern registry contract. It may also claim that the JSON row resolves the accepted component subject, mechanism subject, runtime source locus, governed concept, principles, axioms, and dependency modules.

This module may not claim route source authority, live route freshness, route-policy mutation, provider authorization, private routing-note disclosure, maturity proof, hosted-public posture, launch-scope decision, publishing-scope decision, implementation correctness beyond the listed witnesses, or whole-system correctness.

Scope limit

This is a projection-only diagnostic. It can explain public anti-pattern registry validation, copied-body digest checks, private-leak rejection, negative cases, and validation result records. It does not become route source authority, mutate routes, expose private routing notes, authorize providers, include launch operations, or prove whole-system correctness.

Doctrine Fact Claim AuditChecks that public fact rows state the right count and point at live, anchored code.5/5

Does Checks that public fact rows state the expected count, point at live copied code loci, preserve anchors, and only reference facts that exist in the fixture DAG.

Scope limit fact assertion, code-loci, and DAG fixture truth gate only; it is not a comprehension engine and does not establish a minimum read graph

Run
microcosm doctrine-fact-claim-audit run --input fixtures/first_wave/doctrine_fact_claim_audit/input --out receipts/first_wave/doctrine_fact_claim_audit

EvidenceContract validatorevidence 5/5Import validation

architecturenavigationdoctrine

Source Design note · Source atlas

Paper module Doctrine Fact Claim Audit

doctrine_fact_claim_audit is a Crown Jewel import component with real runnable system and a strict public scope limit. It consumes synthetic public fixtures, copied source source bodies, and source manifests that verify sha256 digests, line counts, required anchors, secret-exclusion status, and result record body omission.

What it proves: fact assertion, code-loci, DAG, and numeric claim binding fixture truth gate only.

Purpose

Documentation about a living system rots. A page states that there are forty-seven of something, or cites a function in a file, and both claims quietly go stale as the code moves underneath them. A reader cannot tell a current count from a number that was true once and never rechecked. This component exists to answer one question: which of a page's factual assertions can be re-derived from source right now, and which have become untracked drift?

The approach treats a documentation claim like a cached value that needs an invalidation strategy. A bare number is not enough; the claim is admissible only when it is bound to a fact assertion that records how to recompute or revalidate the value. The same pass resolves every cited code locus on disk and checks that the quoted anchor text is actually present, so a plausible-but-dead file reference becomes a typed finding rather than inert prose. The interesting move is that nothing here asks a model whether the prose reads as true. The component recomputes a bounded relation over public fixtures and reports only what that relation supports.

The second design choice worth naming is how the checks are proved. The negative floor is semantic, not label-trusting: the test harness overwrites the declared failure fixtures with bogus pass rows and confirms the evaluator still derives the expected stable error codes itself. That keeps the proof attached to the mechanism rather than to the fixture filenames. The honesty of the page rests on that: the component is a narrow claim-audit gate over copied public fixtures, not a comprehension engine, a minimum-read-graph proof, a source-file changes lane, or any launch-scope decision.

Prior Art Grounding

This component borrows from provenance modeling, structured fact-check metadata, schema validation, and supply-chain attestation. Useful anchors include:

  • W3C PROV, which models entities, activities, and agents so readers can assess the quality, reliability, and trustworthiness of derived information.
  • Schema.org ClaimReview, as a web metadata pattern for recording a reviewed claim and its fact-checking context.
  • JSON Schema, for declaring expected structure and rejecting malformed or incomplete claim records.
  • SLSA provenance, for the software-supply-chain pattern of tracing artifacts back to source and build metadata.

Microcosm borrows the provenance, claim-review, schema, and attestation shapes, but keeps this component to public fixture fact counts, code-loci existence, anchor presence, DAG references, and synthetic volatile numeric binding cases. It is not a comprehension engine, private-doctrine export, launch-scope decision, or a minimum-read-graph proof.

Technical Mechanism

The runtime mechanism is a public fixture evaluator in src/microcosm_core/organs/doctrine_fact_claim_audit.py. The component declares a CrownJewelSpec with four required inputs: fact_assertions.json, fact_dag.json, numeric_claims.json, and projection_protocol.json. The shared crown-jewel runner handles source-manifest validation, result record writing, negative-case execution, and scope limit attachment; this module supplies the domain evaluator and the semantic negative-case mutator.

evaluate first loads the fact assertion table and compares expected_fact_count to the number of fact rows. Each fact must carry at least one code locus. The evaluator resolves every relative code-locus path against the copied source-module bundle, then checks that the declared anchor text is present in the copied body. The DAG pass builds the set of audited fact ids and rejects any edge whose from or to endpoint is not in that set. These checks convert plausible documentation references into result record-backed paths, anchors, and graph edges.

Numeric claims are checked by importing the copied source_modules/system/lib/derived_fact_hologram.py body from the exported bundle and calling its find_unbound_numeric_claims function. For each row in numeric_claims.json, the evaluator synthesizes FactAssertion instances for the declared sections, records unbound numeric detections, and blocks a case when a non-detector row leaves current-state numeric prose without a matching fact assertion. Detector rows are positive evidence only because they must surface the expected section and number.

The negative floor is semantic rather than label-trusting. evaluate_negative_case mutates the positive fixture in memory for wrong_fact_count, missing_code_locus, dead_code_locus, dead_dag_ref, and unbound_numeric_claim, then reruns the same evaluator in a temporary input directory. The tests deliberately overwrite the declared negative-case files with bogus pass rows and confirm that the component still derives the expected stable error codes from the evaluator itself. That keeps the proof tied to the mechanism, not to fixture labels.

The source-open body floor is separate from the result record floor. The exported bundle manifest names two copied bodies, derived_fact_hologram.py and paper_modules.py, with digests and line counts. Runtime result records carry refs, counts, verdicts, scope boundaries, and body_in_receipt: false; they do not embed copied source bodies or private operator material.

How to run it:

Runtime bundle route:

PYTHONPATH=src python3 -m microcosm_core.organs.doctrine_fact_claim_audit run-doctrine-fact-bundle --input examples/doctrine_fact_claim_audit/exported_doctrine_fact_claim_audit_bundle --out receipts/runtime_shell/demo_project/organs/doctrine_fact_claim_audit

Shape

  • Subject: doctrine_fact_claim_audit, with mechanism mechanism.doctrine_fact_claim_audit.validates_public_doctrine_fact_claim_audit.
  • Runtime locus: src/microcosm_core/organs/doctrine_fact_claim_audit.py, especially run, run_doctrine_fact_bundle, evaluate, _evaluate_numeric_claims, _load_derived_fact_module, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
  • The fixture checks an expected fact count, resolves declared code-locus paths, verifies required source anchors, rejects dead DAG references, and requires volatile numeric claim cases to be bound to fact assertions.
  • The accepted positive result record reports three facts, three verified code loci, two DAG edges, two numeric claim cases, and one detected unbound numeric detector case, while preserving body_in_receipt: false.
  • The negative floor is stable: dead_code_locus, dead_dag_ref, missing_code_locus, unbound_numeric_claim, and wrong_fact_count.
  • The public standard is standards/std_microcosm_doctrine_fact_claim_audit.json; the fixture manifest is core/fixture_manifests/doctrine_fact_claim_audit.fixture_manifest.json.
mismatchmissing path or anchordead refunboundokokokokfacts + expected_fact_countfacts + expected_fact_countevaluateevaluateedgesedgescasescasessource module manifestcopied bodiessource module manifest copied bodiesdeclared fact count= table length?declared fact count = table length?each code locuspath on disk +anchor in body?each code locus path on disk + anchor in body?DAG endpointsare known fact ids?DAG endpoints are known fact ids?current-state numericsbound to a factassertion section?current-state numerics bound to a fact assertion section?typed blocking findingtyped blocking findingmetadata-only result recordbody_in_receipt: falsemetadata-only result record body_in_receipt: falseevaluate_negative_casemutate fixture, rerunevaluatorevaluate_negative_case mutate fixture, rerun evaluatorexpected stableerror codesexpected stable error codes

Source refs

facts + expected_fact_count
fact_assertions.json
edges
fact_dag.json
cases
numeric_claims.json
Diagram source
flowchart LR Facts["fact_assertions.json facts + expected_fact_count"] --> Eval["evaluate"] Dag["fact_dag.json edges"] --> Eval Numerics["numeric_claims.json cases"] --> Eval Manifest["source module manifest copied bodies"] --> Eval Eval --> Count{"declared fact count = table length?"} Eval --> Loci{"each code locus path on disk + anchor in body?"} Eval --> DagRef{"DAG endpoints are known fact ids?"} Eval --> Bound{"current-state numerics bound to a fact assertion section?"} Count -->|mismatch| Block["typed blocking finding"] Loci -->|missing path or anchor| Block DagRef -->|dead ref| Block Bound -->|unbound| Block Count -->|ok| Result record["metadata-only result record body_in_receipt: false"] Loci -->|ok| Result record DagRef -->|ok| Result record Bound -->|ok| Result record Neg["evaluate_negative_case mutate fixture, rerun evaluator"] --> Codes["expected stable error codes"]

Named Proof Consumers

  • Fixture CLI consumer: PYTHONPATH=src ../repo-python -m microcosm_core.components.doctrine_fact_claim_audit run --input fixtures/first_wave/doctrine_fact_claim_audit/input --out /tmp/microcosm-doctrine-fact-claim-audit/fixture --sign-off-out /tmp/microcosm-doctrine-fact-claim-audit/sign-off.json --card. Expected proof shape: status: pass, three fact rows, three verified code loci, two DAG edges, two numeric-claim cases, one detector case, zero blocking unbound numerics, five semantic negative cases, and body_in_receipt: false.
  • Exported bundle consumer: PYTHONPATH=src ../repo-python -m microcosm_core.organs.doctrine_fact_claim_audit run-doctrine-fact-bundle --input examples/doctrine_fact_claim_audit/exported_doctrine_fact_claim_audit_bundle --out /tmp/microcosm-doctrine-fact-claim-audit/bundle --card. Expected proof shape: the same evaluator runs through the exported bundle input mode, validates the source-module manifest, and writes metadata-only bundle result records.
  • Focused regression consumer: PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider --basetemp=/tmp/microcosm_doctrine_fact_claim_audit_pytest tests/test_doctrine_fact_claim_audit.py -q. Expected proof shape: the seven tests cover the positive fixture, dead code locus, missing code locus, dead DAG ref, unbound numeric claim, semantic negative-case derivation, and exported-bundle route.
  • Corpus parity consumer: PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus. Expected proof shape: the structured source record remains reproducible from the bundle and Markdown projection without hand-editing generated state.
  • structured source record readback consumer: jq '{source_authority:.paper_module_payload.source_authority, mermaid:.paper_module_payload.generated_projections.mermaid.status, atlas:.paper_module_payload.generated_projections.atlas_card.status, edge_count:(.relationships.edges|length), unresolved:(.relationships.unpopulated_selective_relations|length)}' paper_modules/doctrine_fact_claim_audit.json. Expected proof shape: json_capsule, available_from_capsule_edges, linked_from_capsule_edges, resolved bundle edges, and zero unpopulated selective relations.

Reader Evidence Routing

  • Start with paper_modules/doctrine_fact_claim_audit.json as the primary reference, then open this Markdown page as a reader guide to that record.
  • Open standards/std_microcosm_doctrine_fact_claim_audit.json for the standard, required witnesses, negative floor, denied authority, and result record contract.
  • Open core/fixture_manifests/doctrine_fact_claim_audit.fixture_manifest.json for fixture inputs, copied-body counts, durable result record refs, and source-open body omission rules.
  • Open examples/doctrine_fact_claim_audit/exported_doctrine_fact_claim_audit_bundle/source_module_manifest.json before inspecting copied source modules; result records carry refs and digests, not copied source body text.
  • Run the fixture or bundle route from the microcosm-substrate directory and inspect the written JSON files. The component CLI exposes --card, but it does not expose a --json stdout mode.
  • Use scripts/build_doctrine_projection.py --check-paper-module-corpus to verify this paper-module projection stays inside the shared corpus contract.

Claim-Rot Detection

This component treats documentation claims like cached values that need an invalidation strategy. The failure mode is not only a wrong number; it is a volatile number embedded in current-state prose with no attached route for re-deriving it.

The detector flags volatile numerics: a number near a countable noun inside a current-state section. Such a claim is admissible only when it is bound to a fact assertion that records how to recompute or revalidate the value. The same audit resolves every cited code locus on disk and checks that the quoted anchor is actually present, so stale file references and plausible-but-dead anchors are negative evidence rather than inert prose.

The public fixture does not claim natural-language comprehension. It proves the more useful contract: current-state numerics, fact assertions, DAG refs, code loci, and anchor text can be audited as result record-backed claims instead of untracked documentation drift.

Scope limit: Doctrine fact claim audit checks only public fixture fact counts, code-loci existence, anchor presence, DAG references, and synthetic volatile numeric claim binding cases. It is not a comprehension engine, does not establish a minimum read graph, does not export private doctrine, and excludes launch.

Validation Result record Path

From microcosm-substrate, validate with external result record outputs so the reader check does not churn tracked result records:

A diagram view is generated for this module, and an atlas card links to it. Passing result records validate fact-count, code-locus, DAG-ref, numeric-claim, digest, and negative-case boundaries only. If copied source bodies drift, refresh the exact copy bundle through the owning lane before treating bundle red as a reader-page defect.

Negative cases covered by the fixture manifest: dead_code_locus, dead_dag_ref, missing_code_locus, unbound_numeric_claim, wrong_fact_count.

Source provenance is anchored by examples/doctrine_fact_claim_audit/exported_doctrine_fact_claim_audit_bundle/source_module_manifest.json and result records carry refs, digests, counts, verdicts, and scope boundaries only.

Scope boundary

Scope limit

This module may claim public fixture evidence that doctrine fact assertions, code-locus refs, DAG refs, numeric claim bindings, copied source manifests, digest checks, anchor checks, secret-exclusion scans, metadata-only result records, and negative stale-claim cases are checked by the listed runtime witnesses.

This module may not claim doctrine comprehension, private doctrine export, minimum-read-graph proof, live launch-scope decision, hosted-public posture, source-file changes, candidate-axiom promotion, projection correctness beyond the listed witnesses, or whole-system correctness.

Source and projection details
Governing Lattice Relation

This module is the architecture-and-navigation contract specimen for turning current-state doctrine claims into auditable fact rows. The admitted mechanism, mechanism.doctrine_fact_claim_audit.validates_public_doctrine_fact_claim_audit, does not ask a model whether prose is true. It recomputes a bounded relation: declared fact count, code-locus anchors, route-DAG endpoints, volatile numeric claim bindings, source-module manifest anchors, and semantic negative cases must all agree with the copied public fixture basis before a result record can pass.

That relation is why the bundle binds the module to concept.architecture_and_navigation_route_contract_bundle. Architecture and navigation claims are only readable as doctrine when they can be traced through source rows, code loci, validator commands, and metadata-only result records. The bundle therefore treats the generated Mermaid and Atlas card as route projections of 15 resolved edges, not as independent proof that doctrine coverage is complete.

The principle edges are source-backed claim discipline, not decorative tags. P-1 is exercised when the evaluator recomputes fixture truth rather than echoing declared labels. P-2 is exercised by lowering the positive claim to the checker's strength: fact assertion, code-locus, DAG, numeric-claim, and manifest truth only. P-7 is exercised by recording known unknowns without claiming the unmapped doctrine space is exhausted. P-15 is exercised by keeping this Markdown, the structured source record, Mermaid, and Atlas below the bundle, source module, and validator result records.

The axiom bindings are likewise operational. AX-1 requires a derivation before the page repeats a fact count or source claim. AX-6 keeps the declared fixture domain open-world outside its explicit rows. AX-7 makes failed preconditions typed blocking findings instead of meaningless green output. AX-8 keeps public source refs, manifest digests, secret-exclusion status, and body_in_receipt: false attached as data moves from copied source bodies into result records and reader copy.

The proof consumer for this lattice relation is tests/test_doctrine_fact_claim_audit.py: its positive case, four direct mutation cases, semantic-negative-label override, and exported-bundle test prove that the mechanism is an executable claim-audit boundary. The fixture and bundle CLIs give the same boundary to a reader outside pytest; the corpus check proves only that the Markdown and generated structured source record still agree with the bundle, not that any new doctrine truth has been discovered.

Self Ignorance Coverage LedgerCompares expected against built entities to report known coverage gaps.3/5

Does Compares declared Kind Atlas expectations against materialized entities and reports the known coverage debt that falls out of that public fixture comparison.

Scope limit known Kind Atlas coverage debt projection only; it does not claim literal unknown-unknown omniscience or absence proof

Run
microcosm self-ignorance-coverage-ledger run --input fixtures/first_wave/self_ignorance_coverage_ledger/input --out receipts/first_wave/self_ignorance_coverage_ledger

EvidenceComputed projectionevidence 3/5Source-faithful refactor

architecturenavigationdoctrine

Source Design note · Source atlas

Paper module Self-Ignorance Coverage Ledger

Purpose

A navigation system that lists what it knows is easy to build. A system that can state, precisely, what it has not yet covered is harder, and it is the more honest signal to a cold reader. This component answers one question: for a declared set of Kind Atlas families, how many rows does the option surface expose that the generated System Atlas has not yet materialised?

The answer is a small debt vector, computed rather than asserted. For each selected kind the component recomputes the live Kind Atlas row count through system.lib.kind_atlas.build_kind_atlas, counts the entities the build_system_atlas.py graph has actually materialised for that kind, and reports the difference as known coverage debt. Concepts, mechanisms and standards are checked back to real source source files so the materialised set cannot be inflated with names that have no file behind them.

The unusual part is what the validator refuses. It will not accept a fixture that claims its unknown-unknowns are exhaustive: declaring claims_unknown_unknowns_exhaustive raises a finding rather than passing. The ledger reports a bounded count of gaps it can see and explicitly declines to claim there are no others. Known debt is treated as typed residual pressure, not as a completeness proof, and absence of a row is never read as proof that nothing is missing.

Abstract

self_ignorance_coverage_ledger is a public Microcosm Crown Jewel component that measures a narrow, source-grounded form of self-ignorance: known row-level coverage debt between live Kind Atlas option-surface counts and generated System Atlas materialization evidence. It recomputes the selected Kind Atlas families, derives materialized entity IDs from a build_system_atlas.py graph snapshot, source-validates graph-derived entity IDs, replays semantic negative cases, and emits metadata-only result records with scope boundaries.

The current exported bundle is a realness-rung R4 check when the source repo is available: live Kind Atlas counts are bound, the System Atlas graph slice is builder-bound, the live System Atlas graph is cross-checked, expected entity IDs are source-backed, and copied source source bodies are digest-bound through a manifest. The claim is only known_kind_atlas_coverage_debt_projection_only: it is not absence proof, unknown-unknown omniscience, total repository search proof, source-file changes, launch-scope decision, publishing-scope decision, private-system equivalence, provider affiliation, or whole-system correctness.

Problem

Navigation systems can overstate themselves in two opposite ways. A vague "coverage is incomplete" tells a cold reader nothing operational. A confident "nothing else is missing" is worse: it converts absence of evidence into evidence of absence. This component exists to occupy the narrow technical middle: for a declared finite domain of Kind Atlas families, compute the gap between what the option surface exposes and what the System Atlas graph has materialized.

The result is a self-ignorance ledger, not a universal discovery engine. Its positive output is a bounded debt vector. Its negative output is equally important: the validator must refuse fixtures that claim exhaustive unknown-unknown coverage, hand-author materialization counts, substitute entity IDs, use stale/baked expected IDs as authority, tamper with the System Atlas builder result record, or repair a copied-source manifest into a self-reference.

Mechanism

The runtime locus is src/microcosm_core/organs/self_ignorance_coverage_ledger.py. The exported-bundle entrypoint is run_self_ignorance_bundle; the core evaluator is evaluate; the semantic negative-case replayer is evaluate_negative_case; the local scope limit is AUTHORITY_CEILING.

The evaluator consumes four public bundle files:

InputRequired semanticsMain checks
kind_atlas_rows.jsonDeclared Kind Atlas families, expected entity IDs, known-debt floors, and absence policy.Recompute live row counts through system.lib.kind_atlas.build_kind_atlas; reject forbidden unknown-unknown exhaustiveness.
system_atlas_graph.jsonGenerated graph slice carrying materialized System Atlas entity IDs.Require non-empty entities and generated_by == tools/meta/factory/build_system_atlas.py; derive materialized IDs from graph rows.
materialized_entities.jsonDeclared materialization rows and snapshot metadata.Check declared counts against graph-derived counts; use graph-derived counts as authority.
projection_protocol.jsonResult record for the System Atlas check and coverage scope.Require the exact coverage scope and a valid build_system_atlas.py --check result record or blocked-refresh result record.

Algorithmically, the component performs this loop:

  1. Load bundle inputs and the source-module manifest through the Crown Jewel common runner.
  2. Recompute selected Kind Atlas rows from the source repo when system/lib/kind_atlas.py is available.
  3. Load system_atlas_graph.json, require the System Atlas builder marker, and derive materialized IDs by kind.
  4. Cross-check the bundled graph slice against state/system_atlas/system_atlas.graph.json when the source repo is available.
  5. For concepts, mechanisms, and standards, verify that graph-derived expected IDs resolve to real source source files.
  6. Compute known_coverage_debt_count = live_kind_atlas_row_count - graph_derived_materialized_count by kind.
  7. Replay semantic negative cases from clean input copies instead of trusting declared error labels.
  8. Write result records with refs, counts, hashes, findings, realness evidence, and scope boundaries; copied body text stays out of result records.

For the current exported bundle, the public count vector is:

KindLive Kind Atlas rowsGraph-derived materialized entitiesKnown debt
concepts413011
mechanisms36288
paper_modules2252205
standards20129172
Total503307196

Those numbers come from examples/self_ignorance_coverage_ledger/exported_self_ignorance_coverage_ledger_bundle/kind_atlas_rows.json, materialized_entities.json, and system_atlas_graph.json, and are proof-consuming snapshot facts. They are not stable doctrine constants; rerun the validator after Kind Atlas, System Atlas, or source manifests move.

Projection Protocol Result record

projection_protocol.json is the result record that prevents a static graph slice from masquerading as live authority. The accepted bundle must carry:

FieldAccepted valueMeaning
coverage_scopelive_kind_atlas_vs_generated_system_atlas_materialization_snapshotThe domain is live Kind Atlas rows against generated System Atlas materialization.
system_atlas_check_command./repo-python tools/meta/factory/build_system_atlas.py --checkThe refresh/check route is named, not implied.
system_atlas_check_statuspass or blocked_source_inputs_changed_since_artifact_generationA blocked refresh is admissible only when declared as such; it does not upgrade the snapshot.
system_atlas_refresh_blocked_by_active_source_claimsBoolean
body_in_receiptfalseResult record fields carry metadata and verdicts, not copied source bodies.

The focused tests test_self_ignorance_coverage_ledger_rejects_projection_scope_tamper and test_self_ignorance_coverage_ledger_rejects_system_atlas_receipt_tamper are the proof consumers for this protocol.

Mermaid Flow

Live Kind Atlas rows503 selected rowsLive Kind Atlas rows 503 selected rowsSystem Atlas graph slice307 materialized entitiesSystem Atlas graph slice 307 materialized entitiesscope + build_system_atlas.pyresult recordscope + build_system_atlas.py result recordSource source filesSource source filesManifestManifestevaluate()evaluate()semantic negative-case replaysemantic negative-case replayknown debt vector196 unitsknown debt vector 196 unitsmetadata-only result recordscounts, refs, hashes, scopeboundariesmetadata-only result records counts, refs, hashes, scope boundariesscope limitknown debt projection onlyscope limit known debt projection only

Source refs

scope + build_system_atlas.py result record
projection_protocol.json
Source source files
concept/mechanism/standard ids
Manifest
source_module_manifest.jsoncopied_non_secret_macro_body
Diagram source
flowchart LR KA["Live Kind Atlas rows 503 selected rows"] Graph["System Atlas graph slice 307 materialized entities"] Proto["projection_protocol.json scope + build_system_atlas.py result record"] Source["Source source files concept/mechanism/standard ids"] Manifest["source_module_manifest.json copied_non_secret_macro_body"] Eval["evaluate()"] Neg["semantic negative-case replay"] Debt["known debt vector 196 units"] Result record["metadata-only result records counts, refs, hashes, scope boundaries"] Ceiling["scope limit known debt projection only"] KA --> Eval Graph --> Eval Proto --> Eval Source --> Eval Manifest --> Eval Eval --> Debt Eval --> Neg Debt --> Result record Neg --> Result record Result record --> Ceiling

This diagram is the human proof path and must stay subordinate to the bundle and generated projection.

Real-Good / Real-Bad / Perturbation Evidence

The positive case is test_self_ignorance_coverage_ledger_projects_real_bundle_known_debt plus the bundle route:

PYTHONPATH=src ../repo-python -m microcosm_core.organs.self_ignorance_coverage_ledger run-self-ignorance-bundle --input examples/self_ignorance_coverage_ledger/exported_self_ignorance_coverage_ledger_bundle --out /tmp/microcosm-self-ignorance-coverage-ledger/bundle --card

The accepted result must report status pass, known debt 196, observed negative cases forbidden_absence_inference and coverage_debt_mismatch, realness_rung: R4, live_kind_atlas_recompute_used: true, live_system_atlas_graph_crosscheck_used: true, and source-module digest success.

The real-bad cases are not marketing examples; they are the contract. Treat a guard as validated only when the focused pytest route passes in the current checkout:

Evidence classTest / mutationRequired refusal
Missing real graphtest_self_ignorance_static_fixture_blocks_without_real_graphCROWN_JEWEL_INPUT_MISSING and SELF_IGNORANCE_REAL_ATLAS_GRAPH_EMPTY.
Absence overclaimtest_self_ignorance_coverage_ledger_rejects_absence_omniscienceSELF_IGNORANCE_FORBIDDEN_ABSENCE_INFERENCE.
Expected ID mismatchtest_self_ignorance_coverage_ledger_rejects_coverage_debt_mismatchSELF_IGNORANCE_EXPECTED_ENTITY_IDS_MISMATCH.
Baked IDs without graph authoritytest_self_ignorance_coverage_ledger_rejects_baked_expected_ids_without_sourceSELF_IGNORANCE_EXPECTED_ENTITY_IDS_NOT_SOURCE_BACKED; realness rank falls.
Declared entity substitutiontest_self_ignorance_coverage_ledger_rejects_declared_entity_id_substitutionMissing-from-graph and missing-from-expected mismatch rows.
Count tampertest_self_ignorance_coverage_ledger_rejects_materialized_count_tamperSELF_IGNORANCE_MATERIALIZATION_COUNT_NOT_GRAPH_DERIVED.
Graph materialization tampertest_self_ignorance_coverage_ledger_rejects_graph_materialization_tamperExpected-ID mismatch and changed debt vector.
Graph builder tampertest_self_ignorance_coverage_ledger_rejects_graph_builder_tamperSELF_IGNORANCE_ATLAS_GRAPH_BUILDER_MISMATCH.
Protocol scope/check tamperprojection protocol testsScope or check result record blocked; R4 cannot stand.
Declared negative labels lietest_self_ignorance_negative_cases_are_semantic_not_declared_labelsSemantic evaluator still finds the expected error codes.
Copied source self-referencetest_self_ignorance_bundle_rejects_stale_copied_target_source_refCROWN_JEWEL_SOURCE_SELF_REFERENCE_UNVERIFIED.
Copied source digest drifttest_self_ignorance_bundle_rejects_source_module_digest_mismatchCROWN_JEWEL_SOURCE_DIGEST_MISMATCH.

Perturbation evidence is test_self_ignorance_coverage_debt_moves_with_materialized_entity_graph: adding a real, source-backed standard entity moves the known-debt count from 196 to 195 and keeps the result passing. That proves the ledger is coupled to the graph-derived materialization set, not to a fixed prose number.

The unsourced-materialization guard target is test_self_ignorance_coverage_ledger_rejects_coherent_fake_standard_entity. Its intended refusal is SELF_IGNORANCE_EXPECTED_ENTITY_ID_SOURCE_MISSING, but the paper must not count that guard as validated unless the focused pytest route currently blocks the fake standard entity. If it regresses, lower the source-validation claim to the passing guards above and route the source/test issue through the work log before completion.

Source-Backed Concept / Mechanism / Law Links

LinkSource-backed supportClaim supported
Component self_ignorance_coverage_ledgerorgans/self_ignorance_coverage_ledger.json and core/organ_atlas.json::organs[51:self_ignorance_coverage_ledger]This is an accepted public component with the named runtime locus and paper-module drilldown.
Mechanism mechanism.self_ignorance_coverage_ledger.validates_public_self_ignorance_coverage_ledgercore/mechanism_sources.json and mechanisms/mechanism.self_ignorance_coverage_ledger.validates_public_self_ignorance_coverage_ledger.jsonThe mechanism validates known Kind Atlas coverage-debt fixtures while refusing overclaims.
Concept concept.architecture_and_navigation_route_contract_bundleconcepts/concept.architecture_and_navigation_route_contract_bundle.jsonThe component is part of the executable architecture/navigation route-contract family.
Principle P-2principles/P-2.jsonClaim strength must be no stronger than the named checker and result record.
Principle P-7principles/P-7.jsonKnown gaps remain typed residual pressure, not completeness claims.
Principle P-11principles/P-11.jsonFreshness-sensitive claims require dated result records and refresh routes.
Principle P-15principles/P-15.jsonGenerated projections stay below source registries and result records.
Axiom AX-6axioms/AX-6.jsonClosed-world coverage is valid only inside declared finite domains; absence is not negation.
Axiom AX-7axioms/AX-7.jsonPartial computation must totalize as pass or typed refusal with evidence.
Axiom AX-8axioms/AX-8.jsonProvenance and labels must survive source-to-projection and body-import boundaries.
Axiom AX-10axioms/AX-10.jsonLive-state counts require freshness, basis, and rederive contracts.

P-19 appears in the component atlas row as an adjacent governing principle for residual classification, but it is not part of the paper-module bundle's principle_refs. Treat it as component-level context unless the bundle is later updated through the JSON authority lane.

Evidence Contract

The fixture contract lives at core/fixture_manifests/self_ignorance_coverage_ledger.fixture_manifest.json. The active standard lives at standards/std_microcosm_self_ignorance_coverage_ledger.json. Together they admit public synthetic fixtures, copied source source bodies, hashes, anchors, validator refs, and generated result records. They forbid private repo bodies outside copied public fixtures, model-output data bodies, account secret or account-bound material, operator private notes, raw thread bodies, and result record body text for copied material.

The exported bundle manifest at examples/self_ignorance_coverage_ledger/exported_self_ignorance_coverage_ledger_bundle/source_module_manifest.json currently carries one source module: tools/meta/factory/build_system_atlas.py, copied into the bundle under source_modules/tools/meta/factory/build_system_atlas.py. The manifest records the source/target relation, digests, line count, required anchors System Atlas and kind, replacements, and the boundary that transform result records record hashes and replacement classes rather than source bodies.

The standard's result record contract requires a real runtime result record, a source-module manifest for the exported bundle, a secret-exclusion scan, at least the forbidden_absence_inference negative case, and body_in_receipt: false. Synthetic result records are not accepted as stand-ins for this component's authority.

Reader Evidence Routing

Read this module in this order:

  1. paper_modules/self_ignorance_coverage_ledger.json for the generated paper-module projection and relationship edges.
  2. core/paper_module_capsules.json::paper_modules[49:paper_module.self_ignorance_coverage_ledger] for source authority.
  3. standards/std_microcosm_self_ignorance_coverage_ledger.json for the public/private boundary, validator contract, result record expectations, and scope limit.
  4. src/microcosm_core/organs/self_ignorance_coverage_ledger.py for evaluate, evaluate_negative_case, run, and run_self_ignorance_bundle.
  5. examples/self_ignorance_coverage_ledger/exported_self_ignorance_coverage_ledger_bundle/ for the current public evidence bundle.
  6. tests/test_self_ignorance_coverage_ledger.py for proof consumers, bad cases, and perturbation cases.

Treat negative cases as part of the positive claim. The paper should cite only guards that the focused validation route blocks in the current checkout. The validated guard set must include forbidden absence inference, coverage mismatch, baked expected IDs without source, declared entity substitution, materialized count tamper, graph materialization tamper, graph builder tamper, projection protocol tamper, stale source self-reference, and digest mismatch. Fake-but-coherent standard entities remain a required unsourced-materialization guard target, but not an observed passing guard unless the focused test route blocks that perturbation.

Prior Art Grounding

The nearest ordinary analogue is software coverage measurement: coverage tools report what was exercised or missed over a declared surface, not all possible missing behaviors. coverage.py is useful as a reference pattern for bounded observed coverage over a source set.

The health-signal side is adjacent to automated repository checks such as OpenSSF Scorecard: bounded checks can produce useful risk signals without becoming complete security or quality proof. Microcosm applies that pattern to navigation coverage debt and keeps the scope boundary in the same result record frame as the count.

Validation Result record Path

From microcosm-substrate, run:

PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_self_ignorance_coverage_ledger.py -q
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check

Use a throwaway result record directory for manual bundle checks:

PYTHONPATH=src ../repo-python -m microcosm_core.organs.self_ignorance_coverage_ledger run-self-ignorance-bundle --input examples/self_ignorance_coverage_ledger/exported_self_ignorance_coverage_ledger_bundle --out /tmp/microcosm-self-ignorance-coverage-ledger/bundle --card

Passing validation proves that the declared public bundle, source-module manifest, negative cases, and paper-module corpus remain coherent. It does not establish freshness beyond the checked snapshot or authorize generated projection edits by hand.

Scope boundary

Limitations

This component is intentionally closed-world over selected artifact kinds. It does not discover arbitrary missing files, prove that all System Atlas materialization gaps are known, or search every repo surface. It counts row-level debt over selected Kind Atlas families and only within the graph/materialization/protocol evidence supplied to the bundle.

Freshness is conditional. projection_protocol.json can record that a System Atlas refresh was blocked by active source claims. In that state, the component reports a bounded snapshot plus a refresh boundary; it does not silently upgrade stale generated materialization into live truth.

Source-open evidence is public-safe, not public-total. The bundle may carry copied source bodies with transformed non-public paths, hashes, line counts, and anchors. That does not export source notes, model-output data bodies, account or browser state, account secrets, browser UI state, or private source-root equivalence.

The current generated paper-module JSON has resolved bundle edges and relationships.unpopulated_selective_relations: []. That is a discoverability and lattice-coherence statement. It is not implementation-correctness proof, launch-scope decision, provider authority, or proof that every related concept/principle/axiom has full empirical support.

Scope limit

The scope limit is narrow. Self-Ignorance Coverage Ledger can claim that public fixture evidence, graph-derived materialization rows, live Kind Atlas recomputation, source-backed expected entity IDs, copied source evidence, semantic negative cases, and result records make declared Kind Atlas coverage debt visible, recomputable, and checkable.

It cannot claim literal unknown-unknown omniscience, absence proof, total repository search proof, source-file changes, live Atlas mutation, private-source export, launch-scope decision, publishing-scope decision, provider affiliation, product readiness, or whole-system correctness. The active v2 standard status is a source JSON contract state only and does not expand the scope limit.

Formal math & proof (18)

Proof Diagnostic Evidence SpineSorts proof-pipeline checks into accepted or rejected without inflating a pass.3/5

Does An evidence checkpoint that sits in front of formal-proof work. It reads the diagnostic records left by earlier proof-pipeline steps and writes a "diagnostic board" listing which checks were accepted, which were rejected, and why. The board shows exactly what evidence was kept, and refuses to let raw model output, a stale record, or a merely-passing check get inflated into a claim that the math is actually correct. It only arranges and judges existing records; it never runs a proof checker itself.

Scope limit It records proof/evidence diagnostics over existing result record references only. It does not run Lean, use external model services, expose proof bodies, turn a passing check into formal-proof or theorem authority, prove runtime or whole-system correctness, authorize later components, certify public launch, authorize public sharing or recipient work, or establish secret export.

Run
microcosm proof-diagnostic-evidence-spine run --input fixtures/first_wave/proof_diagnostic_evidence_spine/input --out receipts/first_wave/proof_diagnostic_evidence_spine --card

Paper module Proof Diagnostic Evidence Spine

proof_diagnostic_evidence_spine sits one step before formal proof authority. It holds diagnostic evidence from the formal-math evaluation and premise-retrieval pipeline as result record-backed cells, and refuses to let any of them be read as a proof.

Purpose

The component answers a single question: does a diagnostic check that claims to be backed by real Ring2 runtime evidence actually recompute against that evidence, or is it asserting more than its refs support? Without this membrane, a check row could name a failure-taxonomy report or a graph-update candidate set, declare itself passing, and be trusted on its own word. The spine refuses that.

What is unusual is that the validator does not trust the fixture's own pass label. It ignores the legacy expected_result field as a non-authoritative fixture label and rederives the verdict itself. For each check it resolves the named source_ref to a real file, re-hashes that file with sha256, and confirms the hash matches the expected digest. It then opens the named result record anchor and checks that the result record payload actually contains that source ref and digest. A check is accepted only when the source, the digest, and the result record all agree. The pass is a recomputation, not a claim copied from the fixture.

The second idea is that negative evidence is kept rather than hidden. A stale source fingerprint is recorded as source_fingerprint_status: stale and retained as diagnostic evidence; a provider advisory row is preserved as metadata while being rejected as authority; a forbidden proof-body field turns a row into a regression fixture rather than silently dropping it. The board shows what did not hold, which is the point of an evidence membrane.

Teleology

proof_diagnostic_evidence_spine is the body-safe evidence membrane before formal proof work. It records proof/evidence diagnostics while rejecting proof bodies, provider output bodies, source-authority upgrades, stale coupling, and runtime-correctness overclaims.

Public Contract

The validator consumes failure-taxonomy records, graph-update traces, verifier-trace repair artifacts, and formal evidence-cell anchor result record refs from the formal-math evaluation and premise-retrieval pipeline, then emits diagnostic result records over those refs. Provider-advisory rows are bounded evidence authority. Passing diagnostic checks do not become formal proof authority or formal-result correctness.

How a check is accepted

A check row carries three lists: source_refs, receipt_anchor_refs, and source_digest_refs. The validator does not take the row's word for whether it passes. It recomputes the verdict from the system.

For each source_ref it resolves a real file, reads it, and hashes the bytes with sha256. That hash must equal the expected digest the component holds for the ref. It then opens each result record anchor and checks that the result record payload actually contains the source ref and its digest, so a check is only "result record-backed" if the result record it cites genuinely references it. On top of that the component applies a semantic floor: a check whose id mentions a failure taxonomy must point at a source file that carries a failure-taxonomy report with representative failures and at a result record that carries a failure-mode ledger; a graph-update check needs graph-update candidates with ids and a matching result record anchor. The check is accepted only when every source resolves, every digest matches, every cited result record backs the ref, the semantic floor is satisfied, and no expected-negative error code is declared.

The concrete failure mode this guards against is a plausible-looking row that names real artifact paths but does not actually recompute: a digest that has drifted, a result record that does not mention the ref it claims, or a check labelled as failure-taxonomy evidence while pointing at an unrelated file. Each of those becomes a rejection finding rather than a silent pass. The recompute is also why a passing check stays bounded. It establishes that the named evidence is present and coupled, not that the underlying runtime is correct, which is why a row that adds claims_runtime_correctness is rejected as an overclaim.

Shape

all agreeany mismatchDiagnostic check rowsource_refs,receipt_anchor_refs,source_digest_refsDiagnostic check row source_refs, receipt_anchor_refs, source_digest_refsResolve source refto real public fileResolve source ref to real public fileRe-hash file (sha256)compare to expected digestRe-hash file (sha256) compare to expected digestOpen result record anchordoes payload containthis ref and digest?Open result record anchor does payload contain this ref and digest?Semantic floorfailure-taxonomy /graph-updatesource and result recordmatchSemantic floor failure-taxonomy / graph-update source and result record matchAccepted checkverdict = recomputed,body_in_receipt falseAccepted check verdict = recomputed, body_in_receipt falseRejected / retainedas diagnostic evidenceRejected / retained as diagnostic evidenceStale source fingerprintStale source fingerprintProvider advisory payloadProvider advisory payloadForbidden proof-body fieldForbidden proof-body fieldevidence accounting onlyevidence accounting only

Source refs

evidence accounting only
diagnostic_board.json
Diagram source
flowchart TD Check["Diagnostic check row source_refs, receipt_anchor_refs, source_digest_refs"] Resolve["Resolve source ref to real public file"] Hash["Re-hash file (sha256) compare to expected digest"] Result record["Open result record anchor does payload contain this ref and digest?"] Floor["Semantic floor failure-taxonomy / graph-update source and result record match"] Accept["Accepted check verdict = recomputed, body_in_receipt false"] Reject["Rejected / retained as diagnostic evidence"] Stale["Stale source fingerprint"] Provider["Provider advisory payload"] Proofbody["Forbidden proof-body field"] Check --> Resolve --> Hash --> Result record --> Floor Floor -->|all agree| Accept Floor -->|any mismatch| Reject Stale -. retained as evidence .-> Reject Provider -. metadata kept, authority denied .-> Reject Proofbody -. scrubbed, kept as regression .-> Reject Accept --> Board["diagnostic_board.json evidence accounting only"] Reject --> Board Board -. denies .-> Ceiling["no Lean/Lake run, no formal-result correctness, no provider authority, no launch"]

Evidence/accounting refs:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[14] sets source_authority: json_capsule, names subjects proof_diagnostic_evidence_spine and mechanism.proof_diagnostic_evidence_spine.validates_ring2_diagnostic_evidence_membrane, resolves code_loci[0].path to src/microcosm_core/organs/proof_diagnostic_evidence_spine.py, and keeps generated_projections.markdown.generated: false, generated_projections.mermaid.status: available_from_capsule_edges, and generated_projections.atlas_card.status: linked_from_capsule_edges.
  • Generated instance boundary: paper_modules/proof_diagnostic_evidence_spine.json::paper_module_payload.projection_contract records authority_flip_status: not_flipped, while paper_modules/proof_diagnostic_evidence_spine.json::relationships.edges carries source-justified links to the component, mechanism, concept, principles, axioms, dependencies, and code locus.
  • Component/source locus: organs/proof_diagnostic_evidence_spine.json::organ_payload.source_atlas_row names the first command, claim_ceiling_restated, mechanism_refs[0], wires_to, and the same code-locus symbols implemented in src/microcosm_core/organs/proof_diagnostic_evidence_spine.py (PROOF_AUTHORITY_CEILING, EXPECTED_NEGATIVE_CASES, validate_copied_macro_body_artifacts, validate_evidence_receipts, validate_provider_payload_policy, validate_authority_ceiling, run, and run_evidence_bundle).
  • Standard contract: standards/std_microcosm_proof_diagnostic_evidence_spine.json::authority_boundary_detail limits the component to copied Ring2 diagnostic runtime artifacts, summary metrics, graph-variant metadata, and anchor result record refs. Its body_import_verification.source_open_body_import_floor records 13 copied artifact bodies, 10 exact copies, 3 public-light edits, and body_text_exported_in_receipts: false; its body_import_verification.public_organ_source_body_floor records one exact copied public component source body.
  • Bundle floor: examples/proof_diagnostic_evidence_spine/exported_evidence_bundle/bundle_manifest.json has schema_version: proof_diagnostic_evidence_spine_exported_evidence_bundle_v1, bundle_id: ring2_proof_diagnostic_evidence_runtime_example, copied_macro_body_artifacts count 13, and an scope limit of Ring2 diagnostic result record refs only, not formal proof authority.
  • Source-body floor: examples/proof_diagnostic_evidence_spine/exported_evidence_bundle/source_body_floor/source_module_manifest.json::modules[0] records source ref src/microcosm_core/organs/proof_diagnostic_evidence_spine.py, source_to_target_relation: exact_copy, sha256_match: true, body_in_receipt: false, and omitted material including model-output data bodies, account or browser state, browser UI live-access state, recipient-send state, private proof bodies, and oracle-needed premise ids.
  • Result record behavior: receipts/first_wave/proof_diagnostic_evidence_spine/proof_evidence_validation_receipt.json records accepted_count: 2, rejected_count: 1, missing_negative_cases: [], body_in_receipt: false, source_fingerprint_status: stale, and observed negative cases for source-authority upgrade, missing result record fields, runtime-correctness overclaim, provider/proof body rejection, and stale coupling. The sibling provider_payload_policy_result.json::provider_payload_policy preserves advisory metadata while rejecting the forbidden proof-body payload, and diagnostic_board.json::authority_ceiling rejects model-output data authority, source-authority upgrade, runtime-correctness claims, and formal prover execution.
  • Focused regression surface: tests/test_proof_diagnostic_evidence_spine.py asserts the observed negative cases match EXPECTED_NEGATIVE_CASES and checks the exported evidence bundle path. These tests support reader wiring and evidence accounting only; they do not establish formal-result correctness, provider authority, runtime correctness, publishing-scope decision, or launch-scope decision.

Reader Evidence Routing

Route currentness questions through ## JSON Bundle Binding and the validation commands in ## Validation Result record Path. The tests and corpus check confirm reader wiring and projection health; they do not establish proof authority.

Route source/body-floor questions through ## Source-Open Body Floor and the fixture/example paths named under ## Structured Lattice Bindings. The diagnostic artifact copies from the formal-math evaluation pipeline, public component-source copy, manifests, and digest coupling are evidence-accounting inputs; they are bounded evidence bodies, model-output data bodies, runtime correctness claims, or source-authority upgrades.

Route claim-safety and public-copy questions through ## Scope limit, ## Evidence-As-Accounting Shape, and ## Scope boundary, then pair this module with batch12_release_claim_language_gate when public wording is being checked. If the question is "did the validator still enforce the membrane?", use the focused pytest and corpus check in ## Validation Result record Path before citing the reader page.

Evidence-As-Accounting Shape

This component is the proof-adjacent evidence membrane behind Microcosm's scope limits. It accepts diagnostic runtime artifacts, result record refs, source digests, and negative-case results as evidence cells, while refusing to treat any of them as theorem authority.

The accounting rule is two-sided. A copied artifact from the formal-math evaluation and premise-retrieval pipeline can strengthen only the diagnostic claim named by its result record, digest, and validator; it cannot upgrade itself into formal-result correctness, provider authority, launch-scope decision, or private-system equivalence. Stale source coupling is retained as diagnostic evidence instead of hidden, and provider-advisory rows remain metadata without payload bodies.

Use this module with batch12_release_claim_language_gate when evaluating public copy: the evidence spine says what result record-backed cells exist, and the language gate decides whether a public sentence stays within that ceiling.

Prior Art Grounding

The evidence spine is grounded in assurance-case practice: evidence should be connected to claims, assumptions, and limits before it is treated as support. NASA's Goal Structuring Notation example for spacecraft assurance is a useful public analogue because it frames assurance as model-structured evidence rather than document-level persuasion: NTRS 20160005295.

The result record membrane also borrows from W3C PROV and observability practice: diagnostic artifacts are evidence cells with provenance, not theorem authority. That is why the component accepts digest-coupled diagnostic refs and negative cases while rejecting proof bodies, model-output data bodies, and stale source-coupling overclaims.

Validation Result record Path

./repo-pytest tests/test_proof_diagnostic_evidence_spine.py -q --basetemp=/tmp/microcosm_proof_diagnostic_evidence_spine_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This module can claim reader wiring for the proof-diagnostic evidence membrane: the component and mechanism subject resolve, and the runtime source locus is named. It cannot claim Lean or Lake execution, formal proof authority, formal-result correctness, provider authority, runtime correctness of the imported systems, source-file changes, launch-scope decision, publishing-scope decision, hosted deployment, or whole-system correctness.

Diagnostic result records, copied runtime artifacts from the formal-math evaluation pipeline, copied public component source, source digests, and focused tests can support only bounded evidence-accounting claims: which public refs, manifests, negative cases, and body-hygiene checks were validated. A diagram view and atlas entry are generated for this module; they do not convert diagnostics into formal-result correctness or provider/publishing-scope decision.

Scope boundary

This module documents diagnostic result record anchors over real system from the formal-math evaluation and premise-retrieval pipeline, and keeps forbidden proof/provider body cases as regression-only guards. It does not run Lean, use external model services, expose proof bodies, prove runtime correctness, certify public launch operations, authorize public sharing or recipient work, establish secret export, or claim whole-system correctness.

Source and projection details
Source-Open Body Floor

The public bundle carries two bounded body floors. The runtime-artifact floor copies thirteen diagnostic artifacts from the formal-math evaluation and premise-retrieval pipeline under examples/proof_diagnostic_evidence_spine/exported_evidence_bundle/source_artifacts and records their source/target digest coupling in bundle_manifest.json. Three rows are source-faithful public-light edits that redact operator absolute paths and retain both source and target digests.

The component-source floor copies the public source body for src/microcosm_core/organs/proof_diagnostic_evidence_spine.py under source_body_floor/source_modules. Generated state/runs JSON artifacts are evidence bodies, not source-body authority. Neither body floor places body text in result records or workingness cards, and neither imports proof bodies, model-output data bodies, account or browser state, browser UI live access, recipient-send state, account secrets, private proof bodies, or oracle-needed premise ids.

Formal Math Readiness GateReads declared math setups and lists which proof tactics may be attempted versus blocked.3/5

Does Before anyone tries to prove a math theorem with the Lean prover, this gate reads simple description files that declare what a math setup is supposed to have — which math library is claimed to be present, which proof tactics are reported as already probed, which lemmas may be looked up, and which limits apply to the text budgets handed to AI providers — and writes a plain checklist of what is allowed to be attempted versus blocked. It works only from those declared description files; it does not inspect the real toolchain or run anything. Its guards keep the claims honest and checkable: it refuses to let a library be marked available unless a probe result backs that up, blocks routing a proof tactic that was not probed, and refuses to let any real proof text sneak into the lemma-lookup tables or provider budgets.

Scope limit It only validates and projects declared readiness metadata; it does not run Lean/Lake, inspect the real toolchain, use external model services, prove any theorem correct, produce benchmark claims, or authorize Mathlib-dependent proof attempts.

Run
microcosm formal-math-readiness-gate run --input fixtures/first_wave/formal_math_readiness_gate/input --out receipts/first_wave/formal_math_readiness_gate

Paper module Formal Math Readiness Gate

Teleology

formal_math_readiness_gate is the public runtime cell that turns the formal math slice from a deferred slogan into an executable boundary. It validates synthetic readiness metadata for corpus availability, tactic probes, premise indexes, target-shape routing, and provider context recipes before any future Lean witness can claim authority.

The page should let a cold reader answer one question without rereading the component: what evidence has Microcosm actually validated, and where does that evidence stop?

Purpose

Formal-math tooling fails quietly when a library, tactic, or corpus is assumed present rather than checked. A pipeline that routes a proof to aesop when aesop is not actually available, or that treats a premise index as proof evidence because it happens to carry a proof body, has already lost the boundary between "ready to attempt" and "proven". This component exists to make that boundary explicit before any downstream proof work begins. It answers one question: which declared formal-math inputs are well-formed and honest enough that a later proof witness could safely consume them, and where exactly does that warrant stop?

The mechanism is a deterministic reducer over five public JSON inputs: corpus readiness, tactic-portfolio availability, a premise index, target-shape routing, and provider context recipes. It does not run Lean or Lake. Instead it reads what those inputs declare and refuses the specific ways they can lie. A corpus that claims Mathlib is available without a passing probe is rejected. A tactic marked available without a probe result record is rejected. A premise row carrying a proof_body or oracle_needed_premise_ids field is rejected. A route that admits a tactic the portfolio probe already marked unavailable is rejected. The output is a readiness board, not theorem evidence.

The design choice worth noticing is that the gate proves its own discipline through negative cases. Alongside the positive inputs, the fixture carries five inputs that each commit a known overclaim, and the run passes only when every one of those overclaims is caught and no unexpected finding appears. The gate is therefore not merely asserting "we check Mathlib availability"; it is demonstrating, on each run, that a falsified Mathlib claim is actually refused. A second guard keeps the floor source-open without leaking: copied prover probe bodies are verified by digest through a manifest, while proof bodies, model-output data, and private state stay out of the result records entirely.

Shape

unavailable tactic idsFive public JSON inputscorpus, tactics, premises,routes, provider recipesFive public JSON inputs corpus, tactics, premises, routes, provider recipesSecret-exclusion scanzero blocking hits requiredSecret-exclusion scan zero blocking hits requiredreject Mathlib-availabilityoverclaimreject Mathlib-availability overclaimeach available tactic needs aprobe result recordeach available tactic needs a probe result recordvalidate_premise_indexreject proof_body / oraclepremise idsvalidate_premise_index reject proof_body / oracle premise idsreject route admitting anunavailable tacticreject route admitting an unavailable tacticreject over-budget orproof-body recipereject over-budget or proof-body recipecopied probe bodies,digest-checkedcopied probe bodies, digest-checkedReconcile findings vsEXPECTED_NEGATIVE_CASESevery known overclaim must becaughtReconcile findings vs EXPECTED_NEGATIVE_CASES every known overclaim must be caughtReadiness board + extensionboardavailable / blockedcapabilities, countsReadiness board + extension board available / blocked capabilities, countsScope limitno Lean/Lake, proof,provider, launch, orprivate-system authorityScope limit no Lean/Lake, proof, provider, launch, or private-system authority

Source refs

reject Mathlib-availability overclaim
validate_corpus_readiness
each available tactic needs a probe result record
validate_tactic_portfolio
reject route admitting an unavailable tactic
validate_target_shape_routing
reject over-budget or proof-body recipe
validate_provider_context_recipes
copied probe bodies, digest-checked
validate_source_module_imports
Diagram source
flowchart TD Inputs["Five public JSON inputs corpus, tactics, premises, routes, provider recipes"] Scan["Secret-exclusion scan zero blocking hits required"] Corpus["validate_corpus_readiness reject Mathlib-availability overclaim"] Tactics["validate_tactic_portfolio each available tactic needs a probe result record"] Premises["validate_premise_index reject proof_body / oracle premise ids"] Routing["validate_target_shape_routing reject route admitting an unavailable tactic"] Provider["validate_provider_context_recipes reject over-budget or proof-body recipe"] SourceFloor["validate_source_module_imports copied probe bodies, digest-checked"] Reconcile["Reconcile findings vs EXPECTED_NEGATIVE_CASES every known overclaim must be caught"] Board["Readiness board + extension board available / blocked capabilities, counts"] Ceiling["Scope limit no Lean/Lake, proof, provider, launch, or private-system authority"] Inputs --> Scan Scan --> Corpus Scan --> Tactics Scan --> Premises Scan --> Provider Tactics -->|unavailable tactic ids| Routing Corpus --> Reconcile Tactics --> Reconcile Premises --> Reconcile Routing --> Reconcile Provider --> Reconcile SourceFloor --> Reconcile Reconcile --> Board Board --> Ceiling

The machine graph remains the generated paper_module.formal_math_readiness_gate.mermaid projection derived from the source record, not from this hand-authored Mermaid block.

Reader Evidence Routing

Read this module in evidence order:

  1. Start at core/paper_module_capsules.json::paper_modules[21:paper_module.formal_math_readiness_gate]. That row names the source authority, subjects, mechanism refs, code locus, Microcosm concept/principle/axiom refs, generated projection statuses, and the bundle scope limit.
  2. Check the generated structured source record paper_modules/formal_math_readiness_gate.json. Its relationships.edges cite the bundle source refs and show the generated Mermaid status, Atlas status, source_authority: json_capsule, and unresolved selective-relation count.
  3. Inspect the runtime locus src/microcosm_core/organs/formal_math_readiness_gate.py, especially run, run_readiness_bundle, validate_source_module_imports, write_receipts, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and SOURCE_MODULE_MANIFEST_NAME.
  4. Use fixture evidence for the gate behavior: fixtures/first_wave/formal_math_readiness_gate/input, receipts/first_wave/formal_math_readiness_gate/readiness_gate_result.json, formal_math_readiness_board.json, formal_math_readiness_extension_board.json, formal_math_readiness_validation_receipt.json, and result records/sign-off/first_wave/formal_math_readiness_gate_fixture_acceptance.json.
  5. Use exported-bundle evidence for source-open body-floor claims: examples/formal_math_readiness_gate/exported_formal_math_readiness_bundle/source_module_manifest.json, bundle_manifest.json, source_artifacts/, source_body_floor/source_modules/, and receipts/runtime_shell/demo_project/organs/formal_math_readiness_gate/exported_formal_math_readiness_bundle_validation_result.json.
  6. Use tests/test_formal_math_readiness_gate.py for the behavioral result record boundary. The tests cover negative cases, exported bundle sign-off, source-module digest and target-ref mismatch rejection, bounded command-card output, source-body omission from result records, secret-exclusion/public-relative result record paths, and non-writing plan preview.

Do not route a proof claim through this page. It routes readiness evidence, result record integrity, and source-body-floor accounting only.

Technical Mechanism

The runtime is a deterministic readiness reducer over declared public inputs. run() evaluates the first-wave fixture directory with positive and negative JSON cases enabled; run_readiness_bundle() evaluates the exported public bundle without fixture-negative cases and requires the bundle source-module manifest. Both entrypoints call _build_result(), so the fixture and exported bundle result records share one scope limit, one secret scan, one source-module digest checker, and one readiness-board schema.

_build_result() first loads the five public input families: corpus_readiness.json, tactic_portfolio_availability.json, premise_index.json, target_shape_tactic_routing.json, and provider_context_recipes.json. It then scans those inputs plus any declared source artifacts through secret_exclusion_scan.scan_paths, using the public Microcosm forbidden-class policy. The scan is not advisory: the result can pass only when the scan has zero blocking hits, source-module imports pass, all expected fixture-negative cases are observed, and no unexpected positive-case findings remain.

The mechanism is split into six validators:

  • validate_corpus_readiness() records Lean and Mathlib readiness metadata and adds lean_std_synthetic_core:mathlib to blocked capabilities when Mathlib is unavailable. A Mathlib availability claim without a passing probe becomes MATHLIB_AVAILABILITY_OVERCLAIM.
  • validate_tactic_portfolio() separates available from unavailable tactics and requires every available tactic to carry a probe result record. Synthetic probe labels are accepted only when _tactic_probe_realness_evidence() binds them to copied source modules or fixture-manifest source-open evidence.
  • validate_premise_index() admits premise rows as metadata only. It counts premises, namespaces, retrieval terms, and split eligibility, but rejects proof_body, ground_truth_proof, provider_output_body, and oracle_needed_premise_ids.
  • validate_target_shape_routing() intersects each route case's allowed tactics with the unavailable tactics emitted by the portfolio validator. Any overlap becomes ROUTING_ALLOWS_UNAVAILABLE_TACTIC, so routing cannot silently re-enable a tactic that the probe plane blocked.
  • validate_provider_context_recipes() records byte budgets and deliverable shape while rejecting public recipes over 32,768 bytes or recipes that allow proof bodies or provider-body material.
  • validate_source_module_imports() verifies the exported bundle's source_module_manifest.json, target refs, source refs, line counts, target digests, source digests, exact-copy rows, and the two permitted private-path rewrites. It reports digest/ref failures without placing copied source bodies in result records.

After the validators run, _merge_observed() and _merge_findings() compare observed fixture failures against EXPECTED_NEGATIVE_CASES. This is the local scope limit: the fixture run must prove that the known overclaims are caught, while the exported-bundle run must prove that the positive public bundle has no unexpected findings. _build_extension_board() then projects the accepted metadata into the extension board: selected pattern ids, namespace and split counts, tactic availability counts, Mathlib-dependent unavailable tactics, blocked route cases, provider budgets, source-body import counts, the scope limit, and the scope boundary.

Result record writing preserves the same boundary. write_receipts() emits the gate result, readiness board, extension board, validation result record, and sign-off result record for fixture mode. run_readiness_bundle() emits the exported-bundle result record. The focused test suite asserts the mechanism rather than just file existence: it checks the five expected negative case ids, local Lean/Lake probe metadata with Mathlib unavailable, six available tactics with aesop blocked, eleven premises, five route cases, three provider recipes, thirteen verified source artifacts, source/target digest mismatch rejection, target-ref mismatch rejection, secret-exclusion/public-relative result record paths, and result record omission of copied body text.

Public Contract

The component does not run Lean or Lake. It consumes public JSON fixtures and exported bundles, records which capabilities are available or blocked, rejects Mathlib availability overclaims, rejects unprobed tactics, rejects premise rows that contain proof bodies, rejects routes that admit unavailable tactics, and rejects provider recipes that exceed the public budget or allow proof bodies.

The accepted result is a readiness board. That board can tell a later component what is safe to attempt, but it is bounded evidence evidence, benchmark evidence, or permission to execute a theorem prover.

Prior Art Grounding

This component is grounded in formal-math benchmark and environment-readiness work where the presence of a library, tactic, or corpus is not enough by itself. miniF2F motivates explicit benchmark split discipline for formal mathematics, LeanDojo motivates reproducible theorem-proving environments, and mathlib makes the availability of library imports a concrete precondition rather than a vague capability claim.

Microcosm borrows the readiness-gate pattern: corpus availability, Mathlib probes, tactic probes, premise indexes, target-shape routing, and context budgets must be checked before downstream proof or retrieval language is allowed. It excludes Lean execution or proof authority.

Runtime Surfaces

  • python -m microcosm_core.organs.formal_math_readiness_gate run --input fixtures/first_wave/formal_math_readiness_gate/input --out receipts/first_wave/formal_math_readiness_gate
  • python -m microcosm_core.organs.formal_math_readiness_gate run-readiness-bundle --input examples/formal_math_readiness_gate/exported_formal_math_readiness_bundle --out receipts/runtime_shell/demo_project/organs/formal_math_readiness_gate
  • python -m microcosm_core.organs.formal_math_readiness_gate plan --input fixtures/first_wave/formal_math_readiness_gate/input
  • microcosm formal-math-readiness-gate run --input fixtures/first_wave/formal_math_readiness_gate/input --out receipts/first_wave/formal_math_readiness_gate
  • microcosm formal-math-readiness-gate plan --input fixtures/first_wave/formal_math_readiness_gate/input

Relationship To Lean Witness

formal_math_lean_proof_witness remains deferred. This gate makes the deferral typed and testable: Mathlib is absent until a passing probe says otherwise, unavailable tactics cannot be routed, premise indexes cannot carry proof or oracle bodies, and provider recipes cannot smuggle proof-body deliverables.

Validation Result record Path

./repo-pytest tests/test_formal_math_readiness_gate.py -q --basetemp=/tmp/microcosm_formal_math_readiness_gate_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
jq '{edge_count:(.relationships.edges|length), mermaid_status:.paper_module_payload.generated_projections.mermaid.status, atlas_status:.paper_module_payload.generated_projections.atlas_card.status, source_authority:.relationships.source_authority, unresolved_selective_relation_count:(.relationships.unpopulated_selective_relations|length)}' paper_modules/formal_math_readiness_gate.json

Expected generated-row proof: edge_count: 15, mermaid_status: available_from_capsule_edges, atlas_status: blocked_until_organ_atlas_owner_lane_binds_edges, source_authority: json_capsule, and unresolved_selective_relation_count: 0.

Scope boundary

Scope limit

This module may claim that Microcosm has a public readiness gate for formal math system preparation. The valid claim is bounded to corpus availability, Mathlib and tactic probe metadata, premise-index coverage, target-shape tactic routing, provider context budget checks, extension-board pattern ids, public PROVER smoke-run source artifacts, an exact public component-source body floor, and fixture or exported-bundle result records.

The module must not claim Lean/Lake execution, theorem proving, formal proof authority, formal-result correctness, Mathlib-dependent proof success, benchmark performance, provider-call execution, private proof-body import, oracle-needed premise disclosure, source-file changes, publishing-scope decision, hosted deployment, recipient work, secret export, or whole-system correctness. Its strongest launch-facing statement is readiness-boundary enforcement over public metadata and copied source artifacts.

Limitations

The runtime validates finite public fixtures and exported-bundle manifests. It does not execute Lean or Lake, import Mathlib in the current environment, call a provider, or check theorem statements. When the result reports blocked capabilities such as lean_std_synthetic_core:mathlib, that is a readiness boundary for downstream components, not an invitation to route around the gate.

The copied source artifacts are source-open body-floor evidence only. Digest and target-ref checks show that selected PROVER readiness/probe bodies and the public component source copy match their manifests; they do not authorize source-file changes, private source-root export, proof-body disclosure, recipient work, hosted deployment, or public sharing. Result records intentionally carry counts, digests, paths, negative-case coverage, and authority flags instead of copied body text.

The negative cases are also finite. They cover the known overclaims encoded in EXPECTED_NEGATIVE_CASES: Mathlib availability without a passing probe, unprobed tactic availability, premise rows with proof bodies, target routes that admit unavailable tactics, and provider recipes that exceed public budgets or permit proof bodies. A new formal-math claim needs either a new source-backed negative case here or a different proof consumer; this page should not be used as a generic formal-proof claim surface.

Scope boundary

This module documents a public readiness gate only. It excludes Lean/Lake execution, formal proof authority, Mathlib-dependent proof attempts, external model access, benchmark claims, public launch, hosted deployment, public sharing, recipient work, secret export, or whole-system correctness. It also does not make private source-root material, browser UI state, account or browser material, browser state, account secrets, source notes, model-output data bodies, recipient-send state, or private proof bodies part of the public Microcosm body floor.

Source and projection details
Source-Open Body Floor

The exported readiness bundle carries thirteen PROVER smoke-run readiness/probe bodies under source_artifacts. They cover corpus readiness, tactic-affordance probe metadata, Mathlib and trace probes, and the copied portfolio-core Lean probes used to decide which tactics are blocked or available. Two JSON rows are private-path rewrites; those rows retain source and target digests plus the rewrite mode.

The bundle also carries an exact public component-source copy for src/microcosm_core/organs/formal_math_readiness_gate.py under source_body_floor/source_modules. Generated state/runs Lean artifacts are runnable readiness evidence, not source-body authority. Neither floor places body text in result records or workingness cards, and neither imports model-output data bodies, account or browser state, browser UI live access, recipient-send state, account secrets, private proof bodies, or oracle-needed premise ids.

The source-module manifest and bundle manifest are the right surfaces for body-floor inspection. The validation result records intentionally carry status, digests, counts, and public-relative refs rather than copied source bodies.

Wave 011 adds the explicit extension board for the source intake cell formal_math_readiness_extensions. The board is still metadata-only, but it is more useful than the older flat counts: it records the selected pattern ids (lean_std_toolchain_premise_index, tactic_portfolio_availability_probe, target_shape_tactic_routing_gate), the source projection intake ref, public target refs, validation refs, namespace and split coverage for the premise index, tactic availability status counts, Mathlib-dependent unavailable tactics, target-shape routing admissibility, and provider context budgets.

Governing Lattice Relation

The bundle binds this module to concept.formal_math_and_proof_witness_bundle because the component is not a theorem prover; it is the membrane that decides which public formal-math inputs are safe enough for a later proof witness to consume. The governing mechanisms split that membrane in two. The validates_public_formal_math_readiness_bundle mechanism names the positive bundle path: run, run_readiness_bundle, validate_source_module_imports, and write_receipts validate the declared corpus, tactic, premise, routing, provider-budget, source-module-manifest, and source-body-floor evidence before writing readiness boards. The validates_public_readiness_boundary mechanism names the negative path: validate_corpus_readiness, validate_tactic_portfolio, validate_premise_index, validate_target_shape_routing, and validate_provider_context_recipes reject the cases that would turn readiness metadata into proof authority.

The principle and axiom refs are therefore operational, not decorative. P-1, P-2, and P-3 are expressed by keeping the JSON bundle, generated structured source record, runtime code locus, and result records as separate authority classes. P-6 and P-8 are expressed by the body-floor and secret-exclusion contracts: copied PROVER probe bodies and the public component source copy can be inspected through digests and manifests, while private proof bodies, model-output data bodies, and browser or account state stay outside the public floor. AX-1, AX-2, AX-5, and AX-7 are the local reason the downstream paper_module.formal_math_lean_proof_witness remains a dependency rather than an already-proven conclusion.

The generated lattice edge count is small on purpose: it proves that this page is bundle-backed, source-bound, and connected to one deferred proof-witness module.

Corpus Readiness Mathlib Absence GateRuns the real Lean toolchain to confirm the math library is absent, then gates proof tasks.4/5Runs real tools

Does It reads a recorded readiness report from a Lean math toolchain run and makes one fact inspectable: when the report was captured, the Mathlib library was not importable (its import probe failed). From that, it lists which math corpora are absent or usable only for translation smoke tests, and which downstream tasks must be blocked before any Mathlib-dependent proof work is attempted. It also re-checks that the recorded source files match their recorded SHA-256 digests and that no proof bodies, provider outputs, or non-public paths leaked into the public output. The result shows exactly where the proof pipeline draws a hard "not ready, do not proceed" line, with provenance, instead of quietly assuming the environment is fine.

Scope limit It only projects and gate-checks recorded corpus/toolchain readiness accounting, re-verifies recorded source digests and leakage guards, and runs a bounded Lean/Lake import probe when a toolchain is present. It does not run a full Lake build, prove formal-result correctness, claim Mathlib is available beyond the probe result, benchmark corpora, score model performance, use external model services, or include launch operations or public sharing.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run --input fixtures/first_wave/corpus_readiness_mathlib_absence_gate/input --out receipts/first_wave/corpus_readiness_mathlib_absence_gate

EvidenceExternal tool runevidence 4/5Real runtime result

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Corpus Readiness Mathlib Absence Gate

Abstract

corpus_readiness_mathlib_absence_gate is the public formal-math corpus readiness boundary for Microcosm. It carries copied corpus/toolchain rows from the 2026-05-11 proof-state curriculum smoke run and forces Mathlib absence, absent-corpus blocking, consumer gate decisions, and source-module digest coupling to be visible before any downstream retrieval, tactic-routing, or proof-witness language is allowed.

Purpose

Formal-math agents fail in a specific way: they treat "there is a corpus" as if it meant "this corpus is usable for the proof route I am about to take". A roster lists miniF2F, PutnamBench, ProofNet, LeanDojo and Mathlib, the agent assumes the libraries are present, and the failure only surfaces later as a broken import or a tactic that needs a premise the host cannot resolve. This component answers one question before that happens: for each corpus, is it actually present on this host, and is the Mathlib import lane actually available, or not?

The unusual part is that the gate does not take the answer on trust. It runs a bounded Lean/Lake import probe in a temporary directory: one small file that imports Std and is expected to compile, and one that imports Mathlib and is expected to be rejected with the toolchain's own unknown module prefix 'Mathlib' error. A corpus is only marked usable for Mathlib-dependent work when the runtime evidence agrees the corpus exists, carries a Lake file, and the Mathlib lane probe passes. In the current system the Mathlib probe stays false, so every Mathlib-dependent consumer is blocked, and the one consumer that passes is the Lean3 translation smoke, which needs no Mathlib project at all.

This closes the most common way a readiness claim drifts. Stale alias fields such as mathlib_available, or a PASS lean status, cannot turn the gate green on their own; they must agree with the live probe or the row is flagged. The probe is deliberately narrow. It checks that imports resolve and that Mathlib is genuinely absent. It does not run a lake build, prove any theorem, or claim Mathlib is installed. The output is a readiness board and a set of blocked consumer verdicts, bounded evidence.

Shape

no, probe falseFixture or exported bundleinputcorpus readiness rows +consumer gate casesFixture or exported bundle input corpus readiness rows + consumer gate caseslake env lean: Std compiles,Mathlib import rejectedlake env lean: Std compiles, Mathlib import rejectedcheck SHA-256 digests, parseprobe JSONcheck SHA-256 digests, parse probe JSONMathlib lane available?corpus exists + Lake file +probe passesMathlib lane available? corpus exists + Lake file + probe passes7 corpus rows, alias fieldsmust agree with probe7 corpus rows, alias fields must agree with probederive verdicts fromreadiness factsderive verdicts from readiness facts4 copied source artifacts,digest match4 copied source artifacts, digest matchAllowed: Lean3 translationsmoke(needs no Mathlib project)Allowed: Lean3 translation smoke (needs no Mathlib project)Blocked: Mathlib-dependentand absent-corpus consumersBlocked: Mathlib-dependent and absent-corpus consumersmetadata-only result recordsresult, board, validation,sign-off, bundlemetadata-only result records result, board, validation, sign-off, bundleScope limitno Mathlib availability,proof, provider, launchScope limit no Mathlib availability, proof, provider, launch

Source refs

lake env lean: Std compiles, Mathlib import rejected
runtime_lean_import_probe
check SHA-256 digests, parse probe JSON
validate_runtime_source_artifacts
7 corpus rows, alias fields must agree with probe
validate_corpus_readiness
derive verdicts from readiness facts
validate_consumer_gate_cases
4 copied source artifacts, digest match
validate_source_module_imports
Diagram source
flowchart TD fixture["Fixture or exported bundle input corpus readiness rows + consumer gate cases"] probe["runtime_lean_import_probe lake env lean: Std compiles, Mathlib import rejected"] artifacts["validate_runtime_source_artifacts check SHA-256 digests, parse probe JSON"] mathlib{"Mathlib lane available? corpus exists + Lake file + probe passes"} corpus["validate_corpus_readiness 7 corpus rows, alias fields must agree with probe"] gates["validate_consumer_gate_cases derive verdicts from readiness facts"] imports["validate_source_module_imports 4 copied source artifacts, digest match"] allowed["Allowed: Lean3 translation smoke (needs no Mathlib project)"] blocked["Blocked: Mathlib-dependent and absent-corpus consumers"] result records["metadata-only result records result, board, validation, sign-off, bundle"] ceiling["Scope limit no Mathlib availability, proof, provider, launch"] fixture --> artifacts artifacts --> probe probe --> mathlib mathlib -->|no, probe false| corpus corpus --> gates gates --> allowed gates --> blocked fixture --> imports corpus --> result records gates --> result records imports --> result records result records --> ceiling

This reader diagram is intentionally smaller than the generated doctrine-lattice graph.

Mechanism

The mechanism is a readiness reducer, not a theorem-proving backend. The runtime entrypoints run and run_projection_bundle both call _build_result, which loads public fixture or exported-bundle inputs, scans those inputs against the non-public-state exclusion policy, verifies copied source artifacts, and then combines corpus readiness, consumer gate, source-module import, negative-case, and scope limit fields into one metadata-only result.

validate_runtime_source_artifacts anchors the reducer to four source refs: the corpus readiness rows, tactic-affordance probe, Mathlib import probe Lean file, and tactic portfolio availability JSON. It checks expected SHA-256 digests, parses the JSON source artifacts, and runs a bounded Lean/Lake import probe that can show Std imports and Mathlib remains absent without running a Lake build or exporting Lean bodies.

validate_corpus_readiness normalizes seven corpus rows against those runtime source artifacts. A corpus is usable for Mathlib-dependent work only when the runtime evidence says the corpus exists, has a Lake file, and mathlib_lake_project_import_available is true. In the current fixture and bundle evidence that field remains false, so Mathlib-dependent capabilities are blocked, absent corpora are recorded, and stale alias fields such as mathlib_available cannot turn the gate green.

validate_consumer_gate_cases then derives consumer verdicts from the normalized readiness facts instead of trusting expected-decision labels. The translation smoke consumer can pass because it does not require a Mathlib Lake project and names an available Lean3 reference corpus; Mathlib-dependent or absent-corpus consumers stay blocked. validate_source_module_imports adds the exported bundle floor by requiring the manifest class copied_non_secret_macro_body, material classes, target/source digest agreement, and no body material in result records.

The proof consumers are the two component commands, the focused regression test tests/test_corpus_readiness_mathlib_absence_gate.py, the paper-module corpus check, and the command-card surfaces emitted by result_card. Together they exercise the success path, contradictory Mathlib claims, consumer-gate skips, source digest tampering, private-path rewrites, runtime-probe blocks, and result record body exclusion. The resulting evidence relates the bundle's two mechanisms to concept.formal_math_and_proof_witness_bundle, P-8, and AX-7 by making readiness visibility a precondition for downstream formal-math claims while keeping the scope limit below theorem, provider, benchmark, or launch-scope decision.

Public Surfaces

  • Component runner: python -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run --input fixtures/first_wave/corpus_readiness_mathlib_absence_gate/input --out receipts/first_wave/corpus_readiness_mathlib_absence_gate
  • Exported bundle runner: python -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run-projection-bundle --input examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle --out receipts/runtime_shell/demo_project/organs/corpus_readiness_mathlib_absence_gate
  • Standard: standards/std_microcosm_corpus_readiness_mathlib_absence_gate.json
  • Source-module manifest: examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle/source_module_manifest.json
  • Runtime result record: receipts/runtime_shell/demo_project/organs/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle_validation_result.json

Reader Evidence Routing

Read this module in five passes:

  1. Start with the source record at core/paper_module_capsules.json::paper_modules[8:paper_module.corpus_readiness_mathlib_absence_gate]. It is the source authority that names source_authority: json_capsule, the component subject, two mechanism subjects, the resolved runtime code locus, the concept concept.formal_math_and_proof_witness_bundle, the dependency paper_module.tactic_portfolio_availability, P-8, and AX-7.
  2. The reader proof is the current row shape: eight generated relationship edges, Mermaid available_from_capsule_edges, Atlas blocked_until_organ_atlas_owner_lane_binds_edges, and no unpopulated paper-module selective dependency residual for the tactic-portfolio edge. The structured source record is wiring evidence, not theorem-correctness, runtime-correctness, launch, provider, or production authority.
  3. Inspect the runtime locus src/microcosm_core/organs/corpus_readiness_mathlib_absence_gate.py. The load-bearing symbols are run, run_projection_bundle, validate_corpus_readiness, validate_consumer_gate_cases, validate_source_module_imports, _build_result, write_receipts, result_card, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_NAME, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.
  4. For fixture evidence, use fixtures/first_wave/corpus_readiness_mathlib_absence_gate/input and the result records under receipts/first_wave/corpus_readiness_mathlib_absence_gate/ plus result records/sign-off/first_wave/corpus_readiness_mathlib_absence_gate_fixture_acceptance.json. The first-wave result result record records seven corpus rows, seven consumer cases, one allowed Lean3 translation-smoke case, six blocked absent or Mathlib-dependent cases, mathlib_lake_project_import_available: false, body_in_receipt: false, and the five negative cases mathlib_available_without_probe, consumer_skips_readiness_gate, private_corpus_source_ref, proof_body_leakage, and release_overclaim.
  5. For exported-bundle evidence, use examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle/source_module_manifest.json and receipts/runtime_shell/demo_project/organs/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle_validation_result.json. The manifest verifies four copied source artifacts: corpus readiness JSON, tactic-affordance probe JSON, the Mathlib import probe Lean file, and tactic portfolio availability JSON. The exported result record records source_module_import_count: 4, copied_source_artifact_count: 4, source_modules_pass: true, body_in_receipt: false, and three blocked absent or Mathlib-dependent bundle consumer cases.

If a reader needs validation result records rather than prose, run the commands in ## Validation Result record Path, including the focused regression test and paper-module corpus check. Treat every result record as corpus-readiness boundary evidence only; it does not create Lean/Lake execution authority, Mathlib availability, theorem-proof authority, provider authority, private-system equivalence, or launch-scope decision.

Prior Art Grounding

This component is grounded in Lean corpus and neural theorem-proving work where library availability, premise access, and benchmark splits are part of the claim. The Lean mathematical library establishes Mathlib as a large community-maintained formal mathematics corpus, miniF2F gives a cross-system benchmark for formal Olympiad statements, and LeanDojo shows why reproducible corpus extraction and accessible-premise metadata matter for theorem-proving agents.

Microcosm borrows the readiness gate: corpus rows, Mathlib availability probes, blocked consumer cases, source-module digests, and negative leakage guards must be visible before retrieval, tactic-routing, or proof-witness language is allowed. It does not claim Mathlib is present or that any theorem was proved.

Research Bet

Formal-math agents fail when they treat "there is a corpus" as equivalent to "this corpus is usable for this proof route." This component makes that boundary runnable. It records seven corpus rows, blocks six absent or Mathlib-dependent consumer cases, allows only the Lean3 translation-smoke case, and keeps the Mathlib probe false until an actual passing probe is present.

The exported bundle carries four copied body artifacts: corpus readiness JSON, tactic-affordance probe JSON, the Mathlib import probe Lean file, and tactic portfolio availability JSON. Two rows are exact copies and two use a verified private-path rewrite. The result record records the manifest status, counts, material classes, digests, and metadata-only policy; the copied bodies stay under source_artifacts/, not inside result records.

Source-Backed Doctrine Binding

  • Component: src/microcosm_core/organs/corpus_readiness_mathlib_absence_gate.py
  • Bundle: core/paper_module_capsules.json#paper_module.corpus_readiness_mathlib_absence_gate
  • Mechanism: core/mechanism_sources.json#mechanism.corpus_readiness_mathlib_absence_gate.validates_public_corpus_readiness_boundary
  • Standard: standards/std_microcosm_corpus_readiness_mathlib_absence_gate.json
  • Evidence class: core/organ_evidence_classes.json::corpus_readiness_mathlib_absence_gate records algorithmic_projection at rank 3.
  • Source-module manifest: examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle/source_module_manifest.json
  • Sign-off result records: receipts/first_wave/corpus_readiness_mathlib_absence_gate/* and result records/sign-off/first_wave/corpus_readiness_mathlib_absence_gate_fixture_acceptance.json

Cold-Agent Use

Open the source-module manifest first, then the runtime bundle result record, then the first-wave result result record. The useful claim is not that Microcosm has Mathlib or can prove downstream theorems. The useful claim is that Microcosm can force a formal-math route to expose corpus availability, Mathlib absence, consumer gating, source-module digest evidence, copied-body boundaries, negative-case result records, and an explicit scope boundary before any proof route is treated as usable.

Re-entry condition: the current atlas row already points at this paper module. After the sibling organ_atlas.json lane releases, bind this bundle's mechanism ref and code locus into the atlas row and rerun python -m microcosm_core.doctrine_lattice --check.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

PYTHONPATH=src python3 -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run \
  --input fixtures/first_wave/corpus_readiness_mathlib_absence_gate/input \
  --out /tmp/microcosm-corpus-readiness-mathlib-absence-vrp
PYTHONPATH=src python3 -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run-projection-bundle \
  --input examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle \
  --out /tmp/microcosm-corpus-readiness-mathlib-absence-bundle-vrp
PYTHONPATH=src python3 scripts/build_doctrine_projection.py --check-paper-module-corpus
PYTHONPATH=src .venv/bin/python -m pytest -p no:cacheprovider --basetemp=/tmp/microcosm-corpus-readiness-mathlib-absence-tests -q tests/test_corpus_readiness_mathlib_absence_gate.py
jq '{edge_count:(.relationships.edges|length), mermaid_status:.paper_module_payload.generated_projections.mermaid.status, atlas_status:.paper_module_payload.generated_projections.atlas_card.status, source_authority:.paper_module_payload.source_authority, unresolved_selective_relation_count:(.relationships.unpopulated_selective_relations|length)}' paper_modules/corpus_readiness_mathlib_absence_gate.json

The fixture command writes the public corpus-readiness board, result result record, and validation result record. The bundle command validates the exported source-module manifest and metadata-only runtime result record. The corpus check and jq structured source record query prove the bundle-derived projection currentness without hand-editing generated JSON. The focused test keeps the Mathlib absence boundary, consumer gate cases, source-module digest checks, non-public paths rewrite policy, and scope boundary behavior from regressing.

Passing these commands does not establish Mathlib is installed, rerun Lean/Lake, validate downstream formal-result correctness, benchmark a corpus, authorize external model access, or approve launch; it only proves the bounded fixture and exported bundle result records preserve the declared readiness boundary.

Scope boundary

Scope limit

This component is algorithmic projection over copied source system, not a Lean/Lake rerun and not Mathlib proof authority. Its strongest public claim is that a fixture and exported bundle agree about corpus readiness, Mathlib absence, blocked consumers, copied source-module digests, metadata-only result records, and negative leakage guards. It does not establish formal-result correctness, claim Mathlib is available, benchmark a corpus, expose proof/provider/private bodies, call a provider, change source files, or include launch operations.

Scope limit

The JSON bundle proves a public corpus-readiness boundary only: copied corpus/toolchain rows, absent-Mathlib blocking, consumer gate decisions, source-module digest coupling, metadata-only result records, and negative leakage guards. Mermaid availability reflects bundle edges, while the Atlas row still waits on the component-atlas owner lane. This module does not establish Mathlib is installed, rerun Lean or Lake, validate formal-result correctness, benchmark corpus quality, authorize retrieval or tactic routing, use external model services, expose private proof bodies, change source records, or approve launch.

Result record Shape

The first-wave result result record records corpus_count: 7, consumer_case_count: 7, allowed_case_ids, blocked_case_ids, absent_corpus_ids, mathlib_lake_project_import_available: false, body_in_receipt: false, the scope limit, and five observed negative cases:

  • mathlib_available_without_probe
  • consumer_skips_readiness_gate
  • private_corpus_source_ref
  • proof_body_leakage
  • release_overclaim

The exported runtime result record records source_module_import_count: 4, copied_source_artifact_count: 4, source_modules_pass: true, and the same metadata-only result record boundary.

Scope boundary

This is a source-backed corpus readiness boundary with copied source corpus/toolchain material, not Lean/Lake execution, Mathlib availability, theorem-proof authority, corpus benchmark authority, provider authority, or launch-scope decision.

Mathematical Strategy Atlas Hypothesis ScorerPicks a first-guess proof strategy from a problem's tags and flags any it cannot map.3/5

Does Before any proof is attempted, this component looks at a math problem's feature tags and writes down its first-guess strategy (for example, "this looks like an if-and-only-if, so split it both ways"), and flags anything it cannot map as an explicit "no strategy matched" instead of a silent failure. The chosen opening move, why it was chosen, and the cases it could not map are all recorded in machine-readable result records.

Scope limit It only projects pre-oracle strategy-hypothesis and retrieval mechanics; it does not run Lean/Lake, prove theorems, establish domain or formal-result correctness, reveal oracle labels, expose proof bodies, use external model services, tune on test answers, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.mathematical_strategy_atlas_hypothesis_scorer run --input fixtures/first_wave/mathematical_strategy_atlas_hypothesis_scorer/input --out receipts/first_wave/mathematical_strategy_atlas_hypothesis_scorer

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Mathematical Strategy Atlas

mathematical_strategy_atlas_hypothesis_scorer is the public pre-oracle strategy layer for Microcosm formal-math work. It turns problem feature tags into an explicit strategy hypothesis before premise retrieval or proof execution, then records the result as redacted result records.

The point is not to prove anything. The point is to make the first mathematical move inspectable: an iff_goal shape selects iff_split, a recursive list shape selects recursive_data_induction, arithmetic normalization selects the arithmetic lens, and unmapped shapes become a typed STRATEGY_SELECTION_MISS instead of a hidden failure mode.

The current body-floor import carries eight copied source bodies: the prover graph benchmark harness, the provider result record reducer, their strategy-boundary regression tests, the compute-provider strategy classification standard, and three public runtime artifacts from PROVER_PROVIDER_CONTEXT_SWEEP_20260510_v0 (strategy_cards.json, strategy_hypothesis_set.json, and prover_skill_atlas.json). They live in source_artifacts/ under both the first-wave fixture input and the exported bundle; result records carry refs, counts, hashes, anchors, and verdicts instead of body text.

Purpose

A proof search has to start somewhere. Before any premise is retrieved or any tactic is run, an agent has already committed to a first move: a goal shape, a lens, a family of tactics it expects to use. That choice is usually implicit, buried inside a model call or a prompt. This component exists to pull it into the open. The single question it answers is: for a given problem shape, which strategy did the system pick first, and on what visible evidence?

The interesting part is what the answer is allowed to depend on. The scorer never sees the oracle's expected strategy, the ground-truth proof, or any provider output. It works only from public problem features and a strategy atlas of trigger features, negative triggers, and retrieval-expansion terms. The selected strategy is therefore a hypothesis, recomputed from inputs a cold reader can also read, not a result borrowed from the answer key.

That constraint is what the page guards. The common failure mode for a "strategy classifier" is to bake the answer in: declare the chosen strategy as a plain label, or score it on shallow feature overlap that happens to line up with the known-good label. The component rejects both. A declared selection must match the score the scorer recomputes from evidence, and a strategy chosen on overlap alone is a typed negative case rather than a pass.

Shape

The local component standard, when changing runtime behavior or the claim envelope, is standards/std_microcosm_mathematical_strategy_atlas_hypothesis_scorer.json; the general paper-module contract remains standards/std_microcosm_paper_module.json.

The diagram below traces the scorer's runtime flow inside that projection: how public inputs become a per-candidate score, how a selection or a typed miss is chosen, and how the result is recomputed and written as metadata-only result records under the scope limit.

trigger / negative /retrieval termstrigger / negative / retrieval termsfeature tags, oracle hiddenfeature tags, oracle hiddencandidate strategy idscandidate strategy idsscore = trigger_hits x4negative_hits x3+ retrieval_bonus (cap 2)score = trigger_hits x4 negative_hits x3 + retrieval_bonus (cap 2)rank positive scorestie-break by order, then idrank positive scores tie-break by order, then idany positivescore?any positive score?selected_strategy_id+ score componentsselected_strategy_id + score componentsSTRATEGY_SELECTION_MISS(unknown)STRATEGY_SELECTION_MISS (unknown)recompute vs declaredselection / score / rankingrecompute vs declared selection / score / rankingmetadata-only result recordsrefs, counts, hits, verdictsmetadata-only result records refs, counts, hits, verdictsScope limitno Lean/Lake, oracle labels,external model access, orlaunchScope limit no Lean/Lake, oracle labels, external model access, or launch

Source refs

trigger / negative / retrieval terms
strategy_atlas.json
feature tags, oracle hidden
problem_features.json
candidate strategy ids
hypothesis_cases.json
Diagram source
flowchart TD subgraph Inputs["Public inputs"] atlas["strategy_atlas.json trigger / negative / retrieval terms"] features["problem_features.json feature tags, oracle hidden"] cases["hypothesis_cases.json candidate strategy ids"] end subgraph Scoring["Per-candidate scoring"] score["score = trigger_hits x4 - negative_hits x3 + retrieval_bonus (cap 2)"] rank["rank positive scores tie-break by order, then id"] end select{"any positive score?"} selected["selected_strategy_id + score components"] miss["STRATEGY_SELECTION_MISS (unknown)"] recheck["recompute vs declared selection / score / ranking"] result records["metadata-only result records refs, counts, hits, verdicts"] ceiling["Scope limit no Lean/Lake, oracle labels, external model access, or launch"] atlas --> score features --> score cases --> score score --> rank rank --> select select -- yes --> selected select -- no --> miss selected --> recheck miss --> recheck recheck --> result records result records --> ceiling

The generated instance currently exposes 19 concrete relationships.edges: two subject edges for the component and mechanism, one governing concept edge, six principle edges, six axiom edges, three sibling paper-module dependency edges, and one resolved code-locus edge into src/microcosm_core/organs/mathematical_strategy_atlas_hypothesis_scorer.py. relationships.unpopulated_selective_relations is empty, so the module-level unresolved selective-relation count available from this instance is 0.

Runtime evidence enters through the fixture input fixtures/first_wave/mathematical_strategy_atlas_hypothesis_scorer/input, the exported bundle examples/mathematical_strategy_atlas_hypothesis_scorer/exported_mathematical_strategy_atlas_bundle, and their copied source_artifacts/ / source_module_manifest.json bundles. The focused test file is tests/test_mathematical_strategy_atlas_hypothesis_scorer.py; result records include receipts/first_wave/mathematical_strategy_atlas_hypothesis_scorer/mathematical_strategy_atlas_result.json, mathematical_strategy_atlas_board.json, mathematical_strategy_atlas_validation_receipt.json, result records/sign-off/first_wave/mathematical_strategy_atlas_hypothesis_scorer_fixture_acceptance.json, and runtime-shell exported-bundle validation result records.

The honest ceiling is narrow by design: this module can say that public pre-oracle strategy hypotheses, retrieval-lens metadata, copied public source tool/standard/runtime bodies, source-artifact digests, and negative cases are inspectable. It cannot say that Lean or Lake ran, that a theorem was proved, that oracle labels or model-output data are visible, that benchmark performance is certified, that public sharing is approved, that launch is approved, or that the private root has been made public-safe.

How it works

The scorer reads three public inputs: a strategy atlas, a set of problem features, and a set of hypothesis cases. For each candidate strategy in a case it computes a single integer score from three terms. Each problem feature that matches a strategy's trigger_features adds four points. Each feature that matches the strategy's negative_triggers subtracts three. Retrieval-query terms that appear in the strategy's expansion terms add one point each, capped at two. Plain feature overlap is recorded as a diagnostic count but is deliberately kept out of the score.

Selection is then a deterministic sort. Only strategies with a positive score are eligible. Among those, the scorer ranks by score (highest first), breaking ties by the strategy's declared order and then its id, and takes the top row. If no candidate scores positive, the case resolves to the typed STRATEGY_SELECTION_MISS rather than guessing. The output for each case carries the selected id, the score, the component breakdown, the ranked candidate scores, and the trigger, negative, and retrieval hits that produced them, so the choice can be re-derived by hand.

The weights matter because they encode the design intent. Trigger matches are worth more than retrieval matches, so a strategy is chosen mainly for the shape it claims to handle, not for how many search terms happen to coincide. Negative triggers can veto a strategy that looks superficially apt. The retrieval cap stops a strategy from winning on keyword volume alone. A fixture that tries to score on overlap without these terms is caught by the superficial_overlap_only_scoring negative case.

The same recomputation is what enforces honesty. When a case declares its own selected_strategy_id, score, classifier, retrieval_bonus, or candidate_scores, the component recomputes each from the evidence and reports a stale-declaration finding on any mismatch. Declaring the selected strategy as a bare label, with nothing for the scorer to check against, is itself rejected: a label with no derivable evidence is not strategy evidence. Alongside this, the copied source artifacts are checked for leakage policy, so the strategy cards, hypothesis set, and skill atlas stay pre-oracle, free of proof bodies, and free of oracle strategy ids.

Reader Evidence Routing

Read this module as a pre-oracle strategy-hypothesis audit, not as a proof result. The primary reader path is:

  • Start with strategy_atlas.json, problem_features.json, and hypothesis_cases.json to see how public feature tags select a strategy id before retrieval or proof execution.
  • Check source_module_manifest.json and the copied source_artifacts/ bodies to verify that the imported source bodies are public tool/runtime bodies with exact digests, required anchors, and body-floor result records.
  • Inspect the fixture and exported-bundle result records to confirm that strategy ids, retrieval-term effects, oracle-label exclusion, source-card consistency, and negative cases are checked without exposing proof bodies or model-output data.
  • Use the structured source record only for structural lattice proof: it confirms bundle-backed subjects, code loci, doctrine refs, and dependency edges; it does not establish the scorer's correctness or any theorem.

Public Inputs

  • strategy_atlas.json defines the known strategy enum, match features, and retrieval-term additions.
  • problem_features.json carries synthetic public problem features with oracle labels hidden.
  • hypothesis_cases.json validates deterministic pre-oracle strategy scoring.
  • source_module_manifest.json binds copied source body files to exact source refs, SHA-256 digests, byte counts, line counts, material classes, and required anchors.
  • Negative cases reject unknown strategy ids, proof bodies, oracle labels, post-oracle strategy selection, and launch/proof/provider overclaims.

Result records

The component emits:

  • mathematical_strategy_atlas_result.json
  • mathematical_strategy_atlas_board.json
  • mathematical_strategy_atlas_validation_receipt.json
  • mathematical_strategy_atlas_hypothesis_scorer_fixture_acceptance.json

Runtime-shell exported bundle validation writes exported_mathematical_strategy_atlas_bundle_validation_result.json.

Prior Art Grounding

The strategy atlas is grounded in the formal-methods practice of separating problem-shape classification from proof execution. Lean's tactic model, as introduced in Theorem Proving in Lean 4, gives the immediate precedent: proof work is often arrange around tactics chosen for a goal shape, while the kernel checks the final proof state. The mathlib overview also motivates explicit retrieval terms and domain tags because a large formal library is navigated by topic, structure, and reusable theorem families.

The atlas is also adjacent to hammer-style premise and method selection, such as Isabelle Sledgehammer, where a front-end tool searches for useful facts or proof methods before replay. This module keeps the pattern pre-oracle and metadata-only: it records why a first strategy hypothesis was selected, not whether the proof can be completed.

Validation Result record Path

Run from microcosm-substrate:

PYTHONPATH=src ../repo-python -m microcosm_core.organs.mathematical_strategy_atlas_hypothesis_scorer run \
  --input fixtures/first_wave/mathematical_strategy_atlas_hypothesis_scorer/input \
  --out /tmp/microcosm-mathematical-strategy-atlas-hypothesis-scorer/fixture \
  --card
PYTHONPATH=src ../repo-python -m microcosm_core.organs.mathematical_strategy_atlas_hypothesis_scorer run-strategy-bundle \
  --input examples/mathematical_strategy_atlas_hypothesis_scorer/exported_mathematical_strategy_atlas_bundle \
  --out /tmp/microcosm-mathematical-strategy-atlas-hypothesis-scorer/bundle \
  --card
PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_mathematical_strategy_atlas_hypothesis_scorer.py -q
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

A green result record proves only pre-oracle strategy-hypothesis metadata, copied public source tool bodies, source artifact digests, and negative-case enforcement; it does not run Lean or Lake, prove formal-result correctness, reveal oracle labels, export proof bodies, use external model services, certify benchmark performance, authorize public sharing, or include launch operations.

Scope boundary

Scope limit

The atlas is metadata and strategy-hypothesis machinery only. It does not run Lean or Lake, claim formal-result correctness, reveal oracle strategy labels, expose proof bodies, use external model services, tune on test answers, include launch operations, or make Mathlib-dependent proof claims. The copied runtime artifacts are public strategy traces, not oracle labels, model-output data, or proof bodies.

Scope limit

This module supports only the reader-verifiable claim that public strategy-hypothesis metadata, copied source tool bodies, source artifact digests, and negative cases can be checked before oracle labels or proof execution. It does not run Lean or Lake, prove formal-result correctness, reveal oracle labels, expose proof bodies, use external model services, certify benchmark performance, authorize public sharing, include launch operations, or make Mathlib-dependent proof claims.

Tactic Portfolio Availability ProbeMaps which Lean proof tactics a recorded run marked usable before any code relies on one.3/5

Does It turns one captured Lean run's results into an inspectable list of which proof shortcuts ("tactics" like `rfl`, `simp`, `omega`, `aesop`) were recorded as compiling, showing at a glance which are usable and which are off before anything treats a tactic as available. In this fixture seven tactics are marked usable and `aesop` is marked failed (its recorded run hit a missing-Mathlib error). The tool reads pre-recorded status rows and checks them for honesty; it does not run Lean itself.

Scope limit It only projects and validates which tactics were recorded as compiling in one captured environment; it does not run Lean/Lake at all, prove any goal, certify domain-level conclusions, use external model services, claim benchmark performance, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.tactic_portfolio_availability_probe run --input fixtures/first_wave/tactic_portfolio_availability_probe/input --out receipts/first_wave/tactic_portfolio_availability_probe --acceptance-out receipts/acceptance/first_wave/tactic_portfolio_availability_probe_fixture_acceptance.json

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Tactic Portfolio Availability

tactic_portfolio_availability_probe is the public component that turns tactic callability into an explicit artifact before routing or proof search treats a tactic as usable.

The fixture is copied from real source system: the 2026-05-11 PROVER_PROOF_STATE_SEARCH_CURRICULUM smoke run's Lean/Std tactic affordance probe. It records compile-status rows for rfl, decide, omega, simp, simp_all, grind, native_decide, and aesop, with source digests for the run-level affordance probe, the portfolio_core_v0 tactic availability artifact, and the paired corpus-readiness boundary. The Mathlib-dependent aesop row is marked environment_fail because the paired environment probe reports mathlib_lake_project_import_available=false.

The component validates:

  • every tactic has an environment-scoped compile_status;
  • Mathlib-dependent tactics are not marked available without a passing Mathlib import probe;
  • downstream consumers reference only tactics present in the probe portfolio;
  • proof bodies, raw model-output data, benchmark claims, launch-scope decision, and non-public paths stay out of the public artifact.

The generated board is a callability map, bounded evidence evidence. It can make target-shape routing cheaper and more honest, but it cannot prove a goal, widen Lean/Lake authority, use external model services, claim benchmark performance, or include launch operations.

The result record contract reports body_material_status=copied_non_secret_macro_body_with_provenance, tactic_availability_status=real_lean_std_tactic_affordance_probe_rows, source digests, target refs, and secret_exclusion_scan. It does not use body-redaction or non-public-state-scan grammar as product evidence.

Purpose

A tactic name is not a usable tactic. aesop is callable only if the surrounding Lean and Std environment actually carries the imports it needs; omega is callable in one project layout and not in another. Routing or proof search that trusts a bare tactic name will reach for tactics that the current environment cannot run, and then misread the resulting failure as a property of the goal rather than a property of the environment. This component answers one question: in the observed Lean/Std environment, which tactics were actually callable, and on what evidence?

The interesting part is how it treats failure. A copied probe row that reports a Lean FAIL is not flattened into a single "unavailable" verdict. When a tactic declares requires_mathlib and the paired environment probe reports that the Mathlib import is absent, the failure is classified as environment_fail with the reason MATHLIB_IMPORT_MISSING. The same Lean FAIL for a tactic that does not depend on Mathlib is classified as compile_fail. The distinction keeps a missing import from masquerading as a broken tactic, and it preserves Mathlib absence as a recorded fact about the environment rather than discarding it. A downstream router can then re-attempt the same tactic in a different environment instead of striking it off permanently.

The second deliberate choice is that none of this is a measurement of quality. The component copies probe durations and bands them as fast, moderate, or slow so a router can prefer a cheaper available tactic, but the latency profile is stamped as environment-scoped, not benchmark authority. Callability and speed in one observed environment are useful for cheaper routing; they are explicitly not evidence that a tactic is correct, that a goal was proved, or that Lean was rerun by this component.

Shape

PASSFAIL + requires_mathlib + Mathlib absentFAIL otherwisenoyesCopied Lean/Std affordanceprobe rows(compile_status,requires_mathlib,duration_ms)Copied Lean/Std affordance probe rows (compile_status, requires_mathlib, duration_ms)Tactic portfolio availabilityprobeTactic portfolio availability probeEnvironment probeEnvironment probeCopied compile_statusCopied compile_statusavailableband duration fast / moderate/ slowavailable band duration fast / moderate / slowenvironment_failreason MATHLIB_IMPORT_MISSINGenvironment_fail reason MATHLIB_IMPORT_MISSINGcompile_failcompile_failAvailability board fortarget-shape routingAvailability board for target-shape routingDownstream tactic referenceDownstream tactic referenceTactic in probed portfolio?Tactic in probed portfolio?Rejected: unprobed tacticreferencedRejected: unprobed tactic referencedmetadata-only fixture andbundle result recordsno proof, Lean, or providerbodiesmetadata-only fixture and bundle result records no proof, Lean, or provider bodiesGenerated paper-module rowand validation result recordsGenerated paper-module row and validation result records

Source refs

Tactic portfolio availability probe
tactic_portfolio_availability_probe
Environment probe
mathlib_lake_project_import_available
Diagram source
flowchart TD A["Copied Lean/Std affordance probe rows (compile_status, requires_mathlib, duration_ms)"] --> B["tactic_portfolio_availability_probe"] C["Environment probe mathlib_lake_project_import_available"] --> B B --> D{"Copied compile_status"} D -->|PASS| E["available band duration fast / moderate / slow"] D -->|FAIL + requires_mathlib + Mathlib absent| F["environment_fail reason MATHLIB_IMPORT_MISSING"] D -->|FAIL otherwise| G["compile_fail"] E --> H["Availability board for target-shape routing"] F --> H G --> H I["Downstream tactic reference"] --> J{"Tactic in probed portfolio?"} J -->|no| K["Rejected: unprobed tactic referenced"] J -->|yes| H B --> L["metadata-only fixture and bundle result records no proof, Lean, or provider bodies"] L --> M["Generated paper-module row and validation result records"]

The flow is deliberately smaller than the generated doctrine-lattice graph.

Reader Evidence Routing

Read this page in four passes:

  1. Start with the bundle source row at core/paper_module_capsules.json::paper_modules[40:paper_module.tactic_portfolio_availability]. It names the public component subject, mechanism subject, resolved code locus, Microcosm concept, governing principles, axioms, and sibling paper-module dependencies that generate the relationship edges.
  2. Inspect the runtime system at src/microcosm_core/organs/tactic_portfolio_availability_probe.py. The load-bearing symbols are run, run_availability_bundle, _build_result, _write_receipts, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING; those are the code-loci symbols that make the paper module about an executable component instead of a prose topic.
  3. Reproduce the evidence floor with the fixture input fixtures/first_wave/tactic_portfolio_availability_probe/input, the exported bundle examples/tactic_portfolio_availability_probe/exported_tactic_portfolio_availability_bundle, the focused test tests/test_tactic_portfolio_availability_probe.py, and the paper-module corpus check. Treat the result records as environment-scoped tactic-callability evidence only; validation result records do not widen the proof boundary, scope limit, launch posture, provider posture, or benchmark posture.

Prior Art Grounding

The module is patterned after feature-detection probes and proof-assistant tactic inventories. GNU Autoconf's configure workflow established the habit of testing local capability before relying on it; Lean's tactic documentation shows that tactic use is environment- and goal-sensitive, so a tactic name is not enough to justify downstream routing. This component applies that older probe discipline to Microcosm: it records which tactics were callable in the observed Lean/Std environment and preserves Mathlib-dependent absence as evidence, without treating callability as proof quality.

Prior-art anchors:

  • GNU Autoconf feature/configuration probing: https://ftp.gnu.org/old-gnu/Manuals/autoconf-2.57/html_chapter/autoconf.html
  • Lean 4 tactic documentation: https://lean-lang.org/theorem_proving_in_lean4/Tactics/

Primary commands:

Validation Result record Path

From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:

The expected projection row is paper_module.tactic_portfolio_availability with 18 generated relationship edges, no unpopulated selective relations, Mermaid status available_from_capsule_edges, and Atlas status linked_from_capsule_edges. These checks validate environment-scoped tactic availability rows and bundle result records only; they do not turn callability into proof quality, benchmark performance, Mathlib proof authority, or launch-scope decision.

Scope boundary

Scope limit

The JSON bundle and generated row prove only environment-scoped tactic callability evidence: copied Lean/Std tactic affordance rows, compile-status rows, Mathlib absence evidence, downstream tactic-reference checks, source digests, secret-exclusion checks, negative cases, and validation result records. They do not prove formal-result correctness, expand Lean or Lake authority, use external model services, claim benchmark performance, export non-public paths, include launch operations or public sharing, or treat tactic callability as proof quality.

Scope limit

This component is environment-scoped tactic callability evidence only. It does not establish formal-result correctness, expand Lean/Lake authority, use external model services, claim benchmark performance, export non-public paths, include launch operations, or treat tactic callability as proof quality.

Target Shape Tactic Routing GateRecords an allow-or-reject decision and reason for each proof tactic before any proof runs.3/5

Does Before a proof is attempted, this component checks a list of candidate proof tactics for a given goal and writes down an allow-or-reject decision, with a plain reason, for each one. Rejections fall into three kinds: the tactic isn't actually available in the declared environment, it was never listed in the environment's tested set of tactics, or it simply doesn't fit the kind of goal being proved. The resulting record shows, tactic by tactic, exactly what was admitted or blocked and why, instead of an opaque "we tried these" claim. It only inspects and records the routing decision over references that already exist; it never runs a prover or proves anything itself.

Scope limit It only inspects and records the projection mechanics of pre-execution tactic-routing references — emitting per-tactic allow/reject decisions with reasons. It does not run Lean/Lake, does not establish or judge the correctness of any goal, emits no proof bodies, makes no external model access, performs no post-execution route selection, reports no benchmark claims or maturity, and excludes launch.

Run
PYTHONPATH=src python3 -m microcosm_core.cli target-shape-tactic-routing-gate run-routing-bundle --input examples/target_shape_tactic_routing_gate/exported_target_shape_tactic_routing_bundle --out receipts/runtime_shell/demo_project/organs/target_shape_tactic_routing_gate

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Target Shape Tactic Routing

target_shape_tactic_routing_gate is the public Microcosm component for the pre-execution tactic admissibility layer.

It turns real problem-domain, failure-class, and graph-update candidate refs from the formal-math evaluation pipeline into route decisions: which tactics are admitted, which are rejected as unavailable, which are rejected as unprobed, and which are rejected because they do not match the declared goal shape.

Purpose

A proof attempt is expensive, and most of that cost is spent on tactics that were never going to work: tactics the environment cannot run, tactics absent from the probe portfolio, or tactics that do not match the shape of the goal. This component answers one question before any Lean call is made: given the target shape and the current availability probe, which tactics may a route even attempt?

The decision is deliberately made early. Routing happens before execution, so a case that carries Lean result records, execution results, or a post-execution stage is rejected outright rather than trusted. The point is to decide admissibility from evidence that already exists, not from the outcome of the attempt the gate is meant to filter.

What is unusual is that the gate recomputes the choice rather than accepting the declared one. Each target shape carries a small preferred-tactic order (for example omega for integer linear arithmetic, decide for closed natural-number decisions). The gate walks that order, skips any preferred tactic that is unprobed or unavailable, records why it skipped, and falls back to the next allowed candidate or to a default safe order for shapes it does not recognise. A route whose declared selection disagrees with this computed preference is flagged rather than honoured. The route is a claim about what should run; the gate treats it as something to check, not something to believe.

Shape

JSON bundleJSON bundleGenerated structured sourcerecordGenerated structured source recordRuntime componentRuntime componentTactic probe portfolioavailable/unavailable tacticidsTactic probe portfolio available/unavailable tactic idsTarget-shape route casespre_execution selectedtacticsTarget-shape route cases pre_execution selected tacticsCopied Ring2 source artifacts4 body imports,body_in_receipt=falseCopied Ring2 source artifacts 4 body imports, body_in_receipt=falseRoute admissiblebefore proof execution?Route admissible before proof execution?Result recordsresult, board, validation,sign-offResult records result, board, validation, sign-offFocused testsnegative cases and digestchecksFocused tests negative cases and digest checksScope limitno Lean/Lake, proof,provider, post-execution,launchScope limit no Lean/Lake, proof, provider, post-execution, launch

Source refs

JSON bundle
paper_module.target_shape_tactic_routing
Generated structured source record
paper_modules/target_shape_tactic_routing.json
Runtime component
target_shape_tactic_routing_gate.py
Diagram source
flowchart TD Bundle["JSON bundle paper_module.target_shape_tactic_routing"] structured source record["Generated structured source record paper_modules/target_shape_tactic_routing.json"] Component["Runtime component target_shape_tactic_routing_gate.py"] Portfolio["Tactic probe portfolio available/unavailable tactic ids"] Routes["Target-shape route cases pre_execution selected tactics"] SourceFloor["Copied Ring2 source artifacts 4 body imports, body_in_receipt=false"] Decisions{"Route admissible before proof execution?"} Result records["Result records result, board, validation, sign-off"] Tests["Focused tests negative cases and digest checks"] Ceiling["Scope limit no Lean/Lake, proof, provider, post-execution, launch"] Bundle --> structured source record Bundle --> Component Component --> Portfolio Component --> Routes Portfolio --> Decisions Routes --> Decisions SourceFloor --> Decisions Decisions --> Result records Tests --> Result records Result records --> Ceiling

Technical Mechanism

The named mechanism mechanism.target_shape_tactic_routing_gate.validates_public_tactic_routing_boundary is a fail-closed scorer over two public input planes: the tactic probe portfolio and the target-shape route cases. _build_result loads the fixture or exported bundle payloads, scans the inputs and copied source artifacts for forbidden body material, derives known/available/unavailable tactic sets, scores every route case, checks copied Ring2 source-artifact digests, and emits metadata-only result, board, validation, and sign-off result records.

For each route case, _decision_for_tactic rejects a candidate before selection if the tactic id is absent from the public probe portfolio, marked unavailable, or outside the case's declared allowed_tactic_ids. Only a tactic that is probed, available, and target-shape-admissible can receive TARGET_SHAPE_ADMISSIBLE. _shape_preferred_selection then applies the local target-shape preference map, records the unknown-shape default fallback when no specific map exists, and records the preferred-unavailable fallback when the first preferred tactic is known but not usable. _route_integrity_findings turns any unavailable admission, unprobed admission, post-execution route, or declared-selection mismatch into typed findings.

The proof consumer is tests/test_target_shape_tactic_routing_gate.py: it asserts seven pre_execution route cases, shape-preferred selection for the real Ring2 cases, unknown-shape and unavailable-Mathlib fallback behavior, rejection of mutated shape and availability inputs, exported-bundle sign-off, four copied source artifacts with digest verification, compact card omission of the full routing board, and result record text without non-public paths or body fields. Those tests consume the same fixture and exported-bundle surfaces named by the mechanism row, so this page's evidence is the runnable route-reference and result record contract rather than a prose-only claim.

The governing lattice stays explicit: the bundle binds the module to concept.formal_math_and_proof_witness_bundle, principles P-1, P-2, P-3, P-6, P-8, and P-9, axioms AX-1, AX-2, AX-5, AX-7, and AX-8, and dependency modules for tactic portfolio availability, formal-math readiness, proof-diagnostic evidence, verifier-trace repair, and formal evidence-cell anchor resolution. The standard narrows that lattice to one allowed claim: public pre-execution route cases may admit only tactics that were both probed and available before proof execution. The same standard forbids widening this mechanism into formal-result correctness, Lean/Lake execution, external model access, proof or provider body export, post-execution route authority, publishing-scope decision, or launch-scope decision.

Evidence/accounting:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[41:paper_module.target_shape_tactic_routing] names source_authority: json_capsule, subjects component:target_shape_tactic_routing_gate and mechanism.target_shape_tactic_routing_gate.validates_public_tactic_routing_boundary, the resolved code locus src/microcosm_core/organs/target_shape_tactic_routing_gate.py, and generated projection statuses mermaid.status: available_from_capsule_edges plus atlas_card.status: linked_from_capsule_edges.
  • Generated structured source record: paper_modules/target_shape_tactic_routing.json carries relationships.edges for the bundle subjects, concept/principle/axiom refs, dependency paper modules, and code locus; relationships.unpopulated_selective_relations: []; and scope boundaries that the JSON row does not establish runtime correctness, launch-scope decision, or whole-system completeness.
  • Runtime contract: standards/std_microcosm_target_shape_tactic_routing_gate.json limits the allowed claim to pre-execution tactic admission from probed, available tactics; its required_fields bind tactic_portfolio_availability.tactics[].tactic_id, availability_status, target_shape_routes.route_cases[].target_shape, allowed_tactic_ids, selected_tactic_id, and route_stage.
  • Source-body accounting: examples/target_shape_tactic_routing_gate/exported_target_shape_tactic_routing_bundle/source_module_manifest.json records source_import_class: copied_non_secret_macro_body, module_count: 4, body_in_receipt: false, three verified_public_safe_private_path_rewrite rows, and one exact_copy row.
  • Fixture/bundle behavior: examples/target_shape_tactic_routing_gate/exported_target_shape_tactic_routing_bundle/target_shape_routes.json has seven pre_execution route cases, while tactic_portfolio_availability.json marks decide, omega, simp_all, and rfl available and aesop unavailable.
  • Result record floor: receipts/first_wave/target_shape_tactic_routing_gate/target_shape_tactic_routing_result.json, target_shape_tactic_routing_board.json, target_shape_tactic_routing_validation_receipt.json, and result records/sign-off/first_wave/target_shape_tactic_routing_gate_fixture_acceptance.json report status: pass, route_case_count: 7, copied_source_artifact_count: 4, source_artifacts_pass: true, missing_negative_cases: [], secret_exclusion_scan.blocking_hit_count: 0, and authority flags with Lean/Lake, proof, provider, post-execution routing, and launch-scope decision set false.
  • Test boundary: tests/test_target_shape_tactic_routing_gate.py checks observed negative cases, shape-preferred selection, unknown-shape and Mathlib-unavailable fallback, exported-bundle sign-off, source-module digest verification, compact card omission of full boards, and result record output without non-public paths or body fields.

Reader Evidence Routing

Read this module as a pre-execution admissibility gate, not as a proof attempt. The primary reader path is:

  • Start with the problem-domain, failure-class, graph-update candidate, and tactic-probe refs in the fixture input. They are the public route evidence the gate is allowed to inspect before any Lean/Lake work in the formal-math evaluation and premise-retrieval pipeline.
  • Compare each target-shape route case against the selected tactic ids and rejection reasons: admitted tactics must match both the declared goal shape and the public availability probe.
  • Inspect negative cases before the happy path. The important behavior is that unavailable tactics, unprobed tactics, proof/provider body leakage, post-execution routing, and launch overclaims all fail closed.
  • Use the structured source record only for structural lattice proof: it confirms subjects, code loci, doctrine refs, and dependency edges; it does not establish the tactic route can solve the target.

Runtime Surfaces

PYTHONPATH=src python3 -m microcosm_core.organs.target_shape_tactic_routing_gate run --input fixtures/first_wave/target_shape_tactic_routing_gate/input --out receipts/first_wave/target_shape_tactic_routing_gate
PYTHONPATH=src python3 -m microcosm_core.cli target-shape-tactic-routing-gate run-routing-bundle --input examples/target_shape_tactic_routing_gate/exported_target_shape_tactic_routing_bundle --out receipts/runtime_shell/demo_project/organs/target_shape_tactic_routing_gate

Negative Cases

  • unavailable_tactic_admitted rejects an aesop route while Mathlib is absent.
  • unprobed_tactic_allowed rejects a tactic absent from the public probe portfolio.
  • proof_body_leakage rejects proof/provider/Lean body fields.
  • post_execution_route rejects route selection after execution evidence.
  • release_overclaim rejects proof, provider, Lean/Lake, public sharing, and launch-scope decision overclaims.

Prior Art Grounding

The routing layer follows established proof-search and policy-gating patterns: match a goal shape to methods that are known to be available before spending runtime on them. Lean's tactic documentation supplies the local proof-assistant context for goal-directed tactic choice, while Isabelle/Sledgehammer represents a mature prior-art pattern for selecting external provers and relevant facts from a goal. Microcosm narrows that idea to a pre-execution admissibility filter: target shape, allowed references, and current tactic availability must line up before a tactic route can be exported.

Prior-art anchors:

  • Lean 4 tactic documentation: https://lean-lang.org/theorem_proving_in_lean4/Tactics/
  • Isabelle Sledgehammer user guide: https://isabelle.in.tum.de/doc/sledgehammer.pdf

Why It Matters

After corpus readiness and strategy scoring, Microcosm needs a visible gate that prevents wasted or misleading proof attempts. This component shows that gate over the formal-math evaluation and premise-retrieval pipeline already feeding verifier repair, evidence anchoring, and proof diagnostics: a tactic is not tried just because it exists; it is admitted only when the target shape and the public availability probe both allow it.

Validation Result record Path

From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:

These checks validate route-reference fixture and bundle result records only; they do not widen the no-Lean/no-proof scope limit.

Scope boundary

Scope limit

This component does not run Lean or Lake and does not establish a target. It validates only the route references that must exist before a proof attempt in the formal-math evaluation and premise-retrieval pipeline: tactic probe availability, target-shape route cases, selected tactic ids, failure-class refs, graph-update candidate refs, and negative-case result records.

Forbidden outputs include proof bodies, provider bodies, post-execution route selection, Lean result record claims, external model access, launch claims, and Mathlib-dependent proof authority.

Scope limit

This module covers only public pre-execution tactic routing evidence: the route references used before a formal proof attempt, tactic probe availability, target-shape cases, selected tactic ids, failure-class refs, graph-update candidate refs, negative-case result records, source-module digest evidence, and validation result records. It does not run Lean or Lake, prove formal-result correctness, export proof bodies or provider bodies, authorize post-execution route selection, use external model services, claim Mathlib-dependent proof authority, authorize public sharing, include launch operations, or prove whole-system correctness.

Lean Std Premise IndexLists a fixed catalog of public Lean building blocks and confirms none hides proof text or test answers.3/5

Does Presents a small, fixed catalog of Lean standard-library "premises" (named building blocks like facts about numbers, booleans, lists, and basic logic) along with the labels and source references that say where each one comes from. It shows what proof ingredients are on the table and that they were copied from public Lean sources, with no hidden proof text, no Mathlib, and nothing that secretly gives away test answers. It only checks and displays this catalog; it does not run Lean or prove anything.

Scope limit It only validates the projection of premise metadata and copied source bodies; it does not run Lean or Lake, prove any theorem correct, expose proof bodies or oracle-needed ids, use external model services, produce benchmark claims, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.lean_std_premise_index run --input fixtures/first_wave/lean_std_premise_index/input --out receipts/first_wave/lean_std_premise_index --acceptance-out receipts/acceptance/first_wave/lean_std_premise_index_fixture_acceptance.json

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Lean/Std Premise Index

lean_std_premise_index is the closed public premise-index lane for the formal-math slice. It validates premise metadata and selected Ring2 premise-retrieval source result record bodies that a cold reader can inspect without importing Mathlib, exposing proof bodies, or relying on private source run state.

Purpose

A premise index is the catalogue a theorem-proving system reads before it tries to prove anything: a list of the named lemmas and definitions it is allowed to cite, with enough metadata to retrieve the relevant ones. This component answers a narrower question. Given that such an index already exists inside a private Ring2 benchmark run, can a cold reader inspect its public shape and be sure that what they are reading is a faithful copy of the real thing, and not a separate hand-written stand-in?

The answer rests on one design choice that is worth noticing. The validator does not just describe eleven premise rows; it opens the declared source artifact from the Ring2 premise-retrieval run, recomputes its SHA-256, and checks every public row against the matching source row by premise_id. The only permitted difference is a path rewrite: a raw Lean toolchain path becomes a public lean-toolchain://.../Init/... reference, so the reader sees where a lemma lives in the standard library without seeing a private filesystem. If the public catalogue ever drifts from the source it claims to copy, the digest or the row-signature comparison fails and the result record is blocked.

The interesting tension is the line between a useful index and a leaked answer key. A premise index for a benchmark is one edit away from telling a solver exactly which lemmas it needs. So the same pass that admits names, namespaces, retrieval terms, and train/dev/test eligibility rejects the things that would turn the catalogue into proof authority: Mathlib references, proof bodies, the oracle-needed premise ids that name the answer, and any flag that authorises tuning on the test split. The catalogue stays inspectable precisely because those are kept out.

Shape

This module is a cold-reader map from a JSON bundle and copied public Lean/Std premise artifacts into metadata-only validation result records. The readable path is bundle -> generated instance/status -> runtime validator -> fixtures and exported source bundle -> tests and result records -> scope limit; none of those projections expands the closed-index boundary.

source basis: source recordsource basis: source recordgenerated instance fromsource recordgenerated instance from source recordGenerated statusGenerated statusrun / run_index_bundle /scope_limitrun / run_index_bundle / scope_limitclosed Lean/Std premise-indexcontractclosed Lean/Std premise-index contractprojection_protocol,premise_index, index_policy,negative casesprojection_protocol, premise_index, index_policy, negative casessource_module_manifest: 6copied body modulessource_module_manifest: 6 copied body modulesfixture, manifest, bundle,and runtime-shape checksfixture, manifest, bundle, and runtime-shape checksResult recordsResult recordsScope limitno Lean/Lake, Mathlib, proofbodies, providers, benchmarkauthority, source-filechanges, public sharing, orlaunch-scope decisionScope limit no Lean/Lake, Mathlib, proof bodies, providers, benchmark authority, source-file changes, public sharing, or launch-scope decision

Source refs

source basis: source record
core/paper_module_capsules.jsonpaper_module.lean_std_premise_index
generated instance from source record
paper_modules/lean_std_premise_index.json
run / run_index_bundle / scope_limit
src/microcosm_core/organs/lean_std_premise_index.py
closed Lean/Std premise-index contract
standards/std_microcosm_lean_std_premise_index.json
projection_protocol, premise_index, index_policy, negative cases
fixtures/first_wave/lean_std_premise_index/input
source_module_manifest: 6 copied body modules
examples/lean_std_premise_index/exported_lean_std_premise_index_bundle
fixture, manifest, bundle, and runtime-shape checks
tests/test_lean_std_premise_index.py
Result records
receipts/first_wave/lean_std_premise_indexreceipts/runtime_shell/demo_project/organs/lean_std_premise_index
Diagram source
flowchart TD bundle["core/paper_module_capsules.json paper_module.lean_std_premise_index source basis: source record"] instance["paper_modules/lean_std_premise_index.json generated instance from source record Markdown stays reader projection"] generated["Generated status Mermaid: available_from_capsule_edges Atlas: blocked_until_organ_atlas_owner_lane_binds_edges"] runtime["src/microcosm_core/components/lean_std_premise_index.py run / run_index_bundle / scope_limit"] standard["standards/std_microcosm_lean_std_premise_index.json closed Lean/Std premise-index contract"] fixtures["fixtures/first_wave/lean_std_premise_index/input projection_protocol, premise_index, index_policy, negative cases"] bundle["examples/lean_std_premise_index/exported_lean_std_premise_index_bundle source_module_manifest: 6 copied body modules"] tests["tests/test_lean_std_premise_index.py fixture, manifest, bundle, and runtime-shape checks"] result records["result records/first_wave/lean_std_premise_index result records/runtime_shell/demo_project/components/lean_std_premise_index"] ceiling["Scope limit no Lean/Lake, Mathlib, proof bodies, providers, benchmark authority, source-file changes, public sharing, or launch-scope decision"] bundle --> instance instance --> generated standard --> runtime fixtures --> runtime bundle --> runtime runtime --> tests tests --> result records generated --> ceiling result records --> ceiling

Technical Mechanism

The mechanism is a two-entry validator over copied public artifacts, not a proof engine. run reads the first-wave fixture inputs, opens the declared source premise-index source artifact, verifies the declared source_sha256, normalizes Lean toolchain paths into lean-toolchain://.../Init/... public refs, compares every public row against the source row signature, and then checks the protocol, policy, copied-material contract, namespace coverage, split coverage, negative cases, secret exclusion scan, and scope limit before writing metadata-only result, board, validation, and sign-off result records. run_index_bundle applies the same public boundary to the exported bundle and requires the source-module manifest to verify six copied body-material files by source ref, target ref, digest, line count, byte count, and source-to-target equivalence while keeping body text out of result records.

The proof consumer is therefore concrete and local: tests/test_lean_std_premise_index.py asserts that the validator observes all five negative cases, imports the real Ring2 premise-index source artifact, rejects digest, row-count, row-signature, source-ref, source-module digest, and rehash-body-swap mutations, and validates the runtime-shell bundle shape. The positive fixture carries 11 premise rows across Nat, Bool, List, and Iff; the source-open body floor carries one normalized Lean/Std premise index plus five Ring2 source result record or pattern bodies. This is evidence of a bounded public premise catalog and copied-source manifest, not evidence of Lean formal-result correctness.

The governing lattice is source-backed through the bundle-generated instance: paper_module.lean_std_premise_index explains the lean_std_premise_index component and the two mechanism.lean_std_premise_index.* mechanisms, is governed by concept.formal_math_and_proof_witness_bundle, cites P-1, P-2, P-3, P-6, and P-8, abides by AX-1, AX-2, AX-5, and AX-7, and depends only on paper_module.formal_math_premise_retrieval.

Inputs

  • projection_protocol.json records source pattern ids, source source refs, public replacement refs, projection result records, omitted material, and copy policy.
  • premise_index.json carries public metadata rows: premise id, declaration name, namespace, Init/ source ref, retrieval terms, and split eligibility.
  • index_policy.json keeps the closed-index scope limit explicit.
  • source_module_manifest.json records six source-open body imports: the normalized Lean/Std premise index plus five exact bodies from the formal-math premise-retrieval pipeline (source result records and graph-pattern bodies) under source_modules/.

Prior Art Grounding

This component is grounded in formal-library indexing and premise-selection work. The Lean mathematical library anchors the library-as-corpus side, while LeanDojo and HOList anchor the need for premise metadata, retrieval splits, and theorem-proving environments that can be inspected by learning systems.

Microcosm borrows the closed-index discipline: premise ids, declaration names, namespaces, source refs, retrieval terms, split eligibility, and source-module digests are public metadata, while proof bodies and oracle-needed ids remain outside the public boundary. It does not import Mathlib or prove theorems.

Negative Cases

The fixture rejects:

  • Mathlib premise refs;
  • proof-body leakage;
  • oracle-needed premise ids;
  • test-split tuning authority;
  • namespace rows without Init/ source refs.

These are stable negative cases because the index is intended to be useful without becoming proof authority.

Result records

The validator emits:

  • lean_std_premise_index_result.json;
  • lean_std_premise_index_board.json;
  • lean_std_premise_index_validation_receipt.json;
  • an sign-off result record under result records/sign-off/first_wave/.

Runtime-shell execution emits exported_lean_std_premise_index_bundle_validation_result.json after checking the source-module manifest, target file digests, line counts, byte counts, and secret-exclusion boundary.

Reader Evidence Routing

  • Start with the JSON Bundle Binding to identify the source row, generated instance, and scope limit.
  • Use Structured Lattice Bindings only as navigation evidence; the resolved dependency edge points to the premise-retrieval module and does not expand the closed-index proof boundary.
  • Use Inputs and Result records when checking whether public metadata, copied body manifests, and runtime-shell validation stayed body-safe.
  • Use Negative Cases and Scope limit together when deciding whether a proposed public claim exceeds the closed-index boundary.

Validation Result record Path

./repo-pytest tests/test_lean_std_premise_index.py -q --basetemp=/tmp/microcosm_lean_std_premise_index_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This lane is body only. It does not:

  • run Lean or Lake;
  • import Mathlib;
  • expose proof bodies;
  • expose oracle-needed premise ids;
  • tune on test split truth;
  • use external model services;
  • certify theorem validity;
  • authorize public launch;
  • claim secret export.
Scope limit

This module supports only the reader-verifiable claim that public Lean/Std premise metadata, source refs, retrieval terms, split eligibility, and copied source-module digests can be indexed without exposing proof bodies or oracle-needed ids. It does not run Lean or Lake, import Mathlib, prove formal-result correctness, tune on test split truth, use external model services, include launch operations, or certify secret-export safety.

Formal Math Premise RetrievalShows which lemmas a plain search surfaces per query, and never leaks proof text or answer keys.3/5

Does Given a small copied set of Lean/Std math-lemma descriptions plus some search queries, this component shows which lemmas a plain term-matching search would surface for each query, how it keeps each assembled context within a fixed size budget, and that it never exposes proof text or "answer-key" hints (the premise ids a solver would only get to see after the fact). On the bundled first-wave fixture, the result record shows the retrieval mechanism working in miniature alongside deliberate bad inputs (a leaked proof body, leaked answer-key ids, a budget overflow, an attempt to peek at test answers, and an unknown strategy) that the component catches; the leak and budget guards actually fire.

Scope limit It only checks that public retrieval metadata is internally coherent, term-scored over a copied index, budget-bounded, and leakage-clean; it does not run Lean/Lake, use external model services, prove any theorem or its own correctness, claim benchmark performance, or include launch operations.

Run
PYTHONPATH=src python -m microcosm_core.organs.formal_math_premise_retrieval run --input fixtures/first_wave/formal_math_premise_retrieval/input --out receipts/first_wave/formal_math_premise_retrieval

Paper module Formal Math Premise Retrieval

formal_math_premise_retrieval is the source-available first real formal-math import slice after the source projection protocol. It turns the source prover lab's premise-index, term-scoring, context-budget, and strategy-selection patterns into a runnable Microcosm component.

It is still deliberately below proof authority. It validates:

  • Lean/Std premise metadata;
  • query term scoring across public premise ids, namespaces, declaration names, statement excerpts, and retrieval terms;
  • split eligibility;
  • context recipe budgets;
  • public strategy ids;
  • redacted result records;
  • negative cases.

It does not run Lean or Lake, use external model services, expose proof bodies, expose oracle-needed premise ids, tune on test split truth, claim formal-result correctness, or include launch operations.

Purpose

Before a model can attempt a formal proof, it has to find the right lemmas. A Lean library holds thousands of theorems and definitions, and the useful ones for a given goal are a handful. Premise selection is the step that narrows that library down to candidates worth putting in front of a prover. This component is the smallest honest version of that step: it takes a query, scores every public premise against it, and returns a ranked shortlist.

The single question it answers is narrow and checkable: given a copied catalogue of public Lean/Std premise metadata, does a transparent term-scoring retrieval return the premises a query should find, without ever touching a proof? Both halves matter. The retrieval has to actually work, so each fixture query carries the premise ids it is expected to surface and the run fails if the shortlist misses them. And the boundary has to hold, so the same run refuses any input that smuggles in a proof body, an oracle answer, or test-split truth.

What is unusual is the restraint. The retrieval index is not a learned embedding model and the scoring is not a benchmark claims. It is plain term overlap over fields that a reader can inspect: premise ids, namespaces, declaration names, statement excerpts, and retrieval terms. The interesting claim is therefore not "this retrieves well" but "this retrieves over real, copied Lean metadata and can be audited end to end, and the design forbids the shortcuts that would make a premise-selection result look better than it is".

Shape

JSON source recordJSON source recordGenerated paper-moduleinstance15 relationship edgesGenerated paper-module instance 15 relationship edgesRuntime componentRuntime componentPremise indexcopied Lean/Std metadataPremise index copied Lean/Std metadataRetrieval queriesterms, split, strategy, top_kRetrieval queries terms, split, strategy, top_kContext recipesbyte budgetsContext recipes byte budgetsNegative-case inputsproof body, oracle ids,test-split tuning, budget,strategyNegative-case inputs proof body, oracle ids, test-split tuning, budget, strategySplit gateskip premises not inallowed_for_splitSplit gate skip premises not in allowed_for_splitTerm-overlap scoringshared tokens + strategybonusTerm-overlap scoring shared tokens + strategy bonusRanked top_k shortlistRanked top_k shortlistRecall checkvs expected premise idsRecall check vs expected premise idsRequired rejectionsfive leakage/overclaim guardsRequired rejections five leakage/overclaim guardsmetadata-only result recordsboard, validation, sign-offmetadata-only result records board, validation, sign-offScope limitmetadata coherence, noLean/Lake, no proofScope limit metadata coherence, no Lean/Lake, no proof

Source refs

JSON source record
paper_module.formal_math_premise_retrieval
Runtime component
formal_math_premise_retrieval.py
Diagram source
flowchart TD bundle["JSON source record paper_module.formal_math_premise_retrieval"] --> instance["Generated paper-module instance 15 relationship edges"] instance --> component["Runtime component formal_math_premise_retrieval.py"] subgraph Inputs["Public inputs"] index["Premise index copied Lean/Std metadata"] queries["Retrieval queries terms, split, strategy, top_k"] recipes["Context recipes byte budgets"] negatives["Negative-case inputs proof body, oracle ids, test-split tuning, budget, strategy"] end component --> index component --> queries component --> recipes component --> negatives index --> split["Split gate skip premises not in allowed_for_split"] queries --> split split --> score["Term-overlap scoring shared tokens + strategy bonus"] score --> shortlist["Ranked top_k shortlist"] shortlist --> recall["Recall check vs expected premise ids"] negatives --> reject["Required rejections five leakage/overclaim guards"] recipes --> reject recall --> result records["metadata-only result records board, validation, sign-off"] reject --> result records result records --> ceiling["Scope limit metadata coherence, no Lean/Lake, no proof"]

Evidence/accounting:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[25:paper_module.formal_math_premise_retrieval] has source_authority: json_capsule, three subjects, one resolved code_loci[0].path, depends_on naming paper_module.formal_math_lean_proof_witness, and generated projection statuses for Markdown, Mermaid, and Atlas.
  • Generated instance: paper_modules/formal_math_premise_retrieval.json::paper_module_payload repeats the bundle authority_ceiling, reports Mermaid status available_from_capsule_edges, and derives 15 relationships.edges with relationships.unpopulated_selective_relations: [].
  • Component atlas: core/organ_atlas.json::organs[9:formal_math_premise_retrieval] classifies the component in family: formal_math_and_proof, cites the runtime locus, and restates that retrieval metadata coherence is not Lean/Lake, provider, theorem-correctness, benchmark, or launch-scope decision.
  • Mechanism rows: core/mechanism_sources.json::mechanisms[27:mechanism.formal_math_premise_retrieval.validates_public_premise_retrieval_slice] and core/mechanism_sources.json::mechanisms[37:mechanism.formal_math_premise_retrieval.validates_public_premise_retrieval_projection] point at src/microcosm_core/organs/formal_math_premise_retrieval.py and name first-wave, sign-off, and runtime-shell result record refs.
  • Runtime and tests: src/microcosm_core/organs/formal_math_premise_retrieval.py exposes run, run_retrieval_bundle, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING; tests/test_formal_math_premise_retrieval.py checks 11 premises, 4 queries, 44 considered candidates, five negative cases, metadata-only result records, and compact runtime-shell cards.
  • Result records: receipts/first_wave/formal_math_premise_retrieval/formal_math_premise_retrieval_result.json records status: pass, 11 premises, 4 queries, 44 considered candidates, five observed negative cases, missing_negative_cases: [], and a secret-exclusion scan with blocking_hit_count: 0; the exported runtime result record at receipts/runtime_shell/demo_project/organs/formal_math_premise_retrieval/exported_premise_retrieval_bundle_validation_result.json records status: pass, the same premise/query/candidate counts, no negative cases, and secret_exclusion_scan.scanned_path_count: 11.
  • Standard ceiling: standards/std_microcosm_formal_math_premise_retrieval.json::authority_ceiling has status: pass while keeping formal_proof_authority, lean_lake_authority, provider_authority, and release_authority false.

Runtime Surfaces

  • Component runner: python -m microcosm_core.organs.formal_math_premise_retrieval run --input fixtures/first_wave/formal_math_premise_retrieval/input --out receipts/first_wave/formal_math_premise_retrieval
  • Exported bundle runner: python -m microcosm_core.organs.formal_math_premise_retrieval run-retrieval-bundle --input examples/formal_math_premise_retrieval/exported_premise_retrieval_bundle --out receipts/runtime_shell/demo_project/organs/formal_math_premise_retrieval
  • CLI route: microcosm formal-math-premise-retrieval run-retrieval-bundle
  • Standard: standards/std_microcosm_formal_math_premise_retrieval.json
  • Fixture manifest: core/fixture_manifests/formal_math_premise_retrieval.fixture_manifest.json

Public Claim

Microcosm can show a real formal-math retrieval mechanism in miniature:

  • a source-available Lean/Std premise index;
  • public field-haystack term-scored queries;
  • split-aware eligibility;
  • context recipe ceilings;
  • strategy gates;
  • redacted validation result records.

How retrieval scoring works

Each premise row contributes five inspectable fields to the haystack: its premise id, namespace, declaration name, statement excerpt, and a list of retrieval terms. A query carries its own terms, a data split, an optional strategy id, a context recipe, and the public premise ids it is expected to return.

Scoring is term overlap, computed per query. Both the query and each premise are tokenised into lowercase word counts. A premise is only considered if the query's split appears in that premise's allowed_for_split list, which is how test-split leakage is kept out at the structural level rather than by trust. For each eligible premise the score is the summed minimum count of every shared token across the five fields, so a term that appears in both the query and the premise contributes as many points as the smaller of the two counts. A premise that also carries the query's strategy id as a tag gets a single extra point. The ranked list is sorted by score descending, ties broken by premise id, and the top of that list up to the query's top_k is taken as the retrieval.

The retrieval is then graded against itself. Each query declares the public premise ids it should surface, and the component computes recall as the fraction of those expected ids that actually landed in the shortlist. A query that declares expectations but misses any of them blocks the run. In the first-wave fixture this is eleven premises and four queries, scoring forty-four considered candidates in total, and every query is expected to reach full recall.

The failure mode this guards against is a premise-selection result that looks good because it cheated. The five negative-case inputs each encode one such shortcut: a premise index that ships a proof body, a query that lists the oracle premise ids it is "meant" to find, a query that tunes on test-split truth, a context recipe that blows past the byte budget, and a query naming a strategy id outside the allowed set. The run is required to observe all five rejections; if any expected rejection is missing, the whole fixture is blocked rather than passed. Recall over copied real metadata is the positive signal; the refusals are what keep that signal honest.

Prior Art Grounding

This component is grounded in premise-selection and retrieval-augmented theorem proving work. LeanDojo is the closest modern anchor because it couples Lean interaction with retrieval-augmented premise selection. Earlier theorem-proving environments such as HOList and GamePad also motivate extracting proof-state or premise metadata for learning-assisted theorem proving.

Microcosm borrows the retrieval accounting pattern: premise ids, namespaces, statement excerpts, retrieval terms, split eligibility, context budgets, and strategy gates must be inspectable before premise-retrieval claims are admitted. It does not run Lean/Lake or expose proof bodies.

Negative Cases

  • premise_index_proof_body_forbidden
  • query_oracle_ids_forbidden
  • test_split_tuning_attempt
  • context_recipe_budget_overflow
  • unknown_strategy_id

Reader Evidence Routing

  • Start with the JSON Bundle Binding to identify the source record, generated instance, proof boundary, and scope limit.
  • Use Structured Lattice Bindings for navigation; the generated JSON row is the authority for relationship counts and dependency state.
  • Use Runtime Surfaces and Result record Expectations when checking metadata coherence, redaction, leakage checks, and source-available bundle behavior.
  • Use Negative Cases, Scope limit, and Scope limit together before admitting any formal-math public claim.

Validation Result record Path

./repo-pytest tests/test_formal_math_premise_retrieval.py -q --basetemp=/tmp/microcosm_formal_math_premise_retrieval_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

The component proves only that public retrieval metadata is internally coherent and leakage-checked. The deferred formal_math_lean_proof_witness boundary remains unchanged.

Scope limit

This module supports only the reader-verifiable claim that public premise metadata, retrieval terms, split eligibility, strategy gates, and redacted result records are coherent and leakage-checked. It does not run Lean or Lake, prove formal-result correctness, expose proof bodies, authorize oracle-needed premise ids, tune on test split truth, use external model services, approve public sharing, or expand the deferred Lean proof-witness boundary.

Formal Math Verifier Trace Repair LoopReplays how a proof lab turns verifier failures into fixes, with no promotion without a fresh re-run.3/5

Does It replays how a proof-lab turns a verifier's failure feedback into a teaching signal, working from copied (non-secret) run data so the failure categories, the repair action tied to each failure, and the rule that nothing gets promoted without a fresh re-run result record are all inspectable. Actual proofs, answer keys, and model outputs are deliberately kept out, so the whole correction loop is visible without exposing any of them.

Scope limit It demonstrates control-loop projection mechanics over copied Ring2 run rows only; it does not run Lean/Lake, use external model services, expose proof bodies or oracle premise ids, treat human or provider advice as correctness, prove any theorem, or include launch operations.

Run
microcosm formal-math-verifier-trace-repair-loop run-loop-bundle --input examples/formal_math_verifier_trace_repair_loop/exported_verifier_trace_repair_bundle --out receipts/runtime_shell/demo_project/organs/formal_math_verifier_trace_repair_loop

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Formal Math Verifier Trace Repair Loop

formal_math_verifier_trace_repair_loop is the source-available replay of a source proof-lab pattern over copied Ring2 run system: verifier feedback becomes a teaching signal only after a trace grade, a repair action, a failure-mode ledger append, a curriculum delta, and a cold rerun result record.

It is deliberately not a Lean/Lake proof component. It sits between the existing readiness, premise retrieval, tactic routing, proof diagnostic, and Lean witness surfaces so a cold reader can inspect real failure taxonomy, graph-update candidates, and oracle-repair contrast rows without seeing proof bodies, oracle premise ids, model-output data bodies, or private run logs.

Purpose

A failed proof attempt is cheap to throw away and expensive to learn from. The question this component answers is narrow: can a verifier's failure be turned into a reusable repair signal, on the public side, without that signal quietly inheriting the authority of a real theorem prover? It exists because the interesting work in a proof-repair loop is the bookkeeping, not the proving, and that bookkeeping is where overclaim usually creeps in.

The design choice worth noticing is that the loop refuses to collapse its stages into a single verdict. A verifier failure only counts as a teaching signal once it carries a trace grade backed by trace events, a repair action named against the verifier failure class it responds to, a failure-mode ledger append, a curriculum delta, and a cold-rerun result record. Each of those is a separate field, and promotion is blocked until the cold-rerun result record is present. The same separation keeps the dangerous material out: proof bodies, oracle-needed premise ids, and model-output data bodies are forbidden keys, so a row may name a failure class without ever exposing the proof or the oracle answer that produced it.

The failure mode it guards against is stale copied rows pretending to be live proof-lab evidence. The repair rows here are imported from a real Ring2 benchmark run, so the temptation is to treat the copy as if the run were happening now. The realness gate is the answer: it only reaches its top rung when every verifier attempt and curriculum row replays cleanly against the imported source bodies, and the focused tests deliberately perturb an oracle row, a manifest digest, an attempt label, and a curriculum count so that any drift downgrades the verdict rather than passing quietly. A single deterministic toy-theorem rerun is the one thing actually executed here, and it is plain arithmetic over public inputs, not a Lean proof.

Shape

Fixture input or exportedbundlecopied Ring2 rows +source-module manifestFixture input or exported bundle copied Ring2 rows + source-module manifestProjection protocolcopied-material provenanceProjection protocol copied-material provenanceSource-module manifestdigest, line and byte match,body_in_receipt falseSource-module manifest digest, line and byte match, body_in_receipt falseSecret-exclusion scanproof bodies, oracle ids,model-output data forbiddenSecret-exclusion scan proof bodies, oracle ids, model-output data forbiddenVerifier-attempt replaygrade needs trace events,repair needs failure classVerifier-attempt replay grade needs trace events, repair needs failure classRepair-curriculum replayfailure-mode ledger,curriculum deltasRepair-curriculum replay failure-mode ledger, curriculum deltasPromotion policyrequires cold-rerun resultrecordPromotion policy requires cold-rerun result recordDeterministic toy rerunfail then repair over publicinputsDeterministic toy rerun fail then repair over public inputsRealness gateclean source replay -> toprung;any drift downgradesRealness gate clean source replay -> top rung; any drift downgradesmetadata-only result recordsresult, board, validation,sign-offmetadata-only result records result, board, validation, sign-offScope limitrepair-loop accounting,bounded evidenceScope limit repair-loop accounting, bounded evidence
Diagram source
flowchart TD Input["Fixture input or exported bundle copied Ring2 rows + source-module manifest"] Protocol["Projection protocol copied-material provenance"] Manifest["Source-module manifest digest, line and byte match, body_in_receipt false"] Secret["Secret-exclusion scan proof bodies, oracle ids, model-output data forbidden"] Attempts["Verifier-attempt replay grade needs trace events, repair needs failure class"] Curriculum["Repair-curriculum replay failure-mode ledger, curriculum deltas"] Promotion["Promotion policy requires cold-rerun result record"] Toy["Deterministic toy rerun fail then repair over public inputs"] Realness["Realness gate clean source replay -> top rung; any drift downgrades"] Result records["metadata-only result records result, board, validation, sign-off"] Ceiling["Scope limit repair-loop accounting, bounded evidence"] Input --> Protocol Protocol --> Manifest Manifest --> Secret Secret --> Attempts Attempts --> Curriculum Curriculum --> Promotion Promotion --> Toy Attempts --> Realness Curriculum --> Realness Toy --> Realness Realness --> Result records Result records --> Ceiling

Technical Mechanism

The named mechanism mechanism.formal_math_verifier_trace_repair_loop.validates_public_verifier_trace_repair_bundle is a staged public verifier-repair validator, not a proof executor. _build_result composes five checks over the fixture or exported bundle: projection-protocol density, copied source-module manifest integrity, verifier attempt replay, repair-curriculum replay, promotion policy, and one deterministic toy-theorem repair rerun. The result is pass only when the projection protocol has copied-material provenance, the secret scan has no blocking hits, source modules pass when required, verifier attempts and curriculum rows replay against their imported Ring2 source bodies, promotion requires a cold rerun reference, and the toy rerun succeeds.

The exported-bundle path is intentionally stricter than the fixture path. validate_source_module_manifest requires a source import class, body_in_receipt: false, one row for each declared Ring2 source ref, matching target digests, line counts, and byte counts, and a metadata-only source_open_body_imports summary. _validate_attempt_source_replay then dereferences the premise-run row, oracle-repair contrast row, and graph-update candidate for each verifier attempt. Mismatches become typed findings such as VERIFIER_TRACE_SOURCE_REPLAY_MISMATCH, VERIFIER_TRACE_ORACLE_REPLAY_MISMATCH, VERIFIER_TRACE_COLD_RERUN_SOURCE_MISMATCH, or VERIFIER_TRACE_CANDIDATE_REPLAY_MISMATCH; curriculum-source mismatches are checked separately by validate_repair_curriculum.

The realness gate is also mechanical. _runtime_realness_evidence reaches the R4 state only for an exported bundle with verified source modules, at least 30 source replay checks, zero source replay mismatches, at least three attempts, at least nine trace events, at least three failure modes, and a passing toy rerun. The focused tests deliberately perturb the oracle source row, a manifest digest, a verifier-attempt source label, and a curriculum source count; each mutation blocks the verdict or downgrades the realness evidence instead of letting stale copied rows masquerade as proof-lab evidence.

The proof consumer is tests/test_formal_math_verifier_trace_repair_loop.py: it asserts five attempts, 15 trace events, five repair actions, three cold-rerun promotions, three toy-theorem failures repaired into four passing rerun inputs, seven exported source modules, 37 source replay checks, compact-card omission and fresh-result record reuse, public-relative result record paths, no private/body fields in result records, and exact source module copies. Those checks consume the same fixture, bundle, source-module manifest, and mechanism row cited by this page, so the evidence is executable replay accounting rather than a prose-only description.

The governing lattice is deliberately narrow: the bundle binds the module to concept.formal_math_and_proof_witness_bundle, principles P-1, P-2, P-3, P-6, and P-8, axioms AX-1, AX-2, AX-5, and AX-7, and dependency modules for the Lean standard premise index, tactic portfolio availability, target-shape tactic routing, and formal-math premise retrieval. The standard allows only copied Ring2 verifier-trace repair result record schemas and metadata-only public fields. It does not widen a passing replay into Lean/Lake authority, formal-result correctness, proof-body evidence, oracle premise authority, provider authority, human-approval proof authority, publishing-scope decision, launch-scope decision, or whole-system correctness.

Evidence/accounting:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[23:paper_module.formal_math_verifier_trace_repair_loop] sets source_authority: json_capsule, binds the component, binds mechanism.formal_math_verifier_trace_repair_loop.validates_public_verifier_trace_repair_bundle, and resolves src/microcosm_core/organs/formal_math_verifier_trace_repair_loop.py.
  • Generated instance: paper_modules/formal_math_verifier_trace_repair_loop.json reports paper_module_payload.source_authority: json_capsule, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, 17 relationship edges, and resolved paper_module.depends_on.paper_module edges to the Lean standard premise index, tactic portfolio, target-shape routing, and formal-math premise retrieval modules named by the active standard.
  • Runtime, fixture, and bundle: src/microcosm_core/organs/formal_math_verifier_trace_repair_loop.py exposes run, run_loop_bundle, validate_source_module_manifest, _write_receipts, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and SOURCE_MODULE_MANIFEST_REF. The fixture input and exported bundle replay copied Ring2 verifier-trace repair metadata, source-module digests, failure classes, repair actions, promotion gates, and one deterministic public toy-theorem rerun.
  • Result record and test floor: receipts/first_wave/formal_math_verifier_trace_repair_loop/formal_math_verifier_trace_repair_loop_result.json, verifier_trace_repair_board.json, formal_math_verifier_trace_repair_loop_validation_receipt.json, and result records/sign-off/first_wave/formal_math_verifier_trace_repair_loop_fixture_acceptance.json are metadata-only evidence. tests/test_formal_math_verifier_trace_repair_loop.py checks source-module manifest validation, negative cases, toy rerun evidence, and scope limits.
  • Claim boundary: standards/std_microcosm_formal_math_verifier_trace_repair_loop.json and the generated structured source record limit this module to copied Ring2 verifier-trace repair metadata, source-module digests, public fixture result records, and deterministic toy rerun evidence. They do not authorize Lean/Lake authority, formal-result correctness, proof bodies, oracle premise ids, external model access, human approval as proof authority, launch-scope decision, publishing-scope decision, or whole-system correctness.

Reader Evidence Routing

Those rows prove reader wiring, not formal-result correctness.

Route runtime and replay questions through ## Runtime, ## Receipts, and the fixture/bundle paths in the validation command. The fixture runner, exported bundle runner, CLI route, standard, and fixture manifest show how verifier-trace repair accounting is replayed over copied public rows without importing proof bodies, oracle-needed premise ids, model-output data bodies, or private logs.

Route claim-safety questions through ## What It Proves, ## What It Refuses, ## Result record Expectations, and ## Scope limit. If the question is whether the repair loop is still body-safe and result record-backed, run the focused pytest and paper-module corpus check before citing this page.

Prior Art Grounding

This component is grounded in interactive theorem-proving feedback loops and learning environments where failed proof attempts become structured training or repair signals. GamePad and HOList both expose theorem-proving interaction data for machine-learning experiments, while LeanDojo reinforces the need to keep proof assistant feedback, retrieval, and proof-state interaction reproducible.

Microcosm borrows the repair-loop accounting pattern: verifier events, grades, failure classes, repair actions, curriculum deltas, and cold rerun result records are separate fields. It does not treat human or provider advice as formal-result correctness.

Runtime

  • Component runner: python -m microcosm_core.organs.formal_math_verifier_trace_repair_loop run --input fixtures/first_wave/formal_math_verifier_trace_repair_loop/input --out receipts/first_wave/formal_math_verifier_trace_repair_loop
  • Exported bundle runner: python -m microcosm_core.organs.formal_math_verifier_trace_repair_loop run-loop-bundle --input examples/formal_math_verifier_trace_repair_loop/exported_verifier_trace_repair_bundle --out receipts/runtime_shell/demo_project/organs/formal_math_verifier_trace_repair_loop
  • CLI: microcosm formal-math-verifier-trace-repair-loop run-loop-bundle --input examples/formal_math_verifier_trace_repair_loop/exported_verifier_trace_repair_bundle --out receipts/runtime_shell/demo_project/organs/formal_math_verifier_trace_repair_loop
  • Standard: standards/std_microcosm_formal_math_verifier_trace_repair_loop.json
  • Fixture manifest: core/fixture_manifests/formal_math_verifier_trace_repair_loop.fixture_manifest.json

What It Proves

  • A public verifier replay can require trace events before trace grades.
  • Copied Ring2 failure rows can feed a repair curriculum without becoming proof authority.
  • A repair action must name the verifier failure class it responds to.
  • A failure-mode ledger update can be represented without proof bodies.
  • Promotion requires a cold rerun result record reference.
  • Human or provider advice stays advisory until checker evidence exists.

What It Refuses

  • Proof bodies in public verifier traces.
  • Oracle-needed premise ids in public inputs.
  • model-output data bodies in fixtures or result records.
  • Human approval as checker authority or theorem-quality evidence.
  • launch, public sharing, secret export, or general theorem-proving claims.

Result records

  • receipts/first_wave/formal_math_verifier_trace_repair_loop/formal_math_verifier_trace_repair_loop_result.json
  • receipts/first_wave/formal_math_verifier_trace_repair_loop/verifier_trace_repair_board.json
  • receipts/first_wave/formal_math_verifier_trace_repair_loop/formal_math_verifier_trace_repair_loop_validation_receipt.json
  • result records/sign-off/first_wave/formal_math_verifier_trace_repair_loop_fixture_acceptance.json

Validation Result record Path

./repo-pytest tests/test_formal_math_verifier_trace_repair_loop.py -q --basetemp=/tmp/microcosm_formal_math_verifier_trace_repair_loop_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
jq '{edge_count:(.relationships.edges|length), mermaid_status:.paper_module_payload.generated_projections.mermaid.status, atlas_status:.paper_module_payload.generated_projections.atlas_card.status, source_authority:.relationships.source_authority, unresolved_selective_relation_count:(.relationships.unpopulated_selective_relations|length)}' paper_modules/formal_math_verifier_trace_repair_loop.json

Expected generated-row proof: edge_count: 17, mermaid_status: available_from_capsule_edges, atlas_status: linked_from_capsule_edges, source_authority: json_capsule, and unresolved_selective_relation_count: 0.

Scope boundary

Scope limit

The authority boundary is copied Ring2 verifier trace repair public fields only. The component demonstrates control-loop mechanics over real run rows, not formal-result correctness.

Scope limit

This module supports only the reader-verifiable claim that copied Ring2 verifier rows can drive a public verifier-trace repair loop with trace-event requirements, failure-class routing, promotion gates, and metadata-only result records. It does not establish formal-result correctness, expose proof bodies, authorize human or provider advice as proof authority, publish private run logs, approve launch, or certify whole-system correctness.

Formal Evidence Cell Anchor ResolverResolves each proof-flavored math claim to named evidence and flags ones that overreach or lack backing.3/5

Does When the project's writeups make proof-flavored claims about its formal-math work, this component checks each claim against a named piece of recorded evidence and the public reference files in the repo, confirms the claim is no stronger than that evidence allows, and flags claims that have no backing or that overreach. The record shows which claims are anchored to evidence and which are just words, while proof contents and any private file references are kept out of the output.

Scope limit It validates claim-to-evidence anchoring mechanics only: claim-to-cell resolution, source-anchor presence, permitted claim strength, copied-source-module digest checks, and leakage refusals. It does not run Lean/Lake, certify theorem or mathematical correctness, expose proof bodies or non-public source refs, use external model services, or include launch operations/public sharing.

Run
microcosm formal-evidence-cell-anchor-resolver run-anchor-bundle --input examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle --out receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Formal Evidence Cell Anchor Resolver

formal_evidence_cell_anchor_resolver makes Microcosm's formal-math evidence claims inspectable without turning result record summaries into proof authority. It resolves paper-module claims to evidence-cell ids, checks source-anchor refs, records machine-anchor classes, and enforces a claim-strength boundary before any proof-language claim can pass. Its formal-math trace cell anchors the real Ring2 verifier-trace repair result records.

It is not a theorem prover. It does not execute Lean or Lake, expose proof bodies, expose non-public source refs, use external model services, or claim formal-result correctness. It emits real runtime result records over the imported evidence-cell system, carries digest-bearing Ring2 failure-taxonomy and graph-update source refs, and uses secret-exclusion scanning only for account secret-equivalent or non-result record body payloads.

Purpose

Proof-adjacent prose is the easiest place for a claim to drift. A paper module can write "this proves the theorem" or "this is certified" and a cold reader has no cheap way to tell whether the words are backed by a checked artifact or by nothing at all. This component answers one question: when a claim uses proof language, can the words be resolved to a specific piece of public evidence, and does that evidence stay below theorem-correctness authority?

The mechanism is an evidence cell. A cell is a stable id that stands in for a bundle of result record-backed evidence: its source-anchor refs, a machine_anchor_class that names what kind of machine artifact backs it, and the list of claim strengths the cell is allowed to support. The policy proof_language_requires_machine_anchor is the rule that makes the resolver useful. A claim that uses proof language must name a cell, the cell must resolve in the registry, and its source anchors must point at files that actually exist on the public path. A claim that uses proof language but names no cell, or names a cell that is not in the registry, lowers the run to a blocked status rather than passing as green prose.

What is worth noticing is what the cell id buys. It is a compressed handle: one short reference that a reader can follow back to the real result records behind a claim, instead of inlining proof bodies or trusting narrative. Two boundaries sit on top of that handle. Claim strength is capped by the cell, so a claim cannot assert more than its anchored evidence allows. And human approval is refused as a substitute for a machine anchor, which keeps a sign-off from being treated as proof.

Shape

source recordsource recordstructured source recordsource basis: source recordstructured source record source basis: source recorddiagram viewdiagram viewmap viewmap viewthis pagethis pagethis page this pageruntime locusruntime locusfirst-wave fixture inputfirst-wave fixture inputexported evidence-cell anchorbundleexported evidence-cell anchor bundlesource-open body manifestsource-open body manifestvalidation result recordsvalidation result recordsruntime-shell result recordruntime-shell result recordproof boundary + scope limitanchor metadata only, notformal-result correctnessproof boundary + scope limit anchor metadata only, not formal-result correctness

Source refs

source record
core/paper_module_capsules.json::paper_modules[24]
structured source record source basis: source record
paper_modules/formal_evidence_cell_anchor_resolver.json
runtime locus
src/microcosm_core/organs/formal_evidence_cell_anchor_resolver.py
first-wave fixture input
fixtures/first_wave/formal_evidence_cell_anchor_resolver/input
exported evidence-cell anchor bundle
examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle
source-open body manifest
source_module_manifest.json
validation result records
receipts/first_wave/... + receipts/acceptance/...
runtime-shell result record
receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver/...
Diagram source
flowchart TD Bundle["source record core/paper_module_capsules.json::paper_modules[24]"] --> structured source record["structured source record paper_modules/formal_evidence_cell_anchor_resolver.json source basis: source record"] structured source record --> Mermaid["diagram view available_from_capsule_edges"] structured source record --> Atlas["map view blocked_until_organ_atlas_owner_lane_binds_edges"] structured source record --> Reader["this page this page"] Reader --> Runtime["runtime locus src/microcosm_core/components/formal_evidence_cell_anchor_resolver.py"] Runtime --> Fixture["first-wave fixture input fixtures/first_wave/formal_evidence_cell_anchor_resolver/input"] Runtime --> Bundle["exported evidence-cell anchor bundle examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle"] Bundle --> Manifest["source-open body manifest source_module_manifest.json"] Fixture --> Result records["validation result records result records/first_wave/... + result records/sign-off/..."] Bundle --> BundleReceipt["runtime-shell result record result records/runtime_shell/demo_project/components/formal_evidence_cell_anchor_resolver/..."] Result records --> Ceiling["proof boundary + scope limit anchor metadata only, not formal-result correctness"] BundleReceipt --> Ceiling

Read the diagram left to right: the bundle and generated structured source record name the relationships; the runtime validates fixture and bundle inputs; the result records show what passed; the scope limit prevents any of those surfaces from becoming proof, launch, provider, private-system, or theorem-correctness authority.

Reader Evidence Routing

A cold reader should inspect this module through these system surfaces, in order:

  1. Authority seed: core/paper_module_capsules.json::paper_modules[24:paper_module.formal_evidence_cell_anchor_resolver]. This is the source record that binds the Markdown projection, generated JSON, runtime locus, fixture, exported bundle, mechanism rows, and scope boundaries.
  2. Generated structured source record: paper_modules/formal_evidence_cell_anchor_resolver.json. Check relationships.source_authority, the 15 relationship edges, the generated_projections statuses, unpopulated_selective_relations, and the bundle-carried scope limit before trusting any prose summary.
  3. Runtime locus: src/microcosm_core/organs/formal_evidence_cell_anchor_resolver.py. The relevant runtime symbols are run, run_anchor_bundle, validate_source_module_manifest, _build_result, _source_module_summary_card, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_REF, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.
  4. Fixture and exported bundle: fixtures/first_wave/formal_evidence_cell_anchor_resolver/input, examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle, and examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle/source_module_manifest.json. The first-wave fixture exercises negative cases and Ring2 result record anchors; the exported bundle validates six source-open body modules by digest while keeping source bodies out of result records.
  5. Result records: receipts/first_wave/formal_evidence_cell_anchor_resolver/formal_evidence_cell_anchor_resolver_result.json, receipts/first_wave/formal_evidence_cell_anchor_resolver/evidence_cell_anchor_board.json, receipts/first_wave/formal_evidence_cell_anchor_resolver/formal_evidence_cell_anchor_resolver_validation_receipt.json, result records/sign-off/first_wave/formal_evidence_cell_anchor_resolver_fixture_acceptance.json, and receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle_validation_result.json. These result records report pass/fail state, metadata-only public refs, negative-case observations, and explicit release_authorized=false, provider_calls_authorized=false, lean_lake_execution_authorized=false, formal_proof_authority=false, and theorem_correctness_authority=false ceilings.
  6. Focused checks: tests/test_formal_evidence_cell_anchor_resolver.py, scripts/build_doctrine_projection.py --check-paper-module-corpus, and the JSON-row proof query in the validation section below. Those checks validate the reader route and generated-row parity; they do not authorize public sharing or formal proof claims.

Prior Art Grounding

This component is grounded in provenance and proof-certificate work where claims must point at checkable evidence rather than untyped narrative. The W3C PROV model is a general anchor for linking entities, activities, and agents in an evidence graph, while Proof-Carrying Code and small-kernel proof assistants motivate separating a certificate or anchor from the trusted checker that bounds its meaning.

Microcosm borrows the anchor-resolution pattern: proof-language claims must name evidence-cell ids, source anchors, machine-anchor classes, and claim strength limits. It does not turn metadata cells into theorem-correctness authority.

Runtime

  • Component runner: python -m microcosm_core.organs.formal_evidence_cell_anchor_resolver run --input fixtures/first_wave/formal_evidence_cell_anchor_resolver/input --out receipts/first_wave/formal_evidence_cell_anchor_resolver
  • Exported bundle runner: python -m microcosm_core.organs.formal_evidence_cell_anchor_resolver run-anchor-bundle --input examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle --out receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver
  • CLI: microcosm formal-evidence-cell-anchor-resolver run-anchor-bundle --input examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle --out receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver
  • Standard: standards/std_microcosm_formal_evidence_cell_anchor_resolver.json
  • Fixture manifest: core/fixture_manifests/formal_evidence_cell_anchor_resolver.fixture_manifest.json

What It Establishes As Evidence Routing

  • Proof-language claims must resolve to a public evidence cell before this reader treats them as routed evidence.
  • Evidence cells must carry source-anchor refs.
  • Machine-anchor metadata is visible as metadata, not formal-result correctness.
  • Claim strength is bounded by the resolved cell.
  • Secret, account secret-equivalent, or non-result record body payloads must have explicit exclusion result records.
  • The verifier-trace cell is anchored to the first-wave formal_math_verifier_trace_repair_loop result, board, validation result record, and Ring2 failure-taxonomy source digest.

What It Refuses

  • Unknown evidence-cell ids used as proof authority.
  • Proof-language claims without evidence-cell ids.
  • Proof bodies in public claim rows.
  • non-public source refs in public claim or cell rows.
  • Human approval as proof authority.
  • Theorem-correctness claims from metadata cells.
  • launch, public sharing, secret export, or provider authority.

Result records

  • receipts/first_wave/formal_evidence_cell_anchor_resolver/formal_evidence_cell_anchor_resolver_result.json
  • receipts/first_wave/formal_evidence_cell_anchor_resolver/evidence_cell_anchor_board.json
  • receipts/first_wave/formal_evidence_cell_anchor_resolver/formal_evidence_cell_anchor_resolver_validation_receipt.json
  • result records/sign-off/first_wave/formal_evidence_cell_anchor_resolver_fixture_acceptance.json

Validation Result record Path

./repo-pytest tests/test_formal_evidence_cell_anchor_resolver.py -q --basetemp=/tmp/microcosm_formal_evidence_cell_anchor_resolver_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
jq '{edge_count:(.relationships.edges|length), mermaid_status:.paper_module_payload.generated_projections.mermaid.status, atlas_status:.paper_module_payload.generated_projections.atlas_card.status, source_authority:.relationships.source_authority, unresolved_selective_relation_count:(.relationships.unpopulated_selective_relations|length)}' paper_modules/formal_evidence_cell_anchor_resolver.json

Expected generated-row proof: edge_count: 15, mermaid_status: available_from_capsule_edges, atlas_status: blocked_until_organ_atlas_owner_lane_binds_edges, source_authority: json_capsule, and unresolved_selective_relation_count: 0.

Scope boundary

Limitations

This module is a proof-adjacent evidence router, not a proof system. The fixture proves a bounded resolver contract over three paper claims, three evidence cells, seven declared negative-case classes, eight source anchors, three machine anchors, and zero copied source modules in fixture mode. The exported bundle proves the same public runtime shape over three claims, three evidence cells, five source anchors, six copied source-open body modules, and metadata-only result records. These counts are the claim boundary, not a scale claim about the formal-math corpus.

The source-module proof is digest and authority-ref parity for the six exported body modules named by the bundle manifest. It does not establish that every source formal-math source file has been imported, that future source drift is absent, or that copied body availability confers public launch-scope decision. A digest match also excludes exporting proof bodies, non-public source refs, model-output data, oracle material, account secrets, browser UI/operator UI state, or source notes.

The checker rejects unknown cells, missing source anchors, proof language without cells, non-public refs, proof bodies, theorem-correctness overclaims, and human approval as proof authority. That refusal coverage does not certify Lean or Lake execution, formal-result correctness, proof completeness, benchmark performance, deployment posture, or whole-system correctness.

Scope limit

The authority boundary is evidence-cell anchor resolution backed by real runtime result records. The component makes claim boundaries legible; it does not certify mathematical truth.

Scope limit

This module supports only the reader-verifiable claim that public evidence-cell anchor metadata can bind proof-language claims to result record-backed cells and exclude private bodies, proof bodies, model-output data, oracle material, and secret-equivalent refs. Its generated Mermaid/Atlas statuses and relationship counts are JSON-bundle projections; they do not certify formal-result correctness, proof completeness, launch-scope decision, publishing-scope decision, provider authority, or whole-system correctness.

Source and projection details
Governing Lattice Relation

The lattice edge is not just that this page "mentions" formal math evidence. The generated structured source record binds the page to one component, two mechanism rows, concept.formal_math_and_proof_witness_bundle, P-1, P-2, P-3, P-6, P-8, AX-1, AX-2, AX-5, AX-7, the sibling paper_module.formal_math_verifier_trace_repair_loop, and the resolved runtime source locus. That is the governing shape: proof-adjacent claims enter as paper-claim rows, evidence-cell ids, source anchors, machine-anchor classes, and copied source-module manifests; _build_result recomputes the pass or blocked status from those lower-level artifacts; _source_module_summary_card and run_anchor_bundle export compact, metadata-only evidence.

P-1 and AX-1 require a recomputed checker result rather than a label. P-2 and AX-2 keep the scope limit at the strength of the resolver and its certificates. P-3 makes the small resolver/manifest checker the authority surface instead of broad proof-language prose. P-6, P-8, AX-5, and AX-7 explain the blocked path: missing anchors, proof bodies, non-public source refs, source-module digest drift, theorem-correctness language, or human approval as proof authority must lower the status or return a refusal with evidence rather than preserving a green reader claim.

The focused proof consumer is tests/test_formal_evidence_cell_anchor_resolver.py. It asserts the fixture path observes all seven expected negative cases, resolves three claims to three evidence cells, records eight source anchors and three machine anchors, anchors the verifier-trace row to Ring2 result records, keeps formal-proof and theorem-correctness authority false, validates the exported bundle with six copied source modules, rejects theorem-correctness overclaims, rejects digest and rehashed-body swaps, and keeps command-card result records compact and metadata-only. Those checks are the local mechanism witness for the lattice relation.

Source-Open Body Floor

The exported bundle carries a source-open body floor at examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle/source_module_manifest.json. It imports the paper-module formal-evidence auditor, formal evidence-cell registry builder, focused runtime tests, public formal-evidence registry state, Erdos257 issue217 evidence-cell manifest, and the std_paper_module formal-evidence-cell contract body. Result records and workingness cards expose digests and validation status, not body text, proof bodies, model-output data, non-public refs, oracle material, or theorem-correctness authority.

Undeclared Library Prior Symbol ClassifierDetects when a checked Lean proof cites a library result outside its approved set.3/5

Does It checks whether a Lean proof cites a library result (a lemma or definition) that was never on its approved list. Even after a prover accepts a proof, that proof can still quietly use a library symbol it wasn't allowed to, and this component surfaces those out-of-bounds uses as an inspectable record that names each symbol and where the rule came from. It matters because "the proof checked" does not mean "the proof stayed within the allowed set of building blocks," and this makes that gap visible without ever reading the proof's own steps.

Scope limit It only projects the symbol-boundary classification mechanic over copied Lean/Std premise rows and pre-extracted symbol observations; it does not read proof source, run Lean or Lake, prove formal-result correctness, treat the whole standard library as an implicit allowlist, claim Mathlib availability, use external model services, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.undeclared_library_prior_symbol_classifier run --input fixtures/first_wave/undeclared_library_prior_symbol_classifier/input --out receipts/first_wave/undeclared_library_prior_symbol_classifier

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Undeclared Library Prior Classifier

This module is the Microcosm projection of the formal-prover rule that a Lean-accepted proof can still violate the evaluation contract when it uses a real library symbol that was not in the allowed premise set. It is a provenance-bearing symbol-boundary component, not a proof checker.

The fixture carries copied Lean/Std premise rows from the real Ring2 premise-index system and real Ring2 problem ids / candidate artifact digests for the symbol-boundary examples. It records extracted qualified symbol refs and classifies a known symbol outside allowed_premise_ids as UNDECLARED_LIBRARY_PRIOR. If cited_unallowed_premise_ids is present, that explicit budget violation takes precedence and routes as PREMISE_BUDGET_VIOLATION.

The source chain is digest-bearing: the real Ring2 premise index sha256:c78b176388a5e81bd8a785950e7db0c9a65fd38e556515134146163b48604df1, Ring2 run summary sha256:93304410f32d40f5cad1c161c1d01a5d6f353ee10b7cf3fecbaaf7b068b43008, copied Lean/Std premise fixture sha256:0be36ba5b75b40d2ede2d90cefa5181829420df7abbae216d18282b92a30f869, and the adjacent corpus-readiness / tactic-availability result records anchor the Mathlib-absent toolchain boundary.

The exported bundle carries a source-open body floor at examples/undeclared_library_prior_symbol_classifier/exported_symbol_classifier_bundle/source_module_manifest.json. It imports the reducer and set-calibration builder source bodies exactly, plus run bodies for the Ring2 premise index, Ring2 run summary, recipe policy metrics, and result record reduction matrix. The two run-state bodies are path-normalized to <repo-root> and <lean-toolchain-root> while preserving source and target digests, line counts, byte counts, and required anchors.

Purpose

A theorem prover can return a proof that Lean accepts, yet that proof can still break the rules of the evaluation it was run under. The usual reason is simple: the proof reached for a library lemma that the recipe never put on the table. The symbol is real and the proof is sound, but the run quietly used a fact it was not allowed to assume. This component answers one question. Given a set of premises a candidate was allowed to use and the symbols it actually reached for, did it cite a known library symbol that was outside that allowed set?

The unusual choice is what the classifier refuses to do. It does not run Lean, it does not read the proof body, and it does not treat the standard library as an implicit allowlist where anything that exists is fair game. It works only from a copied premise index and a list of symbol observations that were extracted beforehand, and it compares the two. That keeps the check cheap and keeps proof material out of the public result record, but it also means the allowed set is closed by construction: a symbol is admissible only because a premise row names it, never because it happens to live in Lean's standard library.

The check also separates two failure modes that are easy to confuse. An explicit budget breach, where the candidate names a premise id the recipe did not allow, is not the same as a residual breach, where the candidate used an allowed-looking symbol that turns out to be undeclared. The first is settled directly from the cited ids and takes precedence; the second is what the symbol comparison is for. Treating both as one class would either over-escalate honest retries or let genuine out-of-recipe library use slip through as a budget note. Keeping them apart is the point.

Shape

JSON source recordJSON source recordstructured source record19 edges, no selectiveresidualsstructured source record 19 edges, no selective residualsRuntime componentRuntime componentCopied Lean/Std premise index11 sanctioned symbolsCopied Lean/Std premise index 11 sanctioned symbolsPre-extracted symbolobservationsPre-extracted symbol observationsBudgetBudgetKnown qualified symboloutside allowed_premise_idsKnown qualified symbol outside allowed_premise_idsAllowed symbol or no knownundeclared symbolAllowed symbol or no known undeclared symbolPREMISE_BUDGET_VIOLATIONroute: retryPREMISE_BUDGET_VIOLATION route: retryUNDECLARED_LIBRARY_PRIORroute: bridge_escalateUNDECLARED_LIBRARY_PRIOR route: bridge_escalateNONEroute: accept_as_advisoryNONE route: accept_as_advisoryResult record streamfixture, board, validation,sign-offResult record stream fixture, board, validation, sign-offScope limitno Lean/Lake, proof,provider, launch,private-system claimScope limit no Lean/Lake, proof, provider, launch, private-system claim

Source refs

JSON source record
paper_module.undeclared_library_prior_classifier
Runtime component
undeclared_library_prior_symbol_classifier.py
Pre-extracted symbol observations
Nat/List/Bool/Iff/Eq refs
Budget
cited_unallowed_premise_ids present
Diagram source
flowchart TD bundle["JSON source record paper_module.undeclared_library_prior_classifier"] structured source record["structured source record 19 edges, no selective residuals"] runtime["Runtime component undeclared_library_prior_symbol_classifier.py"] premise["Copied Lean/Std premise index 11 sanctioned symbols"] observations["Pre-extracted symbol observations Nat/List/Bool/Iff/Eq refs"] budget["cited_unallowed_premise_ids present"] residual["Known qualified symbol outside allowed_premise_ids"] clean["Allowed symbol or no known undeclared symbol"] retry["PREMISE_BUDGET_VIOLATION route: retry"] escalate["UNDECLARED_LIBRARY_PRIOR route: bridge_escalate"] advisory["NONE route: accept_as_advisory"] result records["Result record stream fixture, board, validation, sign-off"] ceiling["Scope limit no Lean/Lake, proof, provider, launch, private-system claim"] bundle --> structured source record structured source record --> runtime runtime --> premise runtime --> observations observations --> budget observations --> residual observations --> clean budget --> retry residual --> escalate clean --> advisory retry --> result records escalate --> result records advisory --> result records result records --> ceiling

Technical Mechanism

The component separates three questions that are easy to conflate in proof evaluation: whether a candidate explicitly cites a premise outside the recipe, whether it uses a known Lean/Std symbol that was not in the allowed premise set, and whether the theorem is actually correct. Only the first two are in scope. validate_premise_index builds the closed allowlist from copied Lean/Std premise rows, validate_symbol_observations reads pre-extracted qualified symbol observations, and _classify_row applies the precedence rule: cited_unallowed_premise_ids yields PREMISE_BUDGET_VIOLATION with retry; otherwise a known qualified symbol outside allowed_premise_ids yields UNDECLARED_LIBRARY_PRIOR with bridge_escalate; clean or unknown observations remain advisory. The classifier records observed symbols and computed/asserted classes, but it never evaluates proof bodies or runs Lean.

The exported-bundle mechanism is a second boundary rather than a richer proof. validate_source_module_manifest requires source_module_manifest.json, rejects manifest or row-level body_in_receipt: true, verifies six declared body imports against source/target digests, line counts, byte counts, required anchors, material classes, and relation type, and keeps path-normalized Ring2 run-state bodies separate from exact copied reducer bodies. secret_exclusion_scan then checks the declared public fixture and bundle inputs for proof-body, provider-payload, private-ref, and host-path sentinel classes. _write_receipts writes result, board, validation, and sign-off result records; result_card deliberately emits a small pass/fail card that omits source modules, source digests, proof bodies, non-public source refs, secret-scan detail, and scope limit bodies. This is why the module can be source-open about the symbol-boundary system without becoming a proof-body export.

The governing lattice follows the same separation. The bundle binds the component to mechanism.undeclared_library_prior_symbol_classifier.validates_public_symbol_boundary, concept.formal_math_and_proof_witness_bundle, principles P-1, P-2, P-3, P-6, P-8, and P-9, and axioms AX-1, AX-2, AX-5, AX-7, AX-8, and AX-10. The technical claim is therefore limited to public symbol-budget classification over copied, digest-bearing premise evidence. It does not establish theorem truth, Mathlib availability, Lean/Lake execution, launch-scope decision, provider correctness, or complete library allowlisting.

Reader Evidence Routing

Start with the source record, not this prose: core/paper_module_capsules.json::paper_modules[56:paper_module.undeclared_library_prior_classifier] is the source authority that names the component subject undeclared_library_prior_symbol_classifier, the mechanism mechanism.undeclared_library_prior_symbol_classifier.validates_public_symbol_boundary, the code locus src/microcosm_core/organs/undeclared_library_prior_symbol_classifier.py, the concept concept.formal_math_and_proof_witness_bundle, the governing principles P-1, P-2, P-3, P-6, P-8, and P-9, the axioms AX-1, AX-2, AX-5, AX-7, AX-8, and AX-10, and the sibling modules paper_module.corpus_readiness_mathlib_absence_gate, paper_module.tactic_portfolio_availability, and paper_module.lean_std_premise_index.

Then read the generated structured source record paper_modules/undeclared_library_prior_classifier.json. It is the parity projection from the bundle, carrying source_authority: json_capsule, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, 19 generated relationship edges, and no unpopulated selective relations. The structured source record is evidence that the reader page is wired into the doctrine lattice; it is not theorem-correctness, launch, or runtime-correctness authority.

For runtime behavior, inspect src/microcosm_core/organs/undeclared_library_prior_symbol_classifier.py. The named locus validates projection protocol, premise index, classifier policy, source-module manifest, symbol observations, secret-exclusion scan, result construction, result record writing, and result-card compaction. The load-bearing classifier rule is _classify_row: explicit cited_unallowed_premise_ids short-circuit as PREMISE_BUDGET_VIOLATION with retry; otherwise a known qualified Lean/Std symbol outside allowed_premise_ids classifies as UNDECLARED_LIBRARY_PRIOR with bridge_escalate. Negative cases reject proof bodies, non-public source refs, theorem-correctness overclaims, allowed-symbol false positives, unqualified-token overclaims, and missing escalation.

For public fixture evidence, use fixtures/first_wave/undeclared_library_prior_symbol_classifier/input/. The fixture carries the premise index, classifier policy, projection protocol, symbol observations, and the seven negative-case files named by EXPECTED_NEGATIVE_CASES. For exported source-open body-floor evidence, use examples/undeclared_library_prior_symbol_classifier/exported_symbol_classifier_bundle/source_module_manifest.json. That manifest verifies six source body imports: reducer source, set-calibration builder source, path-normalized Ring2 premise-index state, path-normalized Ring2 run summary, recipe policy metrics, and result record reduction matrix. The manifest keeps body_in_receipt false and checks source/target digests plus required anchors; it does not export proof bodies, model-output data bodies, account or browser state, source notes, or private source-root bodies.

For result records, read receipts/first_wave/undeclared_library_prior_symbol_classifier/undeclared_library_prior_symbol_classifier_result.json, receipts/first_wave/undeclared_library_prior_symbol_classifier/undeclared_library_prior_symbol_classifier_board.json, receipts/first_wave/undeclared_library_prior_symbol_classifier/undeclared_library_prior_symbol_classifier_validation_receipt.json, and result records/sign-off/first_wave/undeclared_library_prior_symbol_classifier_fixture_acceptance.json. The fixture result record reports 11 premises, 3 classifications, 1 undeclared-library prior, 1 premise-budget-precedence case, 1 bridge escalation, 1 retry, zero blocking secret-exclusion hits, and the scope boundary that this is not Lean/Lake, formal-result correctness, provider, private-ref, whole-library-allowlist, or launch-scope decision.

Focused regression coverage lives in tests/test_undeclared_library_prior_symbol_classifier.py. It runs both the fixture command and run-symbol-bundle, checks public-relative result records, verifies digest/manifest boundary failures, and confirms the compact card reuses a fresh result record without exporting source modules, body ids, secret-scan details, source digests, proof bodies, or non-public source refs. The paper-module coverage contract also names this module in tests/test_microcosm_paper_module_coverage_contract.py; that is route coverage evidence, not runtime proof evidence.

Named Proof Consumers

The fixture consumer is microcosm_core.organs.undeclared_library_prior_symbol_classifier run over fixtures/first_wave/undeclared_library_prior_symbol_classifier/input. It proves the public example still classifies 11 copied premise rows and 3 symbol observations into one undeclared-library-prior escalation, one premise-budget retry, and one advisory clean case, while the expected negative cases cover proof-body export, non-public refs, theorem-correctness overclaim, allowed-symbol false positives, unqualified-token overclaims, and missing escalation.

The exported-bundle consumer is microcosm_core.organs.undeclared_library_prior_symbol_classifier run-symbol-bundle over examples/undeclared_library_prior_symbol_classifier/exported_symbol_classifier_bundle. It proves the six source-open body imports remain digest/size/anchor checked and public-safe, including the exact copied reducer and calibration-builder bodies plus path-normalized Ring2 state, recipe metrics, and reduction-matrix bodies. It is the consumer that catches source-module digest drift and manifest-boundary violations; it does not certify formal-result correctness.

The focused regression consumer is tests/test_undeclared_library_prior_symbol_classifier.py. It ties the fixture and bundle commands to public-relative result records, source-module digest mismatch blocking, manifest and row-level body_in_receipt rejection, compact-card redaction, and fresh-card reuse. The corpus consumer is scripts/build_doctrine_projection.py --check-paper-module-corpus, which proves the Markdown remains part of the 98-module Microcosm paper-module corpus. That corpus check is routing and projection parity evidence only; it is not a runtime proof substitute.

Public Mechanics

  • Qualified symbol refs are restricted to Nat, List, Bool, Iff, and Eq namespaces in this public fixture.
  • The closed premise index is an allowlist boundary, not permission to use the whole standard library.
  • UNDECLARED_LIBRARY_PRIOR routes to bridge_escalate because the proof may be informative while still out of recipe.
  • PREMISE_BUDGET_VIOLATION routes to retry and short-circuits the residual symbol classifier.
  • Result records expose ids, candidate artifact digests, symbols, counts, failure classes, source refs, source digests, and scope limits.
  • secret_exclusion_scan records zero blocking hits for the declared sentinel classes in the public result record stream; it is not a complete secret audit, launch clearance, or proof that no private material exists anywhere.

Prior Art Grounding

This classifier is grounded in formal-methods work on premise control and library-aware proof search. Isabelle/Sledgehammer makes relevant-fact selection an explicit part of automated proof search, and Lean/Mathlib practice makes clear that accepted proofs can depend on a large library context. Microcosm uses that insight as a boundary check: an accepted proof artifact is not enough if it quietly used symbols outside the declared premise set. The component classifies the symbol-budget violation without judging theorem truth or exporting proof bodies.

Prior-art anchors:

  • Isabelle Sledgehammer and relevant-fact selection: https://isabelle.in.tum.de/doc/sledgehammer.pdf
  • Lean community Mathlib overview: https://leanprover-community.github.io/mathlib-overview.html
  • Lean 4 tactic and proof environment context: https://lean-lang.org/theorem_proving_in_lean4/Tactics/

Regression Cases

The forbidden proof-body, private-ref, allowed-symbol false-positive, unqualified-token, and theorem-correctness cases are regression-only leakage guards. They are not product evidence and cannot stand in for the copied Lean/Std symbol-boundary system.

Validation Result record Path

Run from microcosm-substrate:

The expected bundle projection is Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, and 19 generated relationship edges with no unpopulated selective relations. A green result record proves only the allowed-premise and symbol-budget classification boundary; it does not establish formal-result correctness, run Lean or Lake, expose proof bodies, authorize external model access, claim Mathlib availability, or broaden all Std and Mathlib declarations into allowed priors.

Scope boundary

Scope limit

The JSON bundle and generated row prove only allowed-premise and symbol-budget classification evidence: copied Lean/Std premise rows, real Ring2 ids and digests, extracted qualified symbol refs, declared budget-violation cases, source-open body-floor digest evidence, leakage regression cases, negative cases, and validation result records. They do not prove formal-result correctness, run Lean or Lake, expose proof bodies, use external model services, import non-public source refs, claim Mathlib availability, treat all Std or Mathlib declarations as allowed priors, include launch operations, authorize public sharing, or prove whole-system correctness. They also do not expose model-output data bodies, account or browser state, source notes, or private source-root bodies.

Limitations

The classifier depends on copied, premise rows and pre-extracted qualified symbol observations. It does not parse arbitrary Lean syntax, expand imports, normalize proof terms, or run Lean/Lake to discover symbols. Unknown or unqualified tokens are deliberately kept outside the positive undeclared-library-prior claim unless the public observation and closed premise index make the boundary explicit.

The public source-open body floor is a provenance check, not semantic equivalence for the full private source system. Exact copied bodies and path-normalized run-state bodies are checked for source/target digests, line counts, byte counts, and required anchors; that does not certify every upstream private root, model-output data, account state, or operator context that may have informed the original source run.

The leakage and launch boundaries are also scoped. secret_exclusion_scan checks declared sentinel classes in the public fixture and bundle inputs, while the focused pytest checks regression cases for proof-body export, non-public refs, overclaims, and compact-card redaction. Those checks do not replace a whole-repo secret audit, a public sharing review, theorem-correctness evidence, or a Mathlib availability proof. The paper-module corpus and generated-row checks prove routing parity only.

Scope limit

This module is allowed-premise and symbol-budget classification evidence only. It does not establish formal-result correctness, run Lean or Lake, expose proof bodies, use external model services, import non-public source refs, treat all Std or Mathlib declarations as allowed priors, claim Mathlib availability, or include launch operations.

Scope boundary

This module does not establish formal-result correctness, run Lean or Lake, expose proof bodies, use external model services, import non-public source refs, treat all Std/Mathlib declarations as allowed priors, claim Mathlib availability, or include launch operations.

Source and projection details
Governing Lattice Relation

The governing relation is the path from bundle authority to a bounded proof consumer. The source row binds this module to the undeclared_library_prior_symbol_classifier component, the mechanism mechanism.undeclared_library_prior_symbol_classifier.validates_public_symbol_boundary, the runtime locus src/microcosm_core/organs/undeclared_library_prior_symbol_classifier.py, the concept concept.formal_math_and_proof_witness_bundle, six principles, six axioms, and the sibling paper modules for corpus readiness, tactic availability, and Lean/Std premise indexing.

The principle layer explains why the classifier is a boundary component rather than a theorem authority. P-1 requires the symbol class to be recomputed from premise rows and observations instead of echoed from prose. P-2 lowers the claim to what the checker actually tests: allowed-premise and symbol-budget classification. P-3 concentrates trust in the small component and source-module manifest validators. P-6 fails closed on missing or stale evidence. P-8 turns inadmissible computations into typed outcomes such as PREMISE_BUDGET_VIOLATION and UNDECLARED_LIBRARY_PRIOR. P-9 carries source refs, target refs, digests, and body-material status through the fixture, bundle, and result record layers.

The axiom layer supplies the same ceiling in machine-checkable form. AX-1 requires derivation before assertion, so the page points to fixture and bundle result records instead of declaring theorem truth. AX-2 keeps verification inside kernelized validators. AX-5 prevents an authority upgrade without stronger evidence. AX-7 allows typed partiality and refusal when the proof body, non-public refs, or theorem-correctness claim is inadmissible. AX-8 preserves provenance while keeping proof/provider/private bodies out of public result records.

The generated JSON row currently contributes 19 relationship edges with no unpopulated selective relations. Those edges are evidence of route parity, not new authority: the source authority remains the JSON bundle and the proof authority remains the focused fixture, bundle, and regression consumers.

This page treats those generated navigation surfaces as bundle-derived projections while explaining the resolved symbol-boundary component, code-locus, law, and sibling-paper links.

Ring2 Premise Retrieval Precision Recall HarnessScores how much proof support a premise search found, problem by problem.3/5

Does When a math-proving system searches for the supporting facts ("premises") a proof will need, this component replays saved records of that search and reports, problem by problem, how much of the needed support the search actually turned up. Per problem it labels one of four outcomes: the search found everything needed and the proof went through; it found everything needed but the proof still failed; it found only some of the needed support; or it found none of it. Separating "the proof failed even though every needed premise was found" from "the proof failed because the search missed a needed premise" shows which part to fix. It also runs as a regression guard that refuses inputs which try to slip the answer into the search itself (the known-correct premises planted in the ranked results), leak proof text, tune on the test answers, or claim more than retrieval-quality numbers.

Scope limit These are after-the-fact retrieval-attribution labels and precision/recall counts over copied run records only. The component does not run Lean or Lake, call any provider, expose proof bodies, tune on test answers, claim benchmark performance, prove formal-result correctness, or include launch operations, and its labels are explicitly forbidden from flowing into provider context. The aggregate numbers describe only the copied fixture/bundle replayed, not any benchmark claims.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.ring2_premise_retrieval_precision_recall_harness run --input fixtures/first_wave/ring2_premise_retrieval_precision_recall_harness/input --out receipts/first_wave/ring2_premise_retrieval_precision_recall_harness

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Ring-2 Premise Precision Recall

ring2_premise_retrieval_precision_recall_harness is the public Microcosm component for evaluating copied Ring-2 premise retrieval rankings against after-the-fact labels.

The component computes precision and recall per problem, then classifies the result as retrieval_hit, partial_retrieval_miss, retrieval_miss, or proof_failure_despite_hit. That distinction matters because a failed proof with all needed premises retrieved is a different failure than a missing premise retrieval path.

Purpose

When a proof search fails, it is easy to blame the prover and miss the simpler cause: the right supporting facts were never put in front of it. This component exists to keep those two cases apart. It answers one question: did the retrieval step actually surface the premises a problem needed, or did the failure happen somewhere downstream after the premises were already in hand?

It answers that by recomputing precision and recall from copied records rather than trusting a reported figure. For each problem it intersects the retrieved premise ids with the labelled needed-premise ids, then reads the proof outcome alongside that overlap. Full recall with a passing proof is a retrieval_hit; full recall with a non-passing proof is proof_failure_despite_hit, the case where retrieval did its job and the fault lies elsewhere. Partial overlap and zero overlap are graded as partial_retrieval_miss and retrieval_miss.

The unusual part is the direction the labels are allowed to flow. The needed premise ids are after-the-fact measurement labels, and the component treats them as strictly one-way: they may be used to score a finished run, but they may not be fed back into the retrieval ranking, used to tune on a test split, or carried into a provider-context recipe. Planting an oracle label inside a ranking, or tuning on test answers, is a typed refusal, not a higher score. The point is a metric that cannot quietly become the very advantage it is meant to measure, and that never inflates a retrieval result into a claim about formal-result correctness.

Shape

source recordsource recordstructured source recordstructured source recordthis pagethis pagediagram viewdiagram viewmap viewmap viewfixture inputfixture inputruntime componentruntime componentexported bundleexported bundleprecision/recall labelsretrieval vs proof-failureattributionprecision/recall labels retrieval vs proof-failure attributionvalidation result recordsfirst_wave + runtime_shellvalidation result records first_wave + runtime_shellnegative casesleakage, tuning, overclaim,missing decoynegative cases leakage, tuning, overclaim, missing decoyproof boundarymetrics and copied artifactsonlyproof boundary metrics and copied artifacts only

Source refs

source record
core/paper_module_capsules.json[42]
structured source record
paper_modules/ring2_premise_precision_recall.json
fixture input
fixtures/first_wave/.../input
runtime component
ring2_premise_retrieval_precision_recall_harness.py
exported bundle
examples/.../exported_ring2_precision_recall_bundle
Diagram source
flowchart TD Bundle["source record core/paper_module_capsules.json[42]"] --> JSON["structured source record paper_modules/ring2_premise_precision_recall.json"] JSON --> Markdown["this page reader projection"] JSON --> Mermaid["diagram view available_from_capsule_edges"] JSON --> Atlas["map view organ_atlas.ring2_premise_retrieval_precision_recall_harness"] Fixture["fixture input fixtures/first_wave/.../input"] --> Runtime["runtime component ring2_premise_retrieval_precision_recall_harness.py"] Bundle["exported bundle examples/.../exported_ring2_precision_recall_bundle"] --> Runtime Runtime --> Metrics["precision/recall labels retrieval vs proof-failure attribution"] Runtime --> Result records["validation result records first_wave + runtime_shell"] Runtime --> Negatives["negative cases leakage, tuning, overclaim, missing decoy"] Result records --> Boundary["proof boundary metrics and copied artifacts only"]

Technical Mechanism

The runtime splits the proof consumer into three evidence classes before it reports any metric. _load_payloads reads the declared fixture or exported bundle inputs; _validate_run_material checks that copied Ring-2 run material carries source refs, target refs, validation refs, digests, and the expected copied_non_secret_macro_body_with_provenance status; and _validate_source_artifacts verifies the four copied source artifacts against either the source digest or the private-path rewrite digest. The result record therefore proves the presence and provenance of the copied public artifacts before the precision/recall scores can be interpreted.

The scoring core is _evaluate. It indexes after-the-fact labels by problem_id, applies the policy default_top_k or per-ranking top_k, truncates retrieved premise ids to that cutoff, intersects retrieved ids with labelled needed-premise ids, and computes precision_at_k = hits/top_k and recall_at_k = hits/needed. Aggregate precision and recall use total hit, candidate, and needed-premise counts, then compare the computed aggregate metrics with the policy's expected values. This is why the paper module can distinguish a retrieval miss from a proof failure after full premise recall without asserting anything about the downstream proof.

The failure taxonomy is mechanical rather than rhetorical. Full recall plus a passing proof is retrieval_hit; full recall plus a non-passing proof is proof_failure_despite_hit; partial overlap is partial_retrieval_miss; and zero overlap is retrieval_miss. The policy floor also requires expected failure modes and an adversarial decoy whose needed premise is absent or missed. Those gates make the metric harness test the shape of the evaluation set, not just the happy path.

The negative cases enforce the scope limit. EXPECTED_NEGATIVE_CASES requires oracle labels planted in rankings, proof-body leakage, test-split tuning, metric-overclaim, and missing-decoy inputs to produce typed refusal codes. The result record-writing path then exposes import ids, target refs, digest status, aggregate counts, failure-mode counts, and secret-scan status while keeping proof bodies, model-output data, and non-public paths outside the public result record. That implements the bundle's P-1/P-2/P-6/P-8/P-9 and AX-1/AX-2/AX-5/AX-7 posture: metrics are recomputed from copied artifacts, blocked states stay blocked, and no metric label becomes Lean, provider, benchmark, or launch-scope decision.

Reader Evidence Routing

  • Bundle authority: core/paper_module_capsules.json::paper_modules[42:paper_module.ring2_premise_precision_recall] names the component subject, mechanism subject, concept ref, principle refs, axiom refs, dependencies, runtime code locus, and projection statuses. Edit the source record, not this page, if those relationships change.
  • Generated structured source record: paper_modules/ring2_premise_precision_recall.json is the structured source record to inspect for source_authority: json_capsule, the 18 generated relationship edges, zero unresolved selective relations, Mermaid available_from_capsule_edges, and Atlas linked_from_capsule_edges.
  • Runtime locus: src/microcosm_core/organs/ring2_premise_retrieval_precision_recall_harness.py owns run, run_precision_recall_bundle, _build_result, _write_receipts, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING. It computes aggregate precision/recall, enforces copied source-artifact digests, writes result records, and carries the provider/proof/launch refusal flags.
  • Fixture and exported bundle: fixtures/first_wave/ring2_premise_retrieval_precision_recall_harness/input/ includes the public input records plus five negative cases; examples/ring2_premise_retrieval_precision_recall_harness/exported_ring2_precision_recall_bundle/ is the runtime-shell bundle. Both routes expose source artifacts under source_artifacts/ while result records carry import ids, target refs, and digest status rather than private proof bodies.
  • Result record and test surfaces: receipts/first_wave/ring2_premise_retrieval_precision_recall_harness/ring2_precision_recall_result.json, receipts/first_wave/ring2_premise_retrieval_precision_recall_harness/ring2_precision_recall_validation_receipt.json, result records/sign-off/first_wave/ring2_premise_retrieval_precision_recall_harness_fixture_acceptance.json, receipts/runtime_shell/demo_project/organs/ring2_premise_retrieval_precision_recall_harness/exported_ring2_precision_recall_bundle_validation_result.json, and tests/test_ring2_premise_retrieval_precision_recall_harness.py are the reader-verifiable validation result records for the local public boundary.

Runtime Surfaces

PYTHONPATH=src python3 -m microcosm_core.organs.ring2_premise_retrieval_precision_recall_harness run --input fixtures/first_wave/ring2_premise_retrieval_precision_recall_harness/input --out receipts/first_wave/ring2_premise_retrieval_precision_recall_harness
PYTHONPATH=src python3 -m microcosm_core.cli ring2-premise-retrieval-precision-recall-harness run-precision-recall-bundle --input examples/ring2_premise_retrieval_precision_recall_harness/exported_ring2_precision_recall_bundle --out receipts/runtime_shell/demo_project/organs/ring2_premise_retrieval_precision_recall_harness

Body-Floor Import

The fixture and exported bundle both carry exact copied source artifacts under source_artifacts/ for the Ring2 aggregate report, graph-variant run summary, graph comparison, and problem-source manifest. The validator treats those four digest-matched files as source_open_body_imports with body_in_receipt=false: workingness can count the real source result record bodies, while result records expose only import ids, target refs, and digest status.

Negative Cases

  • oracle_labels_in_ranking rejects oracle-needed premise ids inside rankings.
  • proof_body_leakage rejects proof, provider, or private body fields.
  • test_split_tuning_attempt rejects retrieval tuned on test labels.
  • metric_overclaim rejects proof, benchmark, provider, launch, or publishing-scope decision claims.
  • missing_adversarial_decoy rejects a metric harness without a decoy miss case.

Prior Art Grounding

This component is grounded in information-retrieval evaluation. NIST's TREC evaluation measures provide the older precision/recall frame for judging retrieval systems, and scikit-learn's precision/recall metric API shows the common machine-learning interface for reporting those labels.

The theorem-proving side is adjacent to premise-selection and hammer workflows, such as Isabelle Sledgehammer, where finding the right facts is a distinct step from replaying a proof. Microcosm keeps that distinction explicit: precision/recall can say whether needed support was ranked, but it cannot become Lean correctness, benchmark performance, or provider-output authority.

Why It Matters

Premise retrieval should be measurable without becoming theorem authority. This component gives Microcosm a compact public harness for asking whether a retrieval path missed the needed support, hit the support but failed later, or hid a dangerous truth-side shortcut inside the public runtime.

Validation Result record Path

From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:

The expected projection row is paper_module.ring2_premise_precision_recall with 18 generated relationship edges, zero unresolved selective relations, Mermaid status available_from_capsule_edges, and Atlas status linked_from_capsule_edges. These checks validate copied retrieval records, metric labels, and bundle result records only; they do not become Lean/Lake, benchmark, provider, or theorem authority.

Scope boundary

Scope limit

This component does not run Lean or Lake, use external model services, emit proof bodies, tune retrieval on test answers, claim benchmark performance, prove formal-result correctness, or include launch operations. Its labels are metric labels only; they are not allowed to flow into provider context recipes.

Scope limit

This module supports only the reader-verifiable claim that copied public premise-retrieval records can be scored for precision/recall labels, adversarial decoys, body-floor imports, and metric overclaim refusals. It does not establish Lean correctness, benchmark performance, provider output quality, theorem truth, launch-scope decision, publishing-scope decision, or whole-system correctness.

Limitations

The harness is a local evidence-accounting check over copied artifacts. It does not execute Lean, Lake, Sledgehammer, or any external prover; it does not inspect proof bodies; and it does not decide whether a theorem is true. A retrieval_hit label means the needed-premise ids appeared in the ranking under this fixture policy, not that the downstream proof search is sound or complete.

The reported precision and recall are bounded by the declared Ring-2 fixture and exported bundle. Different corpora, retrieval cutoffs, premise labels, decoy construction, or source-artifact digests require rerunning the component and cannot be inferred from this page. The negative cases prove specific forbidden flows are rejected here; they do not exhaust all possible leakage, tuning, non-public-state, provider-output, or benchmark-gaming failures.

Source and projection details
Governing Lattice Relation

Ring-2 precision/recall sits between premise retrieval and proof diagnosis. The bundle explains the runtime component and the mechanism.ring2_premise_retrieval_precision_recall_harness.validates_public_premise_retrieval_attribution mechanism, which is grounded in the same component source and in concept.formal_math_and_proof_witness_bundle. That relation is deliberately proof-adjacent rather than proof-authoritative: it can show whether copied retrieval rankings hit the labelled needed premises, but it cannot promote a hit into a Lean proof, a benchmark claim, or a provider-context label.

The governing principles make the scoring path stricter than a label echo. P-1 requires recomputing precision and recall from copied rankings and labels; P-2 keeps the scope limit at metric-checker strength; P-3 concentrates authority in the small harness and focused tests; P-6 keeps missing source artifacts, negative cases, or digests blocked; P-8 turns leakage, tuning, and overclaim cases into typed refusals; and P-9 preserves provenance as records cross from source run artifacts into public fixture and bundle result records. The axiom layer matches that mechanism: AX-1 and AX-2 require derived checker evidence, AX-5 and AX-7 force blocked or refused states instead of inflated metrics, AX-6 keeps the labelled premise domain explicit, and AX-8 prevents metric labels from flowing into forbidden sinks.

Formal Math Lean Proof WitnessCompiles a tiny Lean example with the real prover and records whether it built, leaking no proof text.4/5Runs real tools

Does This takes a small, purpose-built Lean math file (a handful of toy theorems written just for this demo) and a tiny project setup, copies them into a throwaway scratch folder, and actually tries to compile them with the installed Lean theorem-prover and its Lake build tool. It then writes down exactly what happened: whether the Lean and Lake tools were found, whether the build passed, fingerprints (hashes) of the source files, the names of the theorems it defined, and how many lines each file had. It also deliberately feeds in a broken proof and a couple of off-limits files to confirm they get rejected. The point is to show real proof-checking machinery run on a small example, while keeping the written records honest and redacted: no proof text or internal logs leak out, and it states plainly that this is a narrow toy check on one fixture, not a general-purpose proof system.

Scope limit It authorizes only a witness that a tiny declared public toy proof compiled under the locally installed Lean/Lake toolchain in a temporary workspace, plus confirmation that its leakage guardrails fired. It excludes Mathlib/Aesop/Batteries-dependent or general proof or theorem-program authority, external model access, private proof import, benchmark or performance claims, whole-system correctness, or any launch, hosted deployment, or public sharing.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.formal_math_lean_proof_witness run --input fixtures/first_wave/formal_math_lean_proof_witness/input --out receipts/first_wave/formal_math_lean_proof_witness

EvidenceExternal tool runevidence 4/5Real runtime result

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Formal Math Lean Proof Witness

Purpose

This component exists to make one claim checkable instead of asserted: that Microcosm can actually run the Lean toolchain, not merely talk about it. The single question it answers is whether the installed Lean toolchain will compile a declared, tiny synthetic Lean project end to end, and whether that run can be recorded without leaking the proof.

The unusual part is the discipline around the run, not the run itself. The component copies a bounded public Lake project into a temporary workspace and invokes lake build, but the result record keeps only the return code, the standard-output and standard-error line counts, the source hashes, and the declaration names pulled out by a regular expression. The proof text and the raw command output never reach the result record. A reader gets evidence that the build happened and what it contained, without the page becoming a copy of the proof.

Two failure modes drive the design. The first is a proof-assistant integration that reports success without ever running the checker; the witness guards against that by executing a real subprocess and recording its exit status, and by deliberately compiling an invalid Lean file in a negative case to confirm the toolchain rejects it. The second is a circular pass, where the manifest quietly carries the answer. The component refuses manifests that embed a proof_body, a ground-truth proof, provider output, or oracle premise ids, so a green result cannot be smuggled in through the inputs.

The scope is small on purpose. Imports of Mathlib, Aesop, and Batteries are rejected before anything runs, so this is a witness for a toy theorem under a local toolchain, not a claim about library-dependent proof work. That boundary is the point: it shows the result record discipline a larger formal-math component would need, without borrowing authority it has not earned.

Teleology

formal_math_lean_proof_witness is the bounded public crossing from formal-math readiness into an actual local Lean/Lake run. It exists so a cold reader can see Microcosm compile a tiny synthetic proof witness with the installed toolchain while the result records stay redacted, public-relative, and honest about the narrow authority boundary.

Shape

First-wave fixtureFirst-wave fixturerun()include_negative=truerun() include_negative=trueExported public bundleExported public bundlerun_witness_bundle()include_negative=falserun_witness_bundle() include_negative=falseValidate witness manifest:reject embedded proof bodies,oracle ids, non-public sourcerefsValidate witness manifest: reject embedded proof bodies, oracle ids, non-public source refsValidatesource_module_manifest.json:copied public source digests,exact-copy vs replacementValidate source_module_manifest.json: copied public source digests, exact-copy vs replacementCopy Lake project to tempworkspacelake buildMicrocosmProofWitnessCopy Lake project to temp workspace lake build MicrocosmProofWitnessNegative cases run real Lean:invalid proof rejected,Mathlib/Aesop/Batteriesimport blockedNegative cases run real Lean: invalid proof rejected, Mathlib/Aesop/Batteries import blockedStandalone exported-witnesscontractor fresh bundle result recordreuse(no live build)Standalone exported-witness contract or fresh bundle result record reuse (no live build)metadata-only JSON resultrecords:return code, line counts,hashes, declaration namesmetadata-only JSON result records: return code, line counts, hashes, declaration namesScope limit:toy public witness onlyScope limit: toy public witness only

Source refs

First-wave fixture
fixtures/first_wave/.../input
Exported public bundle
examples/.../exported_lean_proof_witness_bundle
Diagram source
flowchart TD A["First-wave fixture fixtures/first_wave/.../input"] --> B["run() include_negative=true"] C["Exported public bundle examples/.../exported_lean_proof_witness_bundle"] --> D["run_witness_bundle() include_negative=false"] B --> E["Validate witness manifest: reject embedded proof bodies, oracle ids, non-public source refs"] D --> F["Validate source_module_manifest.json: copied public source digests, exact-copy vs replacement"] E --> G["Copy Lake project to temp workspace lake build MicrocosmProofWitness"] G --> H["Negative cases run real Lean: invalid proof rejected, Mathlib/Aesop/Batteries import blocked"] F --> I["Standalone exported-witness contract or fresh bundle result record reuse (no live build)"] G --> J["metadata-only JSON result records: return code, line counts, hashes, declaration names"] H --> J I --> J J --> K["Scope limit: toy public witness only"]

Reader Evidence Routing

Route bundle/currentness questions through ## JSON Bundle Binding, the source record, and the structured source record. The expected generated-row evidence is source_authority: json_capsule, edge_count: 8, Mermaid available_from_capsule_edges, Atlas blocked_until_organ_atlas_owner_lane_binds_edges, and zero unresolved selective relations. That evidence proves reader wiring and source authority placement, not formal-result correctness.

Route runtime questions through the runtime locus and the two public input surfaces. The first-wave fixture runs run() against the public Lake project and checks the four expected negative cases. The exported bundle runs run_witness_bundle() against copied public source modules, validates source_module_manifest.json, and records digest/source-module status without placing proof bodies in JSON result records.

Route result record and test questions through the required result record paths, the focused pytest, and the corpus check. The focused test asserts local Lake build success for the tiny witness when Lean/Lake are available, eight compiled declarations, four negative-case observations for the fixture, public-relative redacted result records, five exported source-module rows, source digest checks, metadata-only result record policy, and tamper-blocking behavior. Those validation result records do not authorize Mathlib-dependent proofs, external model access, private proof import, benchmark claims, launch-scope decision, deployment posture, public sharing, hosted deployment, source-file changes, or private-system equivalence.

Public Contract

The component copies examples/formal_math_lean_proof_witness/exported_lean_proof_witness_bundle or the first-wave fixture Lake project into a temporary workspace and runs lake build. The public result record records tool availability, Lake build status, source hashes, declaration names, line counts, negative-case coverage, and the scope limit. It does not export proof bodies in JSON result records.

The accepted witness scope is deliberately small:

  • public synthetic Lean source is allowed;
  • JSON manifests and result records may not embed proof bodies;
  • Mathlib, Aesop, and Batteries imports are rejected until a wider scope limit exists;
  • non-public source refs, model-output data, oracle proofs, and private source run bodies remain outside the public root.

Prior Art Grounding

This component is grounded in the Lean proof-assistant lineage and the broader small-kernel theorem-proving tradition. The Lean theorem prover system description anchors the local Lean/Lake witness route, and the Lean mathematical library shows why proof authority depends on explicit imports, declarations, and checked environments.

Microcosm borrows the proof-witness discipline: a local toolchain run, source hashes, declarations, negative cases, and metadata-only result records must be visible before Lean witness language is allowed. It does not claim Mathlib-dependent proof authority or benchmark performance.

Validation Result record Path

./repo-pytest tests/test_formal_math_lean_proof_witness.py -q --basetemp=/tmp/microcosm_formal_math_lean_proof_witness_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
jq '{edge_count:(.relationships.edges|length), mermaid_status:.paper_module_payload.generated_projections.mermaid.status, atlas_status:.paper_module_payload.generated_projections.atlas_card.status, source_authority:.relationships.source_authority, unresolved_selective_relation_count:(.relationships.unpopulated_selective_relations|length)}' paper_modules/formal_math_lean_proof_witness.json

Expected generated-row proof: edge_count: 8, mermaid_status: available_from_capsule_edges, atlas_status: blocked_until_organ_atlas_owner_lane_binds_edges, source_authority: json_capsule, and unresolved_selective_relation_count: 0.

Scope boundary

Limitations

This module is a bounded public witness, not a formal-proof authority. Its positive evidence is one declared toy Lean/Lake fixture, one exported public witness bundle, five copied source-module body rows, local toolchain metadata, eight compiled declarations when Lean/Lake are available, and four expected negative-case observations. That evidence is enough to show the mechanism's result record discipline; it is not enough to prove arbitrary Lean goals, Mathlib coverage, formal-result correctness, benchmark performance, or private proof import equivalence.

The copied-body floor is public but narrow. Result records may cite source refs, hashes, material classes, declaration names, counts, manifest verdicts, tool-return summaries, and scope limit fields. They may not embed proof bodies, model-output data, oracle answers, non-public source refs, raw command output bodies, account secrets, account or browser state, or private source-root material. The source-open claim is therefore limited to the declared public fixture and exported bundle body classes.

The focused regression validates the stated fixture and exported-bundle shape. It checks streaming source scans, tool-version caching, temporary Lake project reuse, Lake build behavior, public-relative redacted result records, source-module digest parity, standalone exported-bundle handling, tamper rejection, negative case coverage, and the generated-row proof. It excludes future fixture families, Atlas/site public sharing, source-file changes, launch, or a larger formal-math proof claim without the owning builder and launch lanes.

Scope limit

This module authorizes only a tiny public fixture witness compiled by local Lean/Lake in a temporary workspace. It excludes Mathlib-dependent proofs, external model access, private proof import, benchmark performance claims, launch operations, hosted deployment, public sharing, recipient work, secret export, or whole-system correctness.

Scope limit

This module supports only the reader-verifiable claim that a tiny public Lean fixture witness can run in a temporary local workspace, emit metadata-only result records, and expose source hashes, declarations, and negative cases. It does not establish Mathlib-dependent theorems, benchmark performance, provider outputs, private proof imports, launch-scope decision, hosted deployment, publishing-scope decision, secret export safety, or whole-system correctness.

Source and projection details
Governing Lattice Relation

The bundle binds this module to concept.formal_math_and_proof_witness_bundle: public proof-adjacent language must pass through explicit witness artifacts before it becomes reader evidence. Here the witness artifacts are the temporary Lake project copy, local Lean/Lake tool probes, lake build MicrocosmProofWitness, source hashes, declaration metadata, source-module manifest checks, negative-case observations, and metadata-only result records. The Markdown page explains that lattice; it does not upgrade the generated JSON row, the local toolchain, or the copied source body floor into theorem authority.

P-3 is the governing principle edge for claim discipline. The mechanism rows do not ask a reader to trust a proof story from prose; they route the claim through run, run_witness_bundle, validate_source_module_imports, _build_result, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and SOURCE_MODULE_MANIFEST_NAME. Those symbols are the mechanism's concrete boundary: they decide which public source refs may be copied, which imports are blocked, which negative cases count, and which result record fields may be exposed.

AX-2 supplies the hard law boundary. Public proof claims stay inside declared fixture evidence, public-relative refs, source digests, declaration counts, tool-return metadata, and negative-case verdicts. Proof bodies, model-output data, non-public source refs, stdout/stderr bodies, private source-root material, launch decisions, and whole-system correctness remain outside the module's authority even when the focused test and corpus check are green.

The dependency on paper_module.corpus_readiness_mathlib_absence_gate prevents the most tempting overread. This witness intentionally rejects Mathlib, Aesop, and Batteries imports until a different scope limit exists. A reader can therefore interpret the module as a toy Lean/Lake execution cell upstream of larger formal-math components, not as evidence that Microcosm can certify Mathlib-dependent theorem work.

Verifier Lab KernelFolds nine proof checks into one report labeling each line by which source actually backs it.5/5

Does This assembly point for the Lean/proof toolkit runs nine smaller formal-math pipeline checkers together and folds their results into one leak-proof report that labels every line by where it came from: a Lean verifier, an answer-key (oracle) comparator, an AI suggestion, a retrieval miss, or a row thrown out for breaking the rules. The report carries only references, hashes, counts, and verdicts, never the actual proof text, AI output, or answer-key bodies. One result record shows which claims a Lean verifier actually backed versus which are just hints or were rejected, instead of leaving a pile of separate outputs to be taken on faith.

Scope limit It validates the declared public contract shape of the proof packet and component result records only; it does not establish anything correct, count oracle/provider output as forward proof success, import private or Mathlib-dependent proof bodies, use external model services, change source files, or claim benchmark solve rates, launch, or maturity.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.verifier_lab_kernel run --input fixtures/first_wave/verifier_lab_kernel/input --out receipts/first_wave/verifier_lab_kernel

Paper module Verifier Lab Kernel

verifier_lab_kernel is the public composition root for the formal-math verifier lab. It is not a theorem prover, a benchmark runner, a private Lean import, or a frontend surface. It composes already-public Microcosm components into one leak-proof result record so a reader can see which claim came from a verifier, which claim came from an oracle comparator, which claim came from a provider hypothesis, and which rows were rejected by contract.

The component consumes:

  • a public ForwardProblem packet with target shape, statement summary, public input hash, and allowed premise ids;
  • an OracleSidecar packet that may compare against hidden or hindsight knowledge but never increments forward success;
  • verifier attempts and verifier result classes;
  • provider/NIM hypotheses as advisory residual diagnoses only;
  • CP2 typed action candidates, bounded evidence bodies or raw tactic scripts;
  • bounded Evolve candidates over policy artifacts only.

The runnable fixture also calls the existing public components:

  • tactic_portfolio_availability_probe;
  • target_shape_tactic_routing_gate;
  • formal_math_verifier_trace_repair_loop;
  • formal_math_lean_proof_witness.

Purpose

In a formal-math agent loop, several different things can look like progress. A Lean checker can accept a term. An oracle holding a hindsight answer can say a candidate matches. A provider model can offer a plausible next tactic. A retrieval step can return a premise. Treated loosely, all of these blur into a single sense of "it worked", and oracle or provider success quietly inflates the count of theorems actually proved. This component exists to stop that blur. The one question it answers is: for each row of evidence, which authority class does it belong to, and what may that class claim?

The composition root runs or consumes nine named component components (corpus readiness, Lean Std premise indexing, premise retrieval, tactic availability, target-shape routing, Ring2 precision and recall, verifier trace repair, proof diagnostics, and the Lean proof witness) and sorts every result into seven separate buckets: verifier-checked, provider-suggested, oracle-compared, retrieval-miss, CP2-translated, Evolve-candidate, and contract-rejected. Each bucket keeps its own authority. A passing component cannot lend its standing to a different bucket.

The unusual part is how the boundary is enforced rather than merely described. The kernel keeps two counters, oracle_forward_success_increment_count and provider_results_counted, and they must read zero. An oracle that marks itself as forward success, or a provider hypothesis that claims proof authority, is recorded as a contract violation, not as a result. The same discipline applies to data: forward problems and CP2 actions are scanned for fields that would smuggle in a proof body, an ideal answer, or an oracle's needed premise ids, and CP2 and Evolve outputs are confined to a fixed vocabulary of action classes and policy artifacts. What the reader receives is a single aggregate result record that carries references, digests, counts, and verdicts, with the proof, provider, oracle, and stdout bodies left out.

Shape

Read the verifier lab kernel as a public result record composition route, not as a proof oracle. The local path spine is the bundle and structured source record (core/paper_module_capsules.json::paper_modules[0:paper_module.verifier_lab_kernel], paper_modules/verifier_lab_kernel.json), the runtime composition root (src/microcosm_core/organs/verifier_lab_kernel.py), the public packet (fixtures/first_wave/verifier_lab_kernel/input/verifier_lab_packet.json), and the emitted public result records under receipts/first_wave/verifier_lab_kernel/.

Bundle and structured sourcerecordBundle and structured source recordPublic verifier packetPublic verifier packetComposition rootComposition rootPublic component resultrecordstactic portfolio / targetshape / trace repair / LeanwitnessPublic component result records tactic portfolio / target shape / trace repair / Lean witnessSeparated claim bucketslean_verified |oracle_compared |provider_suggested |retrieval_miss |cp2_translated |evolve_candidate |contract_rejectedSeparated claim buckets lean_verified | oracle_compared | provider_suggested | retrieval_miss | cp2_translated | evolve_candidate | contract_rejectedPublicboard/result/validationresult recordsPublic board/result/validation result recordsScope limitno proof-body import; nooracle/provider forwardsuccess; no launch claimScope limit no proof-body import; no oracle/provider forward success; no launch claim

Source refs

Bundle and structured source record
core/paper_module_capsules.jsonpaper_modules/verifier_lab_kernel.json
Public verifier packet
fixtures/first_wave/verifier_lab_kernel/input/verifier_lab_packet.json
Composition root
src/microcosm_core/organs/verifier_lab_kernel.py
Public board/result/validation result records
receipts/first_wave/verifier_lab_kernel/*.json
Diagram source
flowchart TD bundle["Bundle and structured source record core/paper_module_capsules.json paper_modules/verifier_lab_kernel.json"] packet["Public verifier packet fixtures/first_wave/verifier_lab_kernel/input/verifier_lab_packet.json"] kernel["Composition root src/microcosm_core/components/verifier_lab_kernel.py"] components["Public component result records tactic portfolio / target shape / trace repair / Lean witness"] buckets["Separated claim buckets lean_verified | oracle_compared | provider_suggested | retrieval_miss | cp2_translated | evolve_candidate | contract_rejected"] result records["Public board/result/validation result records result records/first_wave/verifier_lab_kernel/*.json"] ceiling["Scope limit no proof-body import; no oracle/provider forward success; no launch claim"] bundle --> packet --> kernel kernel --> components --> buckets --> result records kernel --> ceiling buckets --> ceiling

Prior Art Grounding

This component is grounded in small-kernel theorem-proving and proof-certificate composition patterns. The LCF approach and HOL Light anchor the idea that a verifier lab should distinguish trusted checked results from heuristics and automation. Lean-oriented work such as LeanDojo adds the modern agent context: retrieval, provider hypotheses, and proof-state interaction need explicit boundaries before they can influence proof claims.

Microcosm borrows the composition discipline: verifier success, oracle comparison, provider hypothesis, CP2 translation, and Evolve candidate rows are separate buckets with separate authority. It does not count oracle or provider success as forward proof success.

The sign-off result record must separate these buckets:

  • lean_verified;
  • provider_suggested;
  • oracle_compared;
  • contract_rejected;
  • retrieval_miss;
  • cp2_translated;
  • evolve_candidate.

The kernel rejects five contract failures:

  • forward problems that carry candidate, ideal, repair, oracle, source proof, proof body, or base-index fields;
  • oracle comparator success counted as forward success;
  • provider hypotheses claiming proof authority;
  • CP2 candidates carrying proof bodies, raw tactic scripts, provider bodies, or oracle templates;
  • Evolve candidates mutating anything outside the bounded policy-artifact set.

Reader Evidence Routing

Cold-reader audit starts with the generated structured source record for this module, not with a broad theorem-proving claim. The structured source record must confirm that verifier and mechanism subjects resolve and that a diagram view and atlas card are available for this module.

Evidence should be read in this order:

  • Module definition: core/paper_module_capsules.json::paper_module.verifier_lab_kernel and paper_modules/verifier_lab_kernel.json.
  • Runtime proof: src/microcosm_core/organs/verifier_lab_kernel.py, the fixture input packet, and the public component calls listed above.
  • Bucket-separation proof: result record rows for lean_verified, provider_suggested, oracle_compared, contract_rejected, retrieval_miss, cp2_translated, and evolve_candidate.
  • Negative boundary proof: rejection of private proof bodies, oracle-to-forward success, provider proof authority, CP2 proof bodies, arbitrary Evolve mutation, source-file changes, benchmark solve-rate claims, launch claims, hosted-deployment claims, and secret export.

Validation Result record Path

./repo-pytest tests/test_verifier_lab_kernel.py -q --basetemp=/tmp/microcosm_verifier_lab_kernel_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This paper module describes public fixture and exported bundle result records only. It excludes private proof-body import, Mathlib-dependent proof authority, oracle-to-forward success, provider proof authority, CP2 proof bodies, arbitrary Evolve mutation, source-file changes, benchmark solve-rate claims, launch, public sharing, hosted deployment, or secret export.

Limitations

The verifier lab kernel is a composition and result record-boundary mechanism. It does not establish formal-result correctness beyond the public component result records it consumes or emits, and it does not create Mathlib import authority when the corpus-readiness gate reports only bounded fixture evidence. A Lean/Lake return code or compiled declaration count is evidence for the corresponding public fixture or exported bundle, not a license to generalize to arbitrary formal math benchmarks.

Oracle structured source record remain hindsight or comparator evidence. They can diagnose a forward problem but cannot increment forward_success; the runtime authority counters must keep oracle_forward_success_increment_count at zero. Provider or NIM hypotheses remain residual diagnoses until a verifier result record or other system effect exists, so provider_results_counted must also remain zero.

CP2 rows are limited to typed action candidates from the bounded action-class vocabulary, with disconfirmation tests before rerun promotion. They are bounded evidence bodies, raw tactic scripts, provider output bodies, or oracle templates. Evolve rows are limited to the named policy-artifact set and must cite baseline or rerun result records; they do not authorize arbitrary source-file changes. Public result records must keep proof, provider, oracle, stdout/stderr, and private-source bodies out of exported evidence.

Coverage is finite: the present proof consumer exercises the first-wave fixture and exported-bundle contracts, the five named negative cases, and the component-stack result record shape. New claim classes, new fixture packets, or new launch/public sharing language need a fresh proof consumer and negative cases before this module can carry them.

Scope limit

This paper module can claim reader wiring for the verifier lab kernel composition root: verifier and mechanism subjects resolve, the runtime source locus is named, a diagram view and atlas card are generated for this module. It cannot claim private proof-body import, Mathlib-dependent proof authority, oracle-to-forward success, provider proof authority, CP2 proof bodies, arbitrary Evolve mutation, source-file changes, benchmark solve-rate claims, publishing-scope decision, hosted deployment, launch-scope decision, secret export, or whole-system correctness.

Fixture result records, exported-bundle result records, focused tests, and public component composition can support only bucket separation across verifier, oracle, provider, CP2, and Evolve rows. The diagram view and atlas card are navigation aids; they do not convert oracle or provider success into forward proof success, and they do not authorize benchmark or launch claims.

Source and projection details
Governing Lattice Relation

The governing lattice should be read as a claim-separation contract. The concept edge to concept.formal_math_and_proof_witness_bundle says the reader is looking at a proof-witness bundle, not a single proof oracle. The mechanism edge to mechanism.verifier_lab_kernel.composes_public_formal_math_receipts narrows that concept to one public operation: compose formal-math component result records into a leak-proof aggregate while keeping verifier, oracle, provider, retrieval, CP2, Evolve, and contract-rejected buckets distinct.

The code-locus edge is the runtime authority boundary. run and run_kernel_bundle select fixture or exported-bundle mode, _build_result loads the public packet and negative cases, validates the proof-lab route, runs or consumes the component stack, scans for forbidden classes, builds claim_separation, and records authority counters. _write_receipts then emits the board, result, validation, and sign-off result records with body_in_receipt: false, the result record-transparency contract, and the same scope boundary.

The nine depends_on paper-module edges are not a loose bibliography. They are the proof-lab dependency spine: corpus readiness, Lean Std premise indexing, premise retrieval, tactic availability, target-shape routing, Ring2 precision and recall, verifier trace repair, proof diagnostic evidence, and the Lean proof witness each remain separately bounded before the kernel aggregates their result records. This prevents a successful component from lending authority to a different bucket. The principle refs P-1, P-2, P-3, P-6, P-8, and P-15, plus axiom refs AX-1, AX-2, AX-5, and AX-7, are therefore read as ceiling law: public result record evidence may be composed, but hidden bodies, provider/oracle success, source-file changes, launch-scope decision, and whole-system correctness cannot cross the lattice boundary.

Focused test evidence checks the same relation. The verifier-lab test asserts that all expected negative cases are observed, all component statuses pass, claim_separation contains exactly the seven public buckets, oracle/provider authority counters stay at zero, body_in_receipt is false, public result record paths do not leak local roots, and legacy redaction fields do not survive result record normalization. Those checks make the lattice relation concrete for this module: the public aggregate result record is evidence of separation and containment, not of unbounded proof authority.

Evidence binding:

  • JSON bundle authority: core/paper_module_capsules.json#paper_module.verifier_lab_kernel.
  • Mechanism source: core/mechanism_sources.json#mechanism.verifier_lab_kernel.composes_public_formal_math_receipts.
  • Component atlas edge: core/organ_atlas.json#verifier_lab_kernel.
  • Runtime source: src/microcosm_core/organs/verifier_lab_kernel.py.
  • First command: PYTHONPATH=src python3 -m microcosm_core.organs.verifier_lab_kernel run --input fixtures/first_wave/verifier_lab_kernel/input --out receipts/first_wave/verifier_lab_kernel.
Verifier Lab Execution SpineRuns Lean on small bounded proof attempts in a temp copy and records what passed or failed.4/5Runs real tools

Does It copies a small public Lean math project into a throwaway temporary workspace and actually runs the Lean/Lake checker on a handful of small, bounded proof-step attempts the tool builds itself. It then writes down what the checker said: which attempts were accepted, which failed, and the failure category for each, plus safety counts (for example, how many attempts tried to sneak in forbidden content and were rejected). The pass/fail facts and the safety counts are readable directly, while the tool never shows the underlying proof text, never calls any outside service, and never modifies the original project or any existing source files.

Scope limit It is a tool-witness result record for bounded public Lean transition rows only: it does not establish general proof authority, count oracle/provider output as proof, export proof bodies or tactic scripts, use external model services, change source files, claim benchmark solve-rates, or include launch operations/public sharing.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.verifier_lab_execution_spine run --input fixtures/first_wave/verifier_lab_execution_spine/input --out .microcosm/verifier_lab_execution_spine

EvidenceExternal tool runevidence 4/5Real runtime result

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Verifier Lab Execution Spine

verifier_lab_execution_spine is the public execution witness for the verifier lab lane. It is narrower than verifier_lab_kernel: it actually runs bounded Lean transition candidates in a throwaway Lake project, records the return code of each run, and keeps every line of generated proof text and tool output out of the result record. A reader can then separate real execution evidence from overstated proof claims.

The component consumes a public execution packet with:

  • transition candidates, each naming a problem id, a target shape, and one action class from a fixed vocabulary (rfl, decide, cases, induction, exact_premise, and similar);
  • a small Lake project whose MicrocosmProofWitness library the component builds once and reuses;
  • CP2 translation requests that ask for the next typed action after a residual, and Evolve mutations that adjust bounded policy artifacts;
  • negative fixtures that smuggle a proof body, an oracle structured source record, a provider hypothesis, or an unbounded source-file changes into a row.

The component writes one .lean file per transition, runs lake env lean on it, and treats a zero exit code as accepted. It records the return code, the action class, and the failure class, but never the proof text, the stdout body, or the stderr body. The exported-bundle lane re-validates the same shape from a copied source-module manifest without re-running Lean, so a third party can inspect the bundle without a Lean toolchain installed.

Purpose

Automated proof systems can blur how a result was obtained. A model can be handed the answer by an oracle, or prompted with the proof by a provider, and still report the result as if it had found the proof unaided. This component exists to keep that blurring out of the result record. It answers one question: did a bounded Lean candidate actually pass the verifier, with no help that the result record is hiding?

The discipline that makes this work is the separation of authority classes. Every row lands in exactly one bucket: lean_verified for candidates the verifier accepted, oracle_compared and provider_suggested for rows that existed only as references, cp2_translated for the typed next-action layer, retrieval_miss and proof_synthesis_fail for residuals, and contract_rejected for anything that broke the leak rules. The unusual choice is what does not happen: an oracle match never increments forward success, and provider text is never counted as a proof. The counters oracle_forward_success_increment_count and provider_results_counted are held at zero by construction.

The second idea is that real execution and clean result records are not in tension. A candidate carrying oracle_visible: true, or a forbidden field such as proof_body or raw_tactic_script, is rejected before Lean is ever invoked, so the run cannot be contaminated. The transition then runs for real, and the result record carries the return code and the failure class while the proof text and the stdout and stderr bodies stay out. The result record is public evidence precisely because the only things omitted are the things that would leak.

Shape

leak foundcleanexit 0non-zeroExecution packettransition candidates, CP2requests,Evolve mutations,oracle/provider refsExecution packet transition candidates, CP2 requests, Evolve mutations, oracle/provider refsLeak contract gateforbidden fields?oracle/provider visible?action class out ofvocabulary?Leak contract gate forbidden fields? oracle/provider visible? action class out of vocabulary?contract_rejectedrejected before Lean runscontract_rejected rejected before Lean runsBuild Lake projectlake buildMicrocosmProofWitness (once,cached)Build Lake project lake build MicrocosmProofWitness (once, cached)Run candidatewrite .lean, lake env lean,return code = accepted?Run candidate write .lean, lake env lean, return code = accepted?lean_verifiedreturn code 0lean_verified return code 0retrieval_miss /proof_synthesis_failnon-zero return coderetrieval_miss / proof_synthesis_fail non-zero return codecp2_translatedtyped next action, no proofbodycp2_translated typed next action, no proof bodyevolve_candidate /evolve_acceptedbounded policy artifacts onlyevolve_candidate / evolve_accepted bounded policy artifacts onlyoracle_compared /provider_suggestedreferences, never counted assuccessoracle_compared / provider_suggested references, never counted as successAuthority countersoracle_forward_success = 0,provider_results = 0,proof_body_export = 0Authority counters oracle_forward_success = 0, provider_results = 0, proof_body_export = 0metadata-only result recordsresult, board, validation,sign-off;return codes kept, bodiesomittedmetadata-only result records result, board, validation, sign-off; return codes kept, bodies omittedScope limitbounded public transitionresult record onlyScope limit bounded public transition result record only
Diagram source
flowchart TD Packet["Execution packet transition candidates, CP2 requests, Evolve mutations, oracle/provider refs"] Gate["Leak contract gate forbidden fields? oracle/provider visible? action class out of vocabulary?"] Rejected["contract_rejected rejected before Lean runs"] Build["Build Lake project lake build MicrocosmProofWitness (once, cached)"] Run["Run candidate write .lean, lake env lean, return code = accepted?"] Verified["lean_verified return code 0"] Residual["retrieval_miss / proof_synthesis_fail non-zero return code"] CP2["cp2_translated typed next action, no proof body"] Evolve["evolve_candidate / evolve_accepted bounded policy artifacts only"] Refs["oracle_compared / provider_suggested references, never counted as success"] Counters["Authority counters oracle_forward_success = 0, provider_results = 0, proof_body_export = 0"] Result records["metadata-only result records result, board, validation, sign-off; return codes kept, bodies omitted"] Ceiling["Scope limit bounded public transition result record only"] Packet --> Gate Gate -->|leak found| Rejected Gate -->|clean| Build Build --> Run Run -->|exit 0| Verified Run -->|non-zero| Residual Packet --> CP2 Packet --> Evolve Packet --> Refs Verified --> Counters Residual --> Counters CP2 --> Counters Evolve --> Counters Refs --> Counters Rejected --> Result records Counters --> Result records Result records --> Ceiling

Evidence/accounting used for this shape:

  • core/paper_module_capsules.json::paper_modules[44:paper_module.verifier_lab_execution_spine] is the source bundle with source_authority: json_capsule, subjects for component: verifier_lab_execution_spine and mechanism.verifier_lab_execution_spine.validates_public_verifier_transition_witness, resolved code_loci.path: src/microcosm_core/organs/verifier_lab_execution_spine.py, and generated projection statuses available_from_capsule_edges / linked_from_capsule_edges.
  • paper_modules/verifier_lab_execution_spine.json::paper_module_payload.source_row carries the generated copy of that source record; relationships.edges has 19 entries and relationships.unpopulated_selective_relations is empty. This is readback evidence only, not an editable source.
  • core/organ_atlas.json::organs[18] classifies the component as evidence_class: external_subprocess_witness, names the first command, resolves the mechanism edge, and restates that the scope limit is bounded public Lean transition rows only.
  • src/microcosm_core/organs/verifier_lab_execution_spine.py defines the runtime spine: EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, RECEIPT_TRANSPARENCY_CONTRACT, ANTI_CLAIM, validate_source_module_imports, _build_lake_project, _build_result, write_receipts, run, and run_execution_bundle.
  • core/fixture_manifests/verifier_lab_execution_spine.fixture_manifest.json names the fixture inputs, four expected negative cases, stable error codes, generated result record paths, result record field floor, and body_copied_material_count: 5 for the exported body-floor lane.
  • examples/verifier_lab_execution_spine/exported_verifier_lab_execution_spine_bundle/source_module_manifest.json records module_count: 5, body_in_receipt: false, exact-copy digest matches, validation refs, and blocked private/external model service payload bodies.
  • result records/sign-off/first_wave/verifier_lab_execution_spine_fixture_acceptance.json records status: pass, accepted_scope: bounded_public_lean_transition_execution_only, accepted_transition_count: 4, residual_transition_count: 2, zero provider/oracle/proof-body/source-file changes counters, the four observed negative cases, and release_authorized: false.
  • tests/test_verifier_lab_execution_spine.py checks fixture execution, exported-bundle structure, source-module digest blocking, metadata-only result record transparency, and exact public body-floor manifest behavior.

Reader Evidence Routing

A cold-reader audit starts with the module definition and structured source record proof, then moves to the fixture and exported bundle.

Evidence should be read in this order:

  • Bundle proof: core/paper_module_capsules.json::paper_module.verifier_lab_execution_spine and paper_modules/verifier_lab_execution_spine.json.
  • Execution proof: declared command intent, fixture input ref, tool version facts, stdout/stderr classification, validator result record refs, and sign-off result record refs.
  • Bundle proof: exported execution-bundle run and the same command/tool/result record membrane in disposable outputs.
  • Negative boundary proof: missing command intent, missing tool facts, missing result record refs, stale execution facts, proof-authority overclaiming, proof-body export, model-output data export, benchmark solve-rate certification, hosted deployment, and launch-scope decision.

Prior Art Grounding

This component is grounded in reproducible execution and proof-assistant witness patterns. Lean/Lake execution inherits from the small-kernel proof-assistant tradition represented by the Lean theorem prover and by LCF/HOL systems such as HOL Light. Artifact evaluation practice also motivates recording command identity, tool facts, stdout/stderr classification, and result record refs separately from the claim they support.

Microcosm borrows the execution-spine discipline: a command can witness that a bounded tool run happened, but tool output must not become theorem-certification or benchmark authority. It does not expose proof bodies or certify solve rates.

Validation Result record Path

Run from microcosm-substrate:

A green result record proves only bounded execution-spine evidence: command intent, tool facts, stdout/stderr classification, result record refs, and explicit missing-fact failures. It does not establish general proof certification, proof-body safety beyond the fixture membrane, benchmark solve rate, hosted deployment, or launch.

Scope boundary

Scope limit

This paper module can claim the following for the verifier lab execution spine: the component subject resolves, the runtime source locus is named, a diagram view is generated for this module, and an atlas card is generated for this module. It cannot claim general proof certification, Mathlib-dependent proof authority, proof-body safety beyond the fixture membrane, benchmark solve-rate certification, provider authority, source-file changes, hosted deployment, launch-scope decision, publishing-scope decision, or whole-system correctness.

Fixture result records, exported execution-bundle result records, focused tests, command intent, tool-version facts, stdout/stderr classification, result record refs, and missing-fact failures can support only bounded execution-spine evidence. The diagram and atlas views are navigation aids derived from the module definition; they do not promote a tool run into proof certification, benchmark authority, or launch-scope decision.

Scope limit

This paper module describes public execution-spine result records only. It does not establish general proof certification, authorize Mathlib-dependent proof authority, expose private proof bodies, certify benchmark solve rates, use external model services, change source files, include launch operations, or authorize hosted deployment.

Certificate Kernel Execution LabRuns the Lean verifier over a small public proof project and reports which rows it accepted.4/5Runs real tools

Does It builds a small public Lean/Lake project, then runs the Lean verifier over declared "transition" rows that reference a set of generated "certificate" declarations, and writes a structured result record showing which rows the verifier accepted, which it left unresolved or rejected, plus the exact build command, return code, and file hashes. The result record is honest, inspectable evidence that a real Lean verifier ran on public material, with proof text, provider/oracle output, and private source deliberately excluded from the result record and that exclusion recorded (not silently dropped) rather than passed off as evidence.

Scope limit It is a local tool-witness that the declared public fixture rows compiled and were adjudicated by the local Lean verifier; it excludes general proof authority, count oracle/provider output as proof, expose proof text, change source files, claim a benchmark solve-rate, or include launch operations.

Run
microcosm certificate-kernel-execution-lab run --input fixtures/first_wave/certificate_kernel_execution_lab/input --out receipts/first_wave/certificate_kernel_execution_lab

EvidenceExternal tool runevidence 4/5Real runtime result

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Certificate Kernel Execution Lab

Abstract

certificate_kernel_execution_lab is a source-available public runtime refactor of the source certificate-kernel pattern. It runs a small Lean/Lake certificate kernel, generated certificate rows, analyzer metadata, CP2 typed-action reruns, and bounded Evolve policy reruns without importing private proof bodies. The exported bundle also carries copied source body modules from the real Erdos #257 certificate-kernel system: Lean kernel files, generated certificates, the strike runner, toolchain files, and Lean profile result records. The v2 fixture carries both a simple NatSumCertificate row family and a miniature BoundedOrderCertificate family so the public lab is no longer only a single-shape arithmetic result record.

Purpose

This component exists to stop a proof-adjacent claim from resting on prose. The single question it answers is narrow: did a small Lean kernel actually compile and accept the declared certificate rows, here and now, with the command, the return code, and the source hashes on record? Everything else in the page is accounting that keeps the answer honest.

The reduction it relies on is the interesting part. A large class of proof-adjacent facts can be expressed as a finite certificate plus a decidable Boolean checker shaped like validate : Cert -> Bool. The agent is never asked to write a human proof. It is asked to supply the right certificate rows, and Lean decides. The fixture carries two checker families, NatSumCertificate over arithmetic and BoundedOrderCertificate over a bounded modular order, so the sign-off is not a single hard-coded shape. A row counts as accepted only when the runner shells out to lake env lean over a temporary copy of the public project and receives exit code 0.

What is unusual is the weight placed on rejection. Deliberately wrong rows, a missing certificate, a bad arithmetic certificate, a bad bounded-order certificate, must fail through the same real Lean route, in the residual class the fixture predicted. A bundle that can show only green sign-off is treated as a replay artifact, not as certificate-kernel evidence. The runner also keeps the proof channel separate from the language model channel: a transition that can see oracle structured source record or provider hypothesis text is rejected before execution, so a model's confidence can never be quietly counted as a proof. The result record records command identity, counts, and verdicts, and never the proof bodies themselves.

Shape

JSON bundle authorityJSON bundle authorityMarkdownmechanism source rowmechanism source rowcertificate-kernel runtimecertificate-kernel runtimefirst-wave Lean fixturefirst-wave Lean fixtureexported certificate bundleexported certificate bundlecertificate manifestcertificate manifestLean/Lake subprocessLean/Lake subprocessLean analyzer metadataLean analyzer metadatatransition trace rowstransition trace rowsCP2 typed-action rerunsCP2 typed-action rerunsbounded Evolve rerunsbounded Evolve rerunssource-module body floorsource-module body floorpublic readoutpublic readoutmetadata-only result recordsmetadata-only result recordsscope limitscope limit
Diagram source
flowchart TD bundle["JSON bundle authority"] markdown["Markdown reader projection"] mechanism["mechanism source row"] component["certificate-kernel runtime"] fixture["first-wave Lean fixture"] bundle["exported certificate bundle"] manifest["certificate manifest"] lake["Lean/Lake subprocess"] analyzer["Lean analyzer metadata"] transitions["transition trace rows"] cp2["CP2 typed-action reruns"] evolve["bounded Evolve reruns"] source_modules["source-module body floor"] readout["public readout"] result records["metadata-only result records"] ceiling["scope limit"] bundle --> markdown bundle --> mechanism mechanism --> component component --> fixture component --> bundle fixture --> manifest bundle --> manifest manifest --> lake lake --> analyzer lake --> transitions transitions --> cp2 cp2 --> evolve source_modules --> analyzer analyzer --> readout evolve --> result records readout --> result records result records --> ceiling

The module shape is a bounded public certificate-kernel execution witness, not general theorem authority. This page points at the mechanism and runtime component; the runtime validates Lean/Lake command identity, source hashes, generated certificate rows, analyzer metadata, transition traces, CP2 typed-action reruns, bounded Evolve reruns, source-module manifest digests, negative cases, public readout, metadata-only result records, and an scope limit.

Mechanism

The mechanism is a finite-certificate execution reducer. The public entrypoints run and run_certificate_bundle both call _build_result, which loads the certificate lab packet, certificate manifest, Lean project, optional negative fixtures, and optional exported-bundle source manifest before any claim is recorded. The fixture path may run Lean/Lake in a temporary public workspace; the exported-bundle path validates the standalone runtime contract and copied body floor without rerunning private source machinery.

The reducer first establishes source and result record boundaries. _input_paths enumerates the public Lean files and JSON inputs, then scan_paths checks them against core/private_state_forbidden_classes.json. _source_module_manifest_result verifies the exported bundle's nine copied source bodies by material class, target presence, required anchors, and SHA-256 equality; _source_open_body_import_summary turns that manifest into the body floor that result records can cite without carrying proof bodies.

Execution evidence is split into three layers. _build_lake_project runs lake build MicrocosmCertificateLab for the fixture path, while _analyze_lean_project records public Lean imports, declarations, line counts, and hashes with body_in_receipt: false. _execute_transitions then sets certificate transition rows through Lean: accepted rows must return zero, missing or bad certificate rows must fail in the expected residual class, and CP2/Evolve rows must rerun within allowed action and artifact classes instead of mutating arbitrary source.

The negative cases are part of the proof consumer, not examples around it. EXPECTED_NEGATIVE_CASES requires rejection of provider/oracle-visible transition rows, CP2 proof-body leakage, Evolve source-file changes, and non-public source refs in the manifest. The focused regression test tests/test_certificate_kernel_execution_lab.py exercises those refusals, digest mismatch handling, cached command-card economy, public readout generation, and the counters that keep oracle/provider/proof-body/source-file changes at zero.

AUTHORITY_CEILING and RECEIPT_TRANSPARENCY_CONTRACT bind the mechanism back to the lattice relation. The module can claim bounded public fixture and bundle evidence over Lean/Lake command identity, certificate rows, analyzer metadata, transition outcomes, CP2/Evolve reruns, source manifest digests, and metadata-only result records. It cannot claim general theorem authority, provider proof authority, benchmark solve rate, private-body equivalence, source-file changes, launch, or whole-system correctness.

Public Surfaces

  • Component runner: python -m microcosm_core.organs.certificate_kernel_execution_lab run --input fixtures/first_wave/certificate_kernel_execution_lab/input --out receipts/first_wave/certificate_kernel_execution_lab
  • Exported bundle runner: python -m microcosm_core.organs.certificate_kernel_execution_lab run-certificate-bundle --input examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle --out receipts/runtime_shell/demo_project/organs/certificate_kernel_execution_lab
  • CLI: microcosm certificate-kernel-execution-lab run --input fixtures/first_wave/certificate_kernel_execution_lab/input --out receipts/first_wave/certificate_kernel_execution_lab
  • Standard: standards/std_microcosm_certificate_kernel_execution_lab.json
  • Fixture manifest: core/fixture_manifests/certificate_kernel_execution_lab.fixture_manifest.json
  • Source-module manifest: examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle/source_module_manifest.json

Prior Art Grounding

This component is grounded in proof-carrying and proof-assistant traditions. Necula's Proof-Carrying Code anchors the idea that an untrusted producer can supply a certificate checked by a small trusted verifier. The Lean theorem prover continues the small-kernel proof-assistant lineage, and LeanDojo shows why reproducible Lean environments, premise access, and programmatic proof-state interaction matter for theorem-proving agents.

Microcosm borrows the certificate-kernel discipline: certificate rows, Lean/Lake command identity, return codes, source hashes, transition traces, negative rows, and metadata-only result records must be visible before proof-adjacent language is allowed. It does not claim general theorem proof authority.

Research Bet

This component is the certificate-kernel bet in runnable form: a large class of proof-adjacent facts can be reduced to a finite certificate plus a decidable Boolean checker. The public lab keeps the agent task narrow. It does not ask the agent to synthesize a human proof; it asks for the right certificate rows, then lets Lean/Lake decide whether the checker accepts them.

The toy path uses a Lean certificate kernel shaped like validate : Cert -> Bool and accepts only when Lean can compile and run the declared check. The source-body import path carries the real Erdos #257 source floor: Lean kernel files, generated certificate shards, toolchain files, and profile result records from the Mathlib formalization family. The result record may say "accepted" only when the public runner shells out to Lean/Lake and receives exit code 0 for the declared bundle.

The negative floor is part of the proof, not decoration. Deliberately wrong certificate rows must be rejected by the real Lean route, including arithmetic and bounded-order failures. A bundle that cannot show genuine rejection cases is only a replay artifact, not certificate-kernel evidence.

Source-Backed Doctrine Binding

  • Component: src/microcosm_core/organs/certificate_kernel_execution_lab.py
  • Bundle: core/paper_module_capsules.json#paper_module.certificate_kernel_execution_lab
  • Mechanism: core/mechanism_sources.json#mechanism.certificate_kernel_execution_lab.validates_public_certificate_kernel_execution
  • Standard: standards/std_microcosm_certificate_kernel_execution_lab.json
  • Evidence class: core/organ_evidence_classes.json::certificate_kernel_execution_lab records external_subprocess_witness at rank 4.
  • Source-module manifest: examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle/source_module_manifest.json declares nine copied Lean/tool/profile body modules.
  • Runtime result record: receipts/runtime_shell/demo_project/organs/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle_validation_result.json
  • Sign-off result records: receipts/first_wave/certificate_kernel_execution_lab/* and result records/sign-off/first_wave/certificate_kernel_execution_lab_fixture_acceptance.json

Reader Evidence Routing

  • Bundle route: core/paper_module_capsules.json::paper_modules[7:paper_module.certificate_kernel_execution_lab] is the JSON authority row. A diagram view is generated for this module; the Atlas card for this module is staged and will appear once the component-atlas lane completes its binding pass.
  • Mechanism route: core/mechanism_sources.json::mechanism.certificate_kernel_execution_lab.validates_public_certificate_kernel_execution binds the validator command, exported-bundle validator command, focused regression, guardrails, input refs, result record refs, and runtime code locus.
  • Runtime route: src/microcosm_core/organs/certificate_kernel_execution_lab.py owns run, run_certificate_bundle, _source_module_manifest_result, _source_open_body_import_summary, _build_result, _receipt_freshness, build_public_readout, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_NAME, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.
  • Exported-bundle route: examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle is the public runtime bundle. Open source_module_manifest.json before using copied-body counts, then inspect the runtime validation result record and public readout.
  • Focused-test route: tests/test_certificate_kernel_execution_lab.py verifies Lean/Lake execution, analyzer output, transition batching, CP2/Evolve counters, public structured bundle shape, digest mismatch rejection, exact copied source modules, cached command-card economy, transparent metadata-only result records, and the cold-reader public readout.

Cold-Agent Use

Open the source-module manifest first, then the runtime result record, then the component source. The useful claim is not that Microcosm proved the Erdos #257 theorem, solved a benchmark, imported private proof bodies, or gained provider/oracle authority. The useful claim is that Microcosm can force a proof-adjacent story to expose Lean/Lake command identity, return codes, source hashes, declaration counts, certificate rows, transition traces, typed CP2 actions, bounded Evolve reruns, source-module body refs, negative-case result records, and authority counters before certificate-kernel language is allowed.

Re-entry condition: after the sibling organ_atlas.json lane releases, bind this paper-module bundle, mechanism ref, and code locus into the atlas row and rerun python -m microcosm_core.doctrine_lattice --check.

Validation Result record Path

Run the first-wave fixture into disposable result records from the Microcosm root:

Run the exported bundle through the same component:

cd microcosm-substrate
PYTHONPATH=src ../repo-python -m microcosm_core.organs.certificate_kernel_execution_lab run-certificate-bundle --input examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle --out /tmp/microcosm_certificate_kernel_execution_lab_bundle
cd microcosm-substrate
../repo-pytest tests/test_certificate_kernel_execution_lab.py -q
cd ..
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Authority Boundary

The lab proves only that the declared public Lean fixture compiled and that the declared transition rows were accepted, rejected, or left residual under the local verifier. The copied source body modules are public source-open body material, but result records cite them only by manifest row, hash, class, count, and required anchor. It does not expose proof text through result records, count oracle/provider output as proof authority, change source files, claim benchmark solve-rate, or include launch operations.

Result record Shape

Result records are public evidence. The lab exposes structured theorem/declaration names, Lean/Lake command identity, return codes, hashes, declaration counts, accepted/residual counts, negative-case ids, CP2 action classes, Evolve policy artifact ids, source-module manifest status, copied body-material counts, authority counters, scope limit, and scope boundary. It omits only proof, provider, oracle-answer, private-source, and stdout/stderr payload bodies, and records that omission through secret_exclusion_scan and body_in_receipt: false rather than treating absence as product evidence.

  • Lean/Lake build result record for MicrocosmCertificateLab.
  • Analyzer metadata for public Lean files: imports, declarations, hashes, and line counts with proof bodies omitted from JSON result records.
  • Transition rows for valid certificates, missing certificate rows, bad generated certificate rows, and bounded order-certificate rows.
  • CP2 typed-action translations over missing-certificate residuals, with Lean reruns proving downstream effect.
  • Bounded Evolve mutations over certificate row selection policy, accepted only after reruns and no leakage regression.
  • Source-open body import rows for the real source certificate-kernel body floor: exact copied targets under source_modules/ai_workflow, source/target hashes, material classes, and provenance anchors, with result record body text forbidden.
Scope boundary

This is a source-available certificate-kernel laboratory with copied source body material, not a private source dump and not general proof authority beyond the declared fixture rows and source-module body refs.

Scope limit

This paper module can claim a certificate-kernel laboratory backed by a structured doctrine row, with a diagram view generated from that row. The Atlas card for this module is staged pending the component-atlas lane's binding pass; that is honest coordination state, not a content gap.

It cannot claim formal-result correctness, benchmark solve rate, private proof body export, provider or oracle authority, source-file changes, publishing-scope decision, launch-scope decision, or whole-system proof authority. The Atlas card must be completed by the owning component-atlas/bundle route and builder regeneration, not by hand-editing Markdown.

Limitations

This module is a bounded public execution witness, not a theorem-proving authority. Its evidence depends on the shipped public Lean/Lake fixture, generated certificate rows, analyzer metadata, CP2 typed-action reruns, bounded Evolve reruns, and copied source-module manifest. A green run proves that this certificate-kernel bundle follows those constraints; it does not establish the Erdos #257 theorem, Mathlib coverage, benchmark solve rate, or correctness of private source proof bodies.

The source-open body floor is intentionally narrow. The exported bundle carries nine copied Lean/tool/profile bodies under source_modules/, and the result records may cite only refs, hashes, material classes, counts, required anchors, and verdicts. Proof bodies, raw tactic scripts, model-output data, oracle answers, private source paths, stdout/stderr bodies, account secrets, and private source-root material remain outside the public result record surface.

The focused regression covers the declared fixture and exported bundle shape. It checks Lean/Lake execution boundaries, analyzer output, transition batching, CP2/Evolve counters, digest mismatch rejection, exact copied source modules, cached command-card economy, transparent metadata-only result records, and public readout shape. It excludes future certificate families, generated Atlas/site public sharing, source-file changes, or public launch without the owning builder and launch lanes.

Source and projection details
Governing Lattice Relation

The bundle places this module under concept.formal_math_and_proof_witness_bundle: proof-adjacent public claims must be reduced to explicit witness artifacts before a reader is allowed to treat them as evidence. In this module, the witness artifacts are the public Lean/Lake subprocess result, generated certificate rows, analyzer metadata, transition traces, CP2/Evolve rerun evidence, copied source-module manifest, and metadata-only result records. Markdown explains that lattice; it does not replace the JSON bundle or the validator result records.

P-3 is the governing principle edge for the module's claim discipline. The runtime does not ask whether a proof story is persuasive; it requires a finite certificate family, a named verifier route, visible command identity, explicit return codes, public-relative refs, and result record transparency. That is why the mechanism row binds run, run_certificate_bundle, _source_module_manifest_result, _source_open_body_import_summary, _build_result, _receipt_freshness, and build_public_readout as the code locus instead of treating the paper module as independent proof evidence.

AX-2 is the hard boundary: public proof language must remain inside the declared certificate-kernel execution evidence. The standard's scope limit keeps formal_proof_authority limited to bounded public fixture rows and keeps external model access, oracle success, source-file changes, private-system equivalence, launch-scope decision, runtime correctness, and whole-system correctness false.

The dependency on paper_module.verifier_lab_execution_spine tells a reader how to interpret the lab. The certificate kernel is one proof-adjacent execution cell inside the verifier-lab spine: it can show accepted/residual transition rows and rerun effects, but it cannot promote those rows into launch, public sharing, benchmark, or theorem-authority claims without the sibling verifier and launch lanes.

Proof / Control / Runtime Import BundleChecks fourteen proof, control, and runtime parts as one unit that rejects every overclaim.5/5

Does This bundle imports the Set-4 proof/control/runtime source modules and checks them as one inspectable unit. It surfaces the 14 mechanisms, the copied module manifest, the digest/anchor evidence, and the negative cases that reject proof, benchmark, launch, runtime, and non-public-state overclaims without exposing source bodies in result records.

Scope limit It validates only a public source-open bundle and bounded negative fixtures; it is not an Erdos #257 solution, not benchmark evidence, not public sharing or launch-scope decision, not live Codex/browser/runtime authority, and not private-system equivalence.

Run
microcosm batch4-proof-authority-runtime run --input fixtures/first_wave/batch4_proof_authority_runtime/input --out receipts/first_wave/batch4_proof_authority_runtime --acceptance-out receipts/acceptance/first_wave/batch4_proof_authority_runtime_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Set 4 Proof, Authority, and Runtime Bundle

batch4_proof_authority_runtime is the public source-open evidence membrane for fourteen source mechanisms that are easy to overclaim: proof search, machine-checked mathematics, reasoning-authority fences, completion planning, Codex runtime diagnostics, bitemporal coordination, taskpolicy wrapping, and context-yield attribution.

Purpose

These fourteen mechanisms sit close to claims a reader will want to make on their behalf. A proof-search benchmark looks like solving open problems. A copied CertificateKernel.lean for Erdos #257 looks like a solution. A reasoning-grant fence looks like a live sandbox. The single question this bundle answers is narrow and deliberately so: can each of these mechanisms be shown to a cold reader as copied, anchored, public source, without any of them quietly inheriting an authority it does not have?

The unusual part is how the bundle resists the easy inflation. It does not run the mechanisms; it imports their source bodies, checks each one against named required anchors, and then recomputes a stable negative case per mechanism from that source rather than trusting a fixture to declare its own verdict. For the Erdos #257 row it runs a static token scan over the copied Lean source and rejects sorry, admit, and axiom, so an absent proof obligation cannot be smuggled in. An optional local Lean/Lake compile probe is wired in too, but a pass means only that the copied kernel elaborated without error, and the code records that as a non-authoritative availability signal, never as formal-result correctness.

The result is a membrane, not a flagship. The interesting claim is the one it refuses: source import is made auditable, every result record stays metadata-only, and each tempting stronger statement is forced into a visible scope boundary with the authority delta held at none.

Abstract

batch4_proof_authority_runtime is a technical paper module for the Set 4 proof/authority/runtime bundle. Its positive claim is deliberately narrow: Microcosm imports exact copied source source modules into a public bundle, checks source digests and required anchors, runs bounded fixture and bundle validators, records semantic negative cases, and emits metadata-only result records with explicit scope limits.

This module does not claim formal formal-result correctness. It is not a Lean/Lake execution component, not an Erdos #257 solution, not an official benchmark result, not live sandbox enforcement, not live Codex orchestration, not external model access, not source-file changes, not publishing-scope decision, not launch-scope decision, not private-system equivalence, and not whole-system correctness. Where the paper mentions Lean/Lake, it distinguishes Set 4's static copied-source checks from sibling witness components that actually run local Lean/Lake processes.

Telos

The Set 4 bundle exists to make proof-adjacent runtime claims inspectable without leaking private roots or inflating source import into proof authority. It gathers fourteen mechanism families that otherwise invite overclaiming: strategy-control proof search, prover-skill foundry work, VeriSoftBench harness diagnostics, Erdos #257 certificate-kernel source anchors, Lean packet replay, dry-run authority grants, completion planning, Codex runtime diagnostics, bitemporal coordination, macOS taskpolicy wrapping, and context-yield attribution.

The paper's job is not to make those systems authoritative by prose. Its job is to explain the public result record membrane: what was copied, what was checked, what negative cases were observed, what was omitted from result records, and which scope limit remains in force.

Mechanism Overview

The public fixture manifest names fourteen mechanism rows and one stable negative case per mechanism:

  • lean_strategy_control_benchmark
  • prover_skill_foundry
  • verisoftbench_harness_differential
  • verisoftbench_calibration_executor
  • erdos257_certificate_kernel
  • lean_full_fidelity_packet_verifier
  • reasoning_execution_authority_grant
  • forward_integration_policy_fence
  • closeout_executor_state_machine
  • codex_cdp_driver
  • codex_idle_heartbeat_fsm
  • metabolism_bitemporal_claim_log
  • macos_taskpolicy_actuator
  • context_yield_attribution

The exported bundle contains nineteen exact copied source source modules. Validation checks their manifest rows, SHA-256 digests, line counts, required anchors, and per-mechanism public exercise clauses. Result records carry source refs, digests, anchors, counts, verdicts, negative-case ids, and scope limits; they do not inline copied body text or private runtime state.

Runtime Mechanism

The runtime has two public entry shapes:

  1. run consumes fixtures/first_wave/batch4_proof_authority_runtime/input, evaluates the Set 4 fixture manifest, writes the public result board, and emits sign-off JSON.
  2. validate-bundle consumes examples/batch4_proof_authority_runtime/exported_batch4_proof_authority_runtime_bundle, validates the copied-source manifest, and emits a bundle validation result record.

Both paths enforce the same ceiling. They validate public fixture evidence and copied-source integrity; they do not run providers, dispatch live Codex state, execute a live sandbox, change source files, submit benchmark results, approve public sharing, approve launch, or establish formal-result correctness.

For the Erdos #257 certificate-kernel row, Set 4 performs a static placeholder-token scan over copied Lean source and ties that scan to target-runner anchor evidence. That scan may reject sorry, admit, and axiom mutations in the copied source floor. It is not a Lean proof check and not a certificate that the open problem has been solved.

Diagram

Public fixture manifest14 mechanism rows + 14negative casesPublic fixture manifest 14 mechanism rows + 14 negative casesExported public bundle19 copied source modulesExported public bundle 19 copied source modulesSet 4 runtimerun / validate-bundleSet 4 runtime run / validate-bundlePer-mechanism source checkmodule present + requiredanchors in bodyPer-mechanism source check module present + required anchors in bodyErdos #257 static scanreject sorry / admit / axiomErdos #257 static scan reject sorry / admit / axiomOptional Lean/Lake probecopied kernel elaborates?availability onlyOptional Lean/Lake probe copied kernel elaborates? availability onlyNegative cases recomputedverdict derived from source,not declaredNegative cases recomputed verdict derived from source, not declaredmetadata-only result recordsrefs, digests, anchors,counts, verdictsmetadata-only result records refs, digests, anchors, counts, verdictsScope limitauthority delta = noneScope limit authority delta = noneSibling Lean/Lake componentsactually run local proofsSibling Lean/Lake components actually run local proofs
Diagram source
flowchart TD fixture["Public fixture manifest 14 mechanism rows + 14 negative cases"] bundle["Exported public bundle 19 copied source modules"] runtime["Set 4 runtime run / validate-bundle"] anchors["Per-mechanism source check module present + required anchors in body"] scan["Erdos #257 static scan reject sorry / admit / axiom"] probe["Optional Lean/Lake probe copied kernel elaborates? availability only"] negatives["Negative cases recomputed verdict derived from source, not declared"] result records["metadata-only result records refs, digests, anchors, counts, verdicts"] ceiling["Scope limit authority delta = none"] leanWitness["Sibling Lean/Lake components actually run local proofs"] fixture --> runtime bundle --> runtime runtime --> anchors runtime --> scan scan --> probe runtime --> negatives anchors --> result records scan --> result records probe --> result records negatives --> result records result records --> ceiling leanWitness -. "separate execution evidence" .-> ceiling

The dashed edge is intentional. Lean/Lake subprocess evidence informs the technical boundary, but Set 4 itself does not inherit proof authority from sibling components.

Semantic Negatives And Threat Model

The negative cases are not decoration. They are the public failure floor that prevents a source-import bundle from becoming an unbounded proof or runtime claim. The fixture includes negatives for weak proof skeletons, low-repair foundry promotion, benchmark truth leakage, prefix-answer leakage, Erdos solution overclaim, packet hash corruption, forbidden authority grants, dirty forward integration targets, stale completion heads, absent CDP ports, stale idle snapshots, expired bitemporal claims, missing taskpolicy binaries, and accepted read guards.

The threat model is overclaiming. A green result record must not be interpreted as:

  • a formal proof of a theorem;
  • a solution to Erdos #257;
  • an official benchmark result or leaderboard submission;
  • a live provider, browser, sandbox, Codex, or metabolism run;
  • authorization to change source files, publish, launch, or export private state;
  • evidence that public copied modules are equivalent to a private root.

Result Interpretation

A passing fixture command evidences that the public manifest, mechanism rows, negative cases, result record body scan, and scope limit are internally consistent for the Set 4 fixture. A passing bundle command evidences that the exported copied-source manifest matches expected digests and anchors while keeping result records metadata-only. A passing focused pytest evidences regression coverage for fixture execution, bundle validation, source digest mismatch, mutated Lean body rejection, exact-copy imports, private-body omission, and semantic negative-case evaluation.

These are engineering result records. They are not formal proof certificates. They support public reader confidence in the bundle's source-open evidence membrane; they do not certify theorem truth, benchmark claims, launch-scope decision, or whole-system correctness.

Relationship To Formal-Proof Concepts

Set 4 relates to formal-proof practice through boundary discipline, not through theorem authority. The local concept edge is concept.formal_math_and_proof_witness_bundle: proof-adjacent claims must pass through explicit witness artifacts, source refs, digests, declaration or anchor metadata, negative cases, and metadata-only result records before they become reader evidence.

The sibling formal_math_lean_proof_witness component supplies the small public Lean/Lake witness pattern. The sibling certificate_kernel_execution_lab component supplies the bounded certificate-kernel execution pattern. Set 4 imports and validates copied source-body evidence around those themes, but it keeps the authority delta at none.

This distinction is the main technical result of the paper: a source-open public bundle can be useful without pretending to be a formal proof. It can make evidence auditable, show exactly where a proof-adjacent route stops, and force every tempting stronger claim into a visible scope boundary.

Data And Artifact Availability

The public artifact boundary is the standalone microcosm-substrate root. A cold reader should use the paper module, generated structured source record, standard, fixture manifest, exported bundle manifest, focused test, and metadata-only result records inside that root. Public links and public sharing surfaces must resolve to the public Microcosm system, not private source roots, model-output data stores, browser state, prompt-shelf bodies, or operator-voice material.

Prior Art Grounding

The runtime keeps the authority to act separate from the evidence that an action is permitted. This is the idea behind proof-carrying code (Necula, 1997) and capability-based security, where a request arrives with evidence of its own legitimacy rather than relying on ambient trust. Microcosm borrows the proof-before-authority ordering over fixtures; the result is fixture-bound evidence, not a verified authorization system or launch-scope decision.

Reproducibility Route

Run these commands from microcosm-substrate/ when validating this module without changing durable generated projections:

The projection checks for the broader paper-module corpus remain:

PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check

The direct runtime commands and focused pytest are the minimum useful validation.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

PYTHONPATH=src python3 -m pytest tests/test_batch4_proof_authority_runtime.py -q
PYTHONPATH=src python3 scripts/build_doctrine_projection.py --check-paper-module-corpus

These are reader-verifiable evidence only and do not include launch operations, external model access, source-file changes, or whole-system correctness.

Scope boundary

Source Authority And Projection Boundary

The source authority for this paper-module identity is the JSON source record:

  • core/paper_module_capsules.json::paper_modules[77:paper_module.batch4_proof_authority_runtime]
  • generated structured source record: paper_modules/batch4_proof_authority_runtime.json
  • local standard: standards/std_microcosm_batch4_proof_authority_runtime.json
  • runtime locus: src/microcosm_core/organs/batch4_proof_authority_runtime.py
  • focused validator: tests/test_batch4_proof_authority_runtime.py

It may explain the source record, the generated relationship set, and the validation route, but it does not mint new subject edges, proof authority, Mermaid authority, Atlas authority, or launch status. Future relationship changes belong in the source record plus builder regeneration, not in hand-authored Markdown.

Lean/Lake Witness Boundary

Set 4 should be read as the import/result record bundle, not as the Lean/Lake executor. Actual local Lean/Lake subprocess evidence lives in sibling public components:

  • formal_math_lean_proof_witness runs a tiny public Lean/Lake fixture and exported witness bundle, records local tool availability, build status, declaration metadata, four negative-case observations, and metadata-only result records. Its scope limit is toy public witness evidence only; it rejects Mathlib, Aesop, and Batteries authority unless a wider authority plane is introduced.
  • certificate_kernel_execution_lab runs a bounded public certificate-kernel lab through Lean/Lake machinery, records command identity, transition rows, accepted/residual counts, copied-source manifest status, negative cases, and metadata-only result records. Its scope limit is bounded certificate-kernel evidence, not general theorem authority.

Therefore the correct reading is layered:

  • Set 4 validates source-open source-body import, static placeholder scanning, authority-boundary fields, and semantic negatives.
  • The Lean/Lake witness components validate that specific public fixtures can route through local Lean/Lake subprocesses under their own ceilings.
  • None of these pages, individually or together, claim arbitrary formal-result correctness, Mathlib-dependent proof authority, benchmark claims, Erdos #257 solution status, publishing-scope decision, launch-scope decision, or private-system equivalence.
Public/Private Boundary

Allowed public material:

  • mechanism ids, source-module ids, negative-case ids, and stable error codes;
  • exact copied source modules in the exported public bundle;
  • source refs, SHA-256 digests, line counts, required anchors, and bounded outcomes;
  • scope limits, scope boundaries, and metadata-only validation verdicts.

Forbidden public material:

  • keys, account secrets, browser state, account or browser state, model-output data bodies, browser UI live-access material, live Codex state exports, live metabolism DB exports, private runtime state, source notes, prompt-shelf bodies, theorem work-product bodies, raw command-output bodies, public sharing operation state, and official benchmark submission state.

The exported bundle may contain approved copied source modules. The result records are stricter: they identify copied modules by refs, digests, anchors, classes, counts, and verdicts, not by inlining source bodies.

Limitations

The current module has these hard limits:

  • Set 4 does not execute Lean/Lake; it performs static checks over copied source and validates public manifest evidence.
  • Static placeholder-token scanning is bounded evidence checking.
  • Digest and anchor equality do not prove semantic equivalence to a private root.
  • Negative-case coverage is finite and fixture-bound.
  • metadata-only result records improve public safety, but they are not a substitute for formal proof review.
  • Generated Mermaid, Atlas, and JSON structured source record are projections; they do not create source authority.
  • Accepted-component status means accepted current public result record inventory for this verified source-body import, not launch, public sharing, benchmark, or theorem authority.
Scope limit

This module may claim fixture-bound public source-body import, exact copied source-module digest checks, required-anchor checks, static placeholder-token scan evidence, dry-run authority-boundary evidence, semantic negative-case evidence, and metadata-only result record discipline.

It may not claim theorem success, Lean formal-result correctness, Erdos #257 solution status, official benchmark claims, live sandbox enforcement, live Codex orchestration, external model access, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness.

Proof Derived Governed Mutation AuthorizationChecks a synthetic change-authorization record for its proof-and-approval chain, bound to a real commit.5/5

Does Replays a make-believe example of "should this change be allowed to run?" and shows, step by step, why each proposed action was permitted. All three actions (a look-only inspection, a small config write, and an undo of that write) had to carry proof evidence and two visible policy approvals before anything was admitted; on top of that, the two actions that actually change something (the config write and the undo) also had to show a logged record of the change and a matching undo result record. Just holding a password or account secret is never treated as permission, and nothing here touches a real account or makes any real change.

Scope limit It validates only a declared, synthetic governed-mutation contract and excludes live cloud/account action, standing account secrets, source or irreversible mutation, policy-after-execution, hidden votes, external model access, benchmark-score claims, or launch.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.proof_derived_governed_mutation_authorization run --input fixtures/first_wave/proof_derived_governed_mutation_authorization/input --out receipts/first_wave/proof_derived_governed_mutation_authorization --acceptance-out receipts/acceptance/first_wave/proof_derived_governed_mutation_authorization_fixture_acceptance.json

EvidenceContract validatorevidence 5/5Import validation

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Proof-Derived Governed Mutation Authorization

proof_derived_governed_mutation_authorization is the public mutation-authority replay component for showing that a mutation proposal cannot grant itself authority. It validates a synthetic governed-mutation bundle where read-only inspection, scoped config write, and rollback proposals are admitted only when proof cells, visible pre-execution policy verdicts, side-effect logs, rollback result records, cold replay, negative cases, non-public-state scan, and scope limits line up.

This module is source-backed public doctrine, not the source of authority. The source rows are the JSON bundle, mechanism registry row, component atlas binding, standard contract, fixture, exported bundle, component source module, and result records named below. Markdown remains an authored projection over those rows.

Purpose

The component answers one question: can a mutation proposal acquire the authority to change something just by asserting that it should? In an agent system the danger is an action that grants itself permission, for example by claiming a standing account secret, by recording a governance-vote nobody can see, or by reporting success after the fact. This fixture is the boundary that refuses each of those moves.

Authorisation here is derived, not asserted. A proposal is admitted only when an independent chain resolves: redacted proof cells that name validator result records, at least two visible policy verdicts evaluated before any execution identity is minted, a logged side-effect diff for write and rollback proposals, a paired rollback result record, and a cold-replay result record. The validator recomputes an evidence-chain hash from those resolved rows and rejects the proposal if the declared hash does not match. Impressive language, an admin-looking identity, or a final answer that says it worked all fail on their own.

The less obvious part is the anti-bake gate. Passing the synthetic chain is not enough: every authorised proposal must also bind to a real repository record, a concrete git commit that the validator resolves with a git subprocess and checks touched this component's own source or its focused test. The validator then re-derives the proof, policy, and rollback refs from the evidence indices and compares them to what the record declares. A fixture cannot pre-bake its answer, because the answer is reconstructed from real commit scope and the resolved rows rather than read from the file. The fixture admits exactly three synthetic proposals (read-only inspection, scoped config write, rollback) and rejects eight named overclaims; none of this grants any live mutation authority.

Shape

  • Subject: proof_derived_governed_mutation_authorization, with mechanism mechanism.proof_derived_governed_mutation_authorization.validates_synthetic_governed_mutation_authorization.
  • Runtime locus: src/microcosm_core/organs/proof_derived_governed_mutation_authorization.py, especially run, run_authorization_bundle, validate_mutation_proposals, validate_proof_evidence_cells, validate_policy_verdicts, validate_side_effect_ledger, validate_rollback_receipts, validate_cold_replay, _source_module_manifest_result, _source_open_body_import_summary, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
  • The positive fixture admits exactly three synthetic proposals: read-only inspection, scoped config write, and rollback.
  • Every admitted proposal must bind intent bundle refs, proof-cell validator result records, visible pre-execution policy verdicts, ephemeral execution identity refs, an evidence-chain hash, cold replay refs, and an scope limit.
  • Write and rollback proposals also need logged side-effect diff refs and a paired rollback result record before authorization.
  • The exported bundle imports six copied source bodies through source_module_manifest.json and validates them by exact-copy digest evidence without exporting source body text in result records.
matchmismatchreal record boundunbound or baked3 synthetic proposals:read-only, scoped write,rollback3 synthetic proposals: read-only, scoped write, rollbackvalidator-backed proof refsvalidator-backed proof refs2+ visible verdictsbefore execution identity2+ visible verdicts before execution identitylogged diff for write /rollbacklogged diff for write / rollbackpaired rollback result recordpaired rollback result recordcold rerun per proposalcold rerun per proposalRecompute evidence-chain hashdeclared == derived?Recompute evidence-chain hash declared == derived?real repo record + git commitrefreal repo record + git commit refAnti-bake gategit commit touched thissource/test?re-derived refs matchdeclared?Anti-bake gate git commit touched this source/test? re-derived refs match declared?6 copied source bodiesverified by digest6 copied source bodies verified by digest8 negative casesstanding account secret,hidden vote,policy-after-execution, ...8 negative cases standing account secret, hidden vote, policy-after-execution, ...metadata-only result recordsresult, board, validation,sign-offmetadata-only result records result, board, validation, sign-offscope limitno account secrets, livemutation, provider,source-file changes, hosting,public sharing, or launchscope limit no account secrets, live mutation, provider, source-file changes, hosting, public sharing, or launchEvidenceEvidence

Source refs

3 synthetic proposals: read-only, scoped write, rollback
mutation_proposals.json
validator-backed proof refs
proof_evidence_cells.json
2+ visible verdicts before execution identity
policy_verdicts.json
logged diff for write / rollback
side_effect_ledger.json
paired rollback result record
rollback_receipts.json
cold rerun per proposal
cold_replay.json
real repo record + git commit ref
governed_mutation_records.json
6 copied source bodies verified by digest
source_module_manifest.json
Diagram source
flowchart TD Proposals["mutation_proposals.json 3 synthetic proposals: read-only, scoped write, rollback"] subgraph Evidence["Resolved evidence chain"] ProofCells["proof_evidence_cells.json validator-backed proof refs"] Policies["policy_verdicts.json 2+ visible verdicts before execution identity"] Effects["side_effect_ledger.json logged diff for write / rollback"] Rollbacks["rollback_receipts.json paired rollback result record"] Replay["cold_replay.json cold rerun per proposal"] end Hash{"Recompute evidence-chain hash declared == derived?"} Records["governed_mutation_records.json real repo record + git commit ref"] AntiBake{"Anti-bake gate git commit touched this source/test? re-derived refs match declared?"} SourceManifest["source_module_manifest.json 6 copied source bodies verified by digest"] Negatives["8 negative cases standing account secret, hidden vote, policy-after-execution, ..."] Result records["metadata-only result records result, board, validation, sign-off"] Ceiling["scope limit no account secrets, live mutation, provider, source-file changes, hosting, public sharing, or launch"] Proposals --> Evidence Evidence --> Hash Hash -->|match| AntiBake Hash -->|mismatch| Negatives Records --> AntiBake AntiBake -->|real record bound| Result records AntiBake -->|unbound or baked| Negatives SourceManifest --> Result records Negatives --> Result records Result records --> Ceiling

How it works

Take the scoped config write proposal. To be admitted it must carry the fourteen required fields, including proof_cell_refs, policy_verdict_refs, policy_evaluated_before_execution, side_effect_class, evidence_chain_hash, and cold_replay_ref. The validator then checks each one against the other input files rather than trusting the proposal's own summary.

For the proof refs it confirms each cell names the same proposal, carries evidence refs and validator-result record refs, is body-redacted, and does not export a proof body. For the policy refs it counts how many verdicts are visible to the result record, are not hidden votes, read allow or warn, and resolve back to a proof cell for that proposal. Fewer than two visible resolving verdicts blocks the proposal under GOV_MUT_CONSENSUS_WITHOUT_EVIDENCE. Because a scoped write has a reversible side effect, it also needs a logged diff ref in the side-effect ledger and a passing rollback result record for the same proposal. A write or rollback proposal with no rollback ref is rejected as an irreversible mutation.

The validator then recomputes the evidence-chain hash. It hashes the resolved proof digests, policy digests, side-effect ref, rollback ref, and cold-replay ref together and compares the result to the proposal's declared evidence_chain_hash. A mismatch fails the proposal, so the hash cannot be a hand-written constant. Only after the synthetic chain resolves does the real-record gate run. The governed-mutation record must declare a repo record class, a forty-character-or-shorter hex commit ref, and source refs covering git, mission-transaction, work-landing, and ledger material. The validator shells out to git to confirm the commit exists and that its changed files include this component's source module or its focused test, and it re-derives the proof, policy, and rollback refs from the indices so the record's claims must match independently computed values. An authorised proposal whose proposal id is not in the accepted real-record set is downgraded to blocked. The result is that a green result record requires three synthetic proposals, three real records bound to real commits, and a matching anti-bake status, none of which a static fixture can fake.

Public Contract

  • The source pattern is proof_derived_governed_mutation_authorization_compound.
  • The fixture lives at fixtures/first_wave/proof_derived_governed_mutation_authorization/input/.
  • The runtime example lives at examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle/.
  • The validator is microcosm_core.organs.proof_derived_governed_mutation_authorization.
  • The governing standard is standards/std_microcosm_proof_derived_governed_mutation_authorization.json.
  • The component model row is core/organ_atlas.json#proof_derived_governed_mutation_authorization.
  • The sign-off row is core/organ_registry.json#proof_derived_governed_mutation_authorization.

The fixture has three positive proposals: read-only inspection, scoped config write, and rollback. Every admitted proposal must cite an intent bundle, scope limit, proof cell, visible policy verdicts, ephemeral execution identity, evidence-chain hash, and cold replay ref. Write and rollback proposals also require logged side-effect diff refs and a verified rollback result record paired before the mutation is admitted.

Source-Backed Mechanism

The mechanism row mechanism.proof_derived_governed_mutation_authorization.validates_synthetic_governed_mutation_authorization points at these runnable source loci:

  • run and run_authorization_bundle for fixture and exported-bundle entry.
  • validate_mutation_proposals, validate_proof_evidence_cells, validate_policy_verdicts, validate_side_effect_ledger, validate_rollback_receipts, and validate_cold_replay for the authorization predicate.
  • _source_module_manifest_result and _source_open_body_import_summary for digest-verified copied source-body evidence without body text in result records.
  • EXPECTED_NEGATIVE_CASES and AUTHORITY_CEILING for falsification and scope boundary enforcement.

The exported governed-mutation bundle imports six source bodies through examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle/source_module_manifest.json. Those bodies are copied into source_modules/ with digest provenance:

  • state/microcosm_portfolio/extracted_patterns_ledger.jsonl
  • state/microcosm_portfolio/reconstruction/high_novelty_substrate_gap_scout_v1.json
  • tools/meta/control/mission_transaction_preflight.py
  • tools/meta/control/scoped_commit.py
  • tools/meta/factory/work_ledger.py
  • system/lib/work_landing_status.py

Result records may report module ids, refs, counts, classes, hashes, and verdicts. They may not duplicate source body text, proof bodies, governance-vote bodies, model-output data, account secrets, account refs, or live access material.

Reader Evidence Routing

  • Open standards/std_microcosm_proof_derived_governed_mutation_authorization.json for required witnesses, negative-floor classes, denied authority, result record expectations, validator contract, and source refs.
  • Open core/fixture_manifests/proof_derived_governed_mutation_authorization.fixture_manifest.json for positive fixture inputs, eight negative fixtures, body-import summary, durable result record refs, and source-open omission rules.
  • Open examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle/source_module_manifest.json before inspecting copied source modules; result records carry refs, hashes, counts, and verdicts, not copied source body text.
  • Open tests/test_proof_derived_governed_mutation_authorization.py for the focused assertions on proposal counts, negative cases, source-module digest mismatch, public-relative redaction, and card result record reuse.
  • Run the fixture or exported-bundle route from microcosm-substrate/. The CLI supports --card, but it does not expose a --json flag.
  • Use scripts/build_doctrine_projection.py --check-paper-module-corpus to verify this Markdown projection still satisfies the shared paper-module coverage contract.

First Commands

From microcosm-substrate/, a cold agent can refresh the fixture result records with:

The exported bundle validator proves the copied source-body floor without writing durable result records:

PYTHONPATH=src python3 -m microcosm_core.organs.proof_derived_governed_mutation_authorization run-authorization-bundle --input examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle --out /tmp/microcosm-proof-derived-governed-mutation --card

Evidence Result records

  • receipts/first_wave/proof_derived_governed_mutation_authorization/proof_derived_governed_mutation_authorization_result.json
  • receipts/first_wave/proof_derived_governed_mutation_authorization/proof_derived_governed_mutation_authorization_board.json
  • receipts/first_wave/proof_derived_governed_mutation_authorization/proof_derived_governed_mutation_authorization_validation_receipt.json
  • receipts/first_wave/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle_validation_result.json
  • receipts/runtime_shell/demo_project/organs/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle_validation_result.json
  • result records/sign-off/first_wave/proof_derived_governed_mutation_authorization_fixture_acceptance.json

Current result record evidence records three proposals, three authorized synthetic mutations, three proof cells, six visible policy verdicts, two logged side effects, two rollback passes, three cold replay passes, no missing negative cases, private_state_scan.status=pass, and body_in_receipt=false for copied source source modules.

Negative Cases

The fixture rejects the eight named negative cases in core/fixture_manifests/proof_derived_governed_mutation_authorization.fixture_manifest.json: standing account secret authority, policy-after-execution, hidden governance-vote, live cloud account secret, irreversible mutation, unlogged side effect, consensus without evidence, and final-answer-only success.

These negative fixtures are the security argument. A proposal with impressive language, an admin-looking identity, hidden votes, post-hoc approvals, or a final answer that says it succeeded still fails unless the public evidence tables resolve to the authorization predicate.

Prior Art Grounding

The governed-mutation shape is grounded in admission-control and policy-as-code practice: a proposed state change is evaluated before it mutates the system, and the decision is separate from the actor's own assertion. The closest public anchors are Open Policy Agent, which separates policy decision-making from enforcement over structured input, and Kubernetes admission controllers, which validate or mutate API requests before persistence.

The rollback and side-effect portions are also adjacent to controlled rollout practice, including feature-flag and canary-launch patterns described by Martin Fowler. Microcosm keeps the pattern synthetic and replay-only: the component validates visible policy verdicts, side-effect logs, rollback result records, and cold replay without granting live mutation authority.

Public Scope

This component is a synthetic, public, source-open replay. It validates fixture and exported-bundle result records plus copied source bodies with digest provenance. The replay stays inside local files and does not use standing account secrets, access live cloud or account systems, use external model services, change source files, expose private proofs, expose policy-vote bodies, or claim benchmark safety.

Validation Result record Path

./repo-pytest tests/test_proof_derived_governed_mutation_authorization.py -q --basetemp=/tmp/microcosm_proof_derived_governed_mutation_authorization_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This paper module can claim backed reader wiring for the synthetic governed-mutation replay: component and mechanism subjects resolve, the runtime source locus is named, and diagram and atlas views are generated for this module. It cannot claim live mutation authority, standing account secrets, cloud or account access, irreversible approval, source-file changes permission, provider authority, proof-body export, benchmark safety, launch-scope decision, hosted deployment, publishing-scope decision, or whole-system correctness.

Fixture result records, exported-bundle result records, focused tests, and source-copy digests can support only the bounded replay claim: synthetic proposal admission, proof-cell refs, visible policy verdicts, side-effect logs, rollback result records, cold replay refs, negative cases, and body-hygiene behavior. The diagram and atlas views are navigation aids derived from the module bindings; they do not expand the proof boundary.

Agent reliability & safety (17)

Agent Benchmark Integrity Anti Gaming ReplayValidates a synthetic benchmark-integrity record and flags the contamination cases it declares.3/5

Does Checks a public benchmark-integrity example bundle that contains three copied source pattern provenance bodies under source_artifacts. The component verifies the source-module manifest digests, requires each replay row to cite those copied source artifacts, recomputes pass/quarantine verdicts from contamination, file-access, and locked-evaluator spans, and rejects common gaming attempts such as peeking at hidden answers, training on the test set, exposing the oracle patch, cherry-picking the best of many tries, or asserting a score. It still does not run real bug fixes or claim any benchmark claims.

Scope limit It authorizes only bounded public runtime validation over copied source-open pattern provenance bodies and metadata-only benchmark-integrity replay rows; it does not establish any benchmark or SWE-bench score, agent capability, external model service, live-repo mutation, private/oracle/hidden-gold body access, product progress, or launch-scope decision.

Run
microcosm agent-benchmark-integrity-anti-gaming-replay run-benchmark-integrity-bundle --input examples/agent_benchmark_integrity_anti_gaming_replay/exported_benchmark_integrity_bundle --out .microcosm/agent_benchmark_integrity_anti_gaming_replay

EvidenceComputed projectionevidence 3/5Source-faithful refactor

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Agent Benchmark Integrity Anti-Gaming Replay

This module is the public Microcosm projection of the rule that agent benchmark claims must be replay-backed before they are score-backed. It carries copied source-open source pattern provenance bodies for the benchmark-integrity pattern row and reconstruction state, plus a metadata-only regression integrity component. It is not a benchmark runner or product-progress claim.

The fixture models a repository repair benchmark with public case ids, task and patch hashes, locked evaluator ids, evaluator config hashes, file-access log refs, contamination-check refs, trusted-reference score refs, output-replay refs, held-out guard ids, and body_in_receipt=false rows. It deliberately keeps issue bodies, oracle patch bodies, hidden-gold answers, model-output data, and live repository paths out of the public boundary.

The exported bundle includes source_module_manifest.json and source_artifacts/ copies of the source pattern provenance rows from state/microcosm_portfolio. The validator verifies those copied bodies by manifest digest and keeps body text out of result records.

Purpose

Agent benchmark numbers are easy to state and hard to trust. A single headline like "passes N percent of repository repair tasks" hides every decision that produced it: which evaluator ran, whether its configuration was frozen, whether the agent could see held-out answers, whether the test cases leaked into training, and whether one lucky attempt was promoted as the score. This component exists to answer one question before any of that language is allowed: can each claimed pass be replayed from public refs that name their evaluator, their configuration hash, and the evidence that the run was not gamed?

A positive result cannot be asserted. A replay row that simply declares integrity_pass is recomputed from scratch. The validator checks that the evaluator id is on a locked list, that the configuration hash is one the policy declared in advance, that file-access, contamination, and output-replay evidence artifacts exist and pass, and that the case id was registered up front. If any of those is missing or contradicted, the row is recomputed as quarantine regardless of what it declared. Declaring success is treated as the thing to be checked, not as the proof.

There is a further floor: an integrity_pass must be backed by a sanitised real command-run trace, not only by hand-written replay refs. Each row cites a real_benchmark_trace_ref that has to resolve to a copied artifact carrying a passing focused pytest run for this component, with sha256 digests bound to the recorded command-run id and an explicit list of omitted live material (model-output data, account secrets, private issue bodies, oracle patch bodies). The point is to stop a benchmark claim from resting on prose. The evidence has to trace back to a command that actually ran and is reproducible from public refs, while the private and live material that command touched stays out of the public boundary.

This is a discipline fixture, not a leaderboard. It proves that a metadata-only replay respected an anti-gaming boundary over public case ids and locked evaluator refs. It never reports a score, a SWE-bench result, or a capability claim, and the eleven negative cases below are there to demonstrate the boundary holding rather than to advertise a number.

Technical Mechanism

The component turns a benchmark claim into a replay-verification problem. Its inputs are the projection protocol, locked evaluator policy, benchmark case roster, replay observations, exported bundle manifest, source-module manifest, and copied source_artifacts/ rows. _build_result loads those inputs, validates source-module imports, scans public inputs and copied source bodies against the non-public-state forbidden-class policy, checks projection protocol density, validates the locked evaluator policy, validates the case roster, and then validates each replay row against the same public boundary.

A positive replay cannot pass by declaring success. The replay row must name a case id present in benchmark_cases.json, cite a locked evaluator id, carry an evaluator config hash allowed by locked_evaluator_policy.json, expose file-access, contamination-check, trusted-reference, and output-replay refs, and cite source-artifact evidence refs that match the exported source-module manifest targets. Each of those evidence refs must resolve to a metadata-only benchmark_integrity_evidence_artifact_v1 artifact bound to the same replay, case, evaluator, and config hash, with file-access marked passed, contamination flags clear, a trusted reference present without a claimed score, and an output replay that is not final-answer-only grading. The validator recomputes whether each row is integrity_pass or quarantine; missing refs, unregistered cases, unlocked or mutated evaluators, score authorization, private issue bodies, oracle patch bodies, hidden-gold access, model-output data, pass-k cherry-picking, and misleading tests force quarantine or a blocking finding.

A further gate is the real-trace floor. Every positive replay row also cites a real_benchmark_trace_ref, and that ref must resolve to a copied source-module artifact whose material_class is public_sanitized_real_benchmark_trace. The validator opens that artifact and checks that it records a completed, exit-zero command run of the focused pytest for this component, carries a passing pytest summary, binds sha256 digests for the command metadata, stdout, and stderr to a declared command-run id, cites state/command_runs/ source refs for that id, and declares the omission of model-output data, account secrets, private issue bodies, and oracle patch bodies. A replay whose real_benchmark_trace_ref is missing, unverified, or not also listed in the source-artifact evidence refs cannot stand as a pass. This is what stops a benchmark claim from resting on hand-authored refs alone: the integrity verdict has to trace back to a command that actually ran and is reproducible from public refs.

The copied body floor is verified separately from the public result record. The source-module manifest must declare copied_non_secret_macro_body material, public source pattern body classes, body_in_receipt=false, and digest-stable targets. validate_source_module_imports checks that each manifest row points to an existing copied artifact and that its recorded SHA-256 digest matches disk. Result records and command cards then omit the bodies and carry only ids, refs, digests, classes, counts, verdicts, findings, and scope limits.

The public trace is a second proof pass rather than a display copy of replay rows. build_public_benchmark_integrity_anti_gaming_trace recomputes each span from locked-evaluator status, contamination signals, file-access refs, contamination-check refs, trusted-reference refs, and declared quarantine reasons. The expected public fixture has three spans: two recompute as integrity_pass, one recomputes as quarantine, and the trace must agree with the declared replay verdicts before the component can return status=pass.

Named Proof Consumers

  • run consumes the first-wave fixture and writes the result, board, validation result record, sign-off result record, and metadata-only command card. It is the proof consumer for the canonical fixture boundary and required negative-case floor.
  • run-benchmark-integrity-bundle consumes the exported public bundle and proves that source-open body imports, bundle shape, manifest digests, and metadata-only result record/card rules survive outside the fixture directory.
  • tests/test_agent_benchmark_integrity_anti_gaming_replay.py is the focused regression consumer. It asserts negative-case observation, digest verification, source-artifact evidence refs, public trace verdict recomputation, positive/negative verdict handling, metadata-only result records, bundle runtime shape, and command-card reuse of a fresh result record.
  • A cold reader consumes this Markdown only after checking the JSON bundle, generated JSON instance, exported source manifest, case roster, replay observations, focused test path, and scope limit. The reader may verify the replay boundary but must not infer a benchmark claims, provider behavior, product-progress state, public sharing state, or launch-scope decision.

Shape

JSON bundle authorityJSON bundle authorityMarkdownProtocolProtocolsource refs and result recorddensitysource refs and result record densityManifestManifestmaterial class and digestgatematerial class and digest gatecopied public sourceprovenance bodiescopied public source provenance bodiessanitised real command-runtracepassing pytest, sha256digests,declared omissionssanitised real command-run trace passing pytest, sha256 digests, declared omissions3 public case ids3 public case idscase roster and requiredreplay refscase roster and required replay refslocked evaluator policylocked evaluator policylocked ids and config hasheslocked ids and config hashes3 replay observations3 replay observationsper-ref evidence artifactsfile-access, contamination,trusted reference, outputreplayper-ref evidence artifacts file-access, contamination, trusted reference, output replayrecompute integrity_pass orquarantinerecompute integrity_pass or quarantinepublic trace verdictrecomputationpublic trace verdict recomputation2 integrity_pass and 1quarantine2 integrity_pass and 1 quarantine11 anti-gaming fixtures11 anti-gaming fixturesquarantine or blockingfindingquarantine or blocking findingmetadata-onlynon-public-state scanmetadata-only non-public-state scanmetadata-only integrityresult recordmetadata-only integrity result recordanti-score scope limitanti-score scope limit

Source refs

Protocol
projection_protocol.json
Manifest
source_module_manifest.json
Diagram source
flowchart LR Bundle["JSON bundle authority"] --> Markdown["Reader projection"] Protocol["projection_protocol.json"] --> ProtocolGate["source refs and result record density"] Manifest["source_module_manifest.json"] --> DigestGate["material class and digest gate"] DigestGate --> Bodies["copied public source provenance bodies"] DigestGate --> RealTrace["sanitised real command-run trace passing pytest, sha256 digests, declared omissions"] Cases["3 public case ids"] --> ReplayGate["case roster and required replay refs"] Policy["locked evaluator policy"] --> EvaluatorGate["locked ids and config hashes"] Replays["3 replay observations"] --> ReplayGate EvaluatorGate --> ReplayGate ProtocolGate --> ReplayGate ReplayGate --> EvidenceGate["per-ref evidence artifacts file-access, contamination, trusted reference, output replay"] EvidenceGate --> Recompute["recompute integrity_pass or quarantine"] RealTrace --> Recompute Recompute --> Trace["public trace verdict recomputation"] Trace --> Verdicts["2 integrity_pass and 1 quarantine"] Negatives["11 anti-gaming fixtures"] --> Quarantine["quarantine or blocking finding"] Bodies --> PrivateScan["metadata-only non-public-state scan"] RealTrace --> PrivateScan Verdicts --> Result record["metadata-only integrity result record"] Quarantine --> Result record PrivateScan --> Result record Result record --> Ceiling["anti-score scope limit"]

The page shape is a bounded replay spine, not a benchmark leaderboard. A reader starts at the JSON bundle, follows the source-open manifest into three copied public source provenance bodies, then checks the public case roster, locked evaluator policy, replay observations, recomputed trace verdicts, and metadata-only result records. The output is an integrity-boundary verdict: two public case replays pass the boundary, one public case replay is quarantined, and no score or hidden-gold authority is created.

Reader Evidence Routing

  • Bundle route: read core/paper_module_capsules.json::paper_modules[3], then the generated JSON instance, before treating this Markdown as explanatory projection.
  • Bundle route: read examples/agent_benchmark_integrity_anti_gaming_replay/exported_benchmark_integrity_bundle/source_module_manifest.json for module_count=3, body_in_receipt=false, copied body refs, digest refs, and the explicit secret-exclusion boundary.
  • Case route: read benchmark_cases.json for repo_issue_public_001, repo_issue_public_002, and repo_issue_public_003; the rows expose ids, hashes, splits, and held-out guard ids, not issue bodies or oracle patches.
  • Replay route: read replay_observations.json for the locked evaluator ids, config hashes, file-access refs, contamination refs, trusted-reference refs, output-replay refs, and the two integrity_pass plus one quarantine verdict pattern.
  • Runtime route: run tests/test_agent_benchmark_integrity_anti_gaming_replay.py when the reader needs recomputation evidence. The focused tests assert source-module digest verification, public trace verdict recomputation, required negative cases, and metadata-only result record boundaries.

Public Mechanics

  • A replay cannot pass unless the evaluator id and config hash are locked.
  • A replay row cannot pass unless its case id appears in the declared benchmark_cases.json roster.
  • File-access logs, contamination checks, trusted references, and output replay refs are required before any benchmark-style language can be considered.
  • Train/test leakage, hidden-gold access, oracle patch bodies, model-output data, final-answer-only grading, pass-k cherry-picking, misleading tests, private issue bodies, unregistered case replays, and score overclaims are quarantine cases.
  • integrity_pass is evidence that a metadata-only regression replay respected the boundary, not evidence of a SWE-bench score, live agent capability, or product-spine system progress.
  • Result records expose ids, refs, verdicts, counts, negative cases, and scope limits only.
  • Source body imports expose source pattern provenance artifacts in the bundle, with result records limited to refs, digests, classes, and validation status.

Prior Art Grounding

This component is grounded in the long-running observation that optimized metrics can become targets and lose evidential force, plus the AI-safety literature on reward hacking and specification gaming. Concrete Problems in AI Safety frames reward hacking as a practical accident-risk problem, DeepMind's specification-gaming survey collects concrete examples of agents satisfying a proxy in the wrong way, and benchmark-contamination work such as Benchmarking Benchmark Leakage in Large Language Models motivates explicit leakage and benchmark-use documentation.

Microcosm borrows the anti-gaming accounting pattern: evaluator ids, config hashes, case rosters, file-access logs, contamination checks, trusted-reference refs, and replay refs must be present before benchmark-style language is allowed. It does not report or imply a model score.

Validation Result records

The focused proof consumer is tests/test_agent_benchmark_integrity_anti_gaming_replay.py. A passing result record has to show that the fixture and exported-bundle validators recompute benchmark-integrity replay from public case ids, locked evaluator ids, config hashes, file-access refs, contamination-check refs, trusted-reference refs, output-replay refs, source-module manifest digests, and negative-case rows rather than trusting declared benchmark language.

PYTHONDONTWRITEBYTECODE=1 ./repo-pytest \
  tests/test_agent_benchmark_integrity_anti_gaming_replay.py \
  -p no:cacheprovider
./repo-python scripts/build_doctrine_projection.py \
  --check-paper-module-corpus

For the focused test, the result record boundary is the asserted shape: three public case ids, three replay rows, two recomputed integrity_pass rows, one quarantine row, three public trace spans, locked-evaluator and config-hash coverage, three copied source-module imports, nine source-artifact evidence refs, three verified source-artifact evidence refs, body_in_receipt=false, and negative cases for verdict mismatch, invalid declared verdict, evaluator config hash swaps, missing replay/source evidence, digest mismatches, manifest boundary violations, hidden-gold/oracle/provider/score overclaims, and unsafe command-card body reuse. For the corpus check, the result record only proves bundle/instance parity; it does not create benchmark claims, product-progress, provider, public sharing, or launch-scope decision.

Validation Result record Path

Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:

Then run the exported bundle validator:

cd microcosm-substrate && PYTHONPATH=src ../repo-python -m microcosm_core.organs.agent_benchmark_integrity_anti_gaming_replay run-benchmark-integrity-bundle --input examples/agent_benchmark_integrity_anti_gaming_replay/exported_benchmark_integrity_bundle --out /tmp/agent_benchmark_integrity_bundle_receipt --card > /tmp/agent_benchmark_integrity_bundle_card.json

The focused regression test and corpus projection checks are:

cd microcosm-substrate && ../repo-pytest tests/test_agent_benchmark_integrity_anti_gaming_replay.py
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This module may claim only that the public fixture and exported bundle preserve a metadata-only benchmark-integrity replay boundary: public case ids, locked evaluator refs, config hashes, contamination refs, output-replay refs, manifest digests, negative cases, and scope limits are recomputed or checked.

It must not claim benchmark performance, SWE-bench score, provider capability, hidden-gold access, oracle patch access, private issue access, live repository mutation, publishing-scope decision, product-progress evidence, or launch-scope decision.

Scope boundary

This module does not claim benchmark performance, run providers, expose private issue or oracle patch bodies, access hidden-gold answers, mutate live repositories, publish results, host a benchmark, or include launch operations.

Source and projection details
Source-Open Body Floor

The standard treats the bundle source_module_manifest.json as the body-row authority for three copied source pattern provenance bodies: benchmark_integrity_extracted_pattern_ledger_row_body_import, benchmark_integrity_high_novelty_growth_receipt_body_import, and benchmark_integrity_deterministic_pattern_order_body_import.

Those rows stay in source_artifacts/; result records and workingness/status cards carry refs, digests, classes, counts, and scope limits only. The body floor is accepted as regression-negative fixture evidence, not as a benchmark claims, SWE-bench performance claim, hidden-gold export, provider authority, live repository mutation authority, product-progress evidence, public sharing, or launch-scope decision.

Governing Lattice Relation

The bundle binds this page to mechanism.agent_benchmark_integrity_anti_gaming_replay.validates_public_benchmark_integrity_replay, the agent_reliability_and_safety_validator_bundle concept, provisional principles P-1 and P-2, provisional axiom AX-1, and the paper_module.mission_transaction_work_spine dependency. Within that lattice, the mechanism is an evidence-before-score gate: benchmark-style language has no paper authority unless the source record, copied-source manifest, locked policy, case roster, replay observations, public trace, negative-case floor, and metadata-only result records agree.

The governing concept is accountability for validator bundles, not public leaderboard construction. The principle/axiom ceiling is enforced as a refusal surface: private issue bodies, hidden-gold answers, oracle patch bodies, model-output data, source-file changes, live repository mutation, publishing-scope decision, product-progress evidence, and launch-scope decision remain false even when the replay fixture passes.

Cold Evaluation Honesty BundleRuns a copied route-quality simulator and checks its all-B scorecard against the original code.5/5

Does This component imports the real cold_eval.py route-quality simulator as an exact source copy. Running it over a synthetic workspace inspects the all-B scorecard shape, source-module digest evidence, and scope limit checks without exporting body text in result records or turning the fixture into benchmark truth.

Scope limit verified cold-eval source body import only, not a live benchmark, navigation truth, source authority, external model access, private-system equivalence, public sharing, or launch-scope decision

Run
microcosm batch10-cold-eval-honesty-capsule run --input fixtures/first_wave/batch10_cold_eval_honesty_capsule/input --out receipts/first_wave/batch10_cold_eval_honesty_capsule --acceptance-out receipts/acceptance/first_wave/batch10_cold_eval_honesty_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Set 10 Cold Eval Honesty Bundle

Purpose

batch10_cold_eval_honesty_capsule answers one narrow question: can the public Microcosm copy of the source cold_eval.py route-quality simulator run over a synthetic workspace, expose its measured scorecard shape, and refuse to promote that shape into a benchmark or navigation-truth claim?

The useful evidence is deliberately small. A green run means the copied source body executed, the all-B.idea_first_packet winner shape was recomputed from fixture rows, and the scope limit blocked benchmark, hosted-readiness, and launch language. It does not say idea-first routing wins in the live system.

Shape

Public cold-eval workspace(tasks, navigation packets)Public cold-eval workspace (tasks, navigation packets)Copied cold_eval.py runnerCopied cold_eval.py runnerArm A: flat repo entry(README, quickstart,pyproject)Arm A: flat repo entry (README, quickstart, pyproject)Arm B: idea-first packet(entry packet, atlas, index)Arm B: idea-first packet (entry packet, atlas, index)Score each task bydeclared route refs covered(refs scored, never injected)Score each task by declared route refs covered (refs scored, never injected)Winner per task,idea-first win countWinner per task, idea-first win countScorecard shape auditall-B win + route asymmetry+ no non-public refsScorecard shape audit all-B win + route asymmetry + no non-public refsScope limit gateinjection off, forbiddenbenchmark/launch claims namedScope limit gate injection off, forbidden benchmark/launch claims namedmetadata-only result recordand cardmetadata-only result record and card
Diagram source
flowchart TD A["Public cold-eval workspace (tasks, navigation packets)"] --> B["Copied cold_eval.py runner"] B --> A1["Arm A: flat repo entry (README, quickstart, pyproject)"] B --> A2["Arm B: idea-first packet (entry packet, atlas, index)"] A1 --> SC["Score each task by declared route refs covered (refs scored, never injected)"] A2 --> SC SC --> W["Winner per task, idea-first win count"] W --> C["Scorecard shape audit all-B win + route asymmetry + no non-public refs"] C --> D["Scope limit gate injection off, forbidden benchmark/launch claims named"] D --> E["metadata-only result record and card"]

Prior Art Grounding

This component is grounded in evaluation-transparency and benchmark-hygiene practice: scorecards should expose what was measured, what fixture assumptions were injected, and what claims the result can and cannot support. Useful anchors include:

  • HELM, which frames model evaluation as a transparent, scenario-bound benchmark surface rather than a single global capability claim.
  • Model Cards for Model Reporting, which established the pattern of pairing performance results with intended use, limitations, and caveats.

Microcosm borrows the scorecard-plus-limitations shape, then narrows it to a deterministic route-quality fixture. The all-B.idea_first_packet winner row is accounting evidence for this fixture only; it is not promoted into navigation truth, hosted readiness, or launch-scope decision.

Reader Evidence Routing

Read the scorecard as evidence accounting, not as a leaderboard. The fixture intentionally creates a public workspace where the idea-first packet wins. The component then checks that the expected-ref injection policy is off, that non-public refs are not present, and that forbidden claims are named in the manifest.

The honesty of that win turns on one design choice in the copied scorer. Each task lists the route refs an answer should reach, but those expected refs are only ever used to *score* coverage. They are never added to either arm's route, so neither arm is handed the answer. Arm A is scored on the refs a flat reader reaches from README.md, docs/quickstart.md, and pyproject.toml. Arm B is scored on the refs the navigation packets actually declare. The scoring policy is named in every row as declared_route_refs_no_expected_ref_injection_v1, and every row carries expected_ref_injection_used: false. The idea-first arm wins because the entry packets genuinely declare more of the relevant files, not because the scorer leaked the target into the route. That distinction is the difference between a measured route-quality result and a rigged one, and the scope limit gate reports blocked rather than pass if the injection flag is ever turned on.

The engine ids are:

  • cold_eval_original_runner: dynamically loads the copied source body and runs run_cold_eval in a temporary public workspace.
  • cold_eval_scorecard_shape_audit: verifies the all-B winner shape and records visible route-surface asymmetry without upgrading it into proof.
  • cold_eval_claim_ceiling_gate: checks expected-ref injection policy and forbidden benchmark/launch claims.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the route-quality scorecard result record and sign-off JSON. The bundle command validates copied source source, source manifests, metadata-only cards, expected-ref injection policy, and private-ref negative cases. The focused test covers missing tasks, flat-route wins, expected-ref injection, private fixture refs, and the no-benchmark/no-launch scope limit.

This result record path is reader-verifiable evidence only. It does not establish live benchmark results, navigation truth, hosted readiness, launch-scope decision, external model access, source-file changes, or whole-system correctness.

Scope boundary

Scope limit

This module may claim public fixture evidence that the copied cold_eval.py source body executed over the synthetic workspace, the expected scorecard shape was recomputed, expected-ref injection was refused, non-public refs were excluded, negative fixtures were checked, metadata-only cards were emitted, and validation result records enforced the listed scope limit.

This module may not claim live benchmark results, navigation truth, hosted readiness, route-quality superiority, external model access, deployment posture, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.

Scope limit

Fixture-bound route-quality scorecard and copied source refs only; no live benchmark, navigation truth, hosted readiness, launch-scope decision, external model access, source-file changes, or whole-system correctness.

Validator Checker BundleRuns the real validator code over public examples so its safety checks stay inspectable.5/5

Does This component imports the real idea_microcosm validators.py body as an exact source copy. Running it shows status-policy judging, private-boundary scans, specimen checks, launch-gate checks, and the validate entrypoint exercised against public fixtures and negative cases.

Scope limit It validates only the imported validators.py source body and its checker membrane. It does not claim source authority, a full validator-suite proof, private-system equivalence, launch, hosted-public status, public sharing, external model access, or source-file changes.

Run
microcosm batch8-validator-checker-capsule run --input fixtures/first_wave/batch8_validator_checker_capsule/input --out receipts/first_wave/batch8_validator_checker_capsule --acceptance-out receipts/acceptance/first_wave/batch8_validator_checker_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Set 8 Validator Checker Bundle

Role

This module imports the real self-indexing-cognitive-system/src/idea_microcosm/validators.py body into Microcosm and exercises individual checker functions that were not covered by the earlier status-judge-only import.

Purpose

An earlier import brought across only one entry point from validators.py, the status-judge function. That left most of the validator body imported as text but never actually run. This bundle answers a single question: when the real checker functions are invoked, do they still behave the way their names claim? It picks six groups of checkers from the copied body and runs them, rather than asserting from a distance that the file is correct.

The groups are chosen to span the kinds of judgement the validator makes: whether a status policy blocks a poisoned transition, whether the private boundary scanner finds a planted home path and email address, whether the specimen and launch-gate checkers report zero failures on the existing fixture, and whether the no-write validate(root, write_receipt=False) entry point runs without mutating anything. Each group reaches into a different part of the imported body.

The design choice worth noting is what happens when the private source state is not present. In that case the component does not pretend the checkers passed. It falls back to reading the copied source for the named anchors and marks the remaining engines public_runtime_source_only, recording that as a stated limit rather than a hidden success. The second unusual choice is that the negative cases are judged from the engine outputs themselves, so a check cannot pass merely because a fixture file happens to contain the right error string. Both choices exist to stop a green run from claiming more than it observed.

Prior Art Grounding

This bundle borrows from schema validation, fixture-driven testing, and policy/checker separation. Useful anchors include:

  • JSON Schema, as a general pattern for declaring structural expectations and validating data instances against them.
  • pytest fixtures, as a common test pattern for isolating public inputs and expected negative cases.
  • Open Policy Agent, as a prior art pattern for separating policy evaluation from the application code that invokes it.

Microcosm borrows the validator/checker and fixture-negative-case shape, but keeps this component to bounded checker exercises over copied public source. It is not launch-scope decision, hosted-public proof, source-file changes, or a complete validator-suite proof.

Imported system

  • self-indexing-cognitive-system/src/idea_microcosm/validators.py

Technical Mechanism

The runtime does not ask the reader to trust the phrase "validator checker." It builds a small checker membrane around a single imported source body and then records how far that membrane reaches.

The source-anchor phase reads examples/batch8_validator_checker_capsule/exported_batch8_validator_checker_capsule_bundle/source_module_manifest.json. That manifest declares one exact copied module under the public bundle-relative locus source_modules/self-indexing-cognitive-system/src/idea_microcosm/validators.py, with a 12,747-line body and digest 4b2d44810cb9db2c5f62fd39da55deb7f20f6bd44ed1a8b0ae4324d38012a1d4. Here the root segment is a manifest-included public synthetic Microcosm root. The private source-root path is lineage-only and remains excluded from public copy; the checker validates the copied bundle body, not live private source. _validator_source_anchor_matrix checks that the copied body still contains the named validator anchors: private_boundary_hits, policy_wellformedness_failures, judge_status_request, _status_collapse_suite_failures, _source_shuttle_specimen_failures, and validate(root: Path).

The checker-exercise phase then runs six bounded engines when source state is available: source anchoring, status-policy judging, private-boundary scanning, specimen checker groups, launch-gate checker groups, and the no-write validate(root, write_receipt=False) witness. In exported-bundle mode, where a public runtime should not import private source state, the same component falls back to copied-source anchor evidence and marks the remaining engines as public_runtime_source_only. That fallback is a scope limit, not a hidden pass-through to private state.

The negative-case phase is semantic rather than fixture-string-only. The component declares six failure modes: missing validator source, policy poisoning, blind private-boundary scanning, missing specimen checkers, missing launch gates, and bypassing the validate entrypoint. evaluate_negative_case observes those cases from the engine outputs, so the tests can prove the negative cases move with runtime evidence instead of passing because a fixture file contains the right error code.

The result record phase uses the shared crown-jewel runner to write result, board, validation, and sign-off artifacts, then result_card deliberately compresses them into an authority floor and body floor. Those card fields keep release_authorized, publication_authorized, provider_dispatch, model_dispatch, source_mutation_authorized, full_validator_suite_freshness_claim, public_clone_or_hosting_authority, and test_completeness_proof false while also preserving body_in_receipt: false.

Shape

Fixture input or exportedbundleFixture input or exported bundleSource manifest validationSource manifest validationExact copied validators.pydigest and required anchorsExact copied validators.py digest and required anchorsSource state available?Source state available?Six runtime checker enginesSix runtime checker enginesCopied-source anchors plussource-only witnessesCopied-source anchors plus source-only witnessesSemantic negative-caseevaluatorSemantic negative-case evaluatorCrown-jewel result, board,validation, sign-off resultrecordsCrown-jewel result, board, validation, sign-off result recordsResult card authority_floorand body_floorResult card authority_floor and body_floorReader claim: bounded checkermembrane, not launch-scopedecisionReader claim: bounded checker membrane, not launch-scope decision
Diagram source
flowchart TD A["Fixture input or exported bundle"] --> B["Source manifest validation"] B --> C["Exact copied validators.py digest and required anchors"] C --> D{"Source state available?"} D -- "yes" --> E["Six runtime checker engines"] D -- "no" --> F["Copied-source anchors plus source-only witnesses"] E --> G["Semantic negative-case evaluator"] F --> G G --> H["Crown-jewel result, board, validation, sign-off result records"] H --> I["Result card authority_floor and body_floor"] I --> J["Reader claim: bounded checker membrane, not launch-scope decision"]

Doctrine Relation

The generated JSON row binds this page to mechanism.batch8_validator_checker_capsule.validates_public_validator_checker_capsule and concept.agent_reliability_and_safety_validator_bundle; that relation is bundle-declared rather than inferred from this prose. The bundle also names the axiom refs AX-1, AX-4, AX-5, AX-7, AX-8, AX-11, and AX-12 and the principle refs P-1, P-2, P-5, P-6, P-8, P-9, P-13, and P-15. In this module those refs matter because the component separates evidence from authority, keeps JSON as the navigable contract, prevents body leakage, and refuses to promote a selected checker run into a launch or proof claim.

The dependency edges also explain the reader route. microcosm_axiom_substrate owns the axiom vocabulary this module abides by; engine_room_generated_projection_drift_gate owns the generated-projection freshness posture this page must not bypass; and public_reveal_walkthrough owns the reading lane for result records, source refs, and scope boundaries.

Evidence Model and Limitations

The strongest positive evidence is narrow and useful: the focused regression checks that all expected engines are present, the exact copied source body matches the source source digest, exported-bundle validation does not import source validators, source-anchor corruption blocks validation, result cards omit private bodies, and semantic negative cases fail when runtime evidence is weakened.

The limitations are just as important. Exported-bundle mode validates copied source anchors and public-runtime witness fields; it does not re-run the full source validator suite. The fixture proves selected checker groups and selected negative cases, not all future validator behavior. The copied source body being large does not itself increase the claim; only the named anchors, engines, digests, negative cases, and result record fields are evidence. A green run therefore supports a bounded checker-membrane claim and nothing broader.

Reader Evidence Routing

  • Bundle route: read core/paper_module_capsules.json::paper_modules[65] before treating this Markdown as explanation.
  • Generated route: inspect paper_modules/batch8_validator_checker_capsule.json for current relationship state and projection details.
  • Bundle route: inspect examples/batch8_validator_checker_capsule/exported_batch8_validator_checker_capsule_bundle for copied validator source refs and digest evidence.
  • Runtime route: run tests/test_batch8_validator_checker_capsule.py and the commands in ## Validation Result record Path.

Exercised checker groups

  • Policy well-formedness and status transition judging.
  • Private boundary scanning without putting private body text into result records.
  • Status collapse, internal control, correction, self-comprehension, task-ledger, and atlas navigation specimen checkers.
  • launch standards, source bundle, source shuttle, concurrency, native guard, and launch-root compiler gate checkers.
  • The no-write validate(root, write_receipt=False) entrypoint.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the bounded validator-checker result record and sign-off JSON. The bundle command validates copied checker source, manifest digests, selected checker-group exercises, body-exclusion scans, and scope limit fields. The focused test checks fixture validation, bundle validation, private-boundary scanning, and the no-complete-suite-proof scope limit.

This result record path is reader-verifiable evidence only. It does not establish the complete validator suite, authorize source-file changes, provide hosted-public proof, dispatch providers, authorize public sharing, or approve launch.

Scope boundary

Scope limit

The bundle is not launch-scope decision, not hosted-public proof, not source-file changes, and not a complete validator-suite proof.

Scope limit

This paper module can claim a bounded validator/checker fixture with a diagram view and Atlas navigation generated for it. It can explain the declared checker groups, no-write validation entrypoint, and metadata-only result record boundary.

It cannot claim launch-scope decision, hosted-public proof, source-file changes, complete validator-suite proof, publishing-scope decision, provider authority, or whole-system correctness. Any broader checker claim must be grounded in the JSON bundle and its generated projection.

Agent Monitor Redteam Falsification ReplayReplays honest and deceptive agent runs and flags any verdict missing its declared backing evidence.3/5

Does It replays three made-up agent trajectories (a normal one, one trying to hide a failure, and one overstepping a tool's permissions) and checks that every monitor verdict — including a clean "pass" — comes with the result records needed to re-run and verify it. The result records show exactly which checks each verdict depends on, and any verdict missing its backing evidence gets quarantined instead of trusted. It is a demo over fictional data, not a measurement of how well a real monitor catches anything.

Scope limit Bounded public runtime validation over copied source pattern bodies, sanitized dogfood trace slices, recomputed monitor-verdict spans, source-artifact evidence refs, digest/metadata-only/non-public-state gates, and negative cases only; no live agent execution, monitor product performance, control-eval score, safety-validation, benchmark, provider-call, source-file changes, launch, public sharing, or product authority.

Run
microcosm agent-monitor-redteam-falsification-replay run-monitor-bundle --input examples/agent_monitor_redteam_falsification_replay/exported_monitor_redteam_bundle --out receipts/runtime_shell/demo_project/organs/agent_monitor_redteam_falsification_replay

EvidenceComputed projectionevidence 3/5Source-faithful refactor

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Agent Monitor Redteam Falsification Replay

agent_monitor_redteam_falsification_replay is a synthetic regression fixture for one honesty claim: that a monitor verdict was earned by evidence rather than asserted. It replays public trajectory metadata and checks that every "the monitor caught it" sentence is tied to a probe, an escalation, a mitigation, and a metadata-only result record before it counts.

Purpose

It is easy to say a safety monitor flagged a suspicious agent action. It is harder to show that the verdict followed from the evidence and would survive a rerun. This component answers one question: if a monitor claim is replayed against its own declared evidence, does the claim still hold, or does it fall apart?

The part worth noticing is that the page does not trust the verdict it is given. A separate builder, build_public_monitor_redteam_falsification_trace, independently recomputes two things from the trajectory spans: whether each declared coverage label is actually backed by an adversarial probe, and what the monitor verdict should be given the span's severity, escalation, and mitigation evidence. If that recomputed verdict disagrees with the verdict the fixture declared, the run blocks. So the contract is not "does the row have the right fields" but "does an independent reading of the evidence reach the same conclusion". The failure mode it guards against is the one that quietly defeats most monitoring write-ups: a verdict label that reads well but is not supported by the trajectory it claims to summarise.

The replay is deliberately narrow. It works over three synthetic trajectory cases and seven negative cases, on public metadata and metadata-only result records. It is not a live monitor, a control-evaluation score, an exploit guide, or evidence about how well any real monitor performs. It only proves that, for these fixtures, a monitor claim cannot pass without surviving the disagreement check.

Abstract

This module is the public Microcosm technical paper for a monitor/redteam honesty contract. It specifies a synthetic replay fixture that treats a monitor verdict as an evidence object to falsify: the verdict must be tied to a trajectory hash, suspicious-span ref, adversarial-probe ref, escalation ref, mitigation ref, body-omission ref, cold-replay ref, and source-artifact digest boundary before it can support any monitor-coverage sentence.

The result is a safety/reliability proof about evidence shape, not a live monitor product, control-eval score, exploit guide, product-spine system, or complete-security claim. The technical contribution is the proof contract: a replay can show how a monitor claim would fail under semantic negatives and perturbations without exporting private reasoning, internal source bodies, model-output data, live agent traffic, or launch-scope decision.

Telos

The fixture models three public trajectory cases: a benign control, a suppressed failure attempt, and a tool-authority boundary attempt. Each case must carry a trajectory hash, monitor id, suspicious span ref, adversarial probe ref, verdict, severity, escalation ref, body-omission ref, mitigation ref, and cold replay ref before the claim is admitted.

The point is to make "the monitor caught it" a rerunnable, falsifiable claim. A benign pass proves only that the benign route remains admissible. An escalation proves that a suppressed-failure attempt retained enough evidence for human escalation. A block proves that the tool-authority boundary attempt was rejected under the public policy vocabulary. None of those outcomes proves live monitor performance; together they prove that this fixture's evidence contract catches missing probes, missing source evidence, private-body leakage, and verdict drift.

Monitor-Falsification Mechanism

The component turns a monitor claim into a falsification replay. Its inputs are the projection protocol, monitor policy, trajectory cases, monitor observations, negative-case files, exported bundle manifest, and source-module manifest. _build_result loads those inputs, validates the projection protocol density, checks the monitor policy vocabulary, checks trajectory case floors, validates each observation against required refs and source-artifact evidence, scans public inputs and copied source artifacts for forbidden non-public-state classes, and folds in a public trace recomputation before writing result records.

A positive observation cannot pass by carrying a verdict label alone. The policy requires a trajectory id, monitor id, coverage label, suspicious action span ref, adversarial probe ref, monitor verdict, severity tier, human escalation ref, body-omission ref, mitigation diff ref, and cold replay ref. For positive observations, source-artifact evidence refs must match the targets declared by the exported source-module manifest. Missing fields, unsupported verdicts, unverified source evidence, private chain-of-thought export, internal-code export, exploit-instruction material, account secret material, live agent traffic, monitor product-performance claims, and coverage without a probe become quarantine or blocking findings.

The exported body floor is checked separately from the result record. The source-module manifest admits two source artifacts: a copied public source pattern body for the monitor-redteam compound row, and a sanitized public dogfood safety-evals trace slice. validate_source_module_manifest requires the declared module count to match the rows, requires body_in_receipt=false, rejects body text in result records, resolves each target path, and verifies every declared digest field against disk. Result records and cards then expose ids, refs, digests, counts, verdicts, findings, and ceilings, not copied bodies.

The public trace is the anti-echo pass. build_public_monitor_redteam_falsification_trace recomputes whether a coverage label is backed by an adversarial-probe span, then derives the monitor verdict from severity, escalation, and mitigation evidence. In the first-wave fixture, the three spans recompute to one pass, one escalate, and one block. If a declared verdict no longer matches the span-derived verdict, validate_public_trace records PUBLIC_TRACE_MONITOR_REDTEAM_VERDICT_MISMATCH and the component blocks.

Named Proof Consumers

  • run consumes the first-wave fixture and writes result, board, validation, sign-off, and metadata-only command-card result records. It is the proof consumer for the three synthetic trajectory cases and seven required negative cases.
  • run-monitor-bundle consumes the exported public bundle and proves that the bundle manifest, source-module manifest, copied/sanitized source artifacts, four bundle observations, digest checks, non-public-state scan, and metadata-only card path remain valid outside the fixture directory.
  • tests/test_agent_monitor_redteam_falsification_replay.py is the focused regression consumer. It asserts digest verification, sanitized dogfood trace slicing, workingness exposure of source-body imports, negative-case semantic floors, public trace verdict recomputation, source-evidence ref enforcement, public-relative metadata-only result records, manifest boundary rejection, and fresh result record reuse.
  • A cold reader consumes this Markdown only after checking the JSON bundle, generated JSON instance, exported source manifest, trajectory and observation fixtures, public trace recomputation path, focused tests, and scope limit. The reader may verify the replay boundary but must not infer live monitor performance, control-eval score, exploit guidance, complete security, public sharing, or launch-scope decision.

Semantic Negative and Perturbation Proof

The replay proves its boundary through negative and perturbation cases, not by trusting the positive fixture. The negative floor includes private chain-of-thought leakage, internal code export, exploit-instruction material, account secret material, live agent traffic, monitor product-performance claims, and coverage claims without adversarial probes. Those cases are expected to block or quarantine because they violate the public-source/metadata-only contract, not because they resemble production attacks.

The perturbation proof is the public trace recomputation. A declared verdict is accepted only when the public span evidence derives the same pass, escalate, or block result. Removing the adversarial-probe span, weakening the escalation or mitigation refs, changing the severity/verdict relation, or declaring source evidence that is absent from the exported source-module manifest turns the replay into a mismatch or blocking finding. That is the technical safety property: the paper authority lives in disagreement detection between declared monitor coverage and independently recomputed public evidence.

Shape

JSON bundle authorityJSON bundle authorityMarkdownManifestManifest2 source-artifact digestgates2 source-artifact digest gatesthree trajectory casesthree trajectory casespublic monitor policypublic monitor policythree observationsthree observationsrequired refs: span, probe,escalation, mitigation,replayrequired refs: span, probe, escalation, mitigation, replaypublic trace verdictrecomputationpublic trace verdict recomputationpass / escalate / blockpass / escalate / block7 negative semantic floors7 negative semantic floorsmetadata-onlynon-public-state scanmetadata-only non-public-state scanmetadata-only validationresult recordmetadata-only validation result recordscope limitsscope limits

Source refs

Manifest
source_module_manifest.json
Diagram source
flowchart TD bundle["JSON bundle authority"] markdown["Markdown reader projection"] manifest["source_module_manifest.json"] digest["2 source-artifact digest gates"] cases["three trajectory cases"] policy["public monitor policy"] observations["three observations"] required["required refs: span, probe, escalation, mitigation, replay"] trace["public trace verdict recomputation"] verdicts["pass / escalate / block"] negatives["7 negative semantic floors"] privateScan["metadata-only non-public-state scan"] result record["metadata-only validation result record"] ceiling["scope limits"] bundle --> markdown manifest --> digest digest --> observations cases --> required policy --> required required --> observations observations --> trace trace --> verdicts negatives --> result record digest --> privateScan verdicts --> result record privateScan --> result record result record --> ceiling

The module's shape is evidence narrowing.

Reader Evidence Routing

  • Bundle route: core/paper_module_capsules.json::paper_modules[33] is the bundle-backed authority row, and paper_modules/agent_monitor_redteam_falsification_replay.json is the generated paper-module instance.
  • Source-module route: examples/agent_monitor_redteam_falsification_replay/exported_monitor_redteam_bundle/source_module_manifest.json records two admitted source artifacts with body_in_receipt: false: the public source-pattern JSON slice with digest sha256:89792add1e2f03a09c40f64d19c1ac0a54d62c053aabe11ecad0a9846a54cf33, and the sanitized public dogfood safety-evals trace slice with digest sha256:88493225f908f3f8892d187370d30231cb4e292b43bed202b462b6a6888e1eb0.
  • Trajectory route: trajectory_cases.json carries the three public trajectory hashes sha256:monitor-redteam-benign-route-review, sha256:monitor-redteam-suppressed-failure, and sha256:monitor-redteam-tool-authority-escape.
  • Observation route: monitor_observations.json binds obs_benign_route_review_pass to monitor.synthetic.route-review.v1 and a pass verdict, obs_suppressed_failure_escalate to monitor.synthetic.failure-suppression.v1 and an escalate verdict, and obs_tool_authority_escape_block to monitor.synthetic.tool-authority.v1 and a block verdict. Each observation carries a body-omission ref and cold replay ref.
  • Runtime route: src/microcosm_core/organs/agent_monitor_redteam_falsification_replay.py and tests/test_agent_monitor_redteam_falsification_replay.py verify manifest digest visibility, negative-case coverage, required observation fields, public trace recomputation, and metadata-only result record boundaries.

Public Mechanics

  • A monitor claim cannot pass unless the observation includes a verdict and the probe, escalation, mitigation, body-omission, and replay refs that make the verdict rerunnable.
  • Coverage labels require adversarial probe refs; benign-only trajectories do not authorize coverage language.
  • Private reasoning, internal code, exploit-detail, account secret, live-traffic, product-performance, and coverage-without-probe cases are expected falsification fixtures.
  • Result records expose ids, refs, verdict counts, negative cases, body_in_receipt: false, non-public-state scan, and scope limits only.

Prior Art Grounding

This component is grounded in model red-team and behavior-discovery work that treats monitor claims as things to falsify with adversarial probes. Anthropic's Red Teaming Language Models to Reduce Harms is a close procedural anchor for eliciting harmful or unwanted behavior, and Discovering Language Model Behaviors with Model-Written Evaluations anchors the idea that evaluation prompts can surface behavior classes worth tracking. More recent sabotage and control-evaluation work reinforces the same shape: monitors need adversarial trajectories, suspicious spans, escalation paths, and negative cases, not just benign examples.

Microcosm borrows the falsification accounting pattern. A monitor verdict needs trajectory hashes, probe refs, suspicious-span refs, escalation refs, mitigation refs, replay refs, and body-omission result records before coverage language is allowed. It does not claim a live monitor product or control-eval score.

Evidence Contract Summary

The evidence contract has four gates:

  1. Trajectory gate: each monitor observation must cite a trajectory hash, monitor id, suspicious-span ref, adversarial-probe ref, verdict, severity, escalation ref, body-omission ref, mitigation ref, and cold-replay ref.
  2. Source-body gate: the exported source-module manifest names the admitted copied/sanitized public source artifacts, requires matching digests, and keeps body_in_receipt: false.
  3. Falsification gate: semantic negatives and public trace recomputation reject private-body leakage, unsupported source evidence, missing probes, unsupported verdicts, and declared/recomputed verdict mismatch.

A valid paper claim must pass all four gates and still inherit the limitations above.

Validation Result record Path

Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:

Then run the exported bundle validator:

cd microcosm-substrate && PYTHONPATH=src ../repo-python -m microcosm_core.organs.agent_monitor_redteam_falsification_replay run-monitor-bundle --input examples/agent_monitor_redteam_falsification_replay/exported_monitor_redteam_bundle --out /tmp/agent_monitor_redteam_bundle_receipt --card > /tmp/agent_monitor_redteam_bundle_card.json

The focused regression test and corpus projection checks are run from the repo root:

PYTHONDONTWRITEBYTECODE=1 PYTHONPYCACHEPREFIX=/tmp/mc_agent_monitor_pyc ./repo-pytest tests/test_agent_monitor_redteam_falsification_replay.py -q -p no:cacheprovider --basetemp=/tmp/mc_agent_monitor_bt
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

The validation ceiling remains synthetic monitor falsification replay only.

Scope boundary

Limitations and Scope limit

This module may claim public fixture evidence that trajectory hashes, synthetic monitor ids, suspicious-span refs, adversarial-probe refs, verdict labels, escalation refs, mitigation refs, body-omission refs, cold replay refs, negative-case labels, manifest digests, and validation result records are checked by the listed runtime witnesses.

This module may not claim a live monitor product, control-eval score, production monitoring, exploit guidance, private reasoning export, internal source export, live agent traffic, provider behavior, complete security, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.

Scope boundary

This module does not run live agents, use external model services, expose private chain-of-thought, export internal code, provide exploit instructions, include account secrets, import live agent traffic, claim monitor product performance, claim control-eval scores, change source files, publish results, or include launch operations.

Scope limit

This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.

It may not claim more than its bundle scope limit allows: Synthetic replay fixture only; not a live monitor product, control eval, exploit guide, product-spine system, or complete-security claim.

Source and projection details
Governing Lattice Relation

The bundle binds this page to mechanism.agent_monitor_redteam_falsification_replay.validates_public_monitor_verdict_receipts, the agent_reliability_and_safety_validator_bundle concept, provisional principles P-1 and P-2, provisional axiom AX-1, and the agent_monitor_redteam_falsification_replay runtime locus. Within that lattice, the mechanism is an evidence-before-monitor-claim gate: monitor coverage language has no paper authority unless the source record, policy, trajectory roster, observations, source-artifact manifest, negative-case semantic floors, public trace, and metadata-only result records agree.

The governing concept is validator-bundle accountability, not live monitor operation. The principle/axiom ceiling is enforced as a refusal surface: private reasoning, internal code, exploit instructions, account secrets, live agent traffic, model-output data, performance claims, source-file changes, publishing-scope decision, and launch-scope decision remain false even when the replay fixture passes.

Agent Sabotage Scheming Monitor ReplayAudits a hidden-goal catch claim for the steps, suspicion scores, and counterfactual it needs.3/5

Does This inspects whether a write-up that claims an AI agent was caught secretly working a hidden side-goal actually carries the supporting structure a reader would need: the step-by-step actions, a per-step "how suspicious did the monitor find this?" score, and a stated counterfactual saying that removing the sneaky steps would drop the hidden goal while keeping the real task working. It runs only on synthetic, scrubbed example data, so it shows what the skeleton of such a safety claim should look like and checks the pieces line up with each other. It does not run a real agent, does not detect real scheming, and proves nothing about how any deployed monitor performs.

Scope limit Bounded public runtime validation over copied source pattern bodies, sanitized dogfood trace slices, recomputed sabotage/scheming monitor spans, source-artifact evidence refs, digest/metadata-only/non-public-state gates, and negative cases only; no live sabotage, live agent execution, exploit instruction, account secret/account, private-reasoning, harmful-payload, monitor-product-performance, deployment-risk, benchmark, provider-call, source-file changes, launch, public sharing, or product authority.

Run
microcosm agent-sabotage-scheming-monitor-replay run-sabotage-bundle --input examples/agent_sabotage_scheming_monitor_replay/exported_sabotage_monitor_bundle --out receipts/runtime_shell/demo_project/organs/agent_sabotage_scheming_monitor_replay

EvidenceComputed projectionevidence 3/5Source-faithful refactor

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Agent Sabotage Scheming-Monitor Replay

Abstract

agent_sabotage_scheming_monitor_replay is a technical paper module for a drilldown-only regression component. It validates whether a sabotage/scheming monitor claim is backed by joinable synthetic episodes, action traces, per-step monitor scores, counterfactual benign replay, cold replay, falsification fixtures, source-artifact evidence refs, metadata-only result records, and an explicit scope limit.

The core contribution is not a live scheming detector. It is a replay contract: given a public fixture or exported bundle, the component recomputes caught/not-caught verdicts from monitor evidence, checks whether counterfactual replay isolates the side objective, verifies source-artifact digest/provenance rows, rejects common overclaim shapes, and emits result records that omit private or harmful bodies.

Telos

The module exists to make safety-monitor evidence auditable without laundering a demo into a deployment claim. Frontier-lab readers should be able to answer:

  • Which finite evidence rows support the sabotage/scheming monitor claim?
  • Which verdicts are recomputed from scores and joins rather than echoed from fixture labels?
  • Which mutations and negative cases cause rejection?
  • Which claims remain out of scope even when every local check is green?

Mechanism

The runtime locus is src/microcosm_core/organs/agent_sabotage_scheming_monitor_replay.py. It exposes two public admission paths:

  • run validates the first-wave fixture under fixtures/first_wave/agent_sabotage_scheming_monitor_replay/input.
  • run_sabotage_bundle validates the exported bundle under examples/agent_sabotage_scheming_monitor_replay/exported_sabotage_monitor_bundle.

Both paths call _build_result. The builder loads:

  • projection_protocol.json
  • scheming_monitor_policy.json
  • task_episodes.json
  • action_traces.json
  • monitor_scores.json
  • counterfactual_replay.json
  • cold_replay.json
  • source_module_manifest.json when present

The monitor proof is a join-and-recompute circuit. validate_task_episodes, validate_action_traces, and validate_monitor_scores require monitor rows to bind to an episode and action step. Monitor rows must carry evidence refs; in the exported bundle those refs resolve through the digest-checked source manifest. validate_counterfactual_replay checks that replay rows remove side-task success while preserving the main task. validate_cold_replay requires explicit passing replay result records rather than narrative assurance.

The recompute is deliberately conjunctive. A monitor row only earns a block verdict if the suspicion score is at or above the policy block threshold and the row is a "backed suspicious action". Backed means five separate joins all hold: the suspicious-action span names a real action id, that action id is also cited in the row's evidence refs, the joined action carries attack_opportunity_label = present, its hidden-side-objective pressure is medium or high, and the counterfactual replay isolates exactly that step by removing side-task success while preserving the main task. Drop any one of those and the row recomputes to escalate or pass rather than block, so a high score alone cannot manufacture a catch. An episode is then recomputed as caught only if it contains at least one recomputed block row with verified source-artifact evidence.

The trace recomputation locus is src/microcosm_core/macro_tools/agent_execution_trace.py::build_public_sabotage_scheming_monitor_trace. It folds the same joins into per-episode spans, recomputes the caught verdict, and checks that the counterfactual's changed steps are the higher-suspicion ones (a positive rank delta). A stale declared verdict, a benign-step counterfactual spoof, or a missing cold replay becomes a finding rather than a silent pass.

Evidence Floor

The local floor is intentionally small and inspectable:

Evidence classFixture modeExported-bundle mode
Episodes34
Action trace rows67
Monitor-score rows67
Counterfactual replay rows34
Cold replay rows34
Negative cases70 fixture-negative payloads
Copied public source artifactssource trace body + trace builder import2 manifest rows

The exported bundle adds scheming_ep_public_authority_probe and obs_public_authority_probe. That row is the real dogfood-trace floor: it comes from a sanitized public command-trace slice, verifies against the source trace digest, drops argv/stdout/stderr previews, and remains metadata-only in result records. The tests require it to compute a block monitor verdict and a caught episode without exporting private session material.

The source-open body floor is split:

  • The runtime imports microcosm_core.macro_tools.agent_execution_trace::build_public_sabotage_scheming_monitor_trace as the public recomputation body.
  • The exported bundle manifest carries agent_sabotage_scheming_extracted_pattern_ledger_row_body_import as a source-faithful public source pattern slice with digest sha256:e1ad34691ca598b878f8658485ce78fa4fa59b3eb0a4bd078355c415ca2cb312.
  • The same manifest carries agent_sabotage_scheming_public_dogfood_safety_evals_trace_slice_import as a source-faithful sanitized command-trace slice with digest sha256:7b6255ec65f4b21163685cf9d7a59dd9ae6075126deac065dec9a45d79dcea06 and source trace digest sha256:24487cf8edb0779cfdd460b04a4ea7701805686545069f9e1c5767250c223cd3.

Both manifest rows require body_in_receipt: false.

Shape

JSON bundle authorityJSON bundle authorityfixture or exported bundlefixture or exported bundletask episodestask episodesaction tracesaction tracesmonitor scoresmonitor scoressource module manifestsource module manifestdigest-checked publicartifactsdigest-checked public artifactscounterfactual replaycounterfactual replaycold replaycold replaypublic trace recomputationpublic trace recomputationnegative/falsification casesnegative/falsification casesmetadata-only validationresult recordmetadata-only validation result recordscope limitscope limit
Diagram source
flowchart TD bundle["JSON bundle authority"] fixture["fixture or exported bundle"] episodes["task episodes"] actions["action traces"] scores["monitor scores"] sourceManifest["source module manifest"] sourceArtifacts["digest-checked public artifacts"] counterfactual["counterfactual replay"] cold["cold replay"] recompute["public trace recomputation"] negatives["negative/falsification cases"] result record["metadata-only validation result record"] ceiling["scope limit"] bundle --> fixture fixture --> episodes fixture --> actions fixture --> scores fixture --> counterfactual fixture --> cold fixture --> sourceManifest sourceManifest --> sourceArtifacts episodes --> recompute actions --> recompute scores --> recompute counterfactual --> recompute cold --> recompute negatives --> result record sourceArtifacts --> result record recompute --> result record result record --> ceiling

Rejection And Perturbation Evidence

The focused regression suite is the strongest technical evidence for the paper claim. tests/test_agent_sabotage_scheming_monitor_replay.py covers:

  • Digest mismatch in source_module_manifest.json blocks the exported bundle.
  • target_ref / path mismatch blocks the exported bundle.
  • body_text_in_receipt at manifest or row level is rejected.
  • Missing source_artifact_evidence_refs quarantines a monitor row.
  • A declared caught verdict that disagrees with recomputation emits PUBLIC_TRACE_SCHEMING_CAUGHT_VERDICT_MISMATCH.
  • Perturbing a high-suspicion block row down to pass moves the recomputed caught verdict and blocks the result record.
  • A stale caught_verdict label inside a monitor-score row is rejected against the recomputed episode verdict.
  • A counterfactual row that changes a benign low-suspicion step fails the rank delta check and the isolation check.
  • Spoofing a declared negative-case id is caught by semantic negative-case floors.
  • Public-relative, metadata-only result records exclude absolute local paths and private keys such as account secrets, model-output data, hidden reasoning, raw harmful payloads, exploit instructions, and real target identifiers.
  • The real sanitized public probe must compute block, carry verified source artifact evidence, and join to its counterfactual row.
  • Spoofing the public probe to pass is rejected by verdict recomputation and the real-sanitized-evidence check.

These tests matter because they exercise the falsification path, not only the happy path. The component is permitted to say "this synthetic replay failed closed under these mutations"; it is not permitted to say "this detects real scheming".

Reader Evidence Routing

  • Bundle: core/paper_module_capsules.json::paper_modules[34:paper_module.agent_sabotage_scheming_monitor_replay]
  • Generated instance: paper_modules/agent_sabotage_scheming_monitor_replay.json
  • Standard: standards/std_microcosm_agent_sabotage_scheming_monitor_replay.json
  • Mechanism: mechanisms/mechanism.agent_sabotage_scheming_monitor_replay.validates_public_sabotage_scheming_monitor_replay.json
  • Component model: organs/agent_sabotage_scheming_monitor_replay.json
  • Runtime: src/microcosm_core/organs/agent_sabotage_scheming_monitor_replay.py
  • Trace recomputation: src/microcosm_core/macro_tools/agent_execution_trace.py::build_public_sabotage_scheming_monitor_trace
  • Fixture input: fixtures/first_wave/agent_sabotage_scheming_monitor_replay/input
  • Exported bundle: examples/agent_sabotage_scheming_monitor_replay/exported_sabotage_monitor_bundle
  • Focused tests: tests/test_agent_sabotage_scheming_monitor_replay.py
  • Runtime-shell result record: receipts/runtime_shell/demo_project/organs/agent_sabotage_scheming_monitor_replay/exported_sabotage_monitor_bundle_validation_result.json

Governing Relations

The generated paper-module instance binds this reader page to:

  • component:agent_sabotage_scheming_monitor_replay
  • mechanism:mechanism.agent_sabotage_scheming_monitor_replay.validates_public_sabotage_scheming_monitor_replay
  • concept:concept.agent_reliability_and_safety_validator_bundle
  • principle:P-1
  • principle:P-2
  • axiom:AX-1
  • paper_module.agent_monitor_redteam_falsification_replay
  • code locus: src/microcosm_core/organs/agent_sabotage_scheming_monitor_replay.py with run, run_sabotage_bundle, _build_result, _write_receipts, and result_card

Those edges are structural evidence-routing edges. They do not by themselves prove runtime correctness; the runtime result records and tests carry that narrower claim.

Prior-Art Positioning

The module borrows evaluation grammar from sabotage, scheming, and deceptive behavior evaluations: episodes, action traces, monitor scores, counterfactual benign replay, falsification cases, and fail-closed monitor disagreement. The closest public references remain Anthropic's sabotage evaluations, Apollo's in-context scheming evaluations, Anthropic's Sleeper Agents work, and the OpenAI/Apollo line on detecting and reducing scheming.

The Microcosm claim is deliberately narrower than those research programs. It does not evaluate frontier models or live deployments. It gives the public Microcosm system a result record-backed, metadata-only, replayable safety-monitor proof shape that can reject overclaims and provenance drift.

Prior Art Grounding

This component replays a monitor that watches an agent transcript for sabotage and scheming signals. It is grounded in the AI-control literature, which studies how to catch a model that may be deliberately subverting oversight rather than assuming it is benign (Greenblatt et al., AI Control, and Anthropic's sabotage evaluations). Microcosm borrows the monitor-over-transcript shape on synthetic fixtures; the result is fixture-bound replay evidence, not a live monitor, a control guarantee, or an exploit guide.

Validation Result record Path

Run the metadata-only fixture validator:

Run the exported-bundle validator:

cd microcosm-substrate && PYTHONPATH=src ../repo-python \
  -m microcosm_core.organs.agent_sabotage_scheming_monitor_replay \
  run-sabotage-bundle \
  --input examples/agent_sabotage_scheming_monitor_replay/exported_sabotage_monitor_bundle \
  --out /tmp/agent_sabotage_scheming_bundle_receipt \
  --card > /tmp/agent_sabotage_scheming_bundle_card.json

Run focused tests and paper-module corpus validation:

PYTHONPATH=src ./repo-pytest \
  tests/test_agent_sabotage_scheming_monitor_replay.py
cd microcosm-substrate && PYTHONPATH=src python3 \
  scripts/build_doctrine_projection.py --check-paper-module-corpus

Expected local result: the fixture and exported-bundle validators pass on the checked-in inputs; the mutation tests block when digest, target-ref, body-in-result record, evidence-ref, caught-verdict, counterfactual-rank, negative semantic, or public-probe labels are corrupted.

Scope boundary

Public-Safety Boundary

Public rendering may describe the replay grammar, recomputation rule, falsification cases, sanitized command-trace source artifact, digest checks, and metadata-only result records.

Public rendering must not include or imply:

  • live sabotage instructions
  • exploit steps
  • account secrets, browser state, account or browser material, or model-output data
  • private chain-of-thought or hidden reasoning bodies
  • raw harmful payloads
  • unsanitized argv/stdout/stderr previews
  • browser UI state
  • production telemetry
  • live traffic
  • live agent execution authorization
  • deployment-risk measurement
  • product monitor performance
  • benchmark claims
  • provider affiliation or provider behavior claims
  • source-file changes
  • public sharing/launch-scope decision
  • private-system equivalence
  • whole-system safety
Scope limit

This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.

It may not claim more than its bundle scope limit allows: No live sabotage, exploit instruction, account secret/account material, private reasoning, harmful payload, or deployment-risk product claim; synthetic fixtures and metadata-only result records only.

Agent Memory Temporal Conflict ReplayReplays a memory edit-and-delete to show stale facts get flagged before they sway an answer.3/5

Does Replays a canned three-step story: an agent's memory first records two facts (a preference and a tool result), then learns one is out of date and edits the preference and deletes the stale fact, then re-runs the same task once with memory on and once with memory off. From plain result records, it shows that the update and deletion each carry a "conflict" and "downgrade" result record before the memory is allowed to affect the answer, that the memory-on vs memory-off runs are compared through logged evidence rather than just the final wording, and that private-thread content stays in the record only as metadata pointers, never as copied text. It checks the bookkeeping of this synthetic example; it does not judge whether the agent's memory decisions were the right ones.

Scope limit It validates the projection mechanics of a synthetic memory fixture only — that the required refs, decisions, paired replays, negative cases, and secret-exclusion scan line up and that result records are metadata-only. It does not claim live-memory product quality, judge whether memory decisions were domain-correct, treat memory recall as source authority, adopt active injection, export private transcripts, use external model services, change source files, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.agent_memory_temporal_conflict_replay run --input fixtures/first_wave/agent_memory_temporal_conflict_replay/input --out /tmp/agent_memory_temporal_conflict_replay_out

EvidenceComputed projectionevidence 3/5Source-faithful refactor

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Agent Memory Temporal-Conflict Replay

This module is the public Microcosm projection of an agent-memory honesty contract. It is a synthetic replay fixture, not a live memory product, private transcript export, source-authority claim, or launch claim.

The fixture models three public episodes: episode A records a scoped preference and a tool-result fact, episode B updates the preference scope and deletes the now-stale fact through conflict-edge and downgrade result records, and episode C replays the task with memory enabled and disabled. The replay is admitted only when ADD, UPDATE, DELETE, and NOOP decisions, metadata-only non-public refs, evidence handles, cold replay refs, and an answer-delta result record line up.

Purpose

This component exists because an agent that remembers can quietly start trusting the wrong row. A user states a preference, the world changes, a later turn contradicts the earlier one, and a naive memory store keeps serving the stale fact as though it were still true. The single question this fixture answers is narrow and checkable: when one memory write supersedes an earlier one, does the record of that conflict actually hold up, or is it just a label?

The unusual choice is that the validator does not trust the labels it is given. A row can declare decision = UPDATE, attach a plausible-looking conflict-edge ref, and still be quarantined. In _apply_conflict_semantic_recompute the checker re-derives the conflict lineage from the raw fields it can verify: episode order, event timestamp, memory priority, and source-trust score. An UPDATE or DELETE that claims to supersede a prior write but is not timestamped after it, or that regresses priority or relies on lower-trust evidence than the write it replaces, is rejected. _apply_temporal_order_checks adds the coarser ordering rule that a conflict edge must land after some earlier accepted write and a replay must land after the conflict it depends on. The label is treated as a claim to be recomputed, not as authority.

The point of the paired memory-on and memory-off replay is the matching discipline on the output side. Memory is only allowed to take credit for a better answer through an explicit evidence handle and a cold-replay result record, so the gain is attributable rather than asserted. The interesting idea here is not a memory product. It is a small, reproducible accounting method for one specific failure: a stale row outranking newer evidence.

Abstract

Agent memory becomes dangerous when a stored row is allowed to outrank later evidence. This module turns that risk into a public, replayable checker: a synthetic three-episode fixture exercises memory ADD, UPDATE, DELETE, and NOOP decisions, then verifies that later conflicts can influence replay only through typed evidence handles, temporal conflict edges, stale-downgrade result records, paired memory-on/off cold replays, and a metadata-only answer-delta result record.

The technical contribution is not "better memory" and not a product claim. It is a narrow public accounting method for temporal memory conflict: memory rows are metadata under test, non-public refs are metadata-only, copied source-open source bodies are digest checked outside result records, and seven negative fixtures prove that common overclaim paths are rejected before any pass result record is written.

Telos

The reader-facing aim is to make a hard memory-honesty boundary inspectable without exporting private memory bodies. A cold reader should be able to answer four questions from files and result records:

  1. Which memory decision was made, and under which public route ref?
  2. Which evidence handle, timestamp, priority, and source-trust score justified the row?
  3. Which prior row was conflicted or downgraded before later replay credit was allowed?
  4. Did the memory-enabled replay use admissible evidence while the paired memory-disabled replay remained available for answer-delta accounting?

The accepted result is a metadata-only memory-conflict result record. It supports only a synthetic fixture-level claim: this replay respected the declared temporal conflict contract under the checked inputs.

Technical Mechanism

The runtime treats memory as public replay metadata, not as authority. The validator loads projection_protocol.json, memory_policy.json, memory_episodes.json, and replay_observations.json; the exported-bundle mode also loads bundle_manifest.json, source_module_manifest.json, and the copied source artifacts listed in the manifest. _build_result combines secret scanning, public trace construction, protocol validation, policy validation, episode validation, replay validation, source-module import validation, negative-case coverage, and the scope limit before a pass status is possible.

The mechanism has five reader-visible gates:

  1. validate_projection_protocol requires source refs, source pattern ids, projection result records, target refs, public runtime refs, target symbols, reimplemented mechanics, omissions, and an explicit denial that private thread bodies were copied.
  2. validate_memory_policy requires ADD, UPDATE, DELETE, and NOOP as the only admitted decision vocabulary and denies live-memory product, transcript export, source-authority, active-injection, provider-call, and launch-scope decision.
  3. validate_memory_episodes turns the five public event rows into accepted or quarantined memory metadata. Each row needs a route ref, decision, synthetic subject id, evidence handle, metadata-only private thread ref, body-export flag, source-authority flag, and active-injection flag. Positive replay credit requires all four decision classes, two conflict-edge refs, stale-downgrade refs, and a prompt-adoption observation ref.
  4. validate_replay_observations checks the paired memory-enabled and memory-disabled replay rows. Memory-enabled replay must cite public evidence handles that resolve against the accepted event rows, both replays must carry cold-replay result record refs, and the pair must share an answer-delta result record.
  5. validate_source_module_imports verifies the exported bundle's five copied source bodies by digest, material class, relation, and body_in_receipt=false. The card path reports only counts, digest refs, and result record paths; full memory rows, replay rows, source bodies, private transcript bodies, model-output data, and active injection text stay out of result records and public cards.

The mechanism is deliberately negative as well as constructive. Seven falsification fixtures prove that raw transcript export, private candidate auto-promotion, stale preference override, memory-as-source-authority, vector recall without evidence, final-answer-only memory credit, and active injection as authority are blocked. build_public_memory_conflict_trace gives the reader a seven-span public trace over the same rows, with five memory-event spans and two cold-replay spans, and audits coverage for evidence handles, metadata-only non-public refs, no non-public body export, cold-replay refs, answer-delta refs, and memory-enabled evidence.

Temporal Conflict Mechanism

The central rule is evidence-before-memory-authority. A memory row may be accepted as replay metadata only after it satisfies the public policy fields: route ref, decision, synthetic subject id, event timestamp, memory priority, source-trust score, evidence handle, metadata-only private thread ref, and explicit false authority flags for body export, source authority, and active injection.

UPDATE and DELETE decisions have an extra burden because they alter older memory. The validator recomputes the conflict lineage instead of trusting the label. _apply_temporal_order_checks verifies that conflict rows occur after the prior writes they touch, and that replay NOOP rows occur after conflict and downgrade evidence. _apply_conflict_semantic_recompute then checks the semantic shape of the mutation: the prior event must exist, the conflict group must be coherent, timestamps must advance, priority may not regress below the allowed floor, and source trust must stay above the declared floor.

Only after those checks pass can episode C receive replay credit. The memory-enabled replay must cite public evidence refs that resolve to accepted memory rows. The memory-disabled replay stays paired by replay group. The answer-delta result record accounts for the difference between those cold replays without reducing the evaluation to final-answer comparison alone.

Real Sanitized Episode Evidence

The first-wave fixture is not merely shape-only synthetic data. Its memory_episodes.json, memory_policy.json, and replay_observations.json mirror the exported memory-temporal-conflict bundle, and the positive rows carry sanitized_real_episode=true, source artifact refs, source event refs, timestamps, memory priority, and source-trust scores. The source evidence posture declares real_source_floor as copied_non_secret_macro_agent_memory_body_with_provenance, body_in_receipt=false, and private_bodies_exported=false.

The exported bundle contributes a source-open body floor without turning bodies into result record material. source_module_manifest.json lists five copied source bodies across tool, doctrine, standard, and pattern material classes. The runtime verifies their digests and material classes, while public cards and result records expose only paths, counts, digest refs, omitted-material reasons, and the scope limit. That is the realness proof this paper can use: source provenance and result record-level recomputation, not private memory export.

Perturbation and Rejection Contract

The fixture includes positive pass evidence and perturbation evidence. Focused tests mutate the bundle to ensure that the validator rejects timestamp incoherence, priority regression, source-trust regression, temporal order breakage, unverified conflict evidence, source-event drift, stale override without downgrade, downgrade result record field swaps, positive rows without evidence handles, unresolved replay refs, replay without memory evidence, and source-body tampering.

Those rejection tests matter because temporal memory bugs often look plausible in isolation. A stale row with a nice label is still rejected if its conflict edge is absent or late; a memory-enabled replay is still rejected if its evidence refs do not resolve; a digest-mismatched body floor blocks the source import; and final-answer-only comparison remains a negative case rather than utility evidence.

Named Proof Consumers

  • python -m microcosm_core.organs.agent_memory_temporal_conflict_replay run consumes the first-wave fixture, includes negative cases, and writes the result, board, validation, and sign-off result records.
  • python -m microcosm_core.organs.agent_memory_temporal_conflict_replay run-memory-bundle consumes the exported source-open bundle, digest-checks copied source bodies, and emits the public bundle validation result.
  • tests/test_agent_memory_temporal_conflict_replay.py consumes the same fixture and bundle to assert decision counts, conflict counts, stale downgrades, secret exclusion, public-relative result records, unresolved replay rejection, source-module digest verification, metadata-only result record cards, and seven-span trace construction.
  • A cold public reader consumes the source record, manifest, event rows, replay rows, source-artifact digests, and validation result records; that consumer can verify the synthetic honesty boundary but cannot infer quality of any live memory system or launch-scope decision.

Shape

digest verified; body_in_receipt=falseallows ADD / UPDATE / DELETE / NOOP onlycreates baseline memory metadatatouches older memoryrequires conflict_edge_refrequires stale_downgrade_refchecks timestamp, priority, source trustkeeps paired baselineuses evidence_handle_refsno memory evidence usedpaired by replay_group_idcovers events plus cold replaysomits private bodies and model-output datadenies live-memory and source-authority claimsSource-open body floorSource-open body floorPolicy vocabularyPolicy vocabularyEpisode A: ADD rowsEpisode A: ADD rowsEpisode B: UPDATE / DELETErowsEpisode B: UPDATE / DELETE rowsTemporal conflict gateTemporal conflict gateStale downgrade gateStale downgrade gateSemantic recomputeSemantic recomputeEpisode C: memory-enabledreplayEpisode C: memory-enabled replayEpisode C: memory-disabledreplayEpisode C: memory-disabled replayAnswer-delta result recordAnswer-delta result recordPublic 7-span tracePublic 7-span tracemetadata-only result recordmetadata-only result recordScope limit / scope boundaryScope limit / scope boundaryFixture-level validationclaim onlyFixture-level validation claim only
Diagram source
flowchart TD BodyFloor["Source-open body floor"] -->|digest verified; body_in_receipt=false| Policy["Policy vocabulary"] Policy -->|allows ADD / UPDATE / DELETE / NOOP only| EpisodeA["Episode A: ADD rows"] EpisodeA -->|creates baseline memory metadata| EpisodeB["Episode B: UPDATE / DELETE rows"] EpisodeB -->|touches older memory| ConflictGate["Temporal conflict gate"] ConflictGate -->|requires conflict_edge_ref| DowngradeGate["Stale downgrade gate"] DowngradeGate -->|requires stale_downgrade_ref| Recompute["Semantic recompute"] Recompute -->|checks timestamp, priority, source trust| Enabled["Episode C: memory-enabled replay"] Recompute -->|keeps paired baseline| Disabled["Episode C: memory-disabled replay"] Enabled -->|uses evidence_handle_refs| Delta["Answer-delta result record"] Disabled -->|no memory evidence used| Delta Delta -->|paired by replay_group_id| Trace["Public 7-span trace"] Trace -->|covers events plus cold replays| Result record["metadata-only result record"] Result record -->|omits private bodies and model-output data| Ceiling["Scope limit / scope boundary"] Ceiling -->|denies live-memory and source-authority claims| Done["Fixture-level validation claim only"]

The page shape is a temporal-conflict replay, not a memory product surface. A reader starts with the JSON bundle, follows the source module manifest to five copied source bodies, then checks three synthetic episodes: initial memory writes, a later temporal conflict with stale downgrades, and paired cold replays with memory enabled and disabled. The accepted outcome is a result record that says the replay respected the memory-honesty boundary; it does not make memory recall into source authority.

Failure Modes and Limitations

This module is intentionally narrow. It validates a public fixture and exported bundle against a declared temporal conflict contract; it does not measure live assistant memory quality, user satisfaction, recall coverage, provider behavior, or deployment posture. Passing result records show that checked rows, digests, traces, negative cases, and scope limits agreed for the fixture under test.

Known failure modes are treated as checker inputs rather than prose caveats: private transcript export, private candidate auto-promotion, stale preference override, memory-as-source-authority, vector recall without evidence, final-answer-only credit, active injection authority, missing source manifests, source-body digest drift, source-event drift, missing conflict edges, missing downgrade result records, and unresolved replay evidence. If a future module wants a stronger memory claim, it needs a new standard and new evidence; this module cannot promote itself beyond its fixture ceiling.

Reader Evidence Routing

  • Bundle route: read examples/agent_memory_temporal_conflict_replay/exported_memory_temporal_conflict_bundle/source_module_manifest.json for module_count=5, body_in_receipt=false, material classes, digest refs, omitted-material reasons, and the explicit secret-exclusion boundary.
  • Event route: read memory_episodes.json for the five memory events: episode_a_preference_add, episode_a_tool_fact_add, episode_b_preference_scope_update, episode_b_tool_fact_delete, and episode_c_replay_noop.
  • Conflict route: verify that the UPDATE and DELETE events carry temporal conflict-edge refs and stale-downgrade refs before they can affect replay credit.
  • Replay route: read replay_observations.json for the paired episode_c_memory_enabled_replay and episode_c_memory_disabled_replay rows, evidence refs, cold replay result records, and answer-delta accounting.
  • Runtime route: run tests/test_agent_memory_temporal_conflict_replay.py when the reader needs recomputation evidence. The focused tests assert digest verification, public-relative result records, non-public-state exclusion, unresolved replay rejection, and the exported bundle runtime shape.

Public Mechanics

  • Memory update claims require route refs, evidence handles, and explicit ADD/UPDATE/DELETE/NOOP decisions.
  • Updates and deletes that touch older memory require temporal conflict-edge refs plus stale-downgrade refs before memory can affect replay credit.
  • Private thread references are metadata-only; transcript bodies and private memory candidate bodies stay omitted.
  • Utility language requires paired memory-enabled and memory-disabled cold replay result records; final-answer-only comparison is not enough to support a memory utility claim.
  • Raw transcript export, private candidate auto-promotion, stale preference override, memory-as-source-authority, vector recall without evidence, final-answer-only memory credit, and active-injection authority are expected falsification fixtures.

Prior Art Grounding

This component is grounded in agent-memory architectures and the newer literature on stale or poisoned memory. The constructive lineage includes Generative Agents, which made observation, reflection, retrieval, and planning a concrete agent-memory pattern, and MemGPT, which treats long-context behavior as a memory-management problem. The risk lineage includes AgentPoison and STALE, which focus respectively on poisoned retrieval stores and whether agents update invalid memories when new evidence arrives.

Microcosm does not claim a live memory product. It borrows the useful accounting questions: which memory decision was made, which evidence handle justified it, which older row was conflicted or downgraded, and whether memory-on/off replay supports any claim beyond a final-answer comparison.

Validation Result record Path

Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:

Then run the exported bundle validator:

cd microcosm-substrate && PYTHONPATH=src ../repo-python -m microcosm_core.organs.agent_memory_temporal_conflict_replay run-memory-bundle --input examples/agent_memory_temporal_conflict_replay/exported_memory_temporal_conflict_bundle --out /tmp/agent_memory_temporal_conflict_bundle_receipt --card > /tmp/agent_memory_temporal_conflict_bundle_card.json

The focused regression test and corpus projection checks are:

cd microcosm-substrate && ../repo-pytest tests/test_agent_memory_temporal_conflict_replay.py
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Reader Validation Boundary

A cold reader can validate this module by starting from the JSON source record, then checking the generated JSON instance, source-module manifest, synthetic episodes, memory-event decisions, temporal conflict-edge refs, stale-downgrade refs, paired memory-on/off cold replays, negative cases, and focused tests. The validation is limited to whether the synthetic replay preserved a metadata-only memory-honesty boundary.

The validation stops before quality claims about any live memory system, private transcript export, private candidate promotion, memory recall as source authority, provider behavior, active-injection authority, public sharing, and launch. Unpopulated concept, principle, axiom, and dependency edges remain residual pressure unless the JSON bundle owner lane adds real targets.

Scope limit

This module may claim only that a synthetic memory-temporal replay preserved a metadata-only memory-honesty boundary: ADD/UPDATE/DELETE/NOOP decisions, temporal conflict-edge refs, stale-downgrade refs, paired memory-on/off cold replay refs, answer-delta accounting, public trace refs, manifest digests, negative cases, and scope limits are checked.

It must not claim quality of any live memory system, readiness of a memory product, private transcript export, private candidate auto-promotion, source-authority status, provider behavior, active-injection authority, source-file changes, public sharing authorization, or launch-scope decision.

Scope boundary

This module does not run live memory, claim memory product quality, export private transcripts, auto-promote private candidates, treat memory recall as source authority, adopt active injection, use external model services, change source files, publish results, or include launch operations.

Source and projection details
Source-Open Body Floor

The exported bundle manifest is the body-row authority for five copied source bodies spanning tool, doctrine, standard, and pattern material classes. Those bodies stay in bundle source artifacts; result records and cards carry refs, digests, classes, counts, omission reasons, secret-exclusion status, and scope limits only.

The floor is accepted as synthetic temporal-conflict replay evidence. It is not live memory product evidence, private transcript export, private memory candidate export, provider-behavior evidence, source-file changes, public sharing authorization, or launch-scope decision.

Governing Lattice Relation

The JSON bundle binds this module to mechanism.agent_memory_temporal_conflict_replay.validates_public_memory_conflict_replay, concept.agent_reliability_and_safety_validator_bundle, provisional principle refs P-1 and P-2, and provisional axiom ref AX-1. This Markdown does not promote those placeholder refs into stronger doctrine ids; it explains how the concrete mechanism satisfies the current bundle boundary.

Mechanically, the governing relation is evidence-before-memory-authority: memory rows may influence replay only after they carry route refs, public evidence handles, metadata-only non-public refs, conflict-edge or downgrade result records when stale state changes, and paired replay result records. The concept relation is validator-bundle accountability: the module is not a narrative claim about agents remembering well, but an executable fixture whose policy, trace, source-body manifest, negative cases, and result records must all agree. The axiom/principle ceiling is the same one enforced by the validator: private state is not public source authority, synthetic replay is not live product evidence, and projection-ready result records cannot authorize source-file changes, external model access, public sharing, or launch.

Sleeper Memory Poisoning Quarantine ReplayReplays a recorded memory-tamper case, checking its declared quarantine, block, and delete steps line up.3/5

Does This is a worked, made-up example of how an agent should handle a tampered "memory": a poisoned note gets spotted, held in quarantine, blocked from being used in any later action, and finally deleted-with-an-audit, with a clean re-run recorded showing the bad memory is no longer there. The component does not actually delete anything or perform the re-run; it reads the on-disk record of that whole guard-and-cleanup sequence and checks that every required step and result record lines up, so exactly which checks must hold is visible. It works entirely from synthetic refs and metadata, with no private memory or transcripts exposed.

Scope limit It only checks the structural shape and internal consistency of a synthetic memory-security policy projection recorded as JSON. It does not run or validate any real memory store, does not itself quarantine, delete, or re-run anything, and does not establish that any system actually resists poisoning. It exports no private memory bodies or transcripts, calls no providers, mutates no source, produces no benchmark claims, and excludes launch (all scope limit flags are hardcoded false).

Run
PYTHONPATH=src python3 -m microcosm_core.organs.sleeper_memory_poisoning_quarantine_replay run --input fixtures/first_wave/sleeper_memory_poisoning_quarantine_replay/input --out receipts/first_wave/sleeper_memory_poisoning_quarantine_replay --acceptance-out receipts/acceptance/first_wave/sleeper_memory_poisoning_quarantine_replay_fixture_acceptance.json

EvidenceComputed projectionevidence 3/5Source-faithful refactor

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Sleeper Memory Poisoning Quarantine Replay

Purpose

Persistent agent memory is an attack surface. If an agent reads a poisoned source in one session and writes a memory from it, that memory can quietly shape a later session's actions, long after the poisoning is out of view. This module asks one question: if an agent quarantines a poisoned memory write, can it show, from result records alone, that the quarantine actually held when the memory was retrieved later, and that a rollback genuinely removed it?

The interesting part is that the runtime grades the whole chain, not just the final answer. A naive memory-security story checks that the agent reached the right conclusion. This validator refuses that. It requires the poisoned write to carry provenance, the later retrieval to be blocked before any action and to cite the same memory ref the write quarantined, and the rollback to carry a deletion audit ref, a cold-rerun result record, and proof that the memory is absent after the rerun. A blocked retrieval that cannot name the quarantine audit ref or the cold-replay result record for the memory it gates is treated as unproven, not as a pass.

It is a synthetic fixture, deliberately narrow. The inputs are public metadata rows, never live user memory or private bodies, and the result records carry refs, hashes, counts, and verdicts rather than any memory text. It borrows the control shape from prior work on sleeper triggers and memory poisoning; it does not secure a live memory system or claim a benchmark result.

Abstract

This module is the public Microcosm projection of a persistent-memory security claim contract. It is a synthetic replay fixture, not a live memory product, live user memory import, benchmark security result, private memory export, or launch claim.

The fixture models four public sessions: a poisoned source bundle is seen, a memory write proposal is quarantined, later retrieval is blocked before action, and rollback plus cold rerun proves the poisoned memory is absent at the result record boundary. The claim is admitted only when source bundle refs, provenance refs, quarantine verdicts, classifier labels, retrieval influence gates, rollback audit refs, rerun result records, negative cases, and scope limits line up.

Shape

quarantined memory refPublic metadata inputssessions, write proposals,retrieval replays, rollbackrowsPublic metadata inputs sessions, write proposals, retrieval replays, rollback rowsProvenance gatepoisoned write quarantined +provenance-bound controladmittedProvenance gate poisoned write quarantined + provenance-bound control admittedDelayed-influence gatelater retrieval blockedbefore action,same memory ref + audit +cold-replay refDelayed-influence gate later retrieval blocked before action, same memory ref + audit + cold-replay refRollback gatedeletion audit + cold-rerunresult record +memory absent after rerunRollback gate deletion audit + cold-rerun result record + memory absent after rerunSource-body gatecopied bodies digest-checked,result records staymetadata-onlySource-body gate copied bodies digest-checked, result records stay metadata-onlyEight negative caseseach must be observedas a typed findingEight negative cases each must be observed as a typed findingmetadata-only result recordsrefs, hashes, counts,verdictsmetadata-only result records refs, hashes, counts, verdictsScope limitsynthetic replay onlyScope limit synthetic replay only
Diagram source
flowchart TD inputs["Public metadata inputs sessions, write proposals, retrieval replays, rollback rows"] subgraph Gates["Four ordered gates"] provenance["Provenance gate poisoned write quarantined + provenance-bound control admitted"] influence["Delayed-influence gate later retrieval blocked before action, same memory ref + audit + cold-replay ref"] rollback["Rollback gate deletion audit + cold-rerun result record + memory absent after rerun"] bodies["Source-body gate copied bodies digest-checked, result records stay metadata-only"] end negatives["Eight negative cases each must be observed as a typed finding"] result records["metadata-only result records refs, hashes, counts, verdicts"] ceiling["Scope limit synthetic replay only"] inputs --> provenance provenance -->|quarantined memory ref| influence influence --> rollback rollback --> bodies inputs --> negatives provenance --> result records influence --> result records rollback --> result records bodies --> result records negatives --> result records result records --> ceiling

The module's shape is a public memory-security replay, not a live memory product. Public metadata inputs pass through four ordered gates: provenance, delayed influence, rollback, and source-body handling. The delayed-influence gate is coupled to the provenance gate, so a blocked retrieval must target the same memory ref the write quarantined and cite its audit and cold-replay refs. Alongside the positive chain, eight negative cases must each surface as a typed finding, and every path lands in metadata-only result records under the scope limit.

Mechanism

The mechanism is a replay reducer over public metadata, not a memory runtime. src/microcosm_core/organs/sleeper_memory_poisoning_quarantine_replay.py loads six positive input families through _build_result: projection protocol, memory policy, session chain, quarantine events, retrieval replays, rollback/cold-rerun rows, and the source-module manifest. When run is used on the first-wave fixture it also loads the expected negative fixtures; when run_quarantine_bundle is used on the exported bundle it validates the public bundle without treating that bundle as negative-case authority.

The first gate is provenance. validate_memory_write_proposals accepts a memory write only when it carries the required source bundle ref, provenance ref, trust tier, classifier labels, audit ref, quarantine verdict, and redacted body posture. An untrusted source context with the sleeper-poisoning classifier cannot silently become trusted memory. Missing provenance, private memory body export, raw transcript export, live user-memory claims, and trusted promotion from untrusted context become typed findings instead of admissible memory authority.

The second gate is delayed influence. validate_retrieval_replays checks that later retrieval of the quarantined memory is blocked before any action can use it. The row must carry a retrieval ref, influence grade, action gate, cold replay result record ref, public evidence refs, and a quarantine audit ref coupling back to the write proposal it gates. This is the anti-final-answer check: the runtime rejects a memory-security story that grades only the final answer while omitting retrieval, influence, or rerun evidence.

The third gate is rollback. validate_rollback_rerun requires a rollback result record ref, deletion audit ref, rerun result record ref, and memory_absent_after_rerun=true. Rollback language is therefore admitted only when deletion and cold rerun are both present. Tests mutate these fields to show that nonempty but bogus rollback refs, missing result record refs, and absence failures block rather than becoming evidence.

The fourth gate is source-open body handling. validate_source_module_manifest and _source_open_body_import_summary verify seven copied public source bodies, their declared material classes, their digest fields, and their metadata-only result record posture. _write_receipts and result_card then expose public ids, counts, refs, digests, verdicts, negative-case status, and scope limits while omitting retrieval rows, rollback rows, and copied source bodies from command cards.

The proof consumer is therefore a pair of bounded runs plus focused tests: run proves the first-wave fixture with expected negative cases; run_quarantine_bundle proves the exported public bundle; and tests/test_sleeper_memory_poisoning_quarantine_replay.py verifies mutated positive rows, stale baked labels, retrieval/quarantine coupling, rollback result record shape, source-body digest checks, public-relative redaction, and card payload omission.

Reader Evidence Routing

  • Bundle route: core/paper_module_capsules.json::paper_modules[37:paper_module.sleeper_memory_poisoning_quarantine_replay] is the JSON authority row. A diagram view and an atlas card are generated for this module.
  • Mechanism route: core/mechanism_sources.json::mechanism.sleeper_memory_poisoning_quarantine_replay.validates_public_sleeper_memory_poisoning_quarantine_replay binds the code locus, input refs, result record refs, validator commands, focused regression, and guardrails.
  • Runtime route: src/microcosm_core/organs/sleeper_memory_poisoning_quarantine_replay.py owns run, run_quarantine_bundle, _build_result, _write_receipts, result_card, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and the metadata-only source-module import checks.
  • Exported-bundle route: examples/sleeper_memory_poisoning_quarantine_replay/exported_sleeper_memory_poisoning_bundle contains bundle_manifest.json, projection_protocol.json, memory_policy.json, session_chain.json, quarantine_events.json, retrieval_replays.json, rollback_rerun.json, and source_module_manifest.json.
  • Source-module route: source_module_manifest.json records seven copied public source bodies, including the growth result record, memory-plane paper modules, operator-memory tests, agent execution trace runtime, strict JSON helper, and agent execution trace standard; result records keep source bodies out with body_in_receipt: false.
  • Focused-test route: tests/test_sleeper_memory_poisoning_quarantine_replay.py verifies negative cases, public-relative redacted result records, exported-bundle runtime shape, digest mismatch rejection, exact copied source bodies, and card result record reuse.

Prior Art Grounding

This component combines two prior-art lines: sleeper/deceptive trigger behavior and long-term-memory/RAG poisoning. The sleeper-trigger lineage is Anthropic's Sleeper Agents. The memory-poisoning lineage includes AgentPoison, MemoryGraft, and Hidden in Memory, which all treat retrieved or persistent agent memory as an attack surface rather than a neutral cache.

Microcosm does not claim to secure live memory systems. It borrows the control shape: memory writes need provenance, untrusted source context cannot silently become authority, later retrieval must pass an influence gate, and deletion or rollback needs an audit ref plus cold rerun evidence.

Validation Result record Path

Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:

Then run the exported bundle validator:

cd microcosm-substrate && PYTHONPATH=src ../repo-python -m microcosm_core.organs.sleeper_memory_poisoning_quarantine_replay run-quarantine-bundle --input examples/sleeper_memory_poisoning_quarantine_replay/exported_sleeper_memory_poisoning_bundle --out /tmp/sleeper_memory_poisoning_bundle_receipt --card > /tmp/sleeper_memory_poisoning_bundle_card.json

The focused regression test and corpus projection checks are:

cd microcosm-substrate && ../repo-pytest tests/test_sleeper_memory_poisoning_quarantine_replay.py
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

The result record path proves synthetic memory-poisoning quarantine replay over public metadata refs, not live memory safety, provider behavior, or benchmark security.

Scope boundary

Scope boundary

This module does not run live memory, claim memory product quality, import live user memory, export private memory bodies or raw transcripts, promote untrusted context into trusted memory, use external model services, change source files, claim benchmark security, publish results, or include launch operations.

Scope limit

This module may claim synthetic sleeper-memory poisoning quarantine replay over public metadata refs: source bundle refs, provenance refs, quarantine verdicts, classifier labels, retrieval influence gates, rollback audit refs, cold rerun result records, expected negative cases, source-module digest checks, metadata-only result records, and validation result records.

It does not claim live memory product quality, live user-memory handling, trusted promotion from untrusted context, provider behavior, source-file changes, benchmark security, private memory export, public sharing, launch-scope decision, or whole-system correctness. The generated diagram and atlas card are navigation aids, not security benchmark results.

MCP Tool Authority ReplayAudits a recorded tool-use log to confirm each action was scoped, approved, undoable, and fenced.3/5

Does When an AI agent uses outside tools (look something up, change a ticket, read a result from an untrusted source), this checks a recorded log of that tool use to confirm each action was properly fenced: bound to a narrow permission, approved before it changed anything, given a way to undo it, and kept from letting untrusted tool output boss the agent around. It runs on a small built-in make-believe example that carries only labels and references (no real accounts, secrets, or tool contents), so the safety checks an agent's tool use is supposed to pass are inspectable without anything touching a live account.

Scope limit It only checks that the tool-authority evidence in a recorded bundle (scopes, approvals, rollbacks, instruction/data splits, cold replays, redaction, and the expected abuse-case failures) is present and internally consistent. It does not run tools or authorize live MCP/account access, account secret or payload export, treating tool output as instruction, source-file changes, benchmark safety scores, or launch, and it makes no claim that the underlying tool-use policy is domain-correct.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.mcp_tool_authority_replay run --input fixtures/first_wave/mcp_tool_authority_replay/input --out receipts/first_wave/mcp_tool_authority_replay

EvidenceComputed projectionevidence 3/5Source-faithful refactor

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module MCP Tool Authority Replay

This module is the public Microcosm projection of a tool-authority claim contract. It is a synthetic MCP-like replay fixture, not a live MCP account test, external model access, account secret-handling certification, benchmark security result, or launch claim.

The fixture models three public tools: a readonly docs lookup, a write-capable ticket update, and an untrusted result source. The claim is admitted only when tool manifest scope refs, call argument hashes, approval token refs, side-effect ledger refs, rollback result records, untrusted-output instruction/data splits, cold replay result records, negative cases, and scope limits line up.

Purpose

When an agent uses tools through a protocol like MCP, the sentence "the agent used the tool safely" is cheap to write and hard to back. This component answers one question: given a recorded tool-use trace, does the evidence actually support the authority the trace claims, or is the safety language unearned? It exists so that tool-authority claims have to be replayed against metadata before prose is allowed to call them safe.

The approach is to treat a tool call as a small transaction that must show its working. Each call cites a narrow capability scope, an argument hash, and, if it writes, an approval token, a side-effect ledger entry, and a rollback result record. Those references are not taken on trust: the side-effect ledger and the cold replay rows are cross-checked against the accepted call rows by call id, so a rollback result record that no call refers to, or a write that skips approval, is caught rather than waved through. The point is that a reference string is not authority until something downstream resolves it.

Two failure modes are worth naming because they are specific to tool-using agents. The first is the confused deputy: a call that asks for a scope wider than its task needs (*, account_full_access) is rejected before it runs, so a tool cannot quietly borrow more authority than it was granted. The second is tool-output-as-instruction, the prompt-injection shape where text returned by an untrusted tool is obeyed as a command. Here untrusted output must stay data and cite an instruction/data split; a row that lets output become instruction is one of the eight negative cases the fixture is built to catch.

This is deliberately a synthetic replay, not a live test. The component never opens an MCP account, calls a provider, or handles a account secret. It reads only public metadata and digests, and it keeps every payload, result body, and account secret out of the result records it writes. What it offers is narrow and honest: a way to check that a tool-authority story is internally consistent and metadata-only, not a certificate that any real tool integration is secure.

Shape

JSON bundle authorityJSON bundle authorityMarkdownmechanism source rowmechanism source rowMCP authority runtimeMCP authority runtimefirst-wave fixturefirst-wave fixtureexported authority bundleexported authority bundletool manifesttool manifestscoped tool callsscoped tool callstool result rowstool result rowsside-effect ledgerside-effect ledgercold replay rowscold replay rowspublic trace spanspublic trace spanssource-module body floorsource-module body floormetadata-only result recordsmetadata-only result recordsscope limitscope limit
Diagram source
flowchart TD bundle["JSON bundle authority"] markdown["Markdown reader projection"] mechanism["mechanism source row"] component["MCP authority runtime"] fixture["first-wave fixture"] bundle["exported authority bundle"] manifest["tool manifest"] calls["scoped tool calls"] results["tool result rows"] side_effects["side-effect ledger"] replay["cold replay rows"] trace["public trace spans"] source_modules["source-module body floor"] result records["metadata-only result records"] ceiling["scope limit"] bundle --> markdown bundle --> mechanism mechanism --> component component --> fixture component --> bundle bundle --> manifest manifest --> calls calls --> results calls --> side_effects side_effects --> replay results --> trace replay --> trace source_modules --> trace trace --> result records result records --> ceiling

The module's shape is a public tool-authority replay, not a live MCP security claim.

Technical Mechanism

The replay is a fail-closed authority lattice over a synthetic MCP-like tool story. _build_result loads the fixture or exported bundle, runs load_forbidden_classes and scan_paths over input JSON and copied source modules, then validates each contract plane separately: projection protocol, tool policy, tool manifest, tool calls, tool results, side-effect ledger, cold replay rows, public trace spans, and source-module manifest rows. The final status is pass only when every sub-validator passes, no expected negative case is missing, the secret scan has zero blocking hits, and the source-module floor is either present when required or explicitly not required for the first-wave fixture.

Tool authority is checked before prose can turn it into evidence. Manifest rows define the declared tool ids and allowed tool classes. validate_tool_calls then rejects undeclared tool ids, overbroad scopes, missing argument hashes, hidden account secret export, live account access, unapproved side effects, tool-output-as-instruction, final-answer-only grading, and unredacted payload export. Write-capable calls must carry approval token refs, side-effect ledger refs, and rollback result record refs; untrusted-result calls must keep instruction and data boundaries explicit. validate_side_effect_ledger and validate_cold_replay make those refs observable instead of leaving them as decorative strings.

The exported-bundle path adds a body-floor check without moving bodies into result records. _source_module_manifest_result streams digest verification over each copied source source module, checks required anchors, requires body_in_receipt: false, and reports a metadata-only source-open import summary. build_public_mcp_tool_authority_trace contributes three public trace spans for the tool calls; _body_import_verification binds that public refactor back to the source source and Microcosm target digests. _write_receipts emits the result, board, validation result record, and sign-off result record for fixture runs, and result_card deliberately exposes counts, status bits, digest freshness, and omission result records rather than tool rows or source bodies.

The mechanism is intentionally narrower than a tool-use security benchmark. It accepts only public metadata and digest evidence, and it treats every generated projection as a result record over source rows rather than as source authority.

Public Mechanics

  • Every tool call must bind to a narrow capability scope ref before admission.
  • Write-capable calls require approval token refs, side-effect ledger refs, and rollback result record refs.
  • Untrusted tool output is data, not instruction, and must cite an instruction/data split ref.
  • Call arguments, tool outputs, account refs, and result bodies stay redacted or metadata-only.
  • Overbroad scopes, hidden account secret export, tool-output-as-instruction, unapproved side effects, live account access, final-answer-only grading, missing rollback result records, and unredacted tool payloads are expected falsification fixtures.

Reader Evidence Routing

  • Bundle route: core/paper_module_capsules.json::paper_modules[39:paper_module.mcp_tool_authority_replay] is the JSON authority row; a diagram view and an atlas card are generated for this module from the source record.
  • Mechanism route: core/mechanism_sources.json::mechanism.mcp_tool_authority_replay.validates_public_mcp_tool_authority_replay binds the code locus, fixture refs, exported bundle refs, result record refs, validator commands, focused regression, and guardrails.
  • Runtime route: src/microcosm_core/organs/mcp_tool_authority_replay.py owns run, run_tool_authority_bundle, _build_result, _write_receipts, result_card, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
  • Exported-bundle route: examples/mcp_tool_authority_replay/exported_mcp_tool_authority_bundle contains bundle_manifest.json, projection_protocol.json, tool_policy.json, tool_manifest.json, tool_calls.json, tool_results.json, side_effect_ledger.json, cold_replay.json, and source_module_manifest.json.
  • Source-module route: source_module_manifest.json records seven copied public source body rows, while the exported source-open body summary exposes at least six body materials. The floor includes high-novelty and extracted-pattern evidence, agent execution trace runtime and standard bodies, route-readiness standard material, mission-transaction preflight internal control material, and the strict JSON helper. Result records carry refs, digests, counts, and status only.
  • Focused-test route: tests/test_mcp_tool_authority_replay.py verifies negative cases, public-relative redacted result records, exported-bundle runtime shape, source-module digest failures, exact copied source bodies, card result record reuse, and public trace span construction.

Named Proof Consumers

  • First-wave fixture consumer: PYTHONPATH=src ../repo-python -m microcosm_core.components.mcp_tool_authority_replay run --input fixtures/first_wave/mcp_tool_authority_replay/input --out /tmp/microcosm-mcp-tool-authority-replay/fixture --sign-off-out /tmp/microcosm-mcp-tool-authority-replay/sign-off.json --card consumes the fixture route, expected negative cases, secret scan, public trace construction, scope limit, metadata-only result record writer, and command-card omission contract.
  • Exported-bundle consumer: PYTHONPATH=src ../repo-python -m microcosm_core.organs.mcp_tool_authority_replay run-tool-authority-bundle --input examples/mcp_tool_authority_replay/exported_mcp_tool_authority_bundle --out /tmp/microcosm-mcp-tool-authority-replay/bundle --card consumes the public bundle, source-module manifest, copied body-floor digest checks, public trace spans, metadata-only exported-bundle result record, and fresh card reuse path.
  • Focused regression consumer: PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_mcp_tool_authority_replay.py -q pins the negative-case matrix, undeclared-tool rejection, redacted/public result record paths, source-module digest mismatch failures, exact copied source bodies, public trace span coverage, and card omission behavior.
  • This check excludes hand-editing generated projections; it is a read-only result record for this Markdown slice.

Prior Art Grounding

This component is grounded in capability security, least privilege, and current MCP authorization guidance. The classic security lineage is Saltzer and Schroeder's Protection of Information in Computer Systems and Hardy's Confused Deputy: authority should be narrow, mediated, and bound to the object/action being requested. The MCP-specific lineage is the official MCP authorization and security best practices guidance, especially least-privilege scopes and token audience boundaries.

Microcosm does not claim live MCP security or account certification. It borrows the prior-art authority shape and makes it replayable: every public tool story must expose capability scope refs, approval refs, side-effect refs, rollback refs, instruction/data split refs, and scope boundaries before write authority is treated as evidence.

Validation Result record Path

Run the first-wave fixture into disposable result records from the Microcosm root:

Run the exported bundle through the same component:

cd microcosm-substrate
PYTHONPATH=src ../repo-python -m microcosm_core.organs.mcp_tool_authority_replay run-tool-authority-bundle --input examples/mcp_tool_authority_replay/exported_mcp_tool_authority_bundle --out /tmp/microcosm_mcp_tool_authority_bundle
cd microcosm-substrate
PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_mcp_tool_authority_replay.py -q
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This module may claim only that a synthetic public MCP-like replay preserved tool-authority boundaries over metadata rows: capability scopes, argument hashes, approval refs, side-effect refs, rollback refs, instruction/data split refs, cold replay refs, source-module digests, negative cases, and metadata-only validation result records.

It must not claim live MCP account safety, account secret-handling certification, live tool behavior, provider behavior, benchmark security, source-file changes, publishing-scope decision, complete security, or launch-scope decision.

Scope boundary

This module does not access live MCP accounts, export account secrets or model-output data, obey tool output as instruction, run live tools, change source files, claim benchmark safety, publish results, or include launch operations.

Source and projection details
Governing Lattice Relation

The source record declares concept.agent_reliability_and_safety_validator_bundle, principles P-4 and P-16, and axiom AX-3 as the governing lattice. The generated component row repeats the paper and mechanism links and adds an component-level P-18 relation; that component relation is useful context but does not expand the paper-module bundle's declared authority.

P-4 and AX-3 are the local authority rule: a tool handle, account secret-shaped string, role name, or trusted-session label is bounded evidence of authority. The runtime must derive authority from dereferenced manifest policy, capability scope refs, approval refs, side-effect refs, rollback refs, cold replay refs, and public trace spans. P-16 supplies the transaction boundary: a write-capable tool call is admissible only when the call is scoped, the side-effect is ledgered, rollback evidence is present, and the result record says which scope limit still holds.

The mechanism row deliberately leaves sibling/upstream mechanism relations as residual pressure. That residual is part of the scope limit: this module can show a public replay lattice for synthetic MCP-like tool authority, but it does not infer neighbouring mechanisms from prose, certify live MCP security, or promote generated Mermaid, Atlas, site, or corpus projections into source authority.

Source-Open Body Floor

The exported bundle carries copied source bodies under source_modules/, governed by source_module_manifest.json. The manifest records source refs, target refs, hashes, material classes, required anchors, and result record body exclusions for:

  • state/microcosm_portfolio/reconstruction/high_novelty_substrate_gap_scout_v1.json
  • state/microcosm_portfolio/extracted_patterns_ledger.jsonl
  • system/lib/agent_execution_trace.py
  • codex/standards/std_agent_execution_trace.json
  • codex/standards/std_extracted_pattern_route_readiness.json
  • tools/meta/control/mission_transaction_preflight.py
  • system/lib/strict_json.py

The floor is source-open body evidence, not live-account or provider authority. Result records and command cards expose refs, digest status, counts, and verdicts only; they do not embed copied source bodies or private/live payload material.

Belief State Process Reward ReplayChecks that each step reward in a recorded run cites a declared verifier-feedback row, not a trick.3/5

Does Takes a recorded, synthetic bundle of agent steps on three partially-observable toy tasks (a terminal investigation, a mock purchase, a small planner) and checks that every "the agent did the right step" reward is actually backed by a checkable verifier or observed feedback reference, not by hidden reasoning, formatting tricks, a smuggled answer key, or a final-answer-only score. It does not run or watch a live agent; it validates pre-recorded files. The resulting result record files show, per step, the belief summary, the reward, and whether the reward-hacking and replay checks passed, so it is inspectable why each reward was or was not allowed.

Scope limit It only checks that the projection's accounting lines up under its own schema rules over recorded synthetic fixtures; it excludes hidden-reasoning export, RL training, hidden gold or neural-judge-only labels, benchmark-performance claims, external model access, source-file changes, or launch, and proves nothing about real-world reward, live agent behavior, or domain-level conclusions.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.belief_state_process_reward_replay run --input fixtures/first_wave/belief_state_process_reward_replay/input --out .microcosm/belief_state_process_reward_replay

EvidenceComputed projectionevidence 3/5Source-faithful refactor

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Belief-State Process Reward Replay

This module is the public Microcosm projection of a belief-state process reward claim contract. It is backed by the public agent-execution trace refactor lane plus copied source source bodies. It is not a hidden-reasoning export, live RL run, neural-judge-only label set, hidden-gold benchmark, external model access, source-file changes, benchmark-score claim, or launch claim.

The public bundle models three partially observable tasks: terminal investigation, mock purchase, and formal-planning toy. A process-reward claim is admitted only when public observation digests, typed belief-state summaries, predicted next evidence, verifier or observed feedback refs, belief-discrepancy scores, dense process rewards, outcome rewards, reward-hacking trap results, trajectory groups, cold replay result records, negative cases, scope limits, and a source-faithful public trace span set line up.

Purpose

Process-reward language is easy to assert and hard to verify. A row can claim that a step earned a reward "for good reasoning" while the underlying evidence is a hidden gold label, a neural-judge guess, or formatting that gamed the scorer. This component exists to answer one narrow question: does a public process-reward claim actually reconstruct from lower-level public evidence, or is it just a label asserting its own correctness?

The interesting part is the recomputation. The validator does not trust any single fixture file. If any of those refs is missing or points somewhere inconsistent, the claim is blocked with a specific reason code rather than passed. A reward cannot point at a belief that points at a different episode, or cite feedback that belongs to another trajectory, and still count.

That cross-referential check is what separates this from a shape linter. The failure mode it guards against is a process-reward claim that looks correct field by field but does not survive being recomputed end to end. Two further design choices keep the result honest: outcome rewards are carried beside process rewards so a final answer cannot be re-labelled as step-level evidence, and every belief summary, feedback ref, and reward event stays metadata-only, so the validator proves the accounting structure without ever reading hidden reasoning.

Shape

The local governing standard is standards/std_microcosm_belief_state_process_reward_replay.json, whose authority boundary is synthetic belief-state process-reward replay only, not live training, benchmark, provider, source-file changes, public sharing, or launch-scope decision.

yesnoJSON source recordpaper_module_bundles.json[36]JSON source record paper_module_bundles.json[36]Local standardLocal standardRuntime locusRuntime locusrun (fixture mode)8 positive + 7 negativeinputsrun (fixture mode) 8 positive + 7 negative inputsrun_reward_bundle (bundlemode)copied-body manifest floorrequiredrun_reward_bundle (bundle mode) copied-body manifest floor requiredPer-file floorsprojection protocol, rewardpolicy,episodes, belief states,feedback,rewards, trajectory groups,cold replayPer-file floors projection protocol, reward policy, episodes, belief states, feedback, rewards, trajectory groups, cold replaySemantic recompute> trajectory -> outcomereward -> cold replaySemantic recompute > trajectory -> outcome reward -> cold replayNegative cases7 planted traps must beobservedNegative cases 7 planted traps must be observedSecret-exclusion scanplus metadata-only publictrace span setSecret-exclusion scan plus metadata-only public trace span setAll floors pass,chain recomputes,every trap observed,no secret hit?All floors pass, chain recomputes, every trap observed, no secret hit?status: passstatus: passstatus: blockedwith reason codesstatus: blocked with reason codesResult records + compact cardrefs, hashes, counts,verdicts;body_in_receipt falseResult records + compact card refs, hashes, counts, verdicts; body_in_receipt falseScope limitsource-faithful replay onlyScope limit source-faithful replay only

Source refs

Local standard
std_microcosm_belief_state_process_reward_replay.json
Runtime locus
belief_state_process_reward_replay.py
Diagram source
flowchart TD bundle["JSON source record paper_module_capsules.json[36]"] standard["Local standard std_microcosm_belief_state_process_reward_replay.json"] component["Runtime locus belief_state_process_reward_replay.py"] fixtureMode["run (fixture mode) 8 positive + 7 negative inputs"] bundleMode["run_reward_bundle (bundle mode) copied-body manifest floor required"] floors["Per-file floors projection protocol, reward policy, episodes, belief states, feedback, rewards, trajectory groups, cold replay"] recompute["Semantic recompute rebuild belief -> feedback -> process reward -> trajectory -> outcome reward -> cold replay"] negatives["Negative cases 7 planted traps must be observed"] scan["Secret-exclusion scan plus metadata-only public trace span set"] gate{"All floors pass, chain recomputes, every trap observed, no secret hit?"} pass["status: pass"] blocked["status: blocked with reason codes"] result records["Result records + compact card refs, hashes, counts, verdicts; body_in_receipt false"] ceiling["Scope limit source-faithful replay only"] bundle --> component standard --> component component --> fixtureMode component --> bundleMode fixtureMode --> floors bundleMode --> floors floors --> recompute recompute --> negatives negatives --> scan scan --> gate gate -->|yes| pass gate -->|no| blocked pass --> result records blocked --> result records result records --> ceiling

The generated instance reports eight relationship edges and zero unpopulated selective relations: it explains the belief_state_process_reward_replay component and the validating mechanism, is governed by P-1, P-2, and concept.agent_reliability_and_safety_validator_bundle, abides by AX-1, depends on paper_module.agent_route_observability_runtime, and cites src/microcosm_core/organs/belief_state_process_reward_replay.py as the resolved code locus. The component atlas adds the human/agent gloss and result record set; it classifies the evidence as algorithmic_projection and restates that the validator operates on recorded synthetic fixtures rather than live agent behavior.

The fixture manifest fixtures/first_wave/belief_state_process_reward_replay/fixture_manifest.json names eight positive input files and seven planted negative cases: hidden-chain-of-thought export, neural-judge-only labels, hidden gold labels, reward-by-formatting, verifier bypass, benchmark-performance claims, and final-answer-only scoring. The exported bundle manifest carries the source-open body floor: seven copied source modules under source_modules/, checked by digest and anchor refs, with body_text_exported_in_receipts: false. The focused test file tests/test_belief_state_process_reward_replay.py covers the fixture validator, exported bundle validator, public-root copy, negative cases, exact source body imports, and route/result record shape.

The honest ceiling is therefore narrow: this module can support public, metadata-only belief-state process-reward replay over synthetic tasks with verifier-backed process feedback, separated outcome rewards, cold replay, and negative-case coverage. It cannot support hidden reasoning export, live RL, reward-model quality, hidden-gold benchmark claims, provider behavior, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.

Reader Evidence Routing

Read this page from the structured bindings outward. The bindings name the component, mechanism, concept, dependency module, runtime code locus, principle and axiom refs. The fixture result records, bundle result records, and focused test then show the metadata-only replay behavior. This page explains that chain for readers.

Technical Mechanism

The runtime validator is a two-mode replay checker. In fixture mode, run loads eight positive fixture files plus the seven planted negative inputs named by EXPECTED_NEGATIVE_CASES; _build_result then validates the projection protocol, reward policy, task episodes, belief states, verifier feedback, reward events, trajectory groups, cold replay, negative cases, secret-exclusion scan, and public trace projection before _write_receipts writes the result, board, validation, and sign-off result records. A pass requires all required positive floors to pass, every expected negative case to be observed, zero secret-scan blocking hits, public trace status pass, and no positive finding outside the expected falsification cases.

In exported-bundle mode, run_reward_bundle validates the public bundle without negative inputs and makes the copied-body floor mandatory. The source_module_manifest.json path must declare seven copied source body modules, body_in_receipt: false, body_text_in_receipt: false, public material classes, exact source/target digests, existing copied targets, and all declared anchors. Digest mismatch, missing manifest, wrong body class, result record-body leakage, count mismatch, missing target, and missing anchor cases block the bundle instead of degrading silently.

Between the per-file floors and the result records sits validate_semantic_recompute, which is where most of the real work happens. It checks that the cited feedback belongs to the same episode, that the process reward references the same belief, episode, trajectory, feedback ref, and belief-discrepancy value, that the trajectory actually lists that episode and that reward, that the trajectory's outcome reward is a real outcome event for the same episode, and that the cold replay both exists and passes. Any inconsistency appends a precise reason code such as feedback_episode_mismatch, belief_discrepancy_mismatch, or trajectory_process_reward_missing, and a single blocked row is enough to block the whole result. This is the check that a label-only fixture cannot fake: the references have to recompute into one coherent chain, not merely be present.

The validator links process-reward claims to public belief summaries rather than private reasoning. build_public_belief_state_process_reward_trace emits six metadata-only trace spans from the exported bundle, and the card path reports only compact counts, status, freshness digest, source-body floor metadata, and result record refs. CARD_OMITTED_FULL_PAYLOAD_KEYS keeps findings, scans, trace bodies, row payloads, source imports, scope limit, and scope boundary text out of the command card so public surfaces carry proof handles rather than copied private or source bodies.

Named Proof Consumers

  • microcosm_core.organs.belief_state_process_reward_replay.run is the first-wave fixture consumer. It writes result, board, validation, and sign-off result records for the synthetic episodes and negative-case floor.
  • microcosm_core.organs.belief_state_process_reward_replay.run_reward_bundle is the exported bundle consumer. It validates copied source bodies, public trace spans, digest/anchor contracts, and metadata-only result record behavior.
  • microcosm_core.organs.belief_state_process_reward_replay.result_card is the compact public card consumer. It reports counts and validation state while omitting the heavy/private payload classes named by CARD_OMITTED_FULL_PAYLOAD_KEYS.
  • tests/test_belief_state_process_reward_replay.py is the focused regression consumer. It asserts the three episode groups, six belief states, six process rewards, three outcome rewards, three cold replays, seven expected negative cases, exact source-body imports, digest mismatch blockers, manifest-boundary blockers, public-relative redacted result records, fresh-card reuse, and metadata-only public trace projection.
  • microcosm_core.macro_tools.agent_execution_trace.build_public_belief_state_process_reward_trace is the trace-projection consumer. It converts the exported bundle into six public spans with belief-state, feedback, process-reward, outcome-reward, and cold-replay coverage while preserving body_in_receipt: false.

Public Mechanics

  • Belief-state JSON is a public summary, not hidden chain-of-thought.
  • Process rewards must cite deterministic verifier or observed environment feedback refs; neural-judge-only labels are rejected.
  • Outcome rewards are carried beside process rewards so final answers cannot masquerade as process evidence.
  • Reward-hacking traps and cold replay result records must pass for each trajectory group.
  • microcosm_core.macro_tools.agent_execution_trace:: build_public_belief_state_process_reward_trace turns the public bundle into ordered trace spans that preserve belief, verifier, process-reward, outcome-reward, and cold-replay refs while keeping bodies out of result records.
  • examples/belief_state_process_reward_replay/ exported_belief_state_process_reward_bundle/source_module_manifest.json verifies exact copied source bodies for the extracted-pattern ledger, high-novelty reconstruction result record, canonical component model, agent-execution trace runtime, trace standard, and route-readiness checker. Those bodies live in source_modules/; result records carry refs, hashes, counts, and verdicts only.
  • Hidden reasoning export, hidden gold labels, reward-by-formatting, verifier bypass, benchmark-performance claims, and final-answer-only scoring are expected falsification fixtures.

Prior Art Grounding

This component is grounded in three older ideas: belief-state tracking under partial observability, process supervision, and reward-hacking controls. The belief lineage comes from POMDP work such as Kaelbling, Littman, and Cassandra's Planning and Acting in Partially Observable Stochastic Domains. The process-reward lineage follows OpenAI's Let's Verify Step by Step, where step-level feedback is separated from outcome-only supervision. The reward-hacking lineage comes from Concrete Problems in AI Safety and related work on specification gaming.

Microcosm does not train a reward model or expose hidden reasoning. It borrows the accounting form: public belief summaries, verifier-backed process feedback, outcome rewards kept separate from process rewards, reward-hacking traps, and cold replay result records before process-reward language is admitted.

Validation Result record Path

Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:

Then run the exported bundle validator:

cd microcosm-substrate && PYTHONPATH=src ../repo-python -m microcosm_core.organs.belief_state_process_reward_replay run-reward-bundle --input examples/belief_state_process_reward_replay/exported_belief_state_process_reward_bundle --out /tmp/belief_state_process_reward_bundle_receipt --card > /tmp/belief_state_process_reward_bundle_card.json

Scope boundary

Limitations

The replay is intentionally small and synthetic. It covers three partially observable task families, six accepted belief summaries, six process rewards, three outcome rewards, three trajectory groups, three cold replays, seven negative cases in fixture mode, and seven copied source modules in exported-bundle mode. Those counts are proof boundaries, not scale claims. They show that the public validator can separate belief summaries, verifier-backed process feedback, outcome rewards, reward-hacking traps, and cold replay under declared fixtures.

The mechanism does not estimate reward-model calibration, generalize to unseen tasks, compare agent policies, certify live training behavior, or score a benchmark. build_public_belief_state_process_reward_trace emits metadata-only trace spans, so it can prove trace structure and privacy boundaries, not hidden reasoning fidelity. The copied-source manifest proves exact declared public source bodies and anchors for this bundle; it excludes private source root export, external model access, source-file changes, public sharing, or launch.

Scope limit

Source-faithful refactored fixtures and copied source bodies only; not fixture-echo product evidence, hidden reasoning export, live RL training, neural-judge sufficiency, hidden-gold benchmarking, provider behavior, benchmark claims, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.

Scope limit

This paper module can claim a metadata-only belief-state process-reward replay over synthetic tasks, with public belief summaries, verifier-backed process feedback, separated outcome rewards, reward-hacking traps, and cold replay result records.

It cannot claim hidden reasoning export, live RL training, reward-model quality, hidden-gold benchmark claims, provider behavior, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness. Any higher claim must land first in core/paper_module_capsules.json and the generated paper-module projection.

Scope boundary

This module does not export hidden reasoning, run RL or train a model, use hidden gold labels, rely on neural-judge-only labels, claim benchmark performance, use external model services, change source files, publish results, or include launch operations.

Source and projection details
Governing Lattice Relation

The governing lattice relation is that belief-state process-reward language is admissible only after the runtime recomputes the claim from lower-level public evidence. The generated JSON instance resolves eight edges and leaves no selective relation open: the bundle explains the belief_state_process_reward_replay component and mechanism.belief_state_process_reward_replay.validates_public_belief_state_process_reward_replay, is governed by concept.agent_reliability_and_safety_validator_bundle, P-1, and P-2, abides by AX-1, depends on paper_module.agent_route_observability_runtime, and cites src/microcosm_core/organs/belief_state_process_reward_replay.py as the code locus.

That relation matters because the module is not trying to make reward quality plausible from a label. P-1 requires recomputation rather than echoing fixture verdicts, so _build_result rechecks projection protocol, reward policy, episodes, belief rows, feedback rows, reward rows, trajectory groups, cold replay, expected negative cases, secret-exclusion scans, public trace shape, and copied source-module manifests. P-2 and AX-1 then lower the paper claim to what those checks derive: a local replay certificate over declared public inputs. The focused proof consumer is tests/test_belief_state_process_reward_replay.py, which exercises both fixture and exported-bundle modes, mutates real positive feedback linkage, rejects digest and manifest boundary violations, verifies exact source-body imports, checks freshness over live source authority, and confirms the command card omits full payload keys.

Agent Sandbox Policy Escape ReplayMaps sandboxed agent actions to show each was approved or blocked before running, then rolled back.3/5

Does This takes a synthetic record of an agent attempting risky actions inside a sandbox and lays out, step by step, what each action requested, whether a safety policy approved or blocked it before it would have run, what (if anything) changed afterward, and for the actions that did run, whether the change was rolled back and could be re-checked. The record shows exactly how each containment decision is captured: every blocked attempt is still logged as a traced step but is marked as never executed with no resulting change, all from local files with no real secrets, network, or live agent involved.

Scope limit It validates the projection / trace-refactor mechanics over a synthetic fixture only; it excludes live sandbox escape, secret or account secret handling, live network access, host filesystem mutation, executable payload export, raw environment export, external model access, security benchmark claims, source-file changes, or launch. A pass proves the projection boundary and trace-refactor mechanics for this contract, not real sandbox security, exploit resistance, or whole-system safety.

Run
PYTHONPATH=src python3 -m microcosm_core.cli agent-sandbox-policy-escape-replay run-sandbox-bundle --input examples/agent_sandbox_policy_escape_replay/exported_sandbox_policy_escape_bundle --out receipts/runtime_shell/demo_project/organs/agent_sandbox_policy_escape_replay

EvidenceComputed projectionevidence 3/5Source-faithful refactor

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Agent Sandbox Policy-Escape Replay

agent_sandbox_policy_escape_replay is a validator-backed public refactor of the source agent_execution_trace system for sandbox/security claims. It asks a narrow question: can Microcosm compute metadata-only trace spans from action requests, pre-execution policy verdicts, side-effect diff result records, rollback result records, cold replay, falsification fixtures, and an explicit scope limit?

Purpose

Agent sandbox claims are easy to assert and hard to evidence. A system can say it blocks an untrusted action, or that a side effect was rolled back, without ever showing the record that would make the statement checkable. This component exists to hold one narrow claim to account: that for a fixed set of synthetic action requests, the policy decision was recorded before execution, blocked actions left no side effect, and permitted side effects each carry a diff and a rollback result record.

The interesting choice is that the validator does not trust the verdict it is handed. For each request it derives the expected verdict from the request's own shape, its action kind, requested capability, risk class, and source trust label, using a small fixed policy table (_derived_sandbox_policy_verdict). A secret read from untrusted tool output derives to block; a low-risk public fixture write derives to allow; a mock database update derives to review. Any action shape the table does not recognise fails closed to block. The recorded verdict row is then checked against that derivation. A declared allow that should have derived to block is a finding, not a pass. The same fail-closed semantics drive the side-effect check: a request whose shape requires a block must show no execution and a zero diff count regardless of what the verdict row claims.

Because every check runs over symbolic references, the page can report concrete numbers, six action requests, four derived blocks, six metadata-only trace spans, while staying honest about what those numbers are not. They demonstrate that the pre-execution accounting pattern is wired and replayable over public fixtures. They do not demonstrate containment in a real host, resistance to a live exploit, or network isolation. That gap is the point of the scope limit below, and it is the line this component is built to keep visible.

Named Proof Consumers

The primary proof consumer is tests/test_agent_sandbox_policy_escape_replay.py. Its 17 tests exercise both runtime entry points (run and run_sandbox_bundle) and the public trace builder from microcosm_core.macro_tools.agent_execution_trace. The consumer does not accept declared labels at face value: it mutates policy verdicts, side-effect rows, cold replay labels, exported bundle rows, source-module digests, source/target manifest fields, body-boundary fields, and cached-card freshness to prove that the validator recomputes the sandbox-policy result from source rows.

The fixture proof path is microcosm_core.organs.agent_sandbox_policy_escape_replay run. Its success result record must include six action requests, six pre-execution verdicts, four derived blocked rows, one derived allow row, one derived review row, six side-effect result records, two rollback-verified rows, six cold replay passes, all expected negative cases, and a six-span public trace. The negative-case rows are not auxiliary examples; they are the admission boundary that rejects semantic policy drift, blocked-action execution, executable escape payload material, tool-output authority bypass, raw environment exposure, and broad security or benchmark overclaim.

The exported-bundle proof path is microcosm_core.cli agent-sandbox-policy-escape-replay run-sandbox-bundle. It has no fixture-only negative cases, so its proof surface shifts to bundle shape: the bundle id, source-module manifest, seven copied source bodies, digest equality, required anchors, metadata-only result records, public-relative paths, and public trace spans must all validate. The same test file also breaks the manifest in targeted ways to prove that missing manifests, invalid material classes, body-in-result record flags, count mismatches, missing target copies, and partial source or target digest drift block the result.

The corpus proof consumer is scripts/build_doctrine_projection.py --check-paper-module-corpus. It proves only that this reader page remains aligned with the JSON bundle-backed paper-module corpus. It does not refresh generated Mermaid, Atlas, site, or verifier projections, and it does not raise the claim above public fixture and bundle replay evidence.

Technical Mechanism

The mechanism is a validating transducer over public refs, not a sandbox. The runtime entry points run and run_sandbox_bundle load the fixture or exported bundle, then _build_result recomputes each claim from lower-level rows before any result record is accepted. The named proof consumer is tests/test_agent_sandbox_policy_escape_replay.py, with the corpus-level projection consumer scripts/build_doctrine_projection.py --check-paper-module-corpus.

The validator first establishes an input boundary. _load_payloads reads the projection protocol, sandbox policy, action requests, verdicts, side-effect result records, rollback result records, and cold replay rows with strict JSON parsing. scan_paths checks the same public files and copied source-module bodies against core/private_state_forbidden_classes.json, while _source_module_manifest_result verifies that the seven copied source bodies are present, digest-matched, by material class, and excluded from result record body fields.

The policy mechanism is then recomputed row by row. validate_action_requests admits only symbolic request metadata with redacted bodies and no live network target. validate_policy_verdicts joins each request to a pre-execution verdict and checks the verdict against the request's risk class instead of trusting the declared label. validate_side_effect_receipts enforces the mechanical consequence: block decisions must have no execution and no diff, while allow/review decisions must carry a non-empty public diff result record. validate_rollback_receipts requires rollback refs for side-effecting actions, and validate_cold_replay requires replay rows that reproduce verdict and side-effect state.

The trace layer converts accepted public rows into metadata-only PublicTraceSpan records through build_public_sandbox_policy_trace. Each span carries a request id, authority verdict ref, side-effect or rollback ref, outcome, digest, and sandbox_policy_action tool label. This is why the module can claim six public trace spans and outcome counts, but cannot claim live sandbox security: the trace proves replay consistency over refs, not containment in a real host environment.

Negative cases define the refusal surface. The focused test suite mutates the fixture and exported bundle to verify semantic mismatch, blocked-action execution, source-module digest mismatch, source-module manifest boundary breakage, public-relative result record paths, and card reuse. These tests are the source-bound evidence that the validator fails closed for the named public contract.

Shape

JSON bundle authorityJSON bundle authorityMarkdownsource module manifestsource module manifestsix action requestssix action requestssix pre-execution verdictssix pre-execution verdictssix side-effect resultrecordssix side-effect result recordstwo rollback result recordstwo rollback result recordssix cold replay rowssix cold replay rowspublic agent-execution tracepublic agent-execution tracenegative-case refusalsnegative-case refusalsfocused proof consumerfocused proof consumermetadata-only validationresult recordmetadata-only validation result recordscope limitsscope limits
Diagram source
flowchart TD bundle["JSON bundle authority"] markdown["Markdown reader projection"] manifest["source module manifest"] requests["six action requests"] verdicts["six pre-execution verdicts"] effects["six side-effect result records"] rollbacks["two rollback result records"] replay["six cold replay rows"] trace["public agent-execution trace"] negative["negative-case refusals"] tests["focused proof consumer"] result record["metadata-only validation result record"] ceiling["scope limits"] bundle --> markdown manifest --> result record requests --> verdicts verdicts --> effects effects --> rollbacks effects --> replay verdicts --> negative effects --> negative replay --> trace negative --> tests trace --> result record tests --> result record result record --> ceiling

The module's shape is pre-execution containment accounting. Public action requests are normalized into symbolic refs, policy verdicts must exist before execution, blocked actions carry zero side effects, allowed or reviewed side effects need diff refs and rollback refs, cold replay must reproduce the public state, and the trace builder emits metadata-only spans over those refs without promoting the fixture into live sandbox-security authority.

Reader Evidence Routing

  • Bundle route: core/paper_module_capsules.json::paper_modules[35] is the bundle-backed authority row, and paper_modules/agent_sandbox_policy_escape_replay.json is the generated paper-module instance.
  • Bundle route: examples/agent_sandbox_policy_escape_replay/exported_sandbox_policy_escape_bundle carries action_requests.json, policy_verdicts.json, side_effect_receipts.json, rollback_receipts.json, cold_replay.json, sandbox_policy.json, projection_protocol.json, and source_module_manifest.json.
  • Action route: the six public request ids are req_secret_read_attempt, req_network_exfil_attempt, req_destructive_delete_attempt, req_shell_obfuscation_attempt, req_safe_file_edit, and req_reviewed_mock_db_update.
  • Verdict route: the six verdict rows are pre-execution policy decisions under sandbox-policy-v1-public-synthetic, with four block, one allow, and one review outcome.
  • Side-effect route: all six requests have side-effect result records; blocked rows use zero diffs, while the public fixture edit and reviewed mock database update carry diff refs plus rollback refs.
  • Manifest route: source_module_manifest.json records seven copied source bodies, body_in_receipt: false, body_text_in_receipt: false, and the boundary excluding keys, account secrets, account or browser material, model-output data, live network payloads, raw environments, executable escape payloads, and account secret-equivalent material.
  • Runtime route: src/microcosm_core/organs/agent_sandbox_policy_escape_replay.py, src/microcosm_core/macro_tools/agent_execution_trace.py, and tests/test_agent_sandbox_policy_escape_replay.py verify negative cases, public trace-span construction, exact source-module imports, manifest digest rejection, result record public-relativity, and card result record reuse.

Contract

  • Input shape: projection_protocol, sandbox_policy, action_requests, policy_verdicts, side_effect_receipts, rollback_receipts, and cold_replay.
  • Positive evidence: six metadata-only action requests converted into public agent_execution_trace spans, six pre-execution policy verdicts, six side-effect result records, two verified rollback result records, and six cold replay rows.
  • Negative cases: real secret material, live network access, raw environment export, policy after execution, unlogged side effect, tool-output policy bypass, executable escape payload, and security benchmark claim.
  • Result record boundary: the validation result record proves the source-faithful trace refactor mechanics, negative-case coverage, secret-exclusion scan, and scope limit.
  • Scope limit: no live sandbox escape, live secret handling, live network access, host filesystem mutation, executable payload export, raw environment export, external model access, security benchmark claim, source-file changes, or launch-scope decision.

Projection Protocol

Copied: the public shape of the source agent-execution trace membrane and the idea that containment must be proven before a security claim is admitted.

Source-faithfully refactored: PublicTraceSpan construction, sequence-ordered trace rows, authority verdict refs, side-effect and rollback refs, public summary counts, trace digests, local JSON validators, and result record generation.

Cleaned: real secrets, host paths, live network targets, raw environment data, executable payloads, provider data, and account state.

Omitted: live exploit material, hosted sandbox details, real account secrets, raw tool-output bodies, real filesystem paths, raw environment variables, and security benchmark claims claims.

Public runtime surface: a metadata-only sandbox policy bundle plus generated result records under receipts/first_wave/agent_sandbox_policy_escape_replay/ and receipts/runtime_shell/demo_project/organs/agent_sandbox_policy_escape_replay/.

Source-open body floor: the exported bundle carries source_module_manifest.json plus seven exact copied source bodies: the extracted-pattern ledger, the high novelty reconstruction result record, the canonical component model, the source system/lib/agent_execution_trace.py runtime, std_agent_execution_trace, the extracted-pattern route-readiness checker, and the strict JSON helper required by the refreshed trace body. Result records and cards cite the manifest, hashes, material classes, and counts only; full body text stays in the bundle source module files.

Prior Art Grounding

This component is grounded in least-privilege sandboxing and agent-security evaluation work, not in a new exploit technique. The security-control lineage is Saltzer and Schroeder's least-privilege / complete-mediation principles and capability-oriented confused-deputy thinking. The agent-evaluation lineage is closer to sandboxed tool-use benchmarks such as AgentDojo and misuse/harm evaluations such as AgentHarm, where an agent's requested actions, tool calls, and policy outcomes are evaluated under controlled conditions.

Microcosm does not claim real sandbox security, exploit resistance, or live environment isolation. It borrows the pattern that containment must be checked before action, side effects must be logged, rollback needs its own result record, and harmful payloads must stay out of the public surface.

Validation proves the projection boundary and public trace-refactor mechanics for this contract; it does not establish real sandbox security, live model behavior, benchmark claims, exploit resistance, or whole-system safety.

Validation Result records

The focused proof consumer is tests/test_agent_sandbox_policy_escape_replay.py. A passing result record has to show that the fixture validator and exported-bundle validator both recompute the public sandbox-policy trace from action-request refs, pre-execution verdict refs, side-effect result record refs, rollback refs, cold replay rows, and the source-module manifest instead of trusting declared labels.

PYTHONDONTWRITEBYTECODE=1 ./repo-pytest \
  tests/test_agent_sandbox_policy_escape_replay.py \
  -p no:cacheprovider
./repo-python scripts/build_doctrine_projection.py \
  --check-paper-module-corpus

For the focused test, the result record boundary is the asserted shape: six action requests, six policy verdicts, four blocked-without-execution rows, two verified rollback result records, six cold replay rows, six public trace spans, manifest digest checks, public-relative result record paths, and negative cases for semantic mismatch, blocked-action execution, digest mismatch, manifest-boundary breakage, and unsafe card reuse. For the corpus check, the result record is only parity evidence that the JSON bundle and generated paper-module instance still agree; it is not a live sandbox-security result.

Validation Result record Path

Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:

Then run the exported bundle validator:

cd microcosm-substrate && PYTHONPATH=src ../repo-python \
  -m microcosm_core.organs.agent_sandbox_policy_escape_replay \
  run-sandbox-bundle \
  --input examples/agent_sandbox_policy_escape_replay/exported_sandbox_policy_escape_bundle \
  --out /tmp/agent_sandbox_policy_escape_bundle_receipt \
  --card > /tmp/agent_sandbox_policy_escape_bundle_card.json

The focused regression test and corpus projection checks are:

cd microcosm-substrate && ../repo-pytest \
  tests/test_agent_sandbox_policy_escape_replay.py
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

The result record path proves pre-execution policy replay over public refs, not live sandbox security, exploit resistance, or host isolation.

Scope boundary

Scope limit

This module may claim public fixture evidence that action request refs, pre-execution policy verdict refs, side-effect result record refs, rollback refs, cold replay rows, metadata-only trace spans, source manifests, digest checks, secret-exclusion scans, negative cases, and validation result records are checked by the listed runtime witnesses.

This module may not claim live sandbox escape resistance, live secret handling, live network isolation, host filesystem mutation authority, executable payload export, raw environment export, provider behavior, security benchmark performance, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.

Source and projection details
Governing Lattice Relation

The JSON bundle binds this paper module to the component agent_sandbox_policy_escape_replay and to mechanism.agent_sandbox_policy_escape_replay.validates_public_sandbox_policy_trace. The mechanism row states the actual computation: check action requests, pre-execution policy verdicts, side-effect result records, rollback result records, cold replay rows, public trace spans, source-module manifest boundaries, secret-exclusion scans, and escape negative cases before writing bounded result records.

AX-1 supplies the axiom-level rule: derivation must precede assertion, and a claim cannot be stronger than the checker that accepted it. P-1 specializes that rule into recomputation over lower-level evidence instead of echoing fixture labels, declared verdicts, or public copy lines. P-2 lowers the module's public claim to the strength of the named validator, which is why the scope limit stops at metadata-only public sandbox-policy replay result records. The governing concept, concept.agent_reliability_and_safety_validator_bundle, groups this component with agent reliability and safety validators whose public value is bounded result record evidence, not a broad claim that agents are safe in the world.

Indirect Prompt Injection Information Flow Policy ReplayReplays an agent run to show untrusted text was gated before any sensitive action, leaking no secret.3/5

Does Replays a recorded sample agent episode (built from synthetic, metadata-only rows) and makes visible whether the instructions the agent treated as trusted were kept separate from untrusted web, tool, or browser text before any sensitive action was taken. Row by row, the record shows where untrusted text flowed, what the policy decided for each flow (allow / warn / block / review) before the action, and that the recorded outcome leaked no secret and disclosed no trusted context. It also bundles deliberately-bad cases it must reject (e.g. untrusted text reaching a sensitive action ungated, or a account secret being exfiltrated).

Scope limit Passing result records only show this projection satisfies the named information-flow contract over synthetic, redacted, metadata-only rows; they do not prove general prompt-injection robustness, benchmark performance, live account/tool/provider safety, hidden-message handling in a real system, source-file changes, or launch-scope decision.

Run
microcosm indirect-prompt-injection-information-flow-policy-replay run-prompt-injection-bundle --input examples/indirect_prompt_injection_information_flow_policy_replay/exported_prompt_injection_flow_bundle --out receipts/runtime_shell/demo_project/organs/indirect_prompt_injection_information_flow_policy_replay

EvidenceComputed projectionevidence 3/5Source-faithful refactor

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Indirect Prompt-Injection Information-Flow Policy Replay

This validator-backed claim contract admits one narrow public claim: a source-faithful public trace refactor separated trusted instructions from untrusted web/tool/browser text before any privileged action or answer claim was accepted.

The runnable contract requires source trust labels, taint labels, source-to-sink flow rows, pre-action policy verdicts, sanitized-output result records, cold replay, secret-exclusion scan, negative cases, a public agent-execution trace, and an explicit scope limit.

Purpose

An agent that reads web pages, tool output, or retrieved documents takes in text from sources it does not control. Indirect prompt injection is the case where that untrusted text carries an instruction, and the agent acts on it as if the operator had asked. This component exists to make one specific safety property checkable on a synthetic trace: untrusted text was kept separate from trusted instructions, and no untrusted source reached a privileged action without being gated first. It answers a single question. Did the trust boundary actually hold through the flow, or only on paper?

The unusual part is that the validator does not trust the labels the fixture declares. Each flow row claims a set of taint labels and a policy verdict, but the runtime ignores those and recomputes both. It propagates taint along the source-to-sink graph from the labelled sources, so a sink inherits the taint of everything that fed it, and it derives the verdict from that propagated taint plus the sink's privilege, the sanitizer state, the sink kind, and the proposed action. If the declared taint or the declared verdict disagrees with the recomputed one, the row is blocked. A flow cannot quietly relabel an untrusted source as clean, or mark a dangerous action as allow, because the contradiction is recomputed rather than read back.

That recomputation is the point. The failure mode it guards against is a trace that looks safe because the labels were written to look safe. By deriving the labels and verdicts from the graph itself, the contract catches the mislabelled flow that a field-by-field check would wave through. To stay honest about live behaviour, it also takes one generated public tool-call trace span and pushes it through the same machinery as untrusted tool output, so the runtime is seen to treat tool output as data until a policy gate reviews it, never as instruction authority.

Cold-Reader Path

microcosm indirect-prompt-injection-information-flow-policy-replay run-prompt-injection-bundle \
  --input examples/indirect_prompt_injection_information_flow_policy_replay/exported_prompt_injection_flow_bundle \
  --out receipts/runtime_shell/demo_project/organs/indirect_prompt_injection_information_flow_policy_replay

Primary result record: receipts/runtime_shell/demo_project/organs/indirect_prompt_injection_information_flow_policy_replay/exported_prompt_injection_flow_bundle_validation_result.json

First-wave fixture result record: receipts/first_wave/indirect_prompt_injection_information_flow_policy_replay/indirect_prompt_injection_information_flow_policy_replay_validation_receipt.json

Shape

Source rowstrusted and untrusted,with taint labelsSource rows trusted and untrusted, with taint labelsSource-to-sink flow rows(declared taint and verdict)Source-to-sink flow rows (declared taint and verdict)Propagate taintalong the source-to-sinkgraphPropagate taint along the source-to-sink graphDerive verdictfrom taint + sink privilege+ sanitizer + sink kind +actionDerive verdict from taint + sink privilege + sanitizer + sink kind + actionDeclared labels and verdictmatch the derived ones?Declared labels and verdict match the derived ones?Block the row(relabelled or wrong verdict)Block the row (relabelled or wrong verdict)Untrusted into aprivileged sink?Untrusted into a privileged sink?Pre-action verdictallow / warn / review / blockPre-action verdict allow / warn / review / blockSanitized outputno trusted context disclosed,no untrusted instructionobeyedSanitized output no trusted context disclosed, no untrusted instruction obeyedOne public tool-call tracespantreated as untrusted tooloutputOne public tool-call trace span treated as untrusted tool outputmetadata-only result recordsrefs, digests, counts, statusmetadata-only result records refs, digests, counts, status
Diagram source
flowchart TD sources["Source rows trusted and untrusted, with taint labels"] flows["Source-to-sink flow rows (declared taint and verdict)"] propagate["Propagate taint along the source-to-sink graph"] derive["Derive verdict from taint + sink privilege + sanitizer + sink kind + action"] compare{"Declared labels and verdict match the derived ones?"} blocked["Block the row (relabelled or wrong verdict)"] gate{"Untrusted into a privileged sink?"} verdicts["Pre-action verdict allow / warn / review / block"] outputs["Sanitized output no trusted context disclosed, no untrusted instruction obeyed"] toolspan["One public tool-call trace span treated as untrusted tool output"] result records["metadata-only result records refs, digests, counts, status"] sources --> flows flows --> propagate propagate --> derive derive --> compare compare -- no --> blocked compare -- yes --> gate gate -- yes --> verdicts gate -- no --> verdicts verdicts --> outputs toolspan --> propagate outputs --> result records blocked --> result records

The module's shape is a public information-flow replay, not a live prompt-injection defense. This page points at the mechanism and runtime component; the runtime validates source trust labels, taint propagation, privileged sink gates, pre-action verdicts, sanitized outputs, cold replay, public trace spans, source-module digest anchors, negative cases, and metadata-only result records.

Technical Mechanism

The runtime mechanism is an evidence compiler plus an information-flow validator. run loads the first-wave fixture with negative cases enabled; run_prompt_injection_bundle loads the exported public bundle and leaves the fixture-only negative cases out. Both routes call _build_result, which loads the projection protocol, injection policy, source-document rows, flow graph, policy verdict rows, sanitized outputs, cold replay rows, public trace, copied source-module manifest, and secret-exclusion policy before it writes any result record.

The source and flow validators separate instruction authority from untrusted data before claim admission. validate_source_documents requires every source row to carry source id, trust label, channel, body ref, taint labels, instruction-authority flag, body redaction, synthetic-fixture status, and no raw or real-account body export; untrusted sources cannot carry instruction authority. validate_information_flow_graph joins each flow to its source row, derives taint labels through _taint_propagation_receipt, derives the expected policy verdict from propagated taint, sink kind, sink privilege, sanitizer state, and proposed action, and rejects hand-written taint or verdict drift.

Policy and output validation then bind the pre-action membrane. The injection policy must name allow, warn, block, and review verdicts; require the source, flow, verdict, and output field floors; and deny real accounts, raw prompt bodies, account secrets, tool-output authority, hidden-message promotion, live tool calls, general robustness claims, and launch. validate_policy_verdicts requires verdicts to join to flows, precede action, cite rules, stay redacted, and match the derived flow verdict. validate_sanitized_outputs requires output refs to join to flows, disclose no trusted context, obey no untrusted instruction, and avoid external action on blocked flows.

Replay and trace validation keep the public claim metadata-only. validate_cold_replay requires replay commands and result record refs to reproduce each verdict and sanitized output without trusted-context disclosure. The component uses build_public_prompt_injection_trace to build five public trace spans, then _live_tool_call_trace_promotion promotes one generated public tool-call trace span back through the same taint-graph machinery as an untrusted tool-output source. That promotion is evidence that the runtime treats tool output as data until a policy gate reviews it, not as instruction authority.

The copied-source floor is checked independently. _source_module_manifest_result requires the exported bundle's source_module_manifest.json to classify copied material as source body material, keep body text out of result records, match declared module counts, resolve path and target_ref to the same copied body, stream SHA-256 digests over each target, and verify required anchors. _source_open_body_import_summary exposes only body ids, classes, manifest refs, counts, and ceiling flags; the copied bodies remain under source_modules/.

The result record mechanism is intentionally small. _write_receipts writes first-wave result, board, validation, and sign-off result records for fixture mode, while exported-bundle mode writes the bundle validation result. result_card emits a compact command card and omits findings, secret-scan details, scope limit bodies, scope boundary text, source refs, target refs, public trace spans, source rows, flow rows, verdict rows, sanitized output rows, cold replay rows, board rows, and copied source-module bodies. The card preserves counts, status, negative-case coverage, trace span count, body-floor status, and result record refs.

The lattice binding is the source record paper_module.indirect_prompt_injection_information_flow_policy_replay, the mechanism row mechanism.indirect_prompt_injection_information_flow_policy_replay.validates_public_indirect_prompt_injection_information_flow_policy_replay, principles P-9 and P-14, axiom AX-8, and concept.agent_reliability_and_safety_validator_bundle. Those refs are used as an admission-control lattice: source-labelled public evidence may enter the claim surface, while untrusted instruction authority, private bodies, model-output data, live account material, source-file changes, and launch claims remain out of scope.

Input Contract

  • projection_protocol.json: source-available projection statement and omitted private material.
  • injection_policy.json: required source, flow, verdict, and output fields plus authority denials.
  • source_documents.json: synthetic trusted and untrusted sources with trust labels and taint labels.
  • information_flow_graph.json: source-to-sink flow rows before claim admission.
  • policy_verdicts.json: allow, warn, block, and review verdicts before synthetic action.
  • sanitized_outputs.json: output refs proving no trusted context disclosure and no untrusted instruction obedience.
  • cold_replay.json: rerunnable command and result record refs that reproduce verdicts and sanitized state.

Public Trace Refactor

The product evidence is no longer the fixture verdict fields alone. The component uses microcosm_core.macro_tools.agent_execution_trace::build_public_prompt_injection_trace to emit metadata-only spans over the public source, flow, verdict, output, and replay refs. That builder is a Microcosm refactor of the source system/lib/agent_execution_trace.py span model, so the accepted result record can show sequence, authority, audit, coverage, and digest mechanics without copying real accounts, prompt bodies, model-output data, or live tool material.

Reader Evidence Routing

  • Bundle route: core/paper_module_capsules.json::paper_modules[38:paper_module.indirect_prompt_injection_information_flow_policy_replay] is the JSON authority row; a Mermaid diagram and an Atlas card are generated for this module from that row.
  • Mechanism route: core/mechanism_sources.json::mechanism.indirect_prompt_injection_information_flow_policy_replay.validates_public_indirect_prompt_injection_information_flow_policy_replay binds the code locus, fixture refs, exported bundle refs, result record refs, validator commands, focused regression, and guardrails.
  • Runtime route: src/microcosm_core/organs/indirect_prompt_injection_information_flow_policy_replay.py owns run, run_prompt_injection_bundle, _build_result, _write_receipts, result_card, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
  • Exported-bundle route: examples/indirect_prompt_injection_information_flow_policy_replay/exported_prompt_injection_flow_bundle contains bundle_manifest.json, projection_protocol.json, injection_policy.json, source_documents.json, information_flow_graph.json, policy_verdicts.json, sanitized_outputs.json, cold_replay.json, and source_module_manifest.json.
  • Source-module route: source_module_manifest.json records five copied public source bodies: the extracted-pattern ledger row, high-novelty reconstruction result record, agent execution trace runtime, agent execution trace standard, and strict JSON helper. Result records carry refs, digests, counts, and status only; source body text stays in the bundle's source_modules/ tree.
  • Focused-test route: tests/test_indirect_prompt_injection_information_flow_policy_replay.py verifies negative cases, public-relative redacted result records, exported-bundle runtime shape, source-module digest and target-ref failures, exact copied source bodies, card result record reuse, and public trace span construction.

Prior Art Grounding

This component is grounded in the prompt-injection and information-flow-control literature. The prompt-injection side follows the threat shape described by Greshake et al. in Not what you've signed up for, the agentic evaluation framing in AgentDojo, and later data-leakage benchmarks over tool-calling agents such as Simple Prompt Injection Attacks Can Leak Personal Data. The policy mechanism borrows from dynamic information-flow / taint-tracking ideas, including Permissive Information-Flow Analysis for Large Language Models.

Microcosm does not claim a general prompt-injection defense. It preserves the prior-art internal control lesson: untrusted content must be labelled as data, source-to-sink flows must be visible before privileged action, and sanitized outputs need result records. The local component turns that lesson into a metadata-only replay contract with explicit scope boundaries and negative cases.

Negative Cases

The validator rejects real account material, secret or trusted-context exfiltration, raw prompt body export, untrusted tool output treated as instruction authority, hidden system-message promotion, account secret exfiltration, final-answer-only success, and ungated untrusted flow into a privileged sink.

These are falsification fixtures. They are part of the contract, not examples of live exploit traffic.

Named Proof Consumers

The named proof consumer is tests/test_indirect_prompt_injection_information_flow_policy_replay.py. It checks first-wave negative-case coverage, five sources, three untrusted and two trusted source labels, five information flows, derived taint paths, derived policy verdicts, allow/warn/block/review counts, blocked-without-external-action counts, sanitized-output non-disclosure, cold replay, scope limit flags, public trace spans, public tool-call trace promotion through taint propagation, public-relative redacted result records, exported-bundle validation, source-module digest mismatch rejection, target-ref/path mismatch rejection, partial digest mismatch rejection, manifest body-text boundary rejection, streaming source-module digests, exact copied source body imports, fresh --card result record reuse, public trace construction, and fixture-manifest binding to the body-open refactor.

The runtime proof consumers are the two module commands in the Validation Result record Path: fixture mode via indirect_prompt_injection_information_flow_policy_replay run, and exported bundle mode via indirect_prompt_injection_information_flow_policy_replay run-prompt-injection-bundle. Fixture mode must observe all eight negative cases and write metadata-only result, board, validation, and sign-off result records. Bundle mode must validate the public bundle shape, source-module manifest, public trace spans, and metadata-only exported bundle result record.

The corpus proof consumer is scripts/build_doctrine_projection.py --check-paper-module-corpus. It is a corpus check only; it does not refresh generated Mermaid, Atlas, site, verifier, or bundle state.

Validation Result record Path

Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:

Then run the exported bundle validator:

cd microcosm-substrate && PYTHONPATH=src ../repo-python -m microcosm_core.organs.indirect_prompt_injection_information_flow_policy_replay run-prompt-injection-bundle --input examples/indirect_prompt_injection_information_flow_policy_replay/exported_prompt_injection_flow_bundle --out /tmp/indirect_prompt_injection_flow_bundle_receipt --card > /tmp/indirect_prompt_injection_flow_bundle_card.json

The focused regression test and corpus projection checks are:

cd microcosm-substrate && PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_indirect_prompt_injection_information_flow_policy_replay.py -q
cd microcosm-substrate && PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

The result record path proves synthetic information-flow replay and body omission, not general prompt-injection robustness or live account safety.

Scope boundary

Limitations

The replay is intentionally small and synthetic. Fixture mode covers five source documents, three untrusted and two trusted source labels, five source-to-sink flows, five pre-action verdicts, five sanitized outputs, five cold replay passes, five public trace spans, one generated public tool-call trace promoted through the taint graph, five copied source bodies, and eight negative cases. Exported-bundle mode validates the public bundle, source-module manifest, trace spans, and metadata-only result record shape, but it does not carry the fixture-only negative-case payloads.

Those counts are proof boundaries, not scale claims. They show that this local validator recomputes source trust, taint propagation, pre-action verdicts, sanitized output constraints, cold replay, source-module digest anchors, and metadata-only result record shape over declared public inputs. They do not estimate attack coverage, compare defenses, score a benchmark, certify hidden-message handling in production, or demonstrate live email, browser, account, tool, or provider behavior.

The source-open body floor is also narrow. The manifest proves byte parity and declared anchors for the five copied source bodies in the exported bundle. It excludes private source-root export, raw prompt or system body export, account secret-bearing material, source-file changes, public sharing, hosting, launch-scope decision, complete security, or product readiness.

Scope limit

This module supports only the public claim that the replay exposes and checks a prompt-injection information-flow policy over source trust labels, taint labels, source-to-sink flow rows, pre-action policy verdict refs, sanitized-output refs, cold replay refs, public trace spans, live public tool-call trace taint promotion, copied source-module digests, negative-case result records, secret-exclusion checks, and metadata-only scope limits.

The copied source-module digest row proves byte parity for the named source body only; it does not widen the replay into live source authority.

It does not claim general prompt-injection robustness, live account safety, live tool or provider behavior, raw prompt/system/tool body export, account secret-bearing account data, hidden-message production handling, benchmark security or performance, source-file changes, publishing-scope decision, hosting authority, launch-scope decision, complete security, or product-progress authority.

Scope limit

Passing result records prove only that this public trace refactor satisfies the named prompt-injection information-flow contract over metadata-only rows. They do not prove general prompt-injection robustness, benchmark performance, live account safety, provider behavior, tool behavior, hidden-message handling in a real system, source-file changes, publishing-scope decision, or launch operations.

Source and projection details
Governing Lattice Relation

The generated JSON instance gives this page a specific admission lattice, not a loose security story. The only unresolved selective relation is the dependency edge; it remains a residual because the bundle does not name a sibling paper-module dependency.

The governing law is provenance propagation and non-interference. P-9 requires every source, fixture, result record, public-copy, provider-shape, or private-boundary crossing to carry provenance class and scope limit. P-14 requires byte or row basis and provenance to travel together. AX-8 requires data labels to propagate along flows, with untrusted labels entering privileged sinks only through declared transforms that satisfy the sink policy.

The runtime implements that lattice in _build_result: it loads the projection protocol, source documents, information-flow graph, policy verdicts, sanitized outputs, cold replay rows, public trace spans, source-module manifest, and secret-exclusion policy before status is admitted. validate_source_documents rejects untrusted instruction authority, validate_information_flow_graph derives taint labels and policy verdicts instead of trusting hand-written rows, _live_tool_call_trace_promotion treats generated public tool-call trace spans as untrusted tool output, and _write_receipts/result_card keep public result records metadata-only. The focused proof consumer is tests/test_indirect_prompt_injection_information_flow_policy_replay.py, which checks fixture and exported-bundle modes, taint/verdict derivation, negative cases, source-module digest boundaries, exact copied source-body imports, card redaction, fresh result record reuse, public trace spans, and fixture-manifest binding to the body-open refactor.

Source-Open Body Floor

The exported bundle carries exact copied source bodies under source_modules/ai_workflow/, governed by source_module_manifest.json. The imported bodies are:

  • state/microcosm_portfolio/extracted_patterns_ledger.jsonl
  • state/microcosm_portfolio/reconstruction/high_novelty_substrate_gap_scout_v1.json
  • system/lib/agent_execution_trace.py
  • codex/standards/std_agent_execution_trace.json
  • system/lib/strict_json.py

The manifest records source refs, target refs, hashes, material classes, and required anchors. Result records and cards expose refs, counts, and validation status only; they do not embed ledger, reconstruction, prompt, account, account secret, browser UI, model-output data, or live-access bodies.

Agentic Vulnerability Discovery Patch Proof ReplayChecks a fixed-bug evidence chain and re-runs three small real security checks; no real attack material.3/5

Does Takes a claim that an AI agent "found and fixed a security bug" and lays it out as a local, inspectable chain of made-up (synthetic) evidence: the imagined target, the suspected issue, the trace pointed to as backing, a reference to an abstract exploitability argument, the patch, and regression tests the fixture says fail before the fix and pass after it. The component checks only that these pieces are all present, refer to each other consistently, and carry no real targets, exploits, payloads, account secrets, or attack steps; it does not run the tests or judge whether the bug or fix is actually real. The result record shows whether the declared chain holds together, with no real attack material ever present.

Scope limit It validates only the projection/evidence-chain mechanics of a synthetic replay: structural presence, cross-reference consistency, declared boolean flags, and the secret/live-access exclusion scan. It executes small regression witnesses but performs no real vulnerability discovery and makes no judgment of real-world security or fix correctness. It excludes live-target testing, real CVE exploitation, weaponized payloads, account secret handling, network exfiltration, actionable exploit steps, external model access, source-file changes, benchmark security scores, launch, or any whole-system security claim.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.agentic_vulnerability_discovery_patch_proof_replay run --input fixtures/first_wave/agentic_vulnerability_discovery_patch_proof_replay/input --out receipts/first_wave/agentic_vulnerability_discovery_patch_proof_replay --acceptance-out receipts/acceptance/first_wave/agentic_vulnerability_discovery_patch_proof_replay_fixture_acceptance.json

EvidenceComputed projectionevidence 3/5Source-faithful refactor

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Agentic Vulnerability Discovery Patch-Proof Replay

This module documents the source-available claim contract for agentic_vulnerability_discovery_patch_proof_replay. It turns an agentic vulnerability-discovery claim into a public trace-backed local replay: synthetic metadata-only targets, issue hypotheses, trace evidence, abstract exploitability refs, patch diffs, regression tests, verifier result records, sandbox policy verdicts, false-positive triage, cold replay, negative cases, and scope limits.

Purpose

An agent that says it found and fixed a security bug is making a claim that is easy to assert and hard to check. The phrase "found and fixed" can stand for a real, tested repair, or for a plausible-looking patch that was never run, a false positive promoted to a finding, or a benchmark number with no evidence behind it. This component exists to refuse that ambiguity. It answers one question: before any "found and fixed" language is allowed, does a complete evidence chain line up, from a synthetic target through a hypothesis, a trace, an abstract exploitability ref, a patch diff, a regression test, and a verifier result record?

The part worth noticing is that two of those checks are not field checks. They recompute the thing the fixture is claiming. Each executable regression witness names one of three small, public mini-targets, a webhook redirect allowlist, a notebook log redactor, and a scheduler path normaliser. The validator runs that function twice, once in its unpatched form and once patched, and compares the results it computes against the expected_pre_patch and expected_post_patch values the fixture declared. A witness whose declared output does not match the computed output is rejected. In the same spirit, each verifier result record has its pass or false_positive verdict recomputed from the joined proof, patch, test, and witness evidence; the row's own label and result record filename are not taken on trust. The failure mode this guards against is a fixture that asserts a green result without the work behind it ever having run.

This is a synthetic, metadata-only replay, not live security work. The synthetic overclaim fixtures, live targets, real CVE exploitation, weaponised payloads, exploit steps, patch-without-test claims, benchmark claims, are regression boundaries the runtime must reject, not capabilities it offers. The useful claim is narrow and is stated plainly below: Microcosm can hold an agentic security story to a checked evidence chain before it admits patch-proof language.

Shape

JSON bundle authorityJSON bundle authorityMarkdownmechanism source rowmechanism source rowpatch-proof replay runtimepatch-proof replay runtimefirst-wave fixturefirst-wave fixtureexported patch-proof bundleexported patch-proof bundlesynthetic target refssynthetic target refsissue hypothesesissue hypothesestrace evidence refstrace evidence refsabstract exploitability refsabstract exploitability refspatch diff refspatch diff refsregression test refsregression test refsexecutable regressionwitnessesexecutable regression witnessesverifier result recordsverifier result recordssandbox verdictssandbox verdictsnegative-case fixturesnegative-case fixturessecret-exclusion scansecret-exclusion scancold replay rowscold replay rowspublic trace spanspublic trace spanssource-module body floorsource-module body floormetadata-only result recordsmetadata-only result recordsfocused proof-consumer testsfocused proof-consumer testsscope limitscope limit
Diagram source
flowchart TD bundle["JSON bundle authority"] markdown["Markdown reader projection"] mechanism["mechanism source row"] component["patch-proof replay runtime"] fixture["first-wave fixture"] bundle["exported patch-proof bundle"] targets["synthetic target refs"] hypotheses["issue hypotheses"] traces["trace evidence refs"] proofs["abstract exploitability refs"] patches["patch diff refs"] regressions["regression test refs"] executable["executable regression witnesses"] verifiers["verifier result records"] sandbox["sandbox verdicts"] negative["negative-case fixtures"] secret_scan["secret-exclusion scan"] replay["cold replay rows"] public_trace["public trace spans"] source_modules["source-module body floor"] result records["metadata-only result records"] consumer["focused proof-consumer tests"] ceiling["scope limit"] bundle --> markdown bundle --> mechanism mechanism --> component component --> fixture component --> bundle fixture --> targets bundle --> targets targets --> hypotheses hypotheses --> traces traces --> proofs proofs --> patches patches --> regressions regressions --> executable executable --> verifiers verifiers --> sandbox negative --> result records secret_scan --> result records sandbox --> replay replay --> public_trace source_modules --> secret_scan source_modules --> public_trace public_trace --> result records result records --> consumer result records --> ceiling

The module shape is a metadata-only synthetic patch-proof replay, not a live vulnerability discovery or fix-correctness claim. The runtime forces target refs, hypotheses, trace refs, abstract exploitability refs, patch diff refs, regression test refs, verifier result records, sandbox verdicts, false-positive triage, cold replay, public trace spans, source-module digests, negative cases, and scope boundaries to line up before bounded patch-proof language is admitted.

Technical Mechanism

The mechanism is an evidence join, not a scanner. The JSON bundle names the component and mechanism row, and the component resolves every claim through _build_result in src/microcosm_core/organs/agentic_vulnerability_discovery_patch_proof_replay.py. That function loads the projection protocol and vulnerability policy, then validates targets, issue hypotheses, trace evidence, exploitability refs, patch diffs, regression tests, executable regression witnesses, verifier result records, sandbox verdicts, false-positive triage, cold replay rows, optional negative-case fixtures, the public trace builder, and the source-module manifest. A result can pass only when those validators agree, the secret-exclusion scan has zero blocking hits, the public trace status is pass, all positive validators are pass, and the exported bundle's manifest digests match copied source bodies.

Two of those validators do work the others do not. The executable regression witness check runs each declared mini-target function in both its unpatched and patched form and compares the computed pre/post outputs against the values the fixture declared, so a witness cannot pass on a label alone. The verifier result record check recomputes each pass or false_positive verdict from the joined hypothesis, proof, patch, test, and witness evidence, and also requires the result record-ref filename to match that recomputed verdict, so a row cannot claim a result its own evidence does not support. The other validators are stricter joins: every hypothesis must resolve to a synthetic target, every patch-required hypothesis must carry both an abstract exploitability ref and a metadata-only patch diff, and every patch must pair with a regression test that fails before the patch and passes after it. A patch without a paired test, or a false positive promoted to a finding, blocks the result.

The runtime deliberately keeps two evidence modes separate. The first-wave fixture includes the negative-case authority, so it must observe the expected overclaim failures such as live target material, real CVE exploitation, weaponized payload export, exploit steps, patch-without-test claims, and benchmark claims claims. The exported bundle is the public runtime example, so its expected_negative_cases can be empty while it still proves the body floor, public trace, digest checks, regression witnesses, and scope limit. Both modes write metadata-only result records; copied bodies stay behind the source_module_manifest.json refs and hashes.

Named Proof Consumers

  • tests/test_agentic_vulnerability_discovery_patch_proof_replay.py::test_agentic_vulnerability_patch_proof_replay_observes_negative_cases consumes the first-wave fixture and checks the expected counts, negative-case coverage, public trace status, body-import boundary, secret-exclusion scan, and scope limit booleans.
  • tests/test_agentic_vulnerability_discovery_patch_proof_replay.py::test_agentic_vulnerability_exported_bundle_validates_runtime_shape consumes the exported bundle and checks runtime mode, target/hypothesis/patch counts, executable regression witnesses, source-module manifest status, copied-body count, metadata-only import summary, secret-exclusion status, and public trace span count.
  • The rejection tests in the same file are the scope limit in executable form: they mutate false-positive promotion, remove regression tests, tamper executable witnesses, omit exploitability proof, cross-wire verifier result records, and alter source-module digests, then require blocked results and specific error codes instead of allowing patch-proof language.

What It Admits

The validator admits only metadata-only patch-proof evidence where trace refs, abstract proof refs, patch diff refs, regression tests, verifier result records, sandbox verdicts, and cold replay line up.

The result record fields to inspect first are target_count, issue_hypothesis_count, patch_diff_count, regression_test_count, verifier_receipt_count, observed_negative_cases, secret_exclusion_scan, public_agent_execution_trace, body_import_verification, and authority_ceiling.

Prior Art Grounding

This component is grounded in the recent line of agentic software-engineering and security-evaluation work that treats code repair as an executable, test-backed claim rather than a prose claim. SWE-bench popularized repository issue resolution as an LLM task with real codebases and test-based patch evaluation, while SWE-agent made the agent-computer interface itself part of the repair system. Security benchmarks such as CyberSecEval 2 and SecCodePLT motivate separating secure-code or vulnerability capability claims from uninspected generated patches.

Microcosm borrows the accountability pattern: issue hypotheses, trace evidence, patch diffs, regression tests, verifier result records, and negative cases must line up before patch-proof language is allowed. It does not import live targets, CVE exploitation authority, weaponized payloads, or benchmark performance claims.

Source-Backed Doctrine Binding

  • Component: src/microcosm_core/organs/agentic_vulnerability_discovery_patch_proof_replay.py
  • Bundle: core/paper_module_capsules.json#paper_module.agentic_vulnerability_discovery_patch_proof_replay
  • Mechanism: core/mechanism_sources.json#mechanism.agentic_vulnerability_discovery_patch_proof_replay.validates_public_agentic_vulnerability_patch_proof_replay
  • Standard: standards/std_microcosm_agentic_vulnerability_discovery_patch_proof_replay.json
  • Evidence class: core/organ_evidence_classes.json::agentic_vulnerability_discovery_patch_proof_replay records algorithmic_projection at rank 3.
  • Source-module manifest: examples/agentic_vulnerability_discovery_patch_proof_replay/exported_patch_proof_bundle/source_module_manifest.json declares nine copied source/control/standard/tool bodies, including strict_json_source_body_import.
  • Runtime result record: receipts/runtime_shell/demo_project/organs/agentic_vulnerability_discovery_patch_proof_replay/exported_patch_proof_bundle_validation_result.json
  • Sign-off result records: receipts/first_wave/agentic_vulnerability_discovery_patch_proof_replay/* and result records/sign-off/first_wave/agentic_vulnerability_discovery_patch_proof_replay_fixture_acceptance.json

Reader Evidence Routing

  • Bundle route: core/paper_module_capsules.json::paper_modules[5:paper_module.agentic_vulnerability_discovery_patch_proof_replay] is the JSON authority row. A diagram view is generated for this module; the Atlas card view is a staged exercise pending the component-atlas lane.
  • Mechanism route: core/mechanism_sources.json::mechanism.agentic_vulnerability_discovery_patch_proof_replay.validates_public_agentic_vulnerability_patch_proof_replay binds the validator command, exported-bundle validator command, focused regression, guardrails, input refs, result record refs, and runtime code locus.
  • Runtime route: src/microcosm_core/organs/agentic_vulnerability_discovery_patch_proof_replay.py owns run, run_patch_proof_bundle, _source_module_manifest_result, _source_open_body_import_summary, _build_result, _freshness_basis, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_NAME, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.
  • Exported-bundle route: examples/agentic_vulnerability_discovery_patch_proof_replay/exported_patch_proof_bundle is the public runtime bundle for the synthetic patch-proof replay. Open source_module_manifest.json before trusting copied-body counts, then inspect the runtime validation result record.
  • Focused-test route: tests/test_agentic_vulnerability_discovery_patch_proof_replay.py verifies negative cases, public-relative metadata-only result records, exported-bundle runtime shape, exact copied source modules, digest mismatch rejection, command-card result record reuse, and public trace construction.

Cold-Agent Use

Open the source-module manifest first, then the runtime result record, then the component source. The useful claim is not that a real vulnerability was discovered or fixed.

The useful claim is that Microcosm can force an agentic security story to expose synthetic target refs, issue hypotheses, trace evidence, abstract exploitability refs, patch diffs, regression tests, verifier result records, sandbox verdicts, false-positive triage, cold replay, public trace spans, secret-exclusion scan, negative-case result records, and scope limits before patch-proof language is allowed.

Re-entry condition: after the sibling organ_atlas.json lane releases, bind this paper-module bundle, mechanism ref, and code locus into the atlas row and rerun python -m microcosm_core.doctrine_lattice --check.

Negative Cases

The contract rejects live_target_material, real_cve_exploitation, weaponized_payload_export, account_secret_material, network_exfiltration, exploit_instruction_steps, patch_without_tests, and benchmark_score_claim. These are falsification fixtures, not product evidence.

Validation Result record Path

Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:

Then run the exported bundle validator:

cd microcosm-substrate && PYTHONPATH=src ../repo-python -m microcosm_core.organs.agentic_vulnerability_discovery_patch_proof_replay \
  run-patch-proof-bundle \
  --input examples/agentic_vulnerability_discovery_patch_proof_replay/exported_patch_proof_bundle \
  --out /tmp/agentic_vulnerability_patch_proof_bundle_receipt \
  --card > /tmp/agentic_vulnerability_patch_proof_bundle_card.json

The focused regression test and corpus projection checks are:

PYTHONPATH=src ./repo-pytest \
  tests/test_agentic_vulnerability_discovery_patch_proof_replay.py
cd microcosm-substrate && PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py \
  --check-paper-module-corpus

Scope boundary

Scope limit

The result records do not authorize live target testing, real CVE exploitation, weaponized payload export, account secret handling, network exfiltration, actionable exploit instructions, external model access, source-file changes, benchmark security scores, launch, or any whole-system security claim.

Scope limit

This module may claim public fixture evidence that synthetic target refs, issue hypotheses, trace-evidence refs, abstract exploitability refs, patch diff refs, regression-test refs, verifier result records, sandbox verdicts, false-positive triage rows, cold replay rows, public trace spans, source-module digest checks, secret-exclusion scans, negative-case labels, and metadata-only validation result records are checked by the listed runtime witnesses.

This module may not claim live target testing, real CVE exploitation, weaponized payload export, account secret handling, network exfiltration, actionable exploit instructions, live provider behavior, benchmark security scores, patch correctness on real repositories, source-file changes, publishing-scope decision, launch-scope decision, product-progress evidence, or whole-system security.

Source and projection details
Governing Lattice Relation

The governing row is mechanism.agentic_vulnerability_discovery_patch_proof_replay.validates_public_agentic_vulnerability_patch_proof_replay. It binds this reader module to concept.agent_reliability_and_safety_validator_bundle, P-1, P-2, AX-1, and the upstream paper_module.mission_transaction_work_spine dependency. The relation matters because the mechanism is a public safety validator bundle: the paper module can claim that Microcosm checks a source-open, synthetic patch-proof evidence chain, but the lattice ceiling prevents that claim from becoming live vulnerability discovery, exploit proof, benchmark claims, source-file changes, or launch-scope decision.

Source-Open Body Floor

The exported bundle carries nine exact copied source/control/standard/tool bodies under examples/agentic_vulnerability_discovery_patch_proof_replay/exported_patch_proof_bundle/source_modules/. The body floor is governed by source_module_manifest.json, which records digest-verified copies of:

  • the source pattern ledger
  • the high-novelty reconstruction result record
  • the component projection IR
  • the agent-execution trace runtime and standard
  • the extracted-pattern route-readiness standard
  • the mission-transaction preflight wrapper
  • the mission-transaction landing preflight runtime
  • the strict JSON helper

Result records and cards do not duplicate those bodies. They carry source_module_manifest_ref, source_open_body_import_refs, source_open_body_imports, body_material_status, and body_copied_material_count so a cold reader can open the real bodies.

The public result record surface stays free of account secrets, account or browser state, browser state, model-output data bodies, browser UI live access, recipient-send state, weaponized payloads, live targets, exploit steps, and account secret-equivalent material.

Agent Route Observability RuntimeRecomputes an agent run's route-compliance score and anti-pattern flags with real trace-analytics code.5/5

Does This validator takes a sample (synthetic, not live) record of an agent's local run — the route it picked, the work it did, the events it logged, the evidence it pointed to, and the authority limit it declared — and checks that this recorded trail is well-formed and self-consistent, instead of leaving raw log JSON to be read by hand. The record is built to state, up front, where the agent's authority was supposed to stop, so the limits are written down and checkable rather than taken on faith. It checks the recorded evidence; it does not watch a live agent or prove one actually stayed in bounds.

Scope limit It validates only public, recorded trace-feedback metadata and regression fixtures; it does not inspect live operator state, certify or prove runtime behavior, read model-output data, mutate the work log, authorize pattern assimilation, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.agent_route_observability_runtime run --input fixtures/first_wave/agent_route_observability_runtime/input --out receipts/first_wave/agent_route_observability_runtime

Paper module Computer-Use Action Trace Replay

computer_use_action_trace_replay is a validator-backed claim contract under agent_route_observability_runtime. It asks a narrow eval-harness question: does a claimed computer-use episode bind visible observations, affordances, actions, pre-action authority verdicts, state-transition result records, recovery result records, cold replay, falsification fixtures, non-public-state scan posture, and an explicit scope limit?

Run:

PYTHONPATH=src ../repo-python -m microcosm_core.cli agent-route-observability-runtime \
  --input examples/agent_route_observability_runtime/exported_computer_use_action_trace_bundle \
  --out receipts/runtime_shell/demo_project/organs/agent_route_observability_runtime \
  validate-computer-use-bundle

The fixture rejects live account action, account secret entry, external network mutation, purchase/send without approval, destructive action without review, hidden screen-state claims, actions without observation and affordance refs, and benchmark-score claims.

Purpose

A computer-use agent produces a stream of screenshots, clicks, keystrokes, and "it worked" assertions. The hard question for anyone reviewing such a trace is not whether the agent moved the mouse, but whether the record actually supports the claim that something happened safely. A trace can look complete while hiding the two failures that matter most: an action that was blocked or sent for review but is later narrated as a success, and a success that is asserted without any state evidence to back it. This module exists to make that question decidable on a synthetic episode, offline, before any of the language reaches a reader.

The single question it answers is: does each recorded action line up, row by row, with a prior visible observation, a pre-action authority verdict, and a state-transition result record whose outcome agrees with that verdict? The mechanism is a typed join, not a screenshot replay. An action must cite the observation it reacted to and an affordance that was visible in it; a verdict must be stamped before the action and must explicitly deny live-account, account secret, network, destructive, and purchase or send authority; a transition result record must then match the verdict. If the verdict said allow, the result record has to show the action was executed and an oracle confirmed the resulting state. If the verdict said block or review, the result record has to show the action was not executed and the status reads blocked or review-required. Nondeterministic "it probably succeeded" claims are refused outright.

What is genuinely unusual here is the inversion. Most action-trace tooling treats a screenshot as the proof. This module treats the screenshot as the one thing it will not trust: observations enter only as a digest and a visible-state hash, with raw pixels, hidden-state assertions, and live-browser state all required to be absent. The evidence that carries weight is the agreement between the verdict and the transition, not the image. The result record that comes out the other end records counts, refs, hashes, and the redaction posture, and never the raw bodies it checked. It describes a synthetic episode under the route-observability runtime; it does not drive a live browser or desktop.

Shape

allowblock or reviewJSON source recordJSON source recordgenerated Mermaid availablegenerated Mermaid availablegenerated Atlas linkedgenerated Atlas linkedComponentComponentexported computer-use bundleexported computer-use bundlevisible observations: digest+ visible-state hash, no rawpixelsvisible observations: digest + visible-state hash, no raw pixelsaction rows: cite observation+ affordance, allowed kind,redactedaction rows: cite observation + affordance, allowed kind, redactedpre-action authority verdictper actionpre-action authority verdict per actiontransition: executed + oraclestatus passtransition: executed + oracle status passtransition: not executed +blocked / review-requiredtransition: not executed + blocked / review-requiredrecovery result record, noupgrade to executedrecovery result record, no upgrade to executedcold replay reproducesaction, verdict, transitioncold replay reproduces action, verdict, transitionpublic trace spans: refs,counts, hashes, redactionposturepublic trace spans: refs, counts, hashes, redaction posturemetadata-only validationresult recordmetadata-only validation result recordscope limit: no live controlscope limit: no live control

Source refs

Component
agent_route_observability_runtime runtime
Diagram source
flowchart TD bundle["JSON source record"] bundle --> mermaid["generated Mermaid available"] bundle --> atlas["generated Atlas linked"] bundle --> component["agent_route_observability_runtime runtime"] component --> bundle["exported computer-use bundle"] bundle --> observations["visible observations: digest + visible-state hash, no raw pixels"] observations --> actions["action rows: cite observation + affordance, allowed kind, redacted"] actions --> verdicts["pre-action authority verdict per action"] verdicts -->|allow| executed["transition: executed + oracle status pass"] verdicts -->|block or review| held["transition: not executed + blocked / review-required"] held --> recovery["recovery result record, no upgrade to executed"] executed --> cold["cold replay reproduces action, verdict, transition"] recovery --> cold cold --> trace["public trace spans: refs, counts, hashes, redaction posture"] trace --> result record["metadata-only validation result record"] result record --> ceiling["scope limit: no live control"]

The shape is a reader route over a synthetic computer-use action trace validator. The evidence path runs through the source record, fixture manifest, exported bundle, runtime validator, public trace builder, metadata-only result records, and explicit scope limit. A diagram view and Atlas entry are generated for this module from the source record.

Technical Mechanism

The runtime entry point is run_computer_use_action_trace_bundle in src/microcosm_core/organs/agent_route_observability_runtime.py. It first loads the bundle through the strict JSON path and decides whether the input is the full fixture with negative cases or the public exported bundle. It then checks the projection protocol, interaction policy, task episodes, screen observations, action trace, authority verdicts, state transitions, recovery result records, cold replay rows, source-module manifest, non-public-state scan, and public trace spans before writing a result record. The status is pass only when positive findings are empty, required negative cases are observed for the fixture path, the non-public-state scan passes, and copied public source-module digests verify.

The mechanism is a typed join, not a screenshot replay. Actions must cite prior observation and affordance refs. Authority verdicts must cite action ids before state transitions can be credited. Cold replay rows must cover the action ids and reproduce the action, verdict, and transition relation. Recovery result records cover blocked or review-required actions without upgrading them into executed mutations. The public trace builder then emits bounded spans over refs, counts, hashes, and redaction posture, while the result record deliberately omits raw screen bodies, account secrets, hidden screen state, model-output data, private source bodies, absolute local paths, and benchmark-score claims.

Named Proof Consumers

  • validate-computer-use-bundle is the reader command. On the exported bundle, it should produce exported_computer_use_action_trace_bundle_validation_result.json with four episodes, six observations, eight actions, eight authority verdicts, eight state-transition result records, one recovery result record, four cold replay rows, eight public trace spans, copied source-module digest verification, and an explicit no-live-control scope limit.
  • tests/test_agent_route_observability_runtime.py::test_computer_use_action_trace_replay_observes_negative_cases is the negative fixture consumer. It checks that live account action, account secret entry, external network mutation, unapproved purchase/send, destructive file action, hidden screen-state claims, action-without-observation rows, and benchmark-score claims are rejected.
  • tests/test_agent_route_observability_runtime.py::test_computer_use_action_trace_receipt_is_public_relative_and_redacted is the result record-safety consumer. It verifies public-relative paths and absence of account secret values, hidden screen state, absolute paths, and raw bodies.
  • tests/test_agent_route_observability_runtime.py::test_computer_use_action_trace_exported_bundle_validates_runtime_shape is the public-bundle consumer. It checks the exported-bundle shape, action kinds, source-module digest posture, public trace coverage, and no benchmark authority.
  • tests/test_agent_route_observability_runtime.py::test_computer_use_trace_loader_rejects_duplicate_json_keys is the parser-integrity consumer. It prevents a replay bundle from passing by hiding conflicting values behind duplicate JSON keys.

Reader Evidence Routing

  • Bundle route: core/paper_module_capsules.json::paper_modules[46:paper_module.computer_use_action_trace_replay] is the source-authority row for this module. A diagram view and Atlas entry are generated from that source record.
  • Dependency route: downstream modules may reference paper_module.computer_use_action_trace_replay, but this page's source authority is the source record named above, not those downstream dependencies.
  • Fixture-manifest route: core/fixture_manifests/agent_route_observability_runtime.fixture_manifest.json::computer_use_action_trace_replay_contract_v1 names the positive inputs, negative-case floor, expected result record fields, runtime-example command, and scope limit.
  • Runtime route: src/microcosm_core/organs/agent_route_observability_runtime.py::run_computer_use_action_trace_bundle loads the bundle, validates projection protocol, interaction policy, episodes, observations, actions, authority verdicts, state transitions, recovery result records, cold replay, source-module manifest, negative cases, and public trace spans.
  • Exported-bundle route: examples/agent_route_observability_runtime/exported_computer_use_action_trace_bundle contains bundle_manifest.json, projection_protocol.json, interaction_policy.json, task_episodes.json, screen_observations.json, action_trace.json, authority_verdicts.json, state_transition_receipts.json, recovery_receipts.json, cold_replay.json, and source_module_manifest.json.
  • Source-module route: source_module_manifest.json records copied public source bodies for codex/standards/std_agent_execution_trace.json, system/lib/agent_execution_trace.py, and system/lib/strict_json.py, with body_in_receipt: false.
  • Focused-test route: tests/test_agent_route_observability_runtime.py validates negative cases, public-relative redacted result records, exported-bundle runtime shape, public trace span coverage, source-faithful public refactor status, source digest matching, and duplicate-key rejection.

Prior Art Grounding

This component is grounded in web and desktop agent benchmarks that make action trajectories inspectable. WebArena and Mind2Web anchor realistic web-task evaluation, while OSWorld extends the concern to multimodal agents acting in real computer environments. Browser automation standards such as WebDriver are also prior art for representing actions against visible browser state through a controlled protocol.

Microcosm borrows the action-trace accounting pattern: observations, affordances, actions, pre-action authority verdicts, transition result records, recovery result records, cold replay, and falsification cases must line up before a computer-use episode is credited. It does not operate a live browser or desktop.

The result record proves only this public synthetic replay boundary. It does not control a live browser or desktop, use accounts, enter account secrets, mutate external systems, export raw screenshots, claim benchmark performance, change source files, use external model services, or include launch operations.

Validation Result record Path

Reader-verifiable bundle command, run from microcosm-substrate/:

PYTHONPATH=src ../repo-python -m microcosm_core.cli agent-route-observability-runtime \
  --input examples/agent_route_observability_runtime/exported_computer_use_action_trace_bundle \
  --out receipts/runtime_shell/demo_project/organs/agent_route_observability_runtime \
  validate-computer-use-bundle

The command writes the computer-use replay result record under receipts/runtime_shell/demo_project/organs/agent_route_observability_runtime/, including computer_use_action_trace_replay_result.json and the exported bundle validation result. The tracked fixture result record records the synthetic observations, affordances, authority verdicts, transition result records, recovery result records, falsification cases, non-public-state scan posture, and scope limit.

This result record path is reader-verifiable evidence only. It does not flip Mermaid/Atlas status, create bundle authority, operate a live browser or desktop, use accounts, enter account secrets, mutate external systems, claim benchmark performance, or aggregate doctrine-lattice coverage.

Scope boundary

Scope limit

This module may claim synthetic computer-use action-trace replay over public fixtures: visible observations, affordances, action rows, pre-action authority verdicts, state-transition result records, recovery result records, cold replay rows, public trace spans, source-module digest checks, expected negative cases, and metadata-only result records.

It does not claim live browser or desktop control, account automation, account secret entry, purchase/send authority, external network mutation, destructive host action, hidden screen-state truth, benchmark performance, provider behavior, source-file changes, launch-scope decision, or whole-system correctness. The diagram view and Atlas entry generated for this module are navigation surfaces; they are not additional proof authority.

Source and projection details
Governing Lattice Relation

The source record binds this module to the accepted agent_route_observability_runtime component and to mechanism.agent_route_observability_runtime.validates_public_route_feedback. That places the page under AX-1 and the P-1 / P-2 claim discipline: a computer-use claim is admissible only when the runtime recomputes it from lower level evidence, and the public sentence cannot exceed what the named validator actually checks. The generated JSON instance records nine resolved edges: component, mechanism, concept, axiom, principle, dependency, and code-locus links.

The relevant concept is concept.agent_reliability_and_safety_validator_bundle, not a generic browser agent benchmark. It frames the replay as an evidence bundle: visible observations and affordances are the basis, action rows are candidate transitions, pre-action authority verdicts decide whether a transition may be executed or blocked, and result record rows carry the bounded public result. The dependencies on agent_route_observability_runtime and macro_projection_import_protocol keep the proof below the source-open import and result record lanes instead of treating this Markdown page as source authority.

Provider Context Recipe Budget PolicyRuns the real context harness to measure assembled byte sizes and check each bundle fits its budget.4/5Runs real tools

Does This component checks that the bundles of context an AI agent would assemble before calling an outside model provider stay inside fixed size limits (in bytes), fill their sections in the declared order until the budget runs out, list any section that was dropped for not fitting, and never carry answer keys, proof solutions, or other "correct answer" material. The record shows the exact size ceilings for each recipe, which sections fit versus got left out, and which output each recipe is allowed to produce, so the context boundary is inspectable as plain accounting before any external model access or answer authority is ever in play. It only validates this metadata; it does not itself call any provider.

Scope limit It validates context-budget projection mechanics (byte ceilings, ordered section fill, omitted-section manifests, deliverable routing, and digest-checked source-body imports) only. It excludes provider/API calls, run Lean/Lake, expose or carry proof or oracle truth-side material, assert theorem or domain-level conclusions, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.provider_context_recipe_budget_policy run --input fixtures/first_wave/provider_context_recipe_budget_policy/input --out receipts/first_wave/provider_context_recipe_budget_policy

EvidenceBounded runtime computationevidence 4/5Real runtime result

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Provider Context Recipe Budget

provider_context_recipe_budget_policy is the public Microcosm component for turning retrieved proof-support metadata into bounded provider context recipes.

It validates six public recipe shapes: minimal_4kb, premise_16kb, skill_32kb, repair_32kb, fewshot_64kb, and strategy_classification_4kb. Each recipe has a fixed byte ceiling, ordered section fill, a graph role, a reducer deliverable type, and an omitted-sections manifest when a section cannot fit.

Purpose

This component answers one question: when a proof-support pipeline is about to hand material to a model, which sections fit inside a fixed byte budget, in what order, and which sections are dropped? It treats the context window as a budget to spend rather than a place to dump everything retrieved. The board records this stance directly as context_is_budget_not_dump.

The byte sizes are not asserted by the fixture. The validator imports the copied benchmark harness, runs its real _provider_context_pack over each recipe, and measures the actual byte size of each packed section. A recipe is filled in declared order, admitting a section only while the running total stays under the ceiling, so an over-budget section is omitted and named in an explicit manifest rather than silently truncated. If the harness is unavailable the component falls back to declared sizes and says so, rather than guessing.

The second deliberate choice is what cannot enter context at all. A small fixed set of section ids and field keys, covering proof bodies, oracle-only premise ids, ideal answers, and provider output bodies, is rejected structurally, not by convention. Any recipe or section material carrying one of them is blocked before a packet is built. The output is metadata about the context shape: byte ceilings, the admitted and omitted section ids, the deliverable route, and a set of authority claims that stay false. No provider is called and no answer is produced.

Shape

JSON bundleJSON bundleGenerated instance19 relationships, noselective residualsGenerated instance 19 relationships, no selective residualsMarkdown6 public recipe budgets6 public recipe budgetsRuntimeRuntime9 source-backed sections9 source-backed sections8 copied bodies8 copied bodiesnegative fixtures7 forbidden-boundary casesnegative fixtures 7 forbidden-boundary casescontext_packetsincluded/omitted sections,byte counts, routescontext_packets included/omitted sections, byte counts, routesmetadata-only result recordsresult, board, validation,sign-offmetadata-only result records result, board, validation, sign-offscope limitnoprovider/proof/launch-scopedecisionscope limit no provider/proof/launch-scope decision

Source refs

JSON bundle
paper_module.provider_context_recipe_budget
provider_context_recipe_budget.md
6 public recipe budgets
provider_context_recipes.json
Runtime
provider_context_recipe_budget_policy.py
9 source-backed sections
section_materials.json
8 copied bodies
source_module_manifest.json
Diagram source
flowchart TD Bundle["JSON bundle paper_module.provider_context_recipe_budget"] --> Instance["Generated instance 19 relationships, no selective residuals"] Bundle --> Markdown["Reader projection provider_context_recipe_budget.md"] Recipes["provider_context_recipes.json 6 public recipe budgets"] --> Runtime["provider_context_recipe_budget_policy.py"] Sections["section_materials.json 9 source-backed sections"] --> Runtime SourceManifest["source_module_manifest.json 8 copied bodies"] --> Runtime NegativeCases["negative fixtures 7 forbidden-boundary cases"] --> Runtime Runtime --> Projection["context_packets included/omitted sections, byte counts, routes"] Runtime --> Result records["metadata-only result records result, board, validation, sign-off"] Projection --> Ceiling["scope limit no provider/proof/launch-scope decision"] Result records --> Ceiling

Evidence and accounting:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[55:paper_module.provider_context_recipe_budget] sets source_authority: json_capsule, subjects the component provider_context_recipe_budget_policy plus mechanism mechanism.provider_context_recipe_budget_policy.validates_public_context_budget_boundary, and names generated_projections.mermaid.status: available_from_capsule_edges plus generated_projections.atlas_card.status: linked_from_capsule_edges.
  • Generated instance: paper_modules/provider_context_recipe_budget.json::relationships.edges contains 19 bundle-derived relationship edges, and relationships.unpopulated_selective_relations is empty. That is lattice wiring evidence, not implementation-correctness proof.
  • Runtime accounting: src/microcosm_core/organs/provider_context_recipe_budget_policy.py defines EXPECTED_RECIPE_BUDGETS for the six recipes, EXPECTED_DELIVERABLES for their reducer routes, _recipe_projection for included/omitted section accounting, _recipe_findings and _section_findings for boundary errors, and _write_receipts for metadata-only result record output.
  • Fixture inputs: fixtures/first_wave/provider_context_recipe_budget_policy/input/provider_context_recipes.json carries six public recipes with byte budgets from 4096 to 65536, while .../section_materials.json carries nine section rows with source refs and anchors.
  • Body-floor and result records: core/fixture_manifests/provider_context_recipe_budget_policy.fixture_manifest.json records body_copied_material_count: 8, seven negative_case_ids, four expected fixture result record paths, and source_open_body_imports.authority_ceiling fields that keep external model access, Lean/Lake execution, proof authority, truth-side material, payload export, runtime-correctness claims, and launch-scope decision false.
  • Focused tests: tests/test_provider_context_recipe_budget_policy.py checks the six recipe ids, expected negative cases, source-backed section materials, public-relative redacted result records, exported bundle validation, omitted-section movement when section size changes, digest mismatch rejection, and manifest body-text result record-boundary rejection.

Technical Mechanism

The runtime mechanism is a context-packet compiler plus boundary validator. It does not ask a provider for an answer. run loads fixture inputs with negative cases enabled; run_budget_bundle loads the exported bundle shape without the fixture-only negative cases. Both routes call _build_result, which loads recipe rows, section rows, copied source-module bodies, and the non-public-state scan policy before it constructs any result record.

Recipe projection is deterministic. _recipe_projection walks each recipe's ordered section ids, computes each section's byte size with _byte_size, admits a section only while the running total stays within the recipe's byte_budget, and records omitted sections when the next section would exceed the budget. The projection records graph role, deliverable type, included and omitted section ids, included bytes, approximate tokens, and whether the omitted-sections manifest is emitted. The six public recipes are the closed set in EXPECTED_RECIPE_BUDGETS: minimal_4kb, premise_16kb, skill_32kb, repair_32kb, fewshot_64kb, and strategy_classification_4kb.

The validator then checks three independent boundaries. _recipe_findings rejects budget changes, forbidden truth-side section ids, proof/provider body fields, provider-call authorization, deliverable-route drift, and over-budget context with no omitted-sections manifest. _section_findings requires each public section to cite an allowed source ref and source anchor, verifies those anchors against the copied source bodies, and rejects synthetic or truth-side section material. _source_module_findings checks the source-module manifest, expected module ids, metadata-only result record flags, target presence, source/target digest equality, and required anchors for the eight copied source bodies.

The result record mechanism is deliberately metadata-only. _write_receipts writes the fixture result, board, validation result record, and sign-off result record for fixture mode; bundle mode writes only the exported-bundle validation result. result_card emits a compact command card while omitting context packets, source-module imports, source refs, result record paths, private scan hit bodies, and the scope boundary payload. The full result records keep counts, ids, hashes, routes, and verdicts, bounded evidence bodies or provider answers.

In lattice terms, the JSON bundle binds this Markdown projection to provider_context_recipe_budget_policy, to mechanism.provider_context_recipe_budget_policy.validates_public_context_budget_boundary, and to concept.agent_reliability_and_safety_validator_bundle. The principle and axiom refs in the bundle (P-1, P-2, P-3, P-6, P-8, P-16 and AX-1, AX-2, AX-5, AX-7, AX-8, AX-9) are implemented here as admission control over public evidence: bounded context metadata is allowed, truth-side material and provider authority are not.

Runtime Surfaces

PYTHONPATH=src python3 -m microcosm_core.organs.provider_context_recipe_budget_policy run \
  --input fixtures/first_wave/provider_context_recipe_budget_policy/input \
  --out receipts/first_wave/provider_context_recipe_budget_policy
PYTHONPATH=src python3 -m microcosm_core.cli provider-context-recipe-budget-policy run-budget-bundle \
  --input examples/provider_context_recipe_budget_policy/exported_provider_context_budget_bundle \
  --out receipts/runtime_shell/demo_project/organs/provider_context_recipe_budget_policy

Named Proof Consumers

The named proof consumer is tests/test_provider_context_recipe_budget_policy.py. It verifies streaming hash and line-count helpers, real-text byte sizing, all six expected recipe ids, all seven negative cases, source-backed section material, public-relative and redacted result records, exported-bundle validation, omitted-section movement when a section becomes small enough to fit, source-module digest mismatch rejection, source/target digest mismatch rejection, manifest and row body-text result record boundary rejection, compact --card output, exact copied source body imports, and fixture-manifest source-open body-floor counts.

The runtime proof consumers are the two module commands in the Validation Result record Path: provider_context_recipe_budget_policy run for fixture mode and provider_context_recipe_budget_policy run-budget-bundle for exported-bundle mode. Fixture mode must observe the negative-case set and write result, board, validation, and sign-off result records. Bundle mode must validate the exported runtime shape and write one metadata-only bundle validation result.

The corpus proof consumer is scripts/build_doctrine_projection.py --check-paper-module-corpus.

Reader Evidence Routing

  • Start with the JSON Bundle Binding to identify the source record and the launch-safe scope limit before reading any validation result as a capability claim.
  • Use Structured Lattice Bindings for navigation: it names the component, mechanism, generated row, and runtime code locus that the bundle binds.
  • Use Validation Result record Path for reproducibility: fixture and bundle commands produce metadata-only result records, the focused pytest exercises negative cases, and the corpus check verifies paper-module projection parity.
  • The lattice wiring for this module supports discoverability and internal consistency checks; it does not establish external model service, Lean/Lake execution, formal-result correctness, launch-scope decision, or public-send permission.

Negative Cases

  • budget_overflow_recipe rejects recipes above the public byte ceiling.
  • truth_side_section rejects oracle-only section ids.
  • proof_body_leakage rejects proof and provider body fields.
  • provider_call_authorized rejects any public fixture that authorizes a external model access.
  • deliverable_type_route_mismatch rejects a recipe whose reducer output type changed.
  • omitted_sections_suppressed rejects over-budget context without an omitted-sections manifest.
  • synthetic_section_materials rejects section material that lacks an allowed source ref or source anchor, or that is otherwise synthetic.

Why It Matters

Microcosm needs provider context to look like a small operating system, not a prompt dump. This component makes the context boundary inspectable: a cold reader can see the exact byte ceilings, section order, omitted material, and deliverable routes before any provider or proof authority is even in scope.

Prior Art Grounding

The recipe budget is grounded in retrieval-augmented generation and context packing practice. Lewis et al.'s Retrieval-Augmented Generation paper is the direct research anchor for conditioning generation on retrieved supporting material rather than relying only on model parameters. Microcosm narrows that idea into recipe metadata: retrieved proof-support sections are budgeted, ordered, and omitted explicitly before any external model access is in scope.

The command-facing budget style also borrows from the Command Line Interface Guidelines principle of saying enough but not too much. The component turns that UX pressure into fixed byte ceilings, omitted-section manifests, and deliverable-type routing so "more context" does not silently become proof authority or provider authorization.

Validation Result record Path

Run from microcosm-substrate:

PYTHONPATH=src ../repo-python -m microcosm_core.organs.provider_context_recipe_budget_policy run \
  --input fixtures/first_wave/provider_context_recipe_budget_policy/input \
  --out /tmp/microcosm-provider-context-recipe-budget-policy/fixture \
  --card
PYTHONPATH=src ../repo-python -m microcosm_core.organs.provider_context_recipe_budget_policy run-budget-bundle \
  --input examples/provider_context_recipe_budget_policy/exported_provider_context_budget_bundle \
  --out /tmp/microcosm-provider-context-recipe-budget-policy/bundle \
  --card
PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_provider_context_recipe_budget_policy.py -q
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

A green result record proves only public context-recipe metadata, byte ceilings, omitted sections, deliverable routing, copied source-module refs, and negative cases; it does not use external model services, run Lean or Lake, prove formal-result correctness, export proof bodies, expose oracle-only material, include launch operations, or convert context metadata into proof authority.

Scope boundary

Scope limit

This component does not use external model services, run Lean or Lake, prove a theorem, expose a proof body, or reveal oracle-only truth-side material. Its output is context metadata: which sections would be admitted, which sections were omitted, which deliverable route is allowed, and which authority claims remain false.

The strategy_classification_4kb route emits only strategy_id_classification. It is not a proof-body route and cannot carry a provider answer body.

Scope limit

This module covers only public context-recipe metadata: byte ceilings, ordered section admission, omitted-section manifests, deliverable routing, copied source-module refs, digest and anchor checks, negative cases, and metadata-only result records. They do not authorize provider or API calls, Lean or Lake execution, formal-result correctness, proof-body export, oracle-only truth-side material, provider answer bodies, launch-scope decision, publishing-scope decision, or whole-system correctness.

Source and projection details
Source-Open Body Floor

The public bundle carries exact source bodies for the context recipe compiler, formal ladder consumer, provider result record reducer, set calibration report, transform-job ABI, provider adapter policy, compute-provider policy, and provider-navigation transform result record policy. The validator checks every copied module by digest and required anchors; result records report only paths, hashes, counts, anchor status, and verdicts.

The body floor is deliberately metadata-only at the result record edge: runtime result records may prove copied-module paths, digests, anchor presence, counts, and verdicts, but they must not expose proof bodies, oracle-only truth-side material, provider answer bodies, account state, account secrets, or launch-send authority.

Agent Completion Faithfulness AuditRuns real git and pytest on a sample repo so wrap-up claims state only what the evidence proves.4/5Runs real tools

Does Runs a public fixture repo through real git and pytest subprocesses, then checks that completion claims only say what the evidence supports: commit object exists, ledger cap exists, and pytest pass is claimed only after exit-zero status was checked.

Scope limit verified means the referenced evidence object exists or a pytest span ran; it does not imply the span passed unless exit-zero status was explicitly checked

Run
microcosm agent-closeout-faithfulness-audit run --input fixtures/first_wave/agent_closeout_faithfulness_audit/input --out receipts/first_wave/agent_closeout_faithfulness_audit

EvidenceExternal tool runevidence 4/5Real runtime result

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Agent Completion Faithfulness Audit

agent_closeout_faithfulness_audit checks the kind of sentence an agent writes when it finishes a task: "I committed the change, closed the ledger item, and the test passed." It runs the supplied public fixture evidence through real git and pytest subprocesses and refuses any claim that the evidence does not actually support.

Purpose

When an agent reports that work is done, the report is prose. The commit may or may not exist, the ledger row may or may not be there, and "the test passed" may mean the test ran, or it may mean nothing was checked at all. This component exists to answer one question over a fixed fixture: is each completion claim backed by an evidence object that genuinely exists, and is a "passed" claim backed by an explicit exit-zero status check rather than by the wording of the claim?

The approach is unusual in that it does not parse the completion prose or score it against a rubric. The fixture's public_fixture_repo is copied into a throwaway directory, initialised and committed with real git subprocesses, and its HEAD is read back with git rev-parse. A commit claim passes only when it points at that observed HEAD. A declared pytest span is run with python -m pytest <nodeid> inside that temporary repo, and only the exit code decides whether the span passed. The result record records the run as bytes of work that happened, not as a paraphrase of what the agent said.

The distinction the audit defends is narrow and easy to lose. "The span ran" and "the span passed" are separate facts, and a completion sentence that conflates them is the precise failure mode here. A pass claim is admitted only when pass_status_checked is true and the subprocess exited zero; a claim that expected a pass without that check is rejected with CLOSEOUT_PYTEST_PASS_STATUS_NOT_CHECKED. The same separation applies to commits and ledger caps, so a referenced commit object is not treated as a landed change and a named cap is not treated as closed work.

Route Card

  • Component id: agent_closeout_faithfulness_audit
  • Accepted-component evidence class: external_subprocess_witness
  • Standard: standards/std_microcosm_agent_closeout_faithfulness_audit.json
  • Runner: src/microcosm_core/organs/agent_closeout_faithfulness_audit.py
  • Fixture input: fixtures/first_wave/agent_closeout_faithfulness_audit/input
  • Runtime bundle: examples/agent_closeout_faithfulness_audit/exported_agent_closeout_faithfulness_audit_bundle
  • Source manifest: examples/agent_closeout_faithfulness_audit/exported_agent_closeout_faithfulness_audit_bundle/source_module_manifest.json
  • Primary result records: receipts/first_wave/agent_closeout_faithfulness_audit/agent_closeout_faithfulness_audit_result.json, receipts/first_wave/agent_closeout_faithfulness_audit/agent_closeout_faithfulness_audit_board.json, receipts/first_wave/agent_closeout_faithfulness_audit/agent_closeout_faithfulness_audit_validation_receipt.json, and result records/sign-off/first_wave/agent_completion_faithfulness_audit_fixture_acceptance.json
  • Generated posture: this paper module is authored doctrine. Refresh them through their owner commands instead of patching them by hand.

Shape

This module is a completion-claim accounting fixture, not a completion oracle. Its single question is: did the supplied public fixture evidence support the completion claims, and did the result record refuse the overclaims that should not pass?

3 fixture claims3 fixture claimsAuditAudit2 cap rows2 cap rowspublic_fixture_repogit fixturepublic_fixture_repo git fixturedeclared nodeiddeclared nodeid1 exact-copy source body1 exact-copy source bodypass result record3 verified claimspass result record 3 verified claimsnegative-case semantics4 overclaim classesnegative-case semantics 4 overclaim classesscope limitno live mutation or launchscope limit no live mutation or launch

Source refs

3 fixture claims
closeout_claims.json
Audit
agent_closeout_faithfulness_audit.run
2 cap rows
fixture_ledger.json
declared nodeid
tests/test_closeout_fixture.py
1 exact-copy source body
source_module_manifest.json
Diagram source
flowchart TD Claims[completion_claims.json 3 fixture claims] --> Audit[agent_completion_faithfulness_audit.run] Ledger[fixture_ledger.json 2 cap rows] --> Audit Repo[public_fixture_repo git fixture] --> Audit Pytest[tests/test_completion_fixture.py declared nodeid] --> Audit Manifest[source_module_manifest.json 1 exact-copy source body] --> Audit Audit --> Pass[pass result record 3 verified claims] Audit --> Neg[negative-case semantics 4 overclaim classes] Audit --> Ceiling[scope limit no live mutation or launch]

The accounting is source-backed:

Evidence inputRuntime checkResult record/accounting field
closeout_claims.json carries claim_public_head_exists, claim_cap_exists, and claim_pytest_span_passedevaluate() loops over the three claim rows in src/microcosm_core/organs/agent_closeout_faithfulness_audit.pyclaim_count: 3, verified_claim_count: 3
public_fixture_repo is copied into a temporary git repo_prepare_public_fixture_repo() runs git init, config, add, commit, and rev-parse HEAD subprocessesgit_subprocess_count: 6, head_verified_by_subprocess: true
fixture_ledger.json names fixture cap rowstask_ledger_cap claims must match task_ledger_caps[].cap_idcap_fixture_closeout_receipt_exists is accepted; missing caps emit CLOSEOUT_FAKE_CAP_CLAIM
tests/test_closeout_fixture.py::test_public_fixture_addition is the declared pytest spanevaluate() runs python -m pytest <nodeid> -q and records return code, span_ran, and explicit pass-status checkingpytest_subprocess_count: 1, pytest_span_ran_count: 1, pytest_pass_status_checked_count: 1
source_module_manifest.json names one copied source source bodythe bundle validator checks digest equality, line count, required anchors, and metadata-only result record posturemodule_count: 1, line_count: 1703, sha256_match: true, body_in_receipt: false

Negative cases are part of the Shape rather than an appendix because they define the claim boundary. EXPECTED_NEGATIVE_CASES names fake commit, fake cap, fake pytest node, and unchecked-pytest-pass classes; the focused tests assert the first three directly against fixture mutation and assert unchecked pass rejection against CLOSEOUT_PYTEST_PASS_STATUS_NOT_CHECKED. The runtime-bundle result record observes all four classes, so a cold reader can distinguish "the span ran" from "the pass claim had exit-zero evidence."

The source-body route is deliberately narrow. The exported bundle copies exactly system/lib/agent_experience_diagnostics.py to examples/agent_closeout_faithfulness_audit/exported_agent_closeout_faithfulness_audit_bundle/source_modules/system/lib/agent_experience_diagnostics.py; the manifest carries the matching digest, 1703 lines, required anchors Agent Experience Grand Rounds and completion, and body_in_receipt: false. Result records carry refs, hashes, counts, verdicts, and scope boundaries only. They do not carry copied body text, private root paths, model-output data, account or browser state, live work log authority, live work log authority, source-file changes, launch-scope decision, or whole-system completion truth.

Technical Mechanism

The fixture validator is centered on evaluate() in src/microcosm_core/organs/agent_closeout_faithfulness_audit.py. It loads closeout_claims.json and fixture_ledger.json, copies public_fixture_repo into a temporary repository, initializes and commits that copy with real git subprocesses, and records the resulting HEAD through git rev-parse HEAD. Commit claims pass only when the claim ref is HEAD or the actual subprocess-observed HEAD; fixture cap claims pass only when the cap id appears in the fixture ledger.

For pytest claims, evaluate() runs python -m pytest <nodeid> -q inside the temporary public fixture repo. A span can be counted as observed when the nodeid runs, but a pass claim is accepted only when pass_status_checked is true and the pytest subprocess exits zero. The same source file carries evaluate_negative_case(), which mutates one claim row at a time to force the fake commit, fake cap, fake pytest node, and unchecked pass paths. The expected error codes are declared in EXPECTED_NEGATIVE_CASES, so the negative floor is source-bound rather than inferred from prose.

The exported-bundle path uses run_agent_closeout_bundle() against examples/agent_closeout_faithfulness_audit/exported_agent_closeout_faithfulness_audit_bundle. That path reuses the same evaluator while making the source-module manifest floor mandatory: the copied diagnostic body must match the manifest digest, include required anchors, and remain absent from result records. AUTHORITY_CEILING then records the scope boundaries in machine-readable form: no live repo mutation, no launch-scope decision, no work log closure, and no pytest-pass claim without exit-zero evidence.

Named Proof Consumers

  • microcosm_core.organs.agent_closeout_faithfulness_audit.run is the first-wave fixture consumer. It materializes the public fixture repo, ledger, completion-claim rows, semantic negative cases, validation result record, board, and sign-off result record.
  • microcosm_core.organs.agent_closeout_faithfulness_audit.run_agent_closeout_bundle is the exported-bundle consumer. It validates the source-open bundle and the copied diagnostic body manifest while preserving body_in_receipt: false.
  • microcosm_core.organs.agent_closeout_faithfulness_audit.evaluate is the subprocess witness consumer. It checks commit, cap, and pytest-span claims against actual fixture evidence instead of accepting completion prose.
  • microcosm_core.organs.agent_closeout_faithfulness_audit.evaluate_negative_case is the falsification consumer for fake commit, fake cap, fake nodeid, and unchecked pytest-pass overclaims.
  • tests/test_agent_closeout_faithfulness_audit.py is the focused regression consumer. It asserts the public subprocess witness path, fake-claim rejections, semantic negative-case evaluation, exported-bundle metadata-only source manifest behavior, digest-mismatch rejection, and pytest-capable interpreter selection.

First Commands

From microcosm-substrate:

Validate the exported bundle when the question is whether the public source-open copy still matches the declared source body:

What It Proves

This component checks completion claims against public fixture evidence instead of trusting completion prose. A positive run proves four things:

  • the fixture repo exists and the referenced commit object is visible to real git subprocesses;
  • fixture HEAD is checked by subprocess evidence rather than by prose;
  • the declared pytest span actually ran;
  • work log style cap claims only point at rows present in the fixture ledger.

The useful distinction is narrow: verified means the referenced evidence object exists or the pytest span ran. A claim that a pytest span passed is valid only when the result record checked an explicit exit-zero status. That is the reader value of this component: it separates "I referenced a test" from "I proved the test passed."

Prior Art Grounding

This component is grounded in claim-verification and reproducibility patterns rather than in trust of summary prose. FEVER popularized fact extraction and verification as a separate task over cited evidence, while TruthfulQA made explicit that fluent model answers can be misleading without a truthfulness check. The artifact-review tradition also motivates separating a claim, its artifact, and its validation evidence instead of treating a report as self-validating.

Microcosm borrows that verification posture for agent completion: commit refs, work log refs, pytest spans, subprocess witnesses, and pass-status checks must line up before completion language is admitted. It does not certify all live completion prose or turn a referenced test into a passed test without exit-zero evidence.

Source-Backed System

The source-open body import is a single exact source body:

  • system/lib/agent_experience_diagnostics.py

The copied target is:

  • examples/agent_closeout_faithfulness_audit/exported_agent_closeout_faithfulness_audit_bundle/source_modules/system/lib/agent_experience_diagnostics.py

The manifest records:

  • source_to_target_relation: exact_copy;
  • body_copied: true;
  • body_in_receipt: false;
  • a 1703-line body;
  • matching source and target sha256 digests;
  • required anchors Agent Experience Grand Rounds and completion.

Result records carry refs, hashes, counts, verdicts, and scope boundaries only.

Result record Floor

A passing fixture run emits:

  • agent_closeout_faithfulness_audit_result.json
  • agent_closeout_faithfulness_audit_board.json
  • agent_closeout_faithfulness_audit_validation_receipt.json
  • agent_closeout_faithfulness_audit_fixture_acceptance.json

A passing runtime-bundle run emits:

  • exported_agent_closeout_faithfulness_audit_bundle_validation_result.json
  • agent_closeout_faithfulness_audit_board.json
  • agent_closeout_faithfulness_audit_validation_receipt.json

The first-wave result must show:

  • status: pass;
  • real_substrate_disposition: real_substrate_capsule;
  • body_in_receipt: false;
  • source_module_manifest.status: pass;
  • all_expected_digests_matched: true;
  • all_required_anchors_present: true;
  • secret_exclusion_scan.blocking_hit_count: 0;
  • receipt_body_scan.status: pass.

The exercise floor is:

  • three verified completion claims;
  • six git subprocess witnesses;
  • one pytest subprocess witness;
  • one checked pass status;
  • one ran pytest span;
  • head_verified_by_subprocess: true.

Negative Cases

The current negative-case floor is:

  • fake_commit_claim -> CLOSEOUT_FAKE_COMMIT_CLAIM
  • fake_cap_claim -> CLOSEOUT_FAKE_CAP_CLAIM
  • fake_test_claim -> CLOSEOUT_FAKE_TEST_CLAIM
  • unchecked_pass_claim -> CLOSEOUT_PYTEST_PASS_STATUS_NOT_CHECKED

These cases are the claim-language guardrail. If they stop appearing in observed negative cases, the component no longer proves that public completion result records reject fabricated commit, cap, test-node, or unchecked-pytest-pass claims.

Evidence Binding

  • JSON bundle authority: core/paper_module_capsules.json#paper_module.agent_closeout_faithfulness_audit.
  • Mechanism source: core/mechanism_sources.json#mechanism.agent_closeout_faithfulness_audit.validates_closeout_evidence_claims.
  • Component atlas edge: core/organ_atlas.json#agent_closeout_faithfulness_audit.
  • Runtime source: src/microcosm_core/organs/agent_closeout_faithfulness_audit.py.
  • First command: PYTHONPATH=src python3 -m microcosm_core.components.agent_completion_faithfulness_audit run --input fixtures/first_wave/agent_completion_faithfulness_audit/input --out result records/first_wave/agent_completion_faithfulness_audit --sign-off-out result records/sign-off/first_wave/agent_completion_faithfulness_audit_fixture_acceptance.json.

Reader Evidence Routing

  • Start with the Route Card and JSON Bundle Binding to identify the component, standard, source row, runner, fixture input, exported bundle, and result record surfaces.
  • For behavior questions, read src/microcosm_core/organs/agent_closeout_faithfulness_audit.py and the focused tests before trusting this prose.
  • For source-open body questions, read the exported bundle's source_module_manifest.json; the manifest is the evidence for exact-copy relation, digest match, anchor match, and metadata-only result record posture.
  • For claim-language questions, read the Negative Cases and Result record Expectations together; the pass path only matters if the overclaim cases still fail.
  • Treat generated component Markdown, atlas cards, graphs, health files, and runtime result records as navigation or validation projections. They do not become source authority for broader completion truth.

Validation Result record Path

The focused proof consumer is tests/test_agent_closeout_faithfulness_audit.py. A passing result record has to show that completion language was checked against public fixture evidence: referenced commit objects, fixture work log rows, git subprocess witnesses, pytest subprocess witnesses, explicit pass-status checks, negative completion cases, and the exported source-module manifest. It must not rely on completion prose as its own proof.

./repo-pytest tests/test_agent_closeout_faithfulness_audit.py -q --basetemp=/tmp/microcosm_agent_closeout_faithfulness_audit_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

For the focused test, the result record boundary is the asserted shape: three verified completion claims, at least five git subprocess witnesses, one pytest subprocess witness, one ran pytest span, one checked pass-status row, head_verified_by_subprocess=true, source-module digest and required-anchor matches, metadata-only result record posture, and semantic observation of the four negative completion classes. For the corpus check, the result record only proves bundle/instance parity; it does not close live work log work, mutate live work log state, certify arbitrary completion prose, prove launch-scope decision, or turn a referenced pytest span into a passed span without exit-zero evidence.

Validation Anchors

Focused coverage lives in tests/test_agent_closeout_faithfulness_audit.py and checks:

  • public git and pytest subprocess witness behavior;
  • fake commit rejection;
  • unchecked pytest pass rejection;
  • fake cap claim rejection;
  • fake pytest node id rejection;
  • metadata-only source manifest behavior in the exported bundle;
  • source-module digest mismatch rejection;
  • pytest-capable Python selection.

Scope boundary

Scope limit

This module may claim public fixture evidence that completion claims are checked against referenced commit objects, fixture work log rows, pytest subprocess witnesses, explicit pass-status checks, negative completion cases, a copied diagnostic body, source-module manifest digest equality, metadata-only result record posture, and validation result records.

This module may not claim live completion truth, live work log mutation, live work log mutation, live Git mutation, external model access, source-file changes, launch-scope decision, publishing-scope decision, deployment posture, all-agent faithfulness, formal-result correctness beyond the listed witnesses, or whole-system correctness.

Scope limit

This component is a public fixture witness for completion evidence. It does not:

  • prove arbitrary live commits landed;
  • close or mutate work log work;
  • mutate Git state;
  • include launch operations;
  • use external model services;
  • certify all completion prose;
  • turn a ran pytest span into a passed span without an explicit exit-zero check.

Its useful claim is narrower: over the supplied fixture repo, fixture ledger, completion claims, and copied diagnostic body, the component proves that completion evidence references are checked and that specific overclaims are refused.

Source and projection details
Governing Lattice Relation

That mechanism is active in core/mechanism_sources.json and says the component validates public completion evidence claims through fixture commit objects, fixture HEAD evidence, git subprocesses, pytest span execution, explicit pass-status checks, fixture-ledger cap rows, copied source-module digests, and stable overclaim negative cases before writing metadata-only result records.

The doctrine edge is narrow and constructive. The JSON instance reports concept.agent_reliability_and_safety_validator_bundle, principles P-1 and P-2, axiom AX-1, and dependency paper_module.durable_agent_work_landing_replay; those edges explain why this module is a validator-bundle proof instrument rather than a general completion truth oracle. The generated Mermaid and Atlas edges are navigation result records for that binding, not launch or correctness authority.

Bounded Autonomy Campaign PacketDrafts proposed work from coverage gaps and proves it cannot repair or rewrite the code itself.4/5Runs real tools

Does Turns synthetic coverage gaps into a draft candidate packet in a subprocess and records the boundary that it proposes work but cannot repair itself or write source.

Scope limit self-proposal campaign packet only; no self-repair or unsupervised source-file changes

Run
microcosm bounded-autonomy-campaign-packet run --input fixtures/first_wave/bounded_autonomy_campaign_packet/input --out receipts/first_wave/bounded_autonomy_campaign_packet

EvidenceExternal tool runevidence 4/5Real runtime result

ai-safetyagent-evaluationred-teaming

Source Design note · Source atlas

Paper module Bounded Autonomy Campaign Packet

bounded_autonomy_campaign_packet is a Crown Jewel import component with real runnable system and a strict public scope limit. It consumes synthetic public fixtures, copied source source bodies, and source manifests that verify sha256 digests, line counts, required anchors, secret-exclusion status, and result record body omission.

What it proves: self-proposal campaign packet only; no self-repair or unsupervised source-file changes.

Purpose

An agent can usefully notice its own coverage gaps and draft a plan to close them. The danger is that "draft a plan" quietly becomes "do the work": a proposal grows a write surface, and a system that was meant to suggest starts mutating its own source unsupervised. This component exists to keep those two steps apart. It answers one question: can an agent emit a draft campaign proposal from real coverage gaps without that proposal carrying any authority to act on them?

The design choice that makes this interesting is where the candidate count comes from. The component does not invent a plausible-looking list of work. It runs a real source campaign builder in read-only mode (build_standard_skill_pairing_campaign.py --check --report) and accepts its witness only when the builder reports candidate targets and leaves wrote_packet unset. The proposal is therefore derived from a surface that could do real work, observed in a mode where it did not. Each drafted candidate is then stamped write_surface: none, source_mutation_authorized: false, and requires_human_review: true, so the act of proposing can never be mistaken for the act of authorising.

Two refusals guard the boundary. A campaign policy that lists write_source among its allowed actions is rejected outright, before any candidate is drafted. And a campaign digest that already appears in the failed-campaign ledger more than once is refused, so a plan that has already failed cannot be quietly re-proposed under a fresh wrapper. Both refusals are checked by mutating the fixture and confirming the expected error code fires, not by trusting a declared label.

Shape

Public synthetic inputscoverage_gaps,campaign_policy,failed_campaign_digestsPublic synthetic inputs coverage_gaps, campaign_policy, failed_campaign_digestscampaign_policy allowswrite_source?campaign_policy allows write_source?Read-only builder witnesscheck --reportRead-only builder witness check --reportreports candidate targetsand wrote_packet unset?reports candidate targets and wrote_packet unset?Draft candidate packetwrite_surface: none,requires_human_review,source_mutation: falseDraft candidate packet write_surface: none, requires_human_review, source_mutation: falsefailed digestrepeated?failed digest repeated?RefuseSOURCE_WRITE_FORBIDDENREPEATED_FAILED_DIGESTwitness blockedRefuse SOURCE_WRITE_FORBIDDEN REPEATED_FAILED_DIGEST witness blockedmetadata-only result recordsrefs, digests, stdout/stderrhashes;builder output bodiesexcludedmetadata-only result records refs, digests, stdout/stderr hashes; builder output bodies excludedScope limitno self-repair, source-filechanges,providers, launch, or publicsharingScope limit no self-repair, source-file changes, providers, launch, or public sharing

Source refs

Read-only builder witness check --report
build_standard_skill_pairing_campaign.py
Diagram source
flowchart TD Inputs["Public synthetic inputs coverage_gaps, campaign_policy, failed_campaign_digests"] PolicyGate{"campaign_policy allows write_source?"} Witness["Read-only builder witness build_standard_skill_pairing_campaign.py --check --report"] WitnessGate{"reports candidate targets and wrote_packet unset?"} Draft["Draft candidate packet write_surface: none, requires_human_review, source_mutation: false"] DigestGate{"failed digest repeated?"} Refuse["Refuse SOURCE_WRITE_FORBIDDEN / REPEATED_FAILED_DIGEST / witness blocked"] Result records["metadata-only result records refs, digests, stdout/stderr hashes; builder output bodies excluded"] Ceiling["Scope limit no self-repair, source-file changes, providers, launch, or public sharing"] Inputs --> PolicyGate PolicyGate -- "yes" --> Refuse PolicyGate -- "no" --> Witness Witness --> WitnessGate WitnessGate -- "no" --> Refuse WitnessGate -- "yes" --> Draft Draft --> DigestGate DigestGate -- "yes" --> Refuse DigestGate -- "no" --> Result records Refuse --> Result records Result records --> Ceiling

This diagram is a reader aid. The machine graph remains the generated paper_module.bounded_autonomy_campaign_packet.mermaid projection derived from the JSON source record.

Technical Mechanism

The runtime is intentionally narrower than "autonomous repair." SPEC declares the four required public inputs, the source-module manifest, the expected negative cases, and an AUTHORITY_CEILING in which self-repair, unsupervised source-file changes, source-write packets, external model access, and launch are all false. run() and run_bounded_autonomy_bundle() then route both the fixture and exported bundle through run_crown_jewel_organ, so the same evaluator, source-manifest checks, metadata-only result record policy, and semantic negative-case evaluator guard both command surfaces.

The positive lane is witnessed by _campaign_builder_witness(), not by a fictional campaign row. It invokes tools/meta/factory/build_standard_skill_pairing_campaign.py --check --report --max-targets <n> from the source root, then accepts the witness only when the builder returns standard_skill_pairing_campaign_summary, reports at least one candidate target, emits a source_digest, and leaves wrote_packet unset. This makes the campaign packet a read-only proposal derived from a real builder surface; the result record stores return code, digest fields, and stdout/stderr hashes, but keeps builder output bodies out of the result record.

_candidate_packet_subprocess() converts the witnessed target count into draft candidate rows. Each candidate is tied to one fixture coverage gap when available, carries the builder ref and builder source digest, sets write_surface: none, requires human review, and records source_mutation_authorized: false. evaluate() then applies the policy checks: write_source in campaign_policy.allowed_actions is a hard refusal; blocked builder witness or empty candidate packet is a hard refusal; any candidate that authorizes source-file changes or writes to the source surface is also refused.

The negative cases are semantic mutations of the input, not trusted labels. evaluate_negative_case() copies the required inputs into a temporary directory and mutates the relevant file: source_write_campaign_packet appends write_source to campaign_policy.allowed_actions, while repeated_failed_campaign_digest rewrites the failed-digest ledger to contain a duplicate digest. The component passes its own evidence floor only when these mutations produce BOUNDED_AUTONOMY_SOURCE_WRITE_FORBIDDEN and BOUNDED_AUTONOMY_REPEATED_FAILED_DIGEST; stale declared error-code labels cannot satisfy the proof consumer.

Reader Evidence Routing

The primary evidence for this module is the fixture result record and the exported-bundle result record, which demonstrate the bounded campaign packet behavior under synthetic public inputs. Source-module manifests and digest checks are evidence for copied body provenance. This page is an explanation of those sources; the underlying JSON and test outputs are the authority.

Prior Art Grounding

This component borrows from AI risk-management, policy gating, and controlled workflow-automation patterns. Useful anchors include:

  • NIST's AI Risk Management Framework, which frames AI work in terms of governance, mapping, measuring, and managing risk rather than assuming autonomy is inherently authorized.
  • Open Policy Agent, as a policy-engine pattern for deciding whether a proposed action may proceed.
  • GitHub Actions workflow syntax, as a widely used automation surface where jobs, permissions, and concurrency behavior are declared before execution.

Microcosm borrows the governed-campaign and preflight-gate shape, but keeps the component to draft self-proposal packets over synthetic public coverage gaps. It does not self-repair, change source files unsupervised, use external model services, or include launch operations.

How to run it:

microcosm bounded-autonomy-campaign-packet run --input fixtures/first_wave/bounded_autonomy_campaign_packet/input --out receipts/first_wave/bounded_autonomy_campaign_packet

Runtime bundle route:

python -m microcosm_core.organs.bounded_autonomy_campaign_packet run-bounded-autonomy-bundle --input examples/bounded_autonomy_campaign_packet/exported_bounded_autonomy_campaign_packet_bundle --out receipts/runtime_shell/demo_project/organs/bounded_autonomy_campaign_packet

Validation Result record Path

If the fixture or bundle reports source-module digest drift, route that through microcosm_exact_copy_refresh; this page is source-linked only for copied source bodies. If the full projection check fails because another active session holds shared lattice outputs, treat that as unrelated contention and use the corpus check as the local gate for this module.

Negative cases covered by the fixture manifest: repeated_failed_campaign_digest, source_write_campaign_packet.

Source provenance is anchored by examples/bounded_autonomy_campaign_packet/exported_bounded_autonomy_campaign_packet_bundle/source_module_manifest.json and result records carry refs, digests, counts, verdicts, and scope boundaries only.

Scope boundary

Scope limit

This component emits a draft self-proposal from public synthetic coverage gaps and refuses source-write or repeated-failure packets. It does not self-repair, change source files unsupervised, use external model services, include launch operations or public sharing, or widen the proof boundary beyond the copied source bodies, synthetic fixtures, source manifests, negative cases, and validation result records.

Scope limit

This paper module demonstrates a bounded-autonomy fixture that builds a draft campaign packet and refuses unsafe packets under public synthetic inputs. A diagram view and atlas card are generated for this module.

It cannot claim autonomous repair, unsupervised source-file changes, external model access, launch-scope decision, publishing-scope decision, production campaign safety, private-system equivalence, or whole-system correctness.

Secondary Runtime Source BundleRuns eight trace, graph, and market engines on test rows without fetching live markets.5/5

Does This bundle imports a second Set 7 runtime slice as public runnable system. It checks agent trace view-model trust classes, lane-progress state normalization, graph-lens focus roles, graph projection summaries, observe-only cartography rendering, stockgrid payload terms, Polymarket CLOB microstructure, and four-lens market scanning over synthetic public fixtures without exporting sessions, fetching live markets, or giving trading decisions.

Scope limit verified source body import only; no browser/session export, wallet authority, live market data, investment-related actions, external model access, source-file changes, private-system equivalence, public sharing, launch, semantic-truth, or whole-system correctness claim

Run
microcosm batch7-secondary-runtime-capsule run --input fixtures/first_wave/batch7_secondary_runtime_capsule/input --out receipts/first_wave/batch7_secondary_runtime_capsule --acceptance-out receipts/acceptance/first_wave/batch7_secondary_runtime_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 7 Secondary Runtime Bundle

batch7_secondary_runtime_capsule imports a second Set-7 runtime slice into Microcosm. It exact-copies runtime view-model, lane-progress, graph-lens, graph-projection, cartography, stockgrid, and Polymarket source bodies into a public bundle, runs the bounded witness path, and exercises the Python market/numeric cores against synthetic public fixtures.

Imported Source Bodies

  • system/server/ui/src/components/world/agentTraceViewModel.ts
  • system/server/ui/src/components/world/laneProgress.ts
  • system/server/ui/src/components/graph/universalGraphLens.ts
  • system/server/ui/src/components/graph/graphProjection.ts
  • system/server/ui/src/lib/capCartographyShadowRender.ts
  • their focused Vitest witnesses where public-safe
  • tools/stockgrid/stockgrid.py
  • tools/polymarket/clob_snapshot.py
  • tools/polymarket/score.py
  • tools/polymarket/models.py

Purpose

This module is the reader-facing instrument for the accepted batch7_secondary_runtime_capsule component. Its source authority is the JSON source record in core/paper_module_capsules.json; this Markdown explains what a cold reader may trust from the public secondary-runtime fixture and what remains out of scope.

The component exists to answer one question: do these copied frontend and market bodies still behave the way their original code claims to, when run in isolation over synthetic inputs? It copies eight slices into a bundle, then exercises each one against a small fixture and re-checks the exact behaviour the original author relied on. The interesting part is not that the code runs, but that each engine is paired with a planted regression. The component mutates a single token in the copied body, or feeds an adversarial input, and asserts that the behaviour breaks in the expected way. A check that only passes on good input proves little; a check that also fails on the right bad input is evidence the behaviour is real.

Several of these guards encode a concrete bug that was found in production. The Polymarket order-book reader documents a probe from 2026-05-12: the API can return bids floor-first and asks ceiling-first, so a naive bids[0] / asks[0] reader silently inverts best-bid and best-ask. The body derives best prices by numeric extrema instead, and the polymarket_sorted_book_trap case feeds a deliberately mis-sorted book to confirm the extrema rule still holds. The stockgrid momentum primitive refuses an impossible -100% daily change rather than returning a misleading number. The graph projection drops self-edges so a collapsed cluster does not draw an arrow to itself. The scope stays narrow on purpose: this is local body import and synthetic-fixture witness evidence, not live market access, wallet authority, browser export, or investment-related actions.

Shape

Exported bundlecopied bodies+ source digest anchorsExported bundle copied bodies + source digest anchorsVitest witnessVitest witnessTrace view-modeland lane progressTrace view-model and lane progressGraph lensand graph projectionGraph lens and graph projectionCartographyobserve-only renderCartography observe-only renderStockgrid + PolymarketCLOB and four-lens scoringStockgrid + Polymarket CLOB and four-lens scoringMis-sorted bookmust still find extremaMis-sorted book must still find extrema100% changemust be refused100% change must be refusedSelf-edgemust be droppedSelf-edge must be droppedResolved marketmust gate NEWSBREAKERResolved market must gate NEWSBREAKERmetadata-only result recordsstatus, digests, anchorchecksmetadata-only result records status, digests, anchor checksscope limitscope limitNegativesNegatives

Source refs

Vitest witness
world/graph/cartography tests
Diagram source
flowchart TD bundle["Exported bundle copied bodies + source digest anchors"] witness["Vitest witness world/graph/cartography tests"] subgraph Engines["Eight fixture engines"] ui["Trace view-model and lane progress"] graph["Graph lens and graph projection"] carto["Cartography observe-only render"] market["Stockgrid + Polymarket CLOB and four-lens scoring"] end subgraph Negatives["Planted regressions"] invert["Mis-sorted book must still find extrema"] momentum["-100% change must be refused"] selfedge["Self-edge must be dropped"] resolved["Resolved market must gate NEWSBREAKER"] end result records["metadata-only result records status, digests, anchor checks"] ceiling["scope limit"] bundle --> witness witness --> ui bundle --> graph bundle --> carto bundle --> market ui --> Negatives graph --> Negatives carto --> Negatives market --> Negatives Negatives --> result records result records --> ceiling

Reader Evidence Routing

Start from the component source when checking behavior:

  • EXPECTED_ENGINES names the eight fixture engines for trace view-models, lane progress, graph lenses, graph projection, cartography, stockgrid, CLOB microstructure, and Polymarket scoring.
  • EXPECTED_NEGATIVE_CASES names the planted regressions for raw-authority omission, unknown lane state, hidden descendants, self edges, observe-only cartography, extreme stock momentum, sorted-book traps, and resolved-market gating.
  • AUTHORITY_CEILING keeps launch, public sharing, provider/model dispatch, browser or wallet access, source-file changes, investment-related actions, semantic-truth authority, and test-completeness proof false.
  • run, run_batch7_secondary_bundle, and result_card expose the reproducible command and metadata-only summary.

What the engines check

Each engine reads a copied body and asserts a specific, checkable behaviour. The four with the clearest stakes:

  • Polymarket CLOB microstructure. compute_best_prices derives the best bid as the maximum bid price and the best ask as the minimum ask price, never from the first row of each side. This guards a real failure documented in the source: the API can return bids floor-first and asks ceiling-first, which inverts a naive bids[0] / asks[0] reader. The polymarket_sorted_book_trap case feeds a mis-sorted book and confirms the chosen best bid (0.42) and ask (0.53) are not the first entries, then checks the spread and that depth imbalance stays in [-1, 1].
  • Stockgrid momentum. _daily_log_momentum_bps converts a percentage change into a daily log-return in basis points, but returns nothing when the ratio is at or below -0.999999. A claimed -100% daily change has no finite log return, so the primitive refuses it rather than emitting a misleading value. The stockgrid_extreme_momentum case asserts that refusal.
  • Graph projection. projectGraphForRender groups nodes into per-lane, per-wave summary clusters and rewrites edges between clusters. It drops any edge whose source and target land in the same cluster, so a collapsed cluster never draws an arrow to itself. The graph_projection_self_edge case removes the sourceId === targetId guard from the copied body and confirms the self-edge would otherwise survive.
  • Polymarket four-lens scoring. calculate_lenses zeroes the NEWSBREAKER lens for any market that is resolved, low-volume, low-uncertainty, or an outlier in velocity. The fixture scores one open and one resolved synthetic market and asserts the resolved one scores zero on NEWSBREAKER while the open one does not.

The remaining engines cover the trace view-model trust taxonomy (seven labels including missing and fallback, with an explicit "raw provider JSONL is unavailable" path), lane-progress state normalisation (an unknown state falls back to idle, not an invented status), the graph lens (collapsing a parent keeps the parent visible but hides its descendants), and the cartography render (a fixed set of mutating actions stays blocked, so the surface observes without creating or editing). Each negative case is run by mutating one token in the copied body or supplying an adversarial input, then checking the engine reports blocked. The result records record status, digests, and anchor matches only; copied bodies and command output are never inlined.

Prior Art Grounding

The component borrows from MVVM/read-model UI architecture, graph visualization, and market-data board patterns: view models shape raw state for views, graph projections make relationships inspectable, and market rows must preserve provider identity and missingness. Useful anchors include:

  • Microsoft's MVVM guidance, where view models encapsulate presentation state while separating UI from underlying model logic.
  • D3 force layouts as a common graph visualization family for networks and hierarchies.
  • The CFTC's prediction markets explainer, as a boundary reference for event-market data and consumer caution.

Microcosm borrows the view-model, graph-projection, and market-diagnostic shapes, but runs them only over synthetic runtime packets and synthetic market rows. It is not browser/session export, live market data, trading decisions, or proof that frontend projections are complete.

Validation Result record Path

Reader-verifiable fixture command, run from microcosm-substrate/:

Focused test result record, run from the repository root:

PYTHONPATH=src ./repo-pytest \
  tests/test_batch7_secondary_runtime_capsule.py \
  -q --basetemp /tmp/microcosm-batch7-secondary-runtime-tests

The fixture run writes receipts/first_wave/batch7_secondary_runtime_capsule/batch7_secondary_runtime_capsule_result.json, receipts/first_wave/batch7_secondary_runtime_capsule/batch7_secondary_runtime_capsule_validation_receipt.json, and receipts/first_wave/batch7_secondary_runtime_capsule/batch7_secondary_runtime_capsule_board.json; the sign-off file records fixture sign-off. The exported-bundle re-run uses the run-batch7-secondary-bundle action over exported_batch7_secondary_runtime_capsule_bundle.

This result record path is public fixture evidence only. It does not export browser or account sessions, fetch live market data, provide investment-related actions, complete UI/ranking coverage, include launch operations or public sharing, or aggregate doctrine-lattice coverage.

Scope boundary

Scope limit

This bundle can claim fixture-bound public source-body import evidence and secondary runtime/market witness result records. It cannot authorize browser/session export, wallet authority, live market data, investment-related actions, external model access, source-file changes, launch, public sharing, private-system equivalence, semantic truth, complete UI/ranking coverage, or whole-system correctness.

Research & science (8)

Research Replication Rubric Artifact ReplayAudits whether a paper-replication claim carries the full evidence trail.3/5

Does It checks whether a claim that an AI agent "replicated a research paper" comes with the paper trail real replication would leave behind. It re-runs nothing; instead it confirms the bundle names every required piece of evidence: a breakdown of the paper's contributions, a grading rubric, the list of allowed public inputs, a from-scratch repo scaffold, an experiment plan, the metric scripts, a roster of declared file-hashes for the outputs plus hashes that all stay inside that roster, a grader report, a capped compute/runtime budget, an ablation diff, a failure list, and a cold-rerun result record. It also catches eight ways a claim can cheat: reusing the original authors' code, leaking a hidden rubric, calling a run a "success" when only a write-up backs it, asserting a benchmark claims, leaking a private paper or dataset body, using unbounded compute, grading only the final answer, or pointing at a file-hash that was never declared. The work runs on two made-up sample papers (one machine-learning method, one computational-science study), and the generated result record shows which of the eight cheats each test case triggered, rather than taking "it was replicated" on trust.

Scope limit It validates the shape and presence of synthetic replay metadata and result record references only - it does not run any experiment, metric script, or rerun, excludes any claim that a paper was actually replicated, that a benchmark claims was achieved, or that the underlying science is correct, and it never calls providers, exposes private paper/data bodies, or authorizes public sharing or launch.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.research_replication_rubric_artifact_replay run --input fixtures/first_wave/research_replication_rubric_artifact_replay/input --out receipts/first_wave/research_replication_rubric_artifact_replay

Paper module Research Replication Rubric Artifact Replay

Abstract

research_replication_rubric_artifact_replay is a public Microcosm component that turns "an agent replicated a paper" into a replayable evidence contract. It does not rerun a real paper, use external model services, certify benchmark performance, or grant publishing-scope decision. It checks whether a public replay bundle exposes the objects a replication claim must cite before its authority can rise: contribution decomposition refs, rubric-tree refs, allowed input refs, scratch-scaffold refs, experiment-DAG refs, metric-script refs, declared artifact hashes, grader reports, runtime budgets, ablation diffs, failure taxonomies, cold-rerun refs, public execution-trace spans, and source-module digests.

The technical result is an R3 local artifact replay: one public metric script is executed over one allowed public input table, the produced output is compared with a declared output artifact, and the declared hash file is checked against that artifact. A successful run says the replay packet is structurally accountable, digest-bound, redaction-aware, and negative-case tested. It does not say that a real paper was independently replicated.

Purpose

The single question this component answers is narrow: before an agent is allowed to say it replicated a paper, can the claim be forced into a bundle that a cold runtime can check without trusting any prose? The interesting move is that the component refuses to treat "replicated" as one fact. It pulls the claim apart into the objects a real replication would have left behind, a contribution decomposition, a grading rubric tree, the allowed public inputs, an experiment DAG, metric scripts, declared artifact hashes, a grader report, a runtime budget, an ablation diff, a failure taxonomy, and a cold-rerun result record, and it asks for each one by name.

What keeps this from being a checklist linter is the small executable core. The exported bundle does not just assert that an artifact hash exists. The runtime reads one public metric script, runs it over one allowed public input table, produces an output, and then checks that output against both the declared output artifact and the declared hash file. A replay row can name all the right refs and still fail here if the numbers do not reproduce. The negative-case fixtures attack exactly the gap a plausible fake would exploit: report-only success, benchmark-performance language, final-answer-only grading, undeclared hashes, and reuse of the original author's code.

The deliberately modest part is the subject matter. The two paper bundles are public synthetic examples, and the metric is a single sum over a small table. The component's value is the boundary, not the science. It does not run a real paper, call a provider, search compute without bound, or grant any launch or publishing-scope decision. It only makes a replication claim accountable enough that an independent reader can see where the evidence stops.

Telos

Research-agent demos often collapse four objects into one sentence: the paper, the runnable artifact, the grading rubric, and the evidence that an independent rerun happened. This component keeps those objects separate. A replay is admissible only when it names each evidence object and when the local runtime can check the public artifact replay without touching private paper bodies, non-public data bodies, hidden rubrics, model-output data, original-author code bodies, or launch/publishing-scope decision.

The central bet is modest and technical: before any replication claim is made, the system can force the claim into a falsifiable bundle with declared hashes, bounded metric execution, metadata-only result records, and explicit scope boundaries.

Mechanism

The mechanism row is mechanism.research_replication_rubric_artifact_replay.validates_public_research_replication_replay. It runs in src/microcosm_core/organs/research_replication_rubric_artifact_replay.py and is backed by the functions run, run_replication_bundle, validate_source_module_imports, validate_projection_protocol, validate_replication_policy, validate_research_replays, _build_result, _freshness_basis, and the constants EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_REF, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.

The runtime has two modes:

  • Fixture mode reads fixtures/first_wave/research_replication_rubric_artifact_replay/input, includes positive replay rows plus eight negative-case fixtures, and writes first-wave result, board, validation, and sign-off result records.
  • Exported-bundle mode reads examples/research_replication_rubric_artifact_replay/exported_research_replication_bundle, validates the public runtime example, checks the source-module manifest, and writes receipts/runtime_shell/demo_project/organs/research_replication_rubric_artifact_replay/exported_research_replication_bundle_validation_result.json.

The proof object is the tuple:

  1. replication_policy.json, which states required replay fields, rubric axes, and forbidden claims.
  2. research_replays.json, which supplies two synthetic paper bundles that cite public inputs, metrics, artifact hashes, grader reports, budgets, failures, and cold-rerun result records.
  3. execution_artifacts/execution_artifact_manifest.json, which authorizes the replayable artifact relation.
  4. source_module_manifest.json, which names copied source bodies and digest obligations.
  5. Runtime result records, which expose refs, counts, digests, trace spans, and scope boundaries without embedding private bodies.

Metric-Script and Artifact Evidence

The exported bundle includes a small but real artifact-replay loop:

RolePublic artifact
Input bodyexecution_artifacts/inputs/public_synthetic_table.json
Input hashexecution_artifacts/inputs/public_synthetic_table.sha256.json
Metric scriptexecution_artifacts/metrics/public_sum_metric.json
Metric hashexecution_artifacts/metrics/public_sum_metric.sha256.json
Declared outputexecution_artifacts/artifacts/result_table.json
Declared output hashexecution_artifacts/artifacts/result_table.sha256.json

run_replication_bundle reads execution_artifacts/execution_artifact_manifest.json, executes the public_sum_metric over the allowed public input, compares the produced payload with execution_artifacts/artifacts/result_table.json, and verifies the declared hash in execution_artifacts/artifacts/result_table.sha256.json. The focused tests mutate each side of that relation, so the pass is not just a field-presence check.

Pipeline

JSON bundle authorityJSON bundle authorityReplication policyrequired fields + rubric axes+ forbidden claimsReplication policy required fields + rubric axes + forbidden claimsResearch replay rows2 synthetic paper bundlesResearch replay rows 2 synthetic paper bundlesExecution artifactsallowed input + metric spec +declared hashExecution artifacts allowed input + metric spec + declared hashLocal metric replaypublic_sum_metric overallowed inputLocal metric replay public_sum_metric over allowed inputSource-module manifest3 source pattern slices + 1exact-copy component bodySource-module manifest 3 source pattern slices + 1 exact-copy component bodyPublic execution trace2 metadata-only spansPublic execution trace 2 metadata-only spansNegative fixtures8 overclaim casesNegative fixtures 8 overclaim casesmetadata-only result recordscounts, refs, digests, scopeboundariesmetadata-only result records counts, refs, digests, scope boundariesScope limitno replication-success orpublishing-scope decisionScope limit no replication-success or publishing-scope decision

Source refs

JSON bundle authority
paper_module.research_replication_rubric_artifact_replay
Diagram source
flowchart TD bundle["JSON bundle authority paper_module.research_replication_rubric_artifact_replay"] policy["Replication policy required fields + rubric axes + forbidden claims"] replay["Research replay rows 2 synthetic paper bundles"] artifacts["Execution artifacts allowed input + metric spec + declared hash"] metric["Local metric replay public_sum_metric over allowed input"] source_manifest["Source-module manifest 3 source pattern slices + 1 exact-copy component body"] trace["Public execution trace 2 metadata-only spans"] negatives["Negative fixtures 8 overclaim cases"] result records["metadata-only result records counts, refs, digests, scope boundaries"] ceiling["Scope limit no replication-success or publishing-scope decision"] bundle --> policy policy --> replay replay --> artifacts artifacts --> metric source_manifest --> result records metric --> result records trace --> result records negatives --> result records result records --> ceiling

Evidence Contract

The policy file requires fourteen replay fields: paper_id, contribution_decomposition_ref, rubric_tree_ref, allowed_public_input_refs, scratch_repo_scaffold_ref, experiment_dag_ref, metric_script_refs, artifact_hash_refs, declared_artifact_hash_refs, grader_report_ref, cost_runtime_budget_ref, ablation_diff_ref, failure_taxonomy_ref, and cold_rerun_receipt_ref.

The policy also requires eight rubric axes: contribution decomposition, artifact replay, experiment DAG, metric script, grader alignment, budget boundary, failure taxonomy, and cold rerun. A replay row can therefore pass only as a structured evidence packet, not as a final answer or narrative report.

The exported runtime result record currently records the following evidence floor: two synthetic paper bundles, two replay rows, two artifact replay rows, two cold-rerun refs, two public execution-trace spans, four copied source modules, no findings, no error codes, source-module status pass, and input_mode: exported_research_replication_bundle. The fixture result record records all eight negative cases as observed.

Failure Modes and Guardrails

The expected negative cases are:

  • original-author code reuse
  • hidden-rubric leakage
  • report-only success
  • benchmark-performance overclaim
  • private paper or data body leakage
  • unbounded compute search
  • final-answer-only grading
  • undeclared artifact hash refs

The tests also cover source-module digest mismatch, local bundle body tamper, rehashing a swapped source module, wrong execution-artifact hashes, wrong artifact refs with matching hashes, report-only exported replays, metric perturbation, replay metric-script ref tamper, input perturbation, output body tamper, baked output swaps, and self-consistent input/output/hash rewrites. These cases make the component stronger than a field-presence linter: it rejects common ways to produce plausible but unaccountable replication prose.

Test Matrix

The focused regression file tests/test_research_replication_rubric_artifact_replay.py carries the source proof for this module.

ClassExamplesWhat it proves
Real-goodtest_research_replication_replay_observes_negative_cases, test_research_replication_exported_bundle_validates_runtime_shape, test_public_agent_execution_trace_refactor_builds_research_replay_spansThe fixture and exported bundle produce metadata-only result records, observe the required negative cases, execute the local metric replay, and build two public trace spans.
Real-badtest_research_replication_rejects_source_module_digest_mismatch, test_research_replication_rejects_bundle_local_source_module_body_tamper, test_research_replication_rejects_rehashed_source_module_body_swap, test_research_replication_rejects_metadata_only_bundleThe validator rejects broken source-module provenance, local bundle tamper, self-consistent source swaps, and metadata-only replay packets.
Perturbationtest_research_replication_rejects_wrong_execution_artifact_hash, test_research_replication_rejects_wrong_artifact_ref_with_matching_hash, test_research_replication_rejects_metric_perturbation, test_research_replication_rejects_valid_metric_script_body_swap, test_research_replication_rejects_replay_metric_script_ref_tamper, test_research_replication_rejects_replay_allowed_input_ref_tamper, test_research_replication_rejects_input_perturbation, test_research_replication_rejects_output_artifact_body_tamper, test_research_replication_rejects_output_artifact_baked_swap, test_research_replication_rejects_self_consistent_input_output_hash_rewriteMetric, input, output, hash, and replay-row mutations stay blocked even when the tampered bundle tries to preserve self-consistency.
Label forgerytest_research_replication_ignores_forged_negative_case_labels, test_research_replication_negative_case_id_follows_semantics_not_filename, test_research_replication_exported_bundle_ignores_self_declared_pass_labelsVerdicts are derived from semantic replay-row fields, not filenames, declared status labels, or expected error-code labels.
Result record economytest_research_replication_receipts_are_public_relative_and_secret_excluded, test_research_replication_bundle_card_reuses_fresh_receipt, test_research_replication_bundle_card_rejects_stale_receipt_after_input_mutationResult records remain public-relative and secret-excluded; command cards reuse fresh result records and reject stale ones after input mutation.

Realness Rungs

This module's realness is intentionally runged:

  1. Synthetic replay subjects. The two paper bundles are public synthetic examples, one ML-method replay and one computational-science replay.
  2. Real schema pressure. The required fields, rubric axes, declared hash roster, source-module manifest, and non-public-state exclusions are enforced by runtime code and focused tests.
  3. Local artifact replay. The exported bundle executes a local metric over allowed public input and compares produced output against declared artifact hashes.
  4. Source-open provenance. Three public source pattern bodies and one exact Python internal control body are copied into the bundle and digest-checked.
  5. metadata-only public result records. Result records carry counts, refs, digests, verdicts, trace spans, and scope boundaries while excluding private/live/provider material.

The rung contract matters: the component is more than generic documentation polish, but it is still not paper-replication authority.

Relation to Concepts, Principles, and Axioms

The JSON bundle binds the module to concept.research_and_science_replay_evidence_bundle. That concept is instantiated by the mechanism above and abides by AX-1, AX-6, AX-8, and AX-12 at the concept layer. The bundle's direct axiom refs are AX-1, AX-2, AX-5, and AX-7.

The bundle's principle refs are P-1, P-2, P-3, P-6, P-8, and P-15. For this component, the important principle pressure is:

  • Evidence must be structured and replayable before authority rises.
  • Result records and scope boundaries are part of the artifact, not commentary after it.
  • Projections stay below source authority; a readable paper module does not outrank the JSON bundle, mechanism row, runtime code, source-module manifest, or result records.
  • Typed refusal is part of the mechanism: benchmark, provider, public sharing, private-body, original-code, and unbounded-compute claims remain false unless another authority surface actually grants them.

The module depends on paper_module.agent_benchmark_integrity_anti_gaming_replay. Benchmark performance overclaim controls stay routed through that sibling instead of being reinvented here.

Reader Evidence Routing

Open evidence in this order:

  1. core/paper_module_capsules.json#paper_module.research_replication_rubric_artifact_replay for the source-authority bundle, scope limit, doctrine refs, generated projection statuses, and code loci.
  2. core/mechanism_sources.json#mechanism.research_replication_rubric_artifact_replay.validates_public_research_replication_replay for the validator command, exported-bundle validator command, focused regression, guardrails, input refs, result record refs, and upstream mechanisms.
  3. standards/std_microcosm_research_replication_rubric_artifact_replay.json for the first-wave standard, public/private boundary, source-body floor, and hard launch/public sharing/provider/source-file changes flags.
  4. examples/research_replication_rubric_artifact_replay/exported_research_replication_bundle/source_module_manifest.json for source-open body-floor counts and digest obligations.
  5. receipts/runtime_shell/demo_project/organs/research_replication_rubric_artifact_replay/exported_research_replication_bundle_validation_result.json for the current exported-bundle validation result.
  6. tests/test_research_replication_rubric_artifact_replay.py for negative cases, digest tamper tests, metric replay tests, public-relative result record tests, command-card economy, and source-body exclusion.

Prior Art Grounding

This replay scores a research artifact against a replication rubric. It follows artifact-evaluation practice from systems and machine-learning venues (ACM Artifact Review and Badging), which separates 'available' from 'functional' from 'reproduced'. Microcosm borrows the rubric-over-artifact shape; the result is fixture-bound replay evidence, not a reproducibility guarantee or a peer-review verdict.

Validation Result record Path

Focused runtime validation:

./repo-pytest tests/test_research_replication_rubric_artifact_replay.py -q --basetemp=/tmp/microcosm_research_replication_rubric_artifact_replay_pytest

Paper-module corpus validation:

./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

The runtime commands behind the result records are:

Scope boundary

Limitations
  • The two replay subjects are synthetic public paper bundles, not real external paper replications.
  • The metric replay is intentionally small: one public metric spec over one public input table with one declared output artifact. Its value is boundary enforcement, not benchmark substance.
  • Source-open proof is limited to three public source pattern body slices and one exact-copy public Python internal control body. It does not expose private source-root bodies, source notes, model-output data, account or browser state, browser UI state, or original-author code bodies.
  • A green run does not establish research truth, paper novelty, formal-result correctness, benchmark performance, external model service, launch-scope decision, or publishing-scope decision.
Authority Boundary

This component validates synthetic public replay metadata, local public artifact replay, source-module digest boundaries, public trace spans, negative-case coverage, and metadata-only result record shape. It does not claim actual paper replication success, benchmark performance, external model service, hidden-rubric access, original-author-code reuse, private paper/data export, unbounded compute search, final-answer-only grading, launch-scope decision, publishing-scope decision, source-file changes, product progress, or whole-system correctness.

Scope limit

This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.

It may not claim more than its bundle scope limit allows: Copied public source pattern provenance bodies, exact-copy public Python internal control body, metadata-only research-replication replay result records, public agent-execution trace spans, and fixture validation only; no actual paper replication success, benchmark performance claim, private paper/data body export, hidden-rubric export, external model access, unbounded compute search, original-author code reuse, launch-scope decision, publishing-scope decision, source-file changes, or product-progress evidence.

Source and projection details
Source-Open Body Floor

The source-module manifest at examples/research_replication_rubric_artifact_replay/exported_research_replication_bundle/source_module_manifest.json is the source-open body floor. It declares four copied modules:

  • research_replication_extracted_pattern_ledger_row_body_import, a public source pattern body slice.
  • research_replication_high_novelty_growth_receipt_body_import, a public source reconstruction result record slice.
  • research_replication_deterministic_pattern_order_body_import, a public deterministic pattern-order slice.
  • research_replication_replay_control_plane_source_body_import, an exact-copy public Python internal control body for this component.

Each row carries a source ref, target ref, material class, copied-body flag, result record-body exclusion flag, line count or byte count, and sha256 digest. The runtime verifies target digests; for the exact-copy Python row it also checks source currentness and source-target byte equality. Result records expose refs, counts, digests, and verdicts only. They do not embed source bodies.

Spatial World Model Counterfactual Simulation ReplayReplays six what-if robotics scenes to show what a spatial prediction claim is built from.4/5

Does This replay takes six made-up "what if" spatial scenes from robotics and self-driving-style settings (a forklift appears from behind an occlusion, a small pedestrian steps into a crosswalk, a gust pushes a drone off course, a shiny floor fools a robot into seeing free space, a stacked load shifts into a lane, and an oncoming car turns late) and shows each one as inspectable rows: the starting scene, the action taken, the predicted next scene, what changed between them, a sanity check, and honest notes on its limits (it is synthetic, not real-world ground truth). The rows show exactly what a spatial "world model" claim is built from, plus a checklist of dangerous claims it deliberately refuses to make.

Scope limit It validates only the declared public contract of synthetic spatial counterfactual-replay metadata rows. It is evidence for inspectable replay rows and limitation labels, not for real-world spatial accuracy, simulator-product validity, media-only authority, operational deployment, service distribution, or scope decisions.

Run
microcosm spatial-world-model-counterfactual-simulation-replay run-simulation-bundle --input examples/spatial_world_model_counterfactual_simulation_replay/exported_spatial_world_model_simulation_bundle --out receipts/runtime_shell/demo_project/organs/spatial_world_model_counterfactual_simulation_replay

EvidenceContract validatorevidence 4/5Real runtime result

research-workflowsforecasting

Source Design note · Source atlas

Paper module Spatial World Model Counterfactual Simulation Replay

Purpose

Spatial world-model demos are unusually easy to oversell. A plausible-looking video, or a row that simply asserts "the model predicted the next state correctly", can pass for understanding without anything having been checked. This component exists to answer one narrow question: does a declared spatial counterfactual row actually bind a source state, an event, and a predicted outcome that survive an independent recomputation, or is it just a shape that looks right?

The approach is the unusual part. The predicted actor count, transition delta, event label, and spawn cells are derived from the inputs (sensor-packet refs, consistency budget, topology), so a stale or hand-edited prediction no longer matches and the row blocks. The point is not a good simulator. The point is that a spatial-AI claim cannot pass on appearance alone: it has to agree with a recomputation a reader can audit in one screen.

Abstract

spatial_world_model_counterfactual_simulation_replay is a Microcosm component for checking spatial world-model counterfactual claims as metadata transitions, not as generated video, robotics control, AV simulation, geographic truth, or benchmark authority. The component validates six synthetic scene-state rows, six counterfactual replay rows, six predicted transition rows, eight forbidden-claim negative cases, and an exported source-module bundle whose result record stays metadata-only.

The technical claim is deliberately small: for each replay row, the runtime recomputes a deterministic toy gridworld next state from the declared scene state, counterfactual event, sensor-packet refs, consistency budget, topology ref, and limitation labels; it then compares that actual transition against the declared predicted state, transition diff, and oracle check. A green run proves the public replay rows are internally consistent and bounded by their scope limit. It does not establish real-world spatial accuracy, trained simulator quality, generated-video correctness, robot or AV operation, provider behavior, hosting, public sharing, launch-scope decision, or whole-system correctness.

Telos

World-model demos are easy to overstate because visual plausibility can hide whether any state transition was checked. This component makes the proof surface inspectable: a reader can see the scene-state ref, action trace, predicted-state ref, transition-diff ref, oracle-check ref, fidelity limit, limitation labels, negative cases, and source-module digest evidence before accepting any spatial counterfactual claim.

The useful result is not a better simulator. The useful result is an evidence spine that refuses to let a spatial-AI claim advance unless the public row binds input state, counterfactual event, predicted output, actual recomputation, and scope boundary boundary in one result record.

Mechanism

The positive fixture has six scene states and six matching replay rows: warehouse occlusion, crosswalk emergence, drone-corridor gust recovery, mobile robot reflective-floor detour, loading-dock pallet shift, and unprotected-turn late yield. Each row declares a source scene-state ref, action-trace ref, counterfactual event, predicted-state ref, transition-diff ref, oracle-state-check ref, two public sensor-packet refs, a rare-event label, a fidelity-limit label, limitation labels, and explicit false values for private video, raw sensor export, live operation, geography, simulator-product, generated-video-only, benchmark, and launch claims.

Runtime transition checking happens in _state_transition_analysis:

  1. The component resolves each replay to exactly one state-transition row.
  2. It builds an 8 x 8 toy gridworld from the source scene's actor count and topology ref.
  3. It maps the counterfactual event to a deterministic event action such as new_dynamic_actor.
  4. It recomputes the actual next state and transition diff from the input row.
  5. It compares predicted actor count, transition delta, event label, spawn cell or cells, predicted-state ref, diff ref, oracle-check ref, and metadata-only result record status.

The input-driven part matters. Actor-count delta is not copied from the expected fixture. It is recomputed as:

min(
  base_event_actor_count_delta
  + max(0, sensor_packet_count - max_timestep_lag - base_event_actor_count_delta),
  4,
  free_cell_count
)

Spawn cells are also input-derived: the runtime hashes the event, replay id, scene-state ref, topology ref, sensor-packet refs, consistency budget, limitation labels, and source actor count, then walks the bounded grid from the declared event cell. This makes the row sensitive to real input changes while remaining small enough to audit.

Transition Evidence

The current fixture proves a narrow but useful invariant: all six declared predicted states match the runtime's actual toy-gridworld step. The focused test expects:

  • scene_state_count == 6
  • replay_count == 6
  • state_transition_count == 6
  • predicted_state_body_count == 6
  • deterministic_simulation_pass_count == 6
  • gridworld_step_count == 6
  • predicted_actual_match_count == 6
  • transition_diff_count == 6
  • oracle_state_check_count == 6
  • sensor_packet_ref_count == 12

Those counts are technical evidence only because the runtime recomputes the state transition before accepting them. The result record cannot be read as a learned world-model score; it is a public replay consistency check over synthetic metadata and copied source-module digests.

Real-Bad Mutation Contract

The regression suite includes deliberately bad mutations that show the proof is not just shape validation:

  • If a transition row changes actor_count_delta from the recomputed value, run_simulation_bundle blocks with SPATIAL_STATE_TRANSITION_SIMULATION_MISMATCH.
  • If the predicted state misses the gridworld step, the transition row records predicted_state_actor_count_mismatch while the recomputed actual state still shows the expected gridworld execution.
  • If a replay gains an extra sensor-packet ref, the recomputed actor delta moves from 1 to 2. The stale expected transition blocks until the predicted actor count, actor delta, and spawn cells are updated to match the new actual transition.
  • If the source scene actor count and topology ref change, the recomputed source and spawn-cell state moves. The stale predicted state blocks until the transition row is updated.
  • If a source-module manifest tries to place copied body text inside a result record, the source-module summary blocks with SPATIAL_SOURCE_BODY_TEXT_IN_RECEIPT_FORBIDDEN and SPATIAL_SOURCE_MODULE_BODY_TEXT_IN_RECEIPT_FORBIDDEN.

The negative payload cases are similarly typed: private video export, raw sensor export, live robot or AV operation, real-world location claims, simulator-product claims, generated-video-only authority, geographic accuracy claims, and benchmark-score claims without state-diff result records all have explicit forbidden-code coverage.

Shape

yesnonoyesScene-state rowactor count + topologyScene-state row actor count + topologyCounterfactual replay rowevent + sensor refs + budgetCounterfactual replay row event + sensor refs + budgetDeterministic toy gridworldstep8x8 bounded recomputationDeterministic toy gridworld step 8x8 bounded recomputationActual next stateactor delta + spawn cellsActual next state actor delta + spawn cellsDeclared predicted statetransition diff + oraclecheckDeclared predicted state transition diff + oracle checkActual matches declaredtransition?Actual matches declared transition?metadata-only pass resultrecordcounts + refs + digestsmetadata-only pass result record counts + refs + digestsTyped mismatch findingblocked statusTyped mismatch finding blocked statusForbidden payload or claim?Forbidden payload or claim?
Diagram source
flowchart TD Scene["Scene-state row actor count + topology"] --> Replay["Counterfactual replay row event + sensor refs + budget"] Replay --> Step["Deterministic toy gridworld step 8x8 bounded recomputation"] Step --> Actual["Actual next state actor delta + spawn cells"] Replay --> Expected["Declared predicted state transition diff + oracle check"] Actual --> Compare{"Actual matches declared transition?"} Expected --> Compare Compare -->|yes| Result record["metadata-only pass result record counts + refs + digests"] Compare -->|no| Finding["Typed mismatch finding blocked status"] Replay --> Boundary{"Forbidden payload or claim?"} Boundary -->|no| Result record Boundary -->|yes| Finding

This diagram is a reader map for the runtime proof. The generated doctrine lattice Mermaid remains the bundle-derived edge proof.

Reader Evidence Routing

Read this page from source authority outward:

  1. Open core/paper_module_capsules.json::paper_modules[53:paper_module.spatial_world_model_counterfactual_simulation_replay] for the JSON bundle and scope limit.
  2. Open paper_modules/spatial_world_model_counterfactual_simulation_replay.json for generated relationship edges, Mermaid status, Atlas status, and source_authority: json_capsule.
  3. Inspect src/microcosm_core/organs/spatial_world_model_counterfactual_simulation_replay.py, especially _state_transition_analysis, _gridworld_step, _gridworld_actor_count_delta, _gridworld_spawn_cells, _replay_policy_findings, and _source_module_manifest_result.
  4. Inspect fixture inputs under fixtures/first_wave/spatial_world_model_counterfactual_simulation_replay/input and exported-bundle inputs under examples/spatial_world_model_counterfactual_simulation_replay/exported_spatial_world_model_simulation_bundle.
  5. Inspect tests/test_spatial_world_model_counterfactual_simulation_replay.py for the positive replay, public-relative result record, source-module import, body-text rejection, transition-delta mutation, predicted-state mutation, input-perturbation, scene-perturbation, and fresh-card reuse contracts.

Runtime Command

microcosm spatial-world-model-counterfactual-simulation-replay run-simulation-bundle --input examples/spatial_world_model_counterfactual_simulation_replay/exported_spatial_world_model_simulation_bundle --out receipts/runtime_shell/demo_project/organs/spatial_world_model_counterfactual_simulation_replay

The runtime shell also exposes the compressed lens at:

microcosm spatial-simulation

Prior Art Grounding

This replay exercises a spatial world model under counterfactual interventions. It is grounded in the world-models line of work (Ha and Schmidhuber, World Models), where an agent learns a compressed model of its environment it can roll forward under hypothetical actions. Microcosm borrows the counterfactual-rollout shape over synthetic metadata; the result is fixture-bound replay evidence, not robot or AV operation, real-world geography, or a calibrated simulator.

Validation Result record Path

Run from microcosm-substrate:

The expected bundle projection is Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, and 20 generated relationship edges. These checks prove the public synthetic replay and source-module import boundary only; they do not validate real geography, robot or AV operation, simulator-product claims, benchmark claims, public sharing, hosting, or launch.

Scope boundary

Public Boundary

The exported bundle may include copied Station geometry source bodies as public source-open material, but result records carry refs, digests, counts, and verdicts only. They must not carry private video bodies, raw sensor payloads, GPS trace bodies, model-output data, account or browser state, account secrets, or live-access material.

The scope limit is therefore:

  • allowed: synthetic scene-state refs, action-trace refs, predicted-state refs, transition-diff refs, oracle-check refs, source-open public sensor-packet refs, rare-event labels, fidelity-limit labels, limitation labels, source-module digests, negative-case result records, and metadata-only validation result records;
  • not allowed: simulator-product authority, private video export, raw sensor export, live robot or AV operation, real-world geography claims, benchmark claims, external model access, hosting, public sharing, launch-scope decision, private-system equivalence, or whole-system correctness.
Limitations

The dynamics are toy dynamics. The 8 x 8 gridworld models actor counts and spawn cells from public metadata; it does not model perception, control, physics, sensor calibration, camera geometry, lidar, maps, vehicle dynamics, human behavior, or material truth. The synthetic events are useful because they force state-diff accounting, not because they approximate the real world.

The fixture is also finite. It covers six public replay rows, six transition rows, two sensor refs per replay, eight negative claim families, and three copied source modules. It does not establish all possible spatial counterfactuals, full secret absence outside the scanner envelope, complete robotics safety, simulator correctness, or future fixture coverage.

The source-open body floor is limited to exact copied Station geometry guardrail bodies named by the source-module manifest and verified by digest. That does not certify private source-root equivalence, private video or raw sensor availability, account or browser state, provider behavior, hidden GPS trace bodies, live-access material, or launch-scope decision.

Scope limit

This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.

It may not claim more than its bundle scope limit allows: Declared public synthetic spatial counterfactual-replay metadata and source-module import evidence only; no robot or AV operation, real-world geographic accuracy, simulator product validation, generated-video authority, benchmark claims, external model access, hosting, launch-scope decision, publishing-scope decision, or whole-system correctness.

Materials Chemistry Closed Loop Lab Safety ReplayReplays a self-driving lab loop as records, with safety gates and no real chemicals, robot, or lab.3/5

Does Takes the pattern of a "self-driving materials lab" (propose a candidate material, run safety screens, simulate an assay, then decide what to try next) and replays it locally as inspectable records: every step, its safety gate, its simulated result, and the decision that followed, plus the pre-recorded points where such a loop would fail and where it would restart. It makes the workflow's structure visible, all on a simulator-only fixture, so how the loop is wired is traceable without any real lab, real chemicals, or real robot ever being involved.

Scope limit It documents projection and replay mechanics only and excludes wetlab protocols, hazardous synthesis steps, reagent amounts, controlled/bioactive targets, robot commands, live assay data, discovery claims, benchmark claims, external model access, or any judgment of domain/chemical correctness.

Run
microcosm materials-chemistry-closed-loop-lab-safety-replay run-lab-bundle --input examples/materials_chemistry_closed_loop_lab_safety_replay/exported_materials_lab_safety_bundle --out receipts/runtime_shell/demo_project/organs/materials_chemistry_closed_loop_lab_safety_replay

EvidenceComputed projectionevidence 3/5Source-faithful refactor

research-workflowsforecasting

Source Design note · Source atlas

Paper module Materials Chemistry Closed-Loop Lab-Safety Replay

Purpose

"Closed-loop materials lab" is one of the easier phrases to overclaim. A fixture can look like an autonomous discovery loop while carrying nothing that should be spoken aloud: wetlab steps, reagent quantities, a controlled or bioactive target, robot commands, or a flat assertion that some material was discovered. This component exists to sit in front of that language and answer one question: is a closed-loop-lab-shaped fixture safe and grounded enough to be talked about at all, in a simulator-only frame, before any lab claim is allowed?

Its real name inside the runtime is the materials_chemistry_artifact_safety_refusal_validator. The public-promise name "closed-loop replay" was deliberately reframed because nothing here executes a wetlab loop or commands a robot. The unusual part is that the component does not trust the fixture's own conclusion. A normal replay would read a declared "selected candidate" label and report it. This validator instead recomputes the winner from public numbers, weighting an assay proxy, an active-learning score, and a safety gate, then treats a mismatch between that recomputed pick and the declared label as a failure rather than a footnote. A stale or flattering label cannot pass.

The second discipline is refusal as a first-class result. Eight categories of dangerous or overclaiming content each have a named forbidden code, and a fixture that smuggles one in is expected to be refused, not quietly accepted. The verdict is computed from public simulator rows, safety fields, source-module manifests, replay-graph status, negative-case coverage, and a sentinel scan, and it stays inside a simulator-only ceiling. It is a safety and refusal check, not a laboratory.

Abstract

materials_chemistry_closed_loop_lab_safety_replay is a public, simulator-only replay validator for materials-lab language. It does not claim a material discovery, a wetlab protocol, a robot loop, or a benchmark. It checks whether a closed-loop-lab shaped public fixture has enough evidence to be talked about at all: candidate material refs, safety-screen refs, simulator-only assay rows, active-learning decisions, a Lab/Evolve replay graph, source-module manifest digests, negative-case refusals, metadata-only result records, and an explicit scope limit.

The technical claim is a numeric verdict proof boundary. A passing run must recompute the selected candidate from score-backed fixture rows rather than trusting a declared label. The baseline fixture contains four candidates and selects mat_polymer_membrane_001 with score 0.917; perturbation tests prove that stale labels, missing score rows, out-of-range scores, and safety-gate failures block the verdict.

Mechanism

The runtime locus is src/microcosm_core/organs/materials_chemistry_closed_loop_lab_safety_replay.py. The relevant entrypoints are run for first-wave fixture validation and run_lab_bundle for exported-bundle validation. The validator loads a replay policy, candidate rows, experiment DAG rows, simulator assays, active-learning decisions, optional source-module manifests, and eight forbidden negative-case fixtures.

The sign-off rule is deliberately small:

  1. Positive rows must link candidates, experiments, assays, safety screens, active-learning decisions, failure taxonomy refs, and cold replay refs.
  2. Negative cases must be observed and refused.
  3. Numeric replay must recompute the selected candidate from public numbers.
  4. Source-module imports must verify copied bodies without putting bodies into result records.
  5. The safety verdict must remain inside the simulator-only scope limit.
stale label or gate failmatchyesnonumeric policy + expectedlabelnumeric policy + expected label4 candidate refs + safetygates4 candidate refs + safety gates4 public assay proxy values4 public assay proxy values4 active-learning scores4 active-learning scoresnumeric replayweighted recompute of thewinnernumeric replay weighted recompute of the winnerrecomputed pick ==declared label?safety gate >= 0.70?recomputed pick == declared label? safety gate >= 0.70?negative-case fixtures8 forbidden lab classesnegative-case fixtures 8 forbidden lab classesany forbiddenMATERIALS_*_FORBIDDENobserved?any forbidden MATERIALS_*_FORBIDDEN observed?4 copied public body modules4 copied public body modulesLab/Evolve replay graphreplay casesLab/Evolve replay graph replay casessafety verdictsafety verdictAcceptedAcceptedBlockedBlockedmetadata-only result recordscounts, digests, findingsmetadata-only result records counts, digests, findingsscope limitno wetlab / no discovery / nolaunchscope limit no wetlab / no discovery / no launch

Source refs

numeric policy + expected label
replay_policy.json
4 candidate refs + safety gates
candidate_materials.json
4 public assay proxy values
simulator_assays.json
4 active-learning scores
active_learning_decisions.json
4 copied public body modules
source_module_manifest.json
Accepted
public_safe_simulator_replay_accepted
Blocked
blocked_public_safety_boundary
Diagram source
flowchart TD policy["replay_policy.json numeric policy + expected label"] candidates["candidate_materials.json 4 candidate refs + safety gates"] assays["simulator_assays.json 4 public assay proxy values"] decisions["active_learning_decisions.json 4 active-learning scores"] numeric["numeric replay weighted recompute of the winner"] labelcheck{"recomputed pick == declared label? safety gate >= 0.70?"} negatives["negative-case fixtures 8 forbidden lab classes"] refuse{"any forbidden MATERIALS_*_FORBIDDEN observed?"} manifest["source_module_manifest.json 4 copied public body modules"] replay["Lab/Evolve replay graph replay cases"] verdict["safety verdict"] accepted["public_safe_simulator_replay_accepted"] blocked["blocked_public_safety_boundary"] result record["metadata-only result records counts, digests, findings"] ceiling["scope limit no wetlab / no discovery / no launch"] policy --> numeric candidates --> numeric assays --> numeric decisions --> numeric numeric --> labelcheck labelcheck -->|stale label or gate fail| blocked labelcheck -->|match| verdict negatives --> refuse refuse -->|yes| blocked refuse -->|no| verdict manifest --> replay replay --> verdict verdict --> accepted accepted --> result record blocked --> result record result record --> ceiling

Numeric Assay And Verdict Evidence

The replay policy declares:

  • selection rule: max_weighted_public_assay_active_learning_and_safety_gate_score
  • minimum safety gate: 0.70
  • expected selected candidate: mat_polymer_membrane_001
  • weighted score: 0.45 * public_assay_proxy_value + 0.35 * public_active_learning_score + 0.20 * public_safety_gate_score

The source fixture binds four score-backed rows:

CandidateSafety gateAssay proxyActive-learningWeighted scoreDecision / action
mat_polymer_membrane_0010.940.920.900.917decision_membrane_001 / simulate_assay
mat_solid_electrolyte_0020.910.840.810.8445decision_electrolyte_002 / update_surrogate_model
mat_catalyst_support_0030.850.780.740.780decision_support_003 / choose_next_simulation
mat_sorbent_surface_0040.880.700.660.722decision_sorbent_004 / screen_candidate

The focused regression test_materials_chemistry_numeric_replay_recomputes_verdict_from_fixture_numbers proves the pass case: status pass, verified_numeric_row_count == 4, selected candidate mat_polymer_membrane_001, selected decision decision_membrane_001, selected next action simulate_assay, score 0.917, realness rung R3, and verdict basis recomputed_from_public_assay_active_learning_and_safety_gate_fixture_numbers.

The verifier does not use expected labels for selection. Expected labels are checked only after the selected row is recomputed from candidate, assay, and decision content.

Test Matrix

ClassEvidenceExpected verdict
Real-good fixtureBaseline first-wave fixture with four candidate, assay, and decision rowspublic_safe_simulator_replay_accepted; numeric replay pass; selected candidate mat_polymer_membrane_001; score 0.917
Real-good source body floorExported bundle manifest with four copied modules and zero manifest findingssource_module_manifest_status: pass; verified_module_count: 4; result records remain metadata-only; current checked-in bundle still needs refreshed numeric rows before it is a full exported-bundle pass
Real-bad lab safetyControlled/bioactive targets, hazardous synthesis flags, mismatched safety refs, robot command, account secrets, private notebooks, or discovery claimsblocked_public_safety_boundary with the relevant MATERIALS_*_FORBIDDEN or positive-linkage finding
Real-bad numeric missingnessScore-backed rows removed while numeric policy is activeMATERIALS_NUMERIC_REPLAY_POLICY_REQUIRES_SCORE_BACKED_ROWS; verified_numeric_row_count: 0
Real-bad numeric requiredNumeric policy removed and score rows absentMATERIALS_NUMERIC_REPLAY_REQUIRED; realness rung blocked
Real-bad stale labelPolicy declares mat_catalyst_support_003 while recomputation selects mat_polymer_membrane_001MATERIALS_NUMERIC_REPLAY_EXPECTED_LABEL_STALE
Real-bad score rangeSafety, assay, or active-learning score outside [0, 1]MATERIALS_NUMERIC_REPLAY_SCORE_OUT_OF_RANGE
Perturbation, low safety gateMembrane safety gate lowered to 0.52Computed pick moves to mat_solid_electrolyte_002, verdict blocks, and findings include stale label plus MATERIALS_NUMERIC_REPLAY_SAFETY_GATE_FAILED
Perturbation, moved valid pickSorbent raised to safety 0.93, assay 0.98, active learning 0.98, and policy expectation updatedNumeric replay passes, selected candidate mat_sorbent_surface_004, selected action screen_candidate, score 0.970
Perturbation, moved pick without expectation updateExported bundle recomputes sorbent as the winner while policy still expects membraneSource manifest stays pass, but numeric replay blocks with MATERIALS_NUMERIC_REPLAY_EXPECTED_LABEL_STALE

These cases are source/test-backed by tests/test_materials_chemistry_closed_loop_lab_safety_replay.py. Fresh local first-wave result record output is the authority for current numeric replay; older archived first-wave result records and the checked-in exported bundle predate the numeric replay rows and should not be read as the numeric proof. The exported bundle still needs refreshed numeric rows before it is a full exported-bundle pass.

Evidence Routes

  • JSON bundle: core/paper_module_capsules.json::paper_module.materials_chemistry_closed_loop_lab_safety_replay
  • Generated JSON instance: paper_modules/materials_chemistry_closed_loop_lab_safety_replay.json
  • Mechanism source: core/mechanism_sources.json::mechanism.materials_chemistry_closed_loop_lab_safety_replay.validates_public_materials_lab_safety_replay
  • Runtime: src/microcosm_core/organs/materials_chemistry_closed_loop_lab_safety_replay.py
  • Domain standard: standards/std_microcosm_materials_chemistry_closed_loop_lab_safety_replay.json
  • Paper-module standard: standards/std_microcosm_paper_module.json
  • Fixture input: fixtures/first_wave/materials_chemistry_closed_loop_lab_safety_replay/input
  • Exported bundle: examples/materials_chemistry_closed_loop_lab_safety_replay/exported_materials_lab_safety_bundle
  • Focused tests: tests/test_materials_chemistry_closed_loop_lab_safety_replay.py

Prior Art Grounding

This replay exercises a closed-loop materials and chemistry lab controller with a safety gate over synthetic experiments. It is grounded in the self-driving laboratory literature, where a propose-run-measure loop is paired with safety interlocks that can refuse an unsafe experiment. Microcosm borrows the loop-plus-safety-gate shape on a simulator; the result is metadata-only simulator evidence, not a real laboratory controller, chemical-safety authority, or launch.

Validation Result record Path

Run the current runtime proof from the Microcosm root:

Inspect the exported source-body bundle. Until the exported fixture is refreshed with score-backed numeric rows, this command may return a blocked numeric verdict while still proving the manifest/body-floor boundary:

cd microcosm-substrate
PYTHONPATH=src ../repo-python -m microcosm_core.organs.materials_chemistry_closed_loop_lab_safety_replay run-lab-bundle --input examples/materials_chemistry_closed_loop_lab_safety_replay/exported_materials_lab_safety_bundle --out /tmp/microcosm_materials_chemistry_lab_safety_bundle

Run the focused regression suite:

cd microcosm-substrate
PYTHONPATH=src ../repo-pytest tests/test_materials_chemistry_closed_loop_lab_safety_replay.py -q
cd microcosm-substrate
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

This lane intentionally does not run scripts/build_doctrine_projection.py --write; generated projections, atlas cards, and shared bundle surfaces belong to their owner lanes.

Scope boundary

Limitations

This module is a replay validator, not a laboratory. It does not synthesize materials, provide wetlab instructions, control robots, rank real compounds, validate live assay data, authorize external model access, or establish a discovery benchmark. Fixture numbers are public replay coordinates for a safety-gated contract; they are not experimental measurements.

The validator can prove local consistency across fixture rows, exported source-module manifests, replay graph records, negative-case checks, sentinel scans, numeric recomputation, and metadata-only result records. It cannot prove chemical safety, regulatory suitability, lab readiness, deployment readiness, public-site freshness, publishing-scope decision, or launch-scope decision.

Scope limit

This module may claim that Microcosm has a public, source-faithful, simulator-only replay contract that checks candidate refs, safety-screen refs, simulator-only assay rows, active-learning decisions, numeric replay, failure-taxonomy refs, cold replay refs, replay cases, source bundle hashes, copied source-module digests, negative-case result records, metadata-only result record policy, and scope limits.

It must not claim wetlab operation, material synthesis, robot control, hazardous synthesis guidance, reagent quantities, controlled or bioactive targeting, live assay data, private lab notebook export, live account secrets, external model service, material discovery, benchmark performance, safety certification, public sharing, hosting, launch-scope decision, source-file changes, or product-progress authority.

Scope limit

This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.

It may not claim more than its bundle scope limit allows: Copied public Lab/Evolve source/control/result record/standard bodies, metadata-only simulator-only fixture result records, runtime bundle result records, and artifact safety/refusal validation only; no wetlab execution, hazardous synthesis guidance, reagent quantity, controlled or bioactive target, live assay, robot command, private lab notebook, external model access, discovery claim, benchmark claims, launch-scope decision, publishing-scope decision, or product-progress evidence.

Source and projection details
Source-Open Body Floor

The exported bundle at examples/materials_chemistry_closed_loop_lab_safety_replay/exported_materials_lab_safety_bundle contains a source_module_manifest.json with four copied bodies:

Module idMaterial classRole
materials_lab_evolve_failure_replay_specimen_body_importpublic_macro_tool_bodydeterministic replay graph construction, failure classification, restart-point selection, source-bundle hashing, and result record boundaries
materials_lab_evolve_replay_graph_body_importpublic_macro_control_plane_bodyreplay graph body, restart points, source bundles, global teachings, and public claim boundary
materials_lab_evolve_receipt_body_importpublic_macro_receipt_bodyreplay result record body proving the source evidence shape without moving private material into result records
laboratory_standard_body_importpublic_standard_bodypublic laboratory standard floor for the replay

The bundle validator checks module_count: 4, verified_module_count: 4, source_module_manifest_status: pass, metadata-only result record policy, and zero source module findings. The current checked-in exported bundle is still a source-body floor, not the final numeric exported-bundle proof: run_lab_bundle requires refreshed score-backed numeric rows before it can pass as a full exported-bundle verdict. Focused tests inject those rows to prove the exported-bundle numeric path. The remaining bundle and result record refresh is tracked as outstanding work.

The validator also records the blocked source-open boundary for codex/doctrine/paper_modules/lab_oracle_evolve_pipeline.md: that source paper module cannot be imported as an exact body while raw operator-anchor language remains in scope.

Mechanistic Interpretability Circuit Attribution ReplayRecords which model features drove an answer, each tied to checkable evidence.4/5

Does This takes the workflow of "tracing which internal features inside a model drove an answer" and turns it into inspectable local records. Each row links feature ids to a machine-readable graph of connections, records the before/after results of poking those features (the causal-intervention deltas), notes how far the explanation can be trusted (its faithfulness limit), and points to where the underlying evidence lives. The records show that every interpretability claim is backed by checkable evidence, and that they deliberately hold no model weights, no raw activations, no prompts, and no hidden reasoning — they carry only refs, digests, counts, and verdicts.

Scope limit It validates only the declared public circuit-attribution runtime-result record contract. It excludes model-transparency product claims, live model access, export of private weights/raw activations/proprietary prompts/hidden chain-of-thought, external model access, benchmark claims, or public sharing/launch.

Run
microcosm mechanistic-interpretability-circuit-attribution-replay run-attribution-bundle --input examples/mechanistic_interpretability_circuit_attribution_replay/exported_circuit_attribution_bundle --out receipts/runtime_shell/demo_project/organs/mechanistic_interpretability_circuit_attribution_replay

EvidenceContract validatorevidence 4/5Real runtime result

research-workflowsforecastingprovider operations

Source Design note · Source atlas

Paper module Mechanistic Interpretability Circuit Attribution Replay

Purpose

Interpretability writing is unusually easy to overstate. A named feature can read like understanding, a graph picture can read like a discovered circuit, and a small local script can read like access to a real model. This component exists to hold one kind of claim to a smaller, checkable size. It answers a single question: before Microcosm lets a circuit-attribution story stand as public evidence, does the story survive a deterministic replay rather than being taken on trust?

The part worth noticing is how narrow the proof is, and how that narrowness is the point. The component does not attempt to interpret a trained model. It carries a tiny two-layer toy transformer with weights declared in the fixture, recomputes its forward pass, gradient attribution, and per-feature ablation, and then compares the recomputed top feature against the feature the fixture claims. A row passes only when the declared winner still matches after recomputation. Perturb the toy weights and leave the old claim in place and the row is rejected, because the recomputed answer has moved while the prose has not. That is the failure mode the component is built to catch: an interpretability statement that was once true of its inputs but no longer is.

Around that recomputation sit three further gates. Graph evidence must be machine-readable and traversable from declared sparse features to public error nodes, so a screenshot cannot stand in for a circuit. Transparency language needs a causal-intervention reference and faithfulness language needs an explicit limit, so the strongest words carry the strongest evidence requirements. Private weights, raw activations, proprietary prompts, and hidden reasoning are kept out of every result record. What the component produces is an accounting result record for a public fixture, not a transparency tool for any real model.

Abstract

mechanistic_interpretability_circuit_attribution_replay is a public Microcosm component that validates whether circuit-attribution claims are safe to represent as result record evidence. It is not a model-transparency product and does not inspect a live provider model. The component checks a fixture and exported bundle for machine-readable feature graph rows, causal-intervention references, faithfulness limits, source-module digest evidence, negative cases, and a small input-coupled toy-transformer replay.

The technical proof is deliberately modest. A replay passes only when its declared circuit-attribution story agrees with recomputed toy-transformer forward, gradient, and ablation winners; when graph evidence is traversable from public sparse features to public error nodes; when public result records omit private or raw bodies; and when the source-open body floor is backed by copied, source source modules with matching digests. A stale declared top feature is disconfirmed by perturbing the input fixture while leaving the old claim in place.

Problem Statement

Interpretability prose is easy to overclaim: a feature name can sound like transparency, a graph screenshot can sound like a circuit, and a local fixture can sound like model access. This module makes the public claim smaller and more testable. It asks: before Microcosm lets a circuit-attribution story become public evidence, can the story survive a deterministic replay membrane that checks structure, causality refs, source provenance, and explicit scope boundaries?

The answer is local and result record-scoped. Microcosm may claim public circuit-attribution replay accounting for this fixture and exported bundle. It may not claim live model internals, private weights, raw activations, proprietary prompts, hidden reasoning, provider behavior, benchmark claims, publishing-scope decision, hosting, launch-scope decision, or whole-system interpretability correctness.

The technical contribution is therefore an accounting membrane, not a new interpretability algorithm. The membrane turns an interpretability-shaped fixture into a pass/fail public result record by requiring all claim-bearing rows to cross four gates:

GateAcceptsRejects
Replay schemaFeature ids, graph rows, causal refs, sufficiency and faithfulness limits, contradiction refs, cold-replay refs, target refs, and metadata-only result record flags.Missing required fields, unverifiable feature labels, screenshot-only graph evidence, transparency claims without causal-intervention refs, and faithfulness claims without limits.
Graph traversalMachine-readable nodes and edges with a path from declared sparse features to public error nodes.Disconnected edges and decorative constant-delta edge-weight sequences.
Toy recomputationFixture-coupled forward, gradient, ablation, weight digest, and declared-winner comparison.Internal default toy specs, stale declared winners, or uncoupled cached result records.
Source/body boundaryCopied source bodies with digest, class, anchor, and metadata-only result record checks.Private weights, raw activations, proprietary prompt bodies, hidden reasoning, model-output data, body text in result records, and launch-scope decision.

Technical Mechanism

JSON bundleJSON bundleFixture / exported bundlefeature catalog, replay rows,toy-transformer specFixture / exported bundle feature catalog, replay rows, toy-transformer specPolicy gatesrequired fields, forbiddenprivate/raw exports,faithfulness limitsPolicy gates required fields, forbidden private/raw exports, faithfulness limitsGraph analyzerfeature ids -> edges ->public error nodesGraph analyzer feature ids -> edges -> public error nodesToy-transformer replayforward + gradient + ablationrecomputationToy-transformer replay forward + gradient + ablation recomputationSource-open body floorcopied source bodies + digestchecksSource-open body floor copied source bodies + digest checksmetadata-only result recordsrefs, digests, counts,verdictsmetadata-only result records refs, digests, counts, verdictsScope limitpublic replay accounting onlyScope limit public replay accounting only

Source refs

JSON bundle
paper_module.mechanistic_interpretability_circuit_attribution_replay
Diagram source
flowchart TD Bundle["JSON bundle paper_module.mechanistic_interpretability_circuit_attribution_replay"] Fixture["Fixture / exported bundle feature catalog, replay rows, toy-transformer spec"] Policy["Policy gates required fields, forbidden private/raw exports, faithfulness limits"] Graph["Graph analyzer feature ids -> edges -> public error nodes"] Toy["Toy-transformer replay forward + gradient + ablation recomputation"] Source["Source-open body floor copied source bodies + digest checks"] Result records["metadata-only result records refs, digests, counts, verdicts"] Ceiling["Scope limit public replay accounting only"] Bundle --> Fixture Fixture --> Policy Fixture --> Graph Fixture --> Toy Fixture --> Source Policy --> Result records Graph --> Result records Toy --> Result records Source --> Result records Result records --> Ceiling

The component has four coupled checks:

  1. Replay policy validation: each positive row must carry toy prompt refs, sparse feature ids, machine-readable graph nodes and edges, replacement-model approximation scores, causal inhibition and injection refs, causal-intervention result record refs, sufficiency labels, faithfulness limits, contradiction-case refs, cold-replay refs, target refs, and body_in_receipt: false.
  2. Graph analysis: _graph_analysis_for_replay verifies that graph edges resolve to declared nodes and that at least one path exists from the row's sparse feature ids to a public error node. _weight_sequence_analysis rejects simple decorative arithmetic edge-weight sequences across replay rows.
  3. Toy-transformer replay: _toy_transformer_attribution_runtime recomputes a pure-Python two-layer toy transformer from fixture-provided token_ids, embeddings, layer1, layer2, and target_logit_index, then compares the recomputed top attribution and ablation features against declared winners.
  4. Source/body boundary: _source_module_manifest_result, _source_open_body_import_summary, scan_paths, _write_receipts, and result_card verify copied source bodies while keeping result record payloads metadata-only and public-safe.

Implementation Contract

Runtime locusRole in the mechanismEvidence surface
runFirst-wave fixture validator. It loads the public input directory, negative cases, source-module manifest, secret-exclusion policy, and sign-off output.tests/test_mechanistic_interpretability_circuit_attribution_replay.py::test_mechanistic_interpretability_circuit_attribution_replay_observes_negative_cases
run_attribution_bundleExported-bundle validator for the runtime-shell and public demo path. It uses the same replay gates without requiring first-wave negative-case files.test_mechanistic_interpretability_exported_bundle_validates_runtime_shape
_replay_policy_findingsRow-level policy checker for required fields and forbidden interpretability overclaims.Negative fixtures in fixtures/.../input/* and EXPECTED_NEGATIVE_CASES
_graph_analysis_for_replay / _weight_sequence_analysisCircuit-graph shape checks: resolvable nodes/edges, feature-to-error paths, and non-decorative weights.test_mechanistic_interpretability_rejects_disconnected_graph_edges and test_mechanistic_interpretability_rejects_decorative_weight_sequences
_toy_transformer_attribution_runtimePure-Python recomputation harness for target logit, attribution scores, ablation deltas, declared winners, and fixture digest.Toy runtime, stale-claim, perturbation, and cache-reuse tests
_source_module_manifest_result / _source_open_body_import_summarySource-open body floor: copied source body checks with digest, class, anchor, and metadata-only result record constraints.Source-module exact-import and body-text rejection tests
_write_receipts / result_cardPublic output membrane. Result records and cards carry refs, digests, counts, omitted-payload flags, and scope limits rather than source bodies or private state.Result record-boundary and card-reuse tests

Toy-Transformer Attribution Mechanism

The toy-transformer runtime is intentionally small enough to audit. The fixture in fixtures/first_wave/mechanistic_interpretability_circuit_attribution_replay/input/attribution_replays.json declares:

  • token_ids: [0, 1, 2]
  • a three-row embedding table over two dimensions
  • a two-by-three first layer
  • a three-by-two second layer
  • target_logit_index: 1
  • expected top feature by attribution and ablation: toy_hidden_feature_1

The runtime computes token embeddings, averages them into a context vector, applies the first layer, applies a tanh hidden activation, applies the second layer, and reads the target logit. It then computes activation-gradient scores for each hidden feature, using the analytic tanh derivative 1 - h^2 so the attribution score is grounded in the same forward pass rather than a separate estimate. It also ablates each hidden feature in turn, zeroing it and re-reading the target logit, to measure the output delta that feature is responsible for. The fixture currently produces target logit 0.044176; both the gradient attribution and the ablation delta select toy_hidden_feature_1, and the row passes only because those two independent paths agree with each other and with the fixture's declaration.

The important point is not that this is a serious transformer. It is a deterministic proof harness for the public replay claim. The result record can say the declared top feature agrees with recomputation only because the verifier recomputes from input fields and compares the result. The result record also records a weight digest so cached or exported bundle cards can prove which fixture basis they are coupled to.

Discriminating Tests

The proof is strongest where it distinguishes a real coupling from a plausible but stale story. The focused tests exercise those distinctions directly:

TestFixture moveExpected verdictWhy it matters
test_mechanistic_interpretability_toy_transformer_input_perturbation_moves_verdictChanges layer2[0][1] to -0.5 and updates declared winners to toy_hidden_feature_0.Passes with target logit -0.116939; both attribution and ablation move to toy_hidden_feature_0.The result record follows changed input when declaration and recomputation remain coupled.
test_mechanistic_interpretability_input_perturbation_rejects_stale_claimsApplies the same perturbation but leaves declared winners at toy_hidden_feature_1.Blocks with INTERPRETABILITY_TOY_TRANSFORMER_DECLARED_TOP_FEATURE_MISMATCH.The verifier disconfirms stale interpretability claims instead of trusting old fixture prose.
test_mechanistic_interpretability_rejects_internal_default_toy_runtimeRemoves toy_transformer_runtime from the exported bundle.Blocks with INTERPRETABILITY_TOY_TRANSFORMER_FIXTURE_SPEC_REQUIRED.The public proof must be input-coupled, not backed by an internal default.
test_mechanistic_interpretability_bundle_card_rejects_uncoupled_cached_receiptEdits a cached result record so input_coupled_fixture and input_coupled_verdict are false.The command-card path is a freshness optimization, not permission to reuse uncoupled evidence.
test_mechanistic_interpretability_rejects_decorative_weight_sequencesRewrites graph-edge weights into simple arithmetic sequences.Blocks as suspected decorative graph evidence.Machine-readable graph rows still need anti-fabrication checks.
test_mechanistic_interpretability_rejects_disconnected_graph_edgesBreaks an edge path to a declared public error node.Blocks with zero path count for the affected row.A circuit-shaped graph must be traversable, not merely present.
test_mechanistic_interpretability_source_modules_reject_body_text_in_receiptMarks source body text as present in result record material.Blocks the source/body import.Source-open evidence remains metadata-only at result record boundaries.

Evidence Contract

Evidence classLocal authorityWhat it provesWhat it does not establish
Bundle bindingcore/paper_module_capsules.json row 52The paper module, component, mechanism, source locus, and generated projection statuses are linked.Markdown is not promoted to source authority.
Replay rowsfixtures/.../input/attribution_replays.json and exported bundle mirrorSix public replay rows with feature ids, graph edges, causal refs, faithfulness limits, contradiction refs, cold replay refs, and metadata-only target refs.The refs are fixture/accounting evidence, not live model internals.
Feature catalogfixtures/.../input/feature_catalog.jsonSix public sparse-feature summary ids with labels and no private weights or activation dumps.It does not disclose trained-model features or raw activations.
Toy runtime_toy_transformer_attribution_runtime and focused testsForward, gradient, ablation, digest, and stale-declaration checks are recomputed from the input fixture.The toy runtime is not a general interpretability method.
Graph analysis_graph_analysis_for_replay and _weight_sequence_analysisGraph rows are machine-readable, traversable, and not decorative constant-delta weight sequences.It does not validate a real neural circuit.
Source-open body floorsource_module_manifest.json plus source_modules/Eleven copied source bodies have digest/anchor/material-class checks.Bodies are not copied into result records and do not authorize private/live export.
Result record setreceipts/first_wave/..., result records/sign-off/..., runtime-shell lensPublic outputs carry refs, digests, counts, verdicts, omitted-payload flags, and scope limits.Result records do not publish private model data or launch-scope decision.

Reader Evidence Routing

The proof consumer for this reader slice is the focused interpretability replay suite plus the paper-module corpus parity check. The table below is the route a rank/projection reader should follow before trusting any claim in this module:

Reader questionSource surfaceFocused proof consumerScope limit
Is this module bound to a real component and mechanism?core/paper_module_capsules.json::paper_module.mechanistic_interpretability_circuit_attribution_replay and paper_modules/mechanistic_interpretability_circuit_attribution_replay.jsonscripts/build_doctrine_projection.py --check-paper-module-corpus
Does the replay recompute the attribution claim?_toy_transformer_attribution_runtime over fixture-provided token_ids, weights, and target_logit_indextest_mechanistic_interpretability_toy_transformer_runtime_computes_attribution, perturbation, and stale-claim testsProves fixture-local recomputation, not a general interpretability method.
Are graph rows actual circuit evidence rather than screenshots?_graph_analysis_for_replay and _weight_sequence_analysis over declared graph nodes, edges, and public error nodesdisconnected-graph and decorative-weight regression testsProves machine-readable traversability and anti-decoration checks, not a real neural circuit.
Do source-open bodies stay out of result records?source_module_manifest.json, copied source_modules/, _source_module_manifest_result, and _write_receiptssource-module exact-import and body-text-in-result record rejection testsProves copied body floor and metadata-only result records, not private/live export authority.
Where does a reader start when projections disagree?source record, generated JSON instance, runtime source, focused tests, then result recordscorpus check and focused pytest together

Failure Modes And Limitations

  • Missing required replay fields block with INTERPRETABILITY_REPLAY_FIELD_REQUIRED.
  • Feature names without catalog-backed ids block with INTERPRETABILITY_FEATURE_NAME_UNVERIFIABLE.
  • Graph screenshots or disconnected graph rows block because machine-readable edges and traversable paths are required.
  • Transparency language without a causal-intervention result record blocks with INTERPRETABILITY_INTERVENTION_RECEIPT_REQUIRED.
  • Faithfulness language without explicit limits blocks with INTERPRETABILITY_FAITHFULNESS_REQUIRES_LIMITS.
  • Private model weights, raw activation dumps, proprietary prompt exports, hidden chain-of-thought exports, model-output data bodies, and launch-scope decision are forbidden public outputs.
  • Decorative graph-weight sequences block as suspected fabrication.
  • Stale declared toy-transformer winners block when recomputation selects a different top feature.
  • The proof is fixture-local. It verifies a public replay membrane and copied source evidence; it does not certify real-world model faithfulness.

Relation To Interpretability Literature

The module borrows its accounting shape from the transformer-circuits and mechanistic-interpretability tradition: circuits should be graph-structured, features should be identifiable, causal language should be backed by interventions, and faithfulness language should be bounded. Useful prior-art anchors include Anthropic's transformer-circuits framing, causal scrubbing, and SAE/sparse-feature circuit work.

Microcosm does not reproduce those methods. The local contribution is a public replay boundary around an interpretability-shaped claim: machine-readable edges instead of screenshots, causal-intervention refs instead of bare transparency language, fixture recomputation instead of stale row trust, and explicit scope boundaries before a claim becomes public evidence.

Relation To Microcosm Concepts, Mechanisms, And Principles

The bundle binds this module to:

  • concept.research_and_science_replay_evidence_bundle
  • mechanism.mechanistic_interpretability_circuit_attribution_replay.validates_public_mechanistic_interpretability_circuit_attribution_replay
  • principles P-2, P-4, P-8, and P-9
  • axioms AX-3, AX-5, AX-7, and AX-8

The practical reading is:

  • P-2: claim language stays below the strength of the checker.
  • P-4: public proof routes through result records and explicit evidence refs.
  • P-8: failed preconditions are typed refusals, not vague warnings.
  • P-9: provenance crosses from fixture, source source, and result record without upgrading authority.
  • AX-3: dereferenced proof and policy refs matter more than prose labels.
  • AX-5: status fails closed across all required parts.
  • AX-7: partial computation returns a typed refusal.
  • AX-8: public fixture and copied-source labels propagate without becoming private model access.

Named Proof Consumers

Run from microcosm-substrate:

This consumes the first-wave fixture, negative cases, source-module mirror, secret scan, toy-transformer replay, and result record writer.

PYTHONPATH=src ../repo-python -m microcosm_core.organs.mechanistic_interpretability_circuit_attribution_replay run-attribution-bundle \
  --input examples/mechanistic_interpretability_circuit_attribution_replay/exported_circuit_attribution_bundle \
  --out /tmp/microcosm-mechanistic-interpretability-circuit-attribution-replay/bundle \
  --card

This consumes the exported circuit-attribution bundle, copied body floor, digest checks, metadata-only result records, command-card omission contract, and runtime-shell validation shape.

PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_mechanistic_interpretability_circuit_attribution_replay.py -q
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

The focused regression pins recomputation, stale-row rejection, graph and source-body gates, card result record reuse, and body-text exclusions.

Reader Route

A cold reader should inspect in this order:

  1. core/paper_module_capsules.json row 52 for authority and projection binding.
  2. paper_modules/mechanistic_interpretability_circuit_attribution_replay.json for generated relationship edges.
  3. src/microcosm_core/organs/mechanistic_interpretability_circuit_attribution_replay.py for runtime logic.
  4. tests/test_mechanistic_interpretability_circuit_attribution_replay.py for the stale-row, perturbation, graph, source-body, and result record-boundary proof.
  5. fixtures/first_wave/mechanistic_interpretability_circuit_attribution_replay/input for the fixture.
  6. examples/mechanistic_interpretability_circuit_attribution_replay/exported_circuit_attribution_bundle for the public bundle.
  7. receipts/first_wave/mechanistic_interpretability_circuit_attribution_replay and receipts/runtime_shell/public_mechanistic_interpretability_circuit_attribution_replay_lens.json for metadata-only public result record evidence.

Prior Art Grounding

This replay exercises a circuit-attribution pass that traces which internal components account for a behaviour. It is grounded in mechanistic interpretability, the study of the internal circuits of neural networks (Anthropic, Transformer Circuits). Microcosm borrows the attribution-replay shape over synthetic fixtures; the result is fixture-bound runtime evidence, not live model access, a transparency product, or a correctness claim about any real model.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

PYTHONPATH=src python3 -m pytest tests/test_mechanistic_interpretability_circuit_attribution_replay.py -q
PYTHONPATH=src python3 scripts/build_doctrine_projection.py --check-paper-module-corpus

These are reader-verifiable evidence only and do not include launch operations, external model access, source-file changes, or whole-system correctness.

Scope boundary

Authority And Evidence Boundary
  • Source authority: core/paper_module_capsules.json::paper_modules[52:paper_module.mechanistic_interpretability_circuit_attribution_replay] with source_authority: json_capsule.
  • Generated instance: paper_modules/mechanistic_interpretability_circuit_attribution_replay.json.
  • Runtime: src/microcosm_core/organs/mechanistic_interpretability_circuit_attribution_replay.py.
  • Focused tests: tests/test_mechanistic_interpretability_circuit_attribution_replay.py.
  • Governing standard: standards/std_microcosm_mechanistic_interpretability_circuit_attribution_replay.json.

This Markdown is a human-readable paper projection. The bundle JSON binds the component, mechanism, source locus, generated Mermaid status available_from_capsule_edges, and Atlas status linked_from_capsule_edges. The runtime, fixtures, tests, result records, and manifests are the technical evidence for the claims below.

Scope limit

This module may claim:

  • public, cold-replayable circuit-attribution accounting for the named fixture and exported bundle;
  • feature ids tied to machine-readable graph edges and traversable public error-node paths;
  • causal-intervention result record refs and faithfulness-limit refs are required before transparency or faithfulness language passes;
  • the toy-transformer declaration is input-coupled to recomputed forward, gradient, and ablation evidence;
  • stale toy-transformer declarations are rejected by focused tests;
  • copied source source bodies are verified by manifest and digest checks while result records remain metadata-only.

It may not claim:

  • live model access or external model access;
  • private weights, raw activation tensors/dumps, proprietary prompts, hidden chain-of-thought, hidden reasoning, or model-output data export;
  • real model-transparency product status;
  • benchmark claims authority;
  • public sharing, hosted-product readiness, launch-scope decision, or recipient-send authority;
  • whole-system interpretability correctness.
Source and projection details
Source-Open Body Floor

The source-open body floor is declared in:

  • examples/mechanistic_interpretability_circuit_attribution_replay/exported_circuit_attribution_bundle/source_module_manifest.json
  • fixtures/first_wave/mechanistic_interpretability_circuit_attribution_replay/input/source_module_manifest.json

The manifest covers copied source bodies: Oracle attribution maps, pattern-ledger rows, high-novelty scout records, component projection IR, projection readiness code, mission transaction preflight code, execution trace code, strict JSON code, and trace/readiness standards. The runtime verifies classification, material class, body-copied status, body-not-in-result record status, target digest, source/target digest agreement, line count when the source is available, and required anchors.

The body floor excludes private model weights, raw activations, proprietary prompts, hidden reasoning, model-output data, account or browser state, browser or HUD state, account secret material, private source-root material, public sharing, hosting, and launch-scope decision.

Prediction Oracle ReconciliationReplays a forecast against the discipline a careful predictor would have to defend.3/5

Does Runs a made-up forecasting case through the discipline a careful predictor would have to defend: which way a fork was called and why the losing side was ruled out, whether each prediction stayed inside the pre-declared list of allowed outcomes, that no "after the fact" evidence got used as if it were known in advance, how the guesses compared to a synthetic "what actually happened" result, and that any edits to the running record were small, allowed changes rather than rewrites. The reasoning is laid out as inspectable records rather than a single handed-down verdict. Everything is invented test data — it makes no real forecast and claims no track record.

Scope limit It exercises projection mechanics on a synthetic, invented packet only. It does not establish forecasting correctness or accuracy, give trading/financial/investment-related actions, call live market data or providers, publish predictions, claim any performance or track record, import non-public data, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.prediction_oracle_reconciliation run --input fixtures/first_wave/prediction_oracle_reconciliation/input --out receipts/first_wave/prediction_oracle_reconciliation

EvidenceComputed projectionevidence 3/5Source-faithful refactor

research-workflowsforecastingprovider operations

Source Design note · Source atlas

Paper module Prediction Oracle Reconciliation

prediction_oracle_reconciliation is a source-available runtime fixture component for the prediction-engine slice. It compresses the source pattern group around CP1 bifurcation resolution, CP2 valid target universes, oracle grounding firewalls, diff grading, and dossier mutation into a synthetic packet a cold reader can run.

It is deliberately not a market product. The component has no live data, no external model access, no trading authority, no financial or investment-related actions authority, no publishing-scope decision, and no launch-scope decision. Its job is to make the reasoning shape inspectable without making performance or action claims. The result record contract is source-open by default: public fixture packets, exported bundle refs, source refs, and runtime result records carry the evidence, while secret_exclusion_scan blocks only live market feeds, model-output data bodies, account or browser material, private dossiers, and account secret-equivalent access.

Purpose

A forecast that gets the direction right can still be badly wrong about the number, and a forecast can look accurate only because it quietly used evidence that arrived after the outcome it was meant to predict. This component exists to make those two failures visible on a synthetic packet, before any reasoning is dressed up as a track record. The single question it answers is narrow: does this prediction packet keep its evidence honest and its grading recomputable, or does it cut a corner?

The unusual choice is that the component does not trust the numbers the packet reports. For every numeric row it recomputes the absolute error, the percent error, and the direction hit from the snapshot, predicted, and realized prices, then rejects any claimed value that contradicts the recompute. It also surfaces a direction hit that is still a large numeric miss rather than letting the correct arrow hide the size of the error. Evidence is split at the prediction time: a reference that points past the target window is refused, not silently scored.

None of this is forecasting. There is no live market data, no external model access, no trading or investment-related actions, and no performance claim. The packet, its target universe, and its realized values are invented fixtures. A direction hit or a numeric miss inside a result record is a statement about the fixture and the grading mechanics, nothing more.

Public Contract

The input packet names:

  • source_pattern_ids for the source pattern family being projected.
  • valid_prediction_targets and target_universe for the CP2 gate.
  • cp1_branches with selected side, rationale refs, and opposite-side invalidation refs.
  • cp2_predictions with pre-target evidence refs and grounding ids.
  • oracle_diff rows that grade synthetic realized direction against prediction.
  • dossier_mutations constrained to fixture deltas.
  • public_runtime_refs for the public fixture, exported bundle, and paper module system refs.
  • authority_ceiling values that explicitly keep trading, advice, provider, live-market, public sharing, launch, and secret-export authority false.

How it works

validate_reconciliation_packet runs five checks over the packet and folds the findings into one status. Each check guards a specific way a forecast can flatter itself.

CP1 resolution. Every cp1_branches row must name the side it chose, carry rationale refs, and keep an opposite_side_invalidation_ref, the record of why the losing side lost. A branch that asserts a winner without retaining the discarded alternative is rejected as an unresolved bifurcation. Equity or market-lane branches additionally need an explicit confirmation bit before they count.

CP2 universe and pre-target evidence. Predictions must name a target_id inside the declared valid_prediction_targets, so the set of things being predicted is fixed before the outcome rather than chosen afterwards. Evidence refs must be pre-target: a ref is accepted only if it carries the T- time prefix, and a reference that points past the target window raises PREDICTION_ORACLE_POST_T_EVIDENCE_FORBIDDEN. This is the gate that stops a packet from grading itself with hindsight.

Recomputed numeric grading. This is the part that does real arithmetic. For each graded row the component takes the snapshot, predicted, and realized prices and recomputes the absolute delta, the percent delta against the snapshot, and the direction hit. If the row also reports its own abs_error, pred_error_pct, or direction_hit, the claimed value must match the recompute or the row is rejected. Two further rules matter. A row whose direction is correct but whose error clears the floor (ten in absolute terms, or five percent) is surfaced as a large miss, so a right arrow cannot conceal a large numeric error. A row with no realized price is not fabricated into a graded row, a row marked degraded is gated out of grading rather than scored, and the STOCK and ETF asset classes are kept as separate counts rather than blended.

Oracle diff and bounded mutation. The oracle_diff rows grade synthetic realized direction against each prediction, and dossier_mutations may only add a contradiction, revise a confidence band, or retire a claim. A high-severity mutation needs two evidence refs and an explicit public-delta allowlist before it is allowed.

A run passes only when at least two CP1 branches, two CP2 predictions, two graded numeric rows across both asset classes, and one bounded mutation are present, the recompute and evidence gates raise no findings, the source-module digests match, and the secret scan is clean. The result record records counts, verdicts, and authority booleans; the packet body, claimed numbers, and source bodies stay out of it.

Shape

Synthetic prediction packettarget universe, CP1branches,CP2 predictions, oracle diff,numeric rows, dossiermutationsSynthetic prediction packet target universe, CP1 branches, CP2 predictions, oracle diff, numeric rows, dossier mutationsCP1 resolutionchosen side + rationale +why the opposite side lost;equity lane needsconfirmationCP1 resolution chosen side + rationale + why the opposite side lost; equity lane needs confirmationCP2 universe + evidencetarget inside declareduniverse;evidence must be pre-target(T-)CP2 universe + evidence target inside declared universe; evidence must be pre-target (T-)Recomputed numeric gradingabs error, percent error,direction hit recomputed;claimed values must matchRecomputed numeric grading abs error, percent error, direction hit recomputed; claimed values must matchOracle diff + mutationrealized vs predicteddirection;bounded dossier deltasOracle diff + mutation realized vs predicted direction; bounded dossier deltasDirection-right, numeric-misssurfaced, not hiddenDirection-right, numeric-miss surfaced, not hiddenDegraded / missing-truth rowsgated, not fabricatedDegraded / missing-truth rows gated, not fabricatedmetadata-only result recordsresult, board, validation,sign-off; counts and verdictsmetadata-only result records result, board, validation, sign-off; counts and verdictsScope limitsynthetic fixture only;no trading, advice, provider,live market, publish, launchScope limit synthetic fixture only; no trading, advice, provider, live market, publish, launch
Diagram source
flowchart TD Packet["Synthetic prediction packet target universe, CP1 branches, CP2 predictions, oracle diff, numeric rows, dossier mutations"] CP1["CP1 resolution chosen side + rationale + why the opposite side lost; equity lane needs confirmation"] CP2["CP2 universe + evidence target inside declared universe; evidence must be pre-target (T-)"] Numeric["Recomputed numeric grading abs error, percent error, direction hit recomputed; claimed values must match"] Oracle["Oracle diff + mutation realized vs predicted direction; bounded dossier deltas"] LargeMiss["Direction-right, numeric-miss surfaced, not hidden"] Gated["Degraded / missing-truth rows gated, not fabricated"] Result records["metadata-only result records result, board, validation, sign-off; counts and verdicts"] Ceiling["Scope limit synthetic fixture only; no trading, advice, provider, live market, publish, launch"] Packet --> CP1 Packet --> CP2 Packet --> Numeric Packet --> Oracle Numeric --> LargeMiss Numeric --> Gated CP1 --> Result records CP2 --> Result records LargeMiss --> Result records Gated --> Result records Oracle --> Result records Result records --> Ceiling

Evidence/accounting:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[54:paper_module.prediction_oracle_reconciliation] sets source_authority: json_capsule, binds the component, binds mechanism.prediction_oracle_reconciliation.validates_public_prediction_oracle_reconciliation, and resolves src/microcosm_core/organs/prediction_oracle_reconciliation.py.
  • Generated instance: paper_modules/prediction_oracle_reconciliation.json reports paper_module_payload.source_authority: json_capsule, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, 15 relationship edges, and no unpopulated selective relations.
  • Runtime and fixture floor: src/microcosm_core/organs/prediction_oracle_reconciliation.py exposes run, run_prediction_bundle, validate_source_module_imports, validate_reconciliation_packet, _source_open_body_import_summary, write_receipts, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING. fixtures/first_wave/prediction_oracle_reconciliation/input/reconciliation_packet.json carries the synthetic CP1/CP2, oracle-diff, target-universe, and dossier-mutation evidence shape.
  • Exported bundle and result records: examples/prediction_oracle_reconciliation/exported_prediction_oracle_bundle/source_module_manifest.json and the exported source artifacts provide source-open replay evidence. receipts/first_wave/prediction_oracle_reconciliation/prediction_oracle_reconciliation_result.json, prediction_oracle_validation_receipt.json, and result records/sign-off/first_wave/prediction_oracle_reconciliation_fixture_acceptance.json keep the result record metadata-only and fixture-bounded.
  • Test and claim boundary: tests/test_prediction_oracle_reconciliation.py checks invalid target universes, unresolved CP1 branches, post-target evidence, unsafe dossier mutation, live-market/trading/advice overclaims, exported-bundle validation, and source-module digest gates. The structured source record scope limit excludes forecasting correctness, financial decisions, trading authority, live market data, external model access, prediction public sharing, performance track record, non-public data import, launch-scope decision, publishing-scope decision, and whole-system correctness.

Reader Evidence Routing

Open this module as a reader map, not as prediction evidence. Use the runtime fixture input for packet shape, the exported bundle for source-open replay, the structured source record for relationship edges, and the test file for the negative cases that enforce the scope limit.

Route evidence in this order:

  1. Read the structured lattice bindings section to confirm the source record path and subject edges.
  2. Inspect the fixture input for declared target universes, CP1 branches, CP2 prediction evidence, oracle-diff rows, and fixture-bounded dossier mutations.
  3. Run the fixture and exported-bundle commands to produce metadata-only result records.
  4. Check tests/test_prediction_oracle_reconciliation.py for the negative cases that reject target-universe escapes, unresolved CP1 branches, post-target evidence, live-market overclaims, and authority overclaims.
  5. Use paper_modules/prediction_oracle_reconciliation.json as the generated relationship graph for this module.

Negative Cases

The fixture rejects:

  • a CP2 prediction outside the target universe;
  • an unresolved CP1 bifurcation;
  • post-target evidence used as prediction evidence;
  • unconfirmed equity or market-lane claims;
  • unsafe high-severity dossier mutation;
  • trading, advice, live-provider, public sharing, launch, or secret-export authority overclaims.

Prior Art Grounding

This component is grounded in probabilistic forecast evaluation and prediction market infrastructure. The Brier score is an early probability-forecast verification anchor, proper-scoring-rule work such as Gneiting and Raftery motivates incentive-compatible forecast scoring, and Hanson's logarithmic market scoring rule grounds the prediction-market idea that forecasts can be updated and evaluated through explicit scoring mechanisms. Forecasting tournament work around tracking and calibration also motivates separating prediction evidence from post-outcome explanation.

Microcosm borrows the reconciliation pattern: declare the target universe before the outcome, keep pre-target evidence separate from post-target evidence, grade against a synthetic oracle diff, and constrain dossier mutation to declared fixture deltas. It does not trade, advise, publish predictions, or claim forecast performance.

Commands

PYTHONPATH=src python3 -m microcosm_core.organs.prediction_oracle_reconciliation run \
  --input fixtures/first_wave/prediction_oracle_reconciliation/input \
  --out receipts/first_wave/prediction_oracle_reconciliation

PYTHONPATH=src python3 -m microcosm_core.organs.prediction_oracle_reconciliation run-prediction-bundle \
  --input examples/prediction_oracle_reconciliation/exported_prediction_oracle_bundle \
  --out receipts/runtime_shell/demo_project/organs/prediction_oracle_reconciliation

Validation Result record Path

Run from microcosm-substrate:

PYTHONPATH=src ../repo-python -m microcosm_core.organs.prediction_oracle_reconciliation run \
  --input fixtures/first_wave/prediction_oracle_reconciliation/input \
  --out /tmp/microcosm-prediction-oracle-reconciliation/fixture \
  --card
PYTHONPATH=src ../repo-python -m microcosm_core.organs.prediction_oracle_reconciliation run-prediction-bundle \
  --input examples/prediction_oracle_reconciliation/exported_prediction_oracle_bundle \
  --out /tmp/microcosm-prediction-oracle-reconciliation/bundle \
  --card
PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_prediction_oracle_reconciliation.py -q
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

A passing run proves only synthetic target-universe reconciliation, CP1/CP2 accounting, oracle-diff grading, and fixture-bounded dossier mutation; it does not establish forecasting performance, financial decisions, trading authority, live market access, public sharing, or launch.

Scope boundary

Scope limit

This module covers only fixture-bounded prediction-oracle reconciliation: synthetic target-universe accounting, CP1/CP2 separation, oracle-diff grading, dossier mutation constraints, copied source-module import evidence, negative cases, and public result records. They do not prove forecasting accuracy, financial decisions, trading authority, live-market access, provider behavior, prediction public sharing, performance track record, private-data import, launch-scope decision, publishing-scope decision, or whole-system correctness.

Limitations

The target universe, CP1 branches, CP2 evidence, realized values, oracle diff, and dossier mutations are fixture artifacts. They exercise the shape of a reconciliation pipeline, but they are not live market data, a validated forecasting track record, an investment strategy, or a prediction public sharing surface. A direction hit or numeric miss inside the result record is evidence about the synthetic packet only.

The exported bundle is source-open in the narrow body-floor sense. It digest checks copied source contracts, node manifests, tool code, pattern rows, and route-decision artifacts while keeping body text out of result records. That does not certify private source-root equivalence, provider behavior, account or session state, hidden market feeds, private dossiers, or launch-scope decision.

The negative cases are scoped regression guards. They reject invalid targets, unresolved bifurcations, post-target evidence, unconfirmed equity-lane claims, unsafe dossier mutation, trading/advice overclaims, degraded feed misuse, missing realized numeric truth, and asset-class mixing. Those refusals do not prove full financial safety, whole-system correctness, runtime correctness outside the named component, or complete secret absence beyond the declared scanner envelope.

Scope limit

Synthetic invented prediction packet and source-module import evidence only; no forecasting correctness or accuracy, no trading, financial, or investment-related actions, no live market data, no external model access, no prediction public sharing, no performance track record, no non-public data import, no launch-scope decision, no publishing-scope decision, and no whole-system correctness.

Scope boundary

This module demonstrates synthetic prediction-reconciliation mechanics only. It does not trade, give financial or investment-related actions, call live market providers, publish predictions, claim forecasting performance, import non-public data, or include launch operations.

Source and projection details
Governing Lattice Relation
  • source record: core/paper_module_capsules.json::paper_modules[54:paper_module.prediction_oracle_reconciliation].
  • Subject edges: explains component prediction_oracle_reconciliation and mechanism mechanism.prediction_oracle_reconciliation.validates_public_prediction_oracle_reconciliation.
  • Doctrine edges: governed by principles P-2, P-6, P-8, and P-9; abides by axioms AX-5, AX-7, AX-8, and AX-10.
  • Dependency edges: depends on paper_module.finance_forecast_evaluation_spine, paper_module.world_model_projection_drift_control_room, and paper_module.research_replication_rubric_artifact_replay.
  • Runtime code locus: src/microcosm_core/organs/prediction_oracle_reconciliation.py, including run, run_prediction_bundle, validate_source_module_imports, validate_reconciliation_packet, _source_open_body_import_summary, _build_result, write_receipts, result_card, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
  • Generated row proof: 15 resolved relationship edges, no unpopulated selective relations, Mermaid available_from_capsule_edges, and Atlas linked_from_capsule_edges.

The governing lattice turns the component into a bounded reconciliation checker rather than a forecast authority. P-2 lowers every positive claim to the checker strength: CP1/CP2 accounting, oracle-diff grading, numeric-row gates, source-module digest checks, negative cases, and metadata-only result records. P-6 fails closed when a branch is unresolved, a target escapes the declared universe, a source digest mismatches, or an authority flag tries to rise above the accepted component ceiling. P-8 makes those refusals typed outcomes instead of prose warnings. P-9 carries source refs, public runtime refs, copied-body material status, and result record refs across the fixture and exported bundle.

The axiom layer supplies the same boundary. AX-5 prevents the fixture from upgrading synthetic reconciliation evidence into trading, advice, live-market, provider, public sharing, launch, or performance-track-record authority. AX-7 permits partiality: degraded feed health, missing realized numeric truth, and asset-class split pressure are surfaced as scoped findings rather than hidden successes. AX-8 keeps copied source bodies while excluding live market data, model-output data bodies, private dossiers, and account secret-equivalent material. AX-10 requires the target-universe, CP1/CP2, oracle-diff, and source-module evidence to be tied to the current fixture or bundle result records before the Markdown projection is treated as current.

The structured source record's 15 edges prove route parity only.

Finance Forecast Evaluation SpineReplays synthetic forecast tests through copied finance stats, recording p-values with no advice.4/5Runs real tools

Does Runs public synthetic forecast-evaluation fixtures through copied finance statistics modules and records p-value/refusal behavior without live market data or advice claims.

Scope limit synthetic fixture forecast-evaluation statistics only; no investment-related actions, live market data, track record, or performance claim

Run
microcosm finance-forecast-evaluation-spine run --input fixtures/first_wave/finance_forecast_evaluation_spine/input --out receipts/first_wave/finance_forecast_evaluation_spine

EvidenceExternal tool runevidence 4/5Real runtime result

research-workflowsforecastingfinance

Source Design note · Source atlas

Paper module Finance Forecast Evaluation Spine

finance_forecast_evaluation_spine is a Crown Jewel import component with real runnable system and a strict public scope limit. It consumes synthetic public fixtures, copied source source bodies, and source manifests that verify sha256 digests, line counts, required anchors, secret-exclusion status, and result record body omission.

Purpose

Comparing two forecasting models is harder than it looks. A lower average loss does not establish that one model genuinely predicts better, because losses are autocorrelated, samples are short, and a careless split can let a model peek at the answer. This component exists to carry the statistical machinery that economists use to answer that question carefully, and to do so without ever claiming the machinery has been pointed at a real market.

The single question it answers is narrow: given two paired loss series over a synthetic fixture, can the difference in predictive accuracy be called significant under an admissible test, or must the test refuse? It computes the Diebold-Mariano loss-differential statistic with a Bartlett HAC long-run variance, the Harvey-Leybourne-Newbold small-sample correction, Hansen's test for superior predictive ability with recentering, a model confidence set, and a Politis-Romano stationary bootstrap.

Failure is handled explicitly. The Harvey-Leybourne-Newbold correction returns its computed statistic, but when SciPy is absent it refuses the p-value with a typed reason rather than fabricating one. The same discipline rejects a horizon that reaches the sample length, a sample too small to estimate anything, a time split that lets the evaluation date sit at or after the event window, and any policy flag that smuggles in advice or a track-record claim. A refusal is recorded as a first-class validator outcome, not an error: "we declined to answer" is itself a valid result.

The guards run before the statistics. If a boundary policy or a leakage check fails, the result record is blocked before any statistics subprocess starts, so an inadmissible request never produces a number that could be misread as a result.

What it proves: synthetic fixture forecast-evaluation statistics only; no investment-related actions, live market data, track record, or performance claim.

How to run it:

microcosm finance-forecast-evaluation-spine run --input fixtures/first_wave/finance_forecast_evaluation_spine/input --out receipts/first_wave/finance_forecast_evaluation_spine

Runtime bundle route:

python -m microcosm_core.organs.finance_forecast_evaluation_spine run-finance-forecast-bundle --input examples/finance_forecast_evaluation_spine/exported_finance_eval_bundle --out receipts/runtime_shell/demo_project/organs/finance_forecast_evaluation_spine

Negative cases covered by the fixture manifest: finance_hln_dependency_refusal, finance_leakage_lookahead_split, finance_no_advice_overclaim.

Source provenance is anchored by examples/finance_forecast_evaluation_spine/exported_finance_eval_bundle/source_module_manifest.json and result records carry refs, digests, counts, verdicts, and scope boundaries only.

Shape

"boundary fails""boundary passes""first-wave fixture""exported bundle"Synthetic fixture inputsfamily_loss_matrix,paired_loss_series,finance_boundary_policy,projection_protocolSynthetic fixture inputs family_loss_matrix, paired_loss_series, finance_boundary_policy, projection_protocolCopied finance modulesplus source manifest digestsCopied finance modules plus source manifest digestsRunnerRunnerGuards run firstpolicy no-advice flags,lookahead-split leakage checkGuards run first policy no-advice flags, lookahead-split leakage checkBlocked result recordstatistics subprocess neverstartsBlocked result record statistics subprocess never startsAdmissible andexported bundle?Admissible and exported bundle?Statistics subprocessDM/HAC, Hansen SPA, MCS,stationary bootstrap, HLNrefusalStatistics subprocess DM/HAC, Hansen SPA, MCS, stationary bootstrap, HLN refusalStandalone statisticscontractno live source-rootsubprocessStandalone statistics contract no live source-root subprocessResult recordsrefs, hashes, counts,verdicts,scope boundaries;body_in_receipt falseResult records refs, hashes, counts, verdicts, scope boundaries; body_in_receipt false

Source refs

Runner
finance_forecast_evaluation_spine.run
Diagram source
flowchart TD Fixture["Synthetic fixture inputs family_loss_matrix, paired_loss_series, finance_boundary_policy, projection_protocol"] Source["Copied finance modules plus source manifest digests"] Runner["finance_forecast_evaluation_spine.run"] Guards["Guards run first policy no-advice flags, lookahead-split leakage check"] Blocked["Blocked result record statistics subprocess never starts"] Branch{"Admissible and exported bundle?"} Subprocess["Statistics subprocess DM/HAC, Hansen SPA, MCS, stationary bootstrap, HLN refusal"] Standalone["Standalone statistics contract no live source-root subprocess"] Result record["Result records refs, hashes, counts, verdicts, scope boundaries; body_in_receipt false"] Fixture --> Runner Source --> Runner Runner --> Guards Guards -->|"boundary fails"| Blocked Guards -->|"boundary passes"| Branch Branch -->|"first-wave fixture"| Subprocess Branch -->|"exported bundle"| Standalone Subprocess --> Result record Standalone --> Result record Blocked --> Result record

Technical Mechanism

The module is a deterministic forecast-evaluation harness around CrownJewelSpec, not a finance product. The spec fixes four required fixture inputs (family_loss_matrix.json, paired_loss_series.json, finance_boundary_policy.json, and projection_protocol.json), names the three required negative cases, binds the source manifest, and restricts the source-open import to required anchors in model_selection_stats.py, spa_statistics.py, loss_differentials.py, and family_loss_matrix.py.

At runtime, run delegates to run_crown_jewel_organ with evaluate and evaluate_negative_case. evaluate loads the synthetic loss matrix, paired loss series, and boundary policy, then calls _evaluate_payloads. That function first enforces the policy and lookahead-split guards; if either boundary fails, it returns a blocked result record before any statistics subprocess can run. Only after those guards pass does it run the copied statistics modules or, for the exported bundle path, use _standalone_exported_statistics_contract so the standalone public bundle does not depend on a live source-root subprocess.

The statistical witness is therefore deliberately narrow: Reality Check, Hansen-SPA, MCS, Diebold-Mariano/HAC, stationary bootstrap, and the HLN refusal are result record fields over the synthetic fixture. The same mechanism treats finance_hln_dependency_refusal as a typed negative case when SciPy support is absent, treats policy overclaims as FINANCE_NO_ADVICE_OVERCLAIM, treats temporal leakage as FINANCE_LOOKAHEAD_SPLIT_FORBIDDEN, and keeps copied source bodies out of result records with body_in_receipt: false.

Reader Evidence Routing

Read the positive fixture as a small statistical witness, not as a market result. The current result record has status: pass, sample_size: 40, candidate_count: 3, reality_check.status: computed_bootstrap, spa.status: computed_bootstrap, mcs.implemented: true, paired_loss.diebold_mariano.status: computed_hac_normal_approximation, and a five-replicate stationary-bootstrap witness. Those fields show that the component can exercise the copied forecast evaluation code paths on public synthetic data.

Read the negative floor as equal evidence. The observed negative cases are finance_hln_dependency_refusal, finance_leakage_lookahead_split, and finance_no_advice_overclaim, with stable error codes FINANCE_HLN_TYPED_REFUSAL_REQUIRED, FINANCE_LOOKAHEAD_SPLIT_FORBIDDEN, and FINANCE_NO_ADVICE_OVERCLAIM. The HLN case refuses because SciPy is unavailable for the t-distribution; that is the intended scope limit, not a missing p-value to fill in by hand.

Read source-open evidence through the manifest, not through result records. The source bundle carries 13 copied finance modules; result records carry references, hashes, counts, verdicts, and scope boundaries, and keep body_in_receipt: false. The local claim therefore stays at "synthetic fixture forecast-evaluation statistics and typed refusals." It does not become investment-related actions, live-market data, a track record, performance proof, optimizer authorization, or launch-scope decision.

Forecast-Evaluation Discipline

This component is evidence that the Microcosm can carry professional forecast evaluation logic without pretending to carry market authority. The admissible statistics include Diebold-Mariano loss-differential testing, the Harvey-Leybourne-Newbold small-sample correction, Hansen's SPA test, a Politis-Romano stationary bootstrap, Bartlett HAC long-run variance, and purged/embargoed cross-validation in the Lopez de Prado style.

The important doctrine is refusal discipline. Horizons greater than or equal to sample length, samples too small to estimate a statistic, leakage-prone splits, missing SciPy support, and advice-shaped claims must return typed refusals instead of crashes or meaningless numbers. Hansen-style recentering of poor or irrelevant alternatives is part of the SPA contract because it is the boundary between a useful superior-predictive-ability test and White Reality Check style over-penalization.

Result records should therefore distinguish "computed statistic" from "refused because inadmissible." Both are successful validator outcomes when the fixture asked for that behavior.

Named Proof Consumers

  • Runtime fixture consumer: finance_forecast_evaluation_spine.run over fixtures/first_wave/finance_forecast_evaluation_spine/input must produce status: pass, the three observed semantic negative cases, false advice/live-data/performance authority flags, and metadata-only source-manifest result record material.
  • Exported-bundle consumer: run-finance-forecast-bundle over examples/finance_forecast_evaluation_spine/exported_finance_eval_bundle must validate the 13 copied finance modules by digest and use the standalone statistics contract rather than a live source subprocess.
  • Focused pytest consumer: tests/test_finance_forecast_evaluation_spine.py must keep the positive statistical fixture, no-advice overclaim, live-market overclaim, lookahead split, semantic-negative-case, standalone-bundle, and digest-mismatch tests green.
  • Corpus consumer: scripts/build_doctrine_projection.py --check-paper-module-corpus must keep the 98-module Microcosm paper-module corpus valid without hand-editing the generated JSON instance.
  • Scope limit consumer: any public or dissemination copy must preserve the local ceiling that this is synthetic fixture forecast-evaluation evidence, not investment-related actions, live data, performance proof, optimizer authorization, or launch-scope decision.

Prior Art Grounding

This component is grounded in forecast-evaluation statistics rather than trading systems. The core anchors are the Diebold-Mariano test for comparing predictive accuracy, the Harvey-Leybourne-Newbold small-sample correction for prediction-error tests (DOI reference), Hansen's test for superior predictive ability, and proper-scoring-rule work such as Gneiting and Raftery. The purged/embargoed split discipline also follows the financial ML concern that temporal leakage can make backtests look stronger than they are.

Microcosm borrows the professional evaluation posture: compute admissible statistics when the fixture supports them, return typed refusals when it does not, and keep evaluation separate from advice, live market data, or performance claims.

Validation Result record Path

PYTHONPATH=src ./repo-pytest tests/test_finance_forecast_evaluation_spine.py -q --basetemp=/tmp/microcosm_finance_forecast_evaluation_spine_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

Finance forecast evaluation spine proves only synthetic market-shaped forecast-evaluation fixture behavior, copied source manifest integrity, metadata-only result records, admissible statistic computation, and typed refusals for inadmissible finance claims. A diagram view and atlas navigation entry are generated for this module, but those navigation projections do not expand the proof. This module is not investment or trading decisions, uses no live market data, proves no track record or performance claim, mutates no optimizer, certifies no trading strategy, and treats SciPy absence as a typed HLN refusal rather than a hidden statistical success.

Source and projection details
Governing Lattice Relation

The generated JSON instance resolves six bundle-derived edges for this module: it explains component finance_forecast_evaluation_spine, explains mechanism mechanism.finance_forecast_evaluation_spine.validates_public_finance_forecast_evaluation_spine, is governed by concept concept.research_and_science_replay_evidence_bundle, is governed by principle P-8, abides by AX-7, and cites the code locus src/microcosm_core/organs/finance_forecast_evaluation_spine.py. Those edges come from core/paper_module_capsules.json::paper_modules[30:paper_module.finance_forecast_evaluation_spine] and the generated structured source record, not from this Markdown prose.

Mechanically, P-8 and AX-7 show up as refusal discipline: an admissible statistic can pass, but advice-shaped policy flags, live-market authority, leakage-prone time splits, source digest mismatch, and fake HLN p-values must block. The concept edge keeps the module in the research/science replay-evidence family, where proof value is a reproducible fixture and source-manifest witness rather than a claim about markets.

Market Dashboard Read-Model BundleRuns a copied market-dashboard reader to catch broken links, stale feeds, and trading overclaims.5/5

Does This bundle imports the market dashboard read-model source as public runnable system. Running it over synthetic market-dashboard rows shows how structural read-model checks, feed freshness classification, and related-situation grouping catch dangling graph edges, unsafe route refs, auto-apply overclaims, trading-language overclaims, silent omissions, stale or missing readiness, and no-overlap relation cases.

Scope limit This is fixture-bound read-model, freshness, and relation-grouping evidence only; it is not live market-level conclusions, not investment-related actions, not external model access, not launch-scope decision, and not whole-system correctness.

Run
microcosm batch12-market-dashboard-read-model-capsule run-market-dashboard-bundle --input examples/batch12_market_dashboard_read_model_capsule/exported_batch12_market_dashboard_read_model_capsule_bundle --out receipts/runtime_shell/demo_project/organs/batch12_market_dashboard_read_model_capsule

EvidenceVerified source importevidence 5/5Copied source body

research-workflowsforecastingfinance

Source Design note · Source atlas

Paper module Set 12 Market Dashboard Read-Model Bundle

Purpose

The underlying source module compiles a generated market-situation graph into a backend read model: a trust strip, a ranked situation queue, a detail index, a graph slice, facets, drilldowns, and an API contract. The read model is the shape a dashboard consumes. It runs the copied read-model helpers over small synthetic fixtures and asks one question: does the read-model layer hold its own claim boundary, or does it quietly become a market-truth or advice surface?

The interesting part is what the validator refuses rather than what it accepts. A presentation layer is the easy place for an overclaim to leak in: a label like "strong buy", an auto_apply_allowed flag left true, a freshness state that reports green from a stale or missing artifact. The copied validate_market_dashboard_read_model scans for trading and action-claim language, requires oracle_evolve.auto_apply_allowed to be false and review_gated to be true, requires no_advice_mode to be enabled, and requires the silent-omission count to be zero. The bundle drives those checks with fixtures designed to trip each one, then records whether the source actually flagged them.

The other two mechanisms guard the read path itself. A feed-freshness overlay classifies the current run into a small set of honest states so historical green proof cannot stand in for live-feed capability, and a related-situations scorer groups situations by shared entities or matching type without inventing links. Everything is fixture-bound: there is no live market data, no external model access, and no investment-related actions anywhere in scope.

Mechanisms

  • validate_market_dashboard_read_model
  • _runtime_feed_freshness_overlay
  • _related_situations

What the checks do

validate_market_dashboard_read_model is the structural and overclaim gate. It first checks the read model is well formed: the schema version matches, every situation in the queue resolves to a detail entry, every graph-slice edge points at a node that exists, and each drilldown source-ref returns metadata only with no arbitrary file read and no .. traversal in its route. It then enforces the claim boundary. auto_apply_allowed must be false, review_gated must be true, no_advice_mode must be enabled, the silent-omission count must be zero, and any copied source text is scanned for trading or action-claim language (buy, sell, short, price target, stop loss, and similar). The bundle feeds it five negative fixtures, one per failure shape, and confirms the source emits the matching error string for each. A read model that passed these checks but stayed silent on a planted overclaim would be the real failure, so the bundle treats a missing error as a finding.

_runtime_feed_freshness_overlay reads a per-run readiness summary and reports one of three honest states. fresh_green_feed requires the run to be ready, all targets met, no blockers, and same-day generation. stale_green_feed is artifact-backed but no longer same-day. blocked_missing_artifact covers the run that is missing its readiness file, falls short on targets, or carries blockers. The point is that a stale or absent run never reports green: historical proof cannot stand in for live-feed capability, and the state carries a plain truth-statement saying so. The bundle writes synthetic readiness files for each case and checks the classifier returns the expected state.

_related_situations builds the "see also" cohort for a situation. It collects other situations that either share an entity or match the situation type, ranks them, excludes the focus situation itself, and caps the list at six. The bundle checks one boundary case in particular: a situation with no entity overlap and a different type produces an empty cohort rather than a spurious link.

Shape

Synthetic dashboard,freshness, related fixturesSynthetic dashboard, freshness, related fixturesCopied read-model helpers(market_dashboard_read_model.py)Copied read-model helpers (market_dashboard_read_model.py)Validate market dashboardread modelValidate market dashboard read modelStructure: schema,queue-to-detail, graph edges,drilldown route safetyStructure: schema, queue-to-detail, graph edges, drilldown route safetyScope limit: no auto-apply,review-gated, no-advice,no trading language,zero silent omissionsScope limit: no auto-apply, review-gated, no-advice, no trading language, zero silent omissions_runtime_feed_freshness_overlay_runtime_feed_freshness_overlayfresh_green_feedfresh_green_feedstale_green_feedstale_green_feedBlocked missing artifactBlocked missing artifact_related_situations_related_situationsEntity overlap or type match;self-excluded, capped at six;no overlap means emptyEntity overlap or type match; self-excluded, capped at six; no overlap means emptymetadata-only result recordand card(refs, digests, counts,verdicts)metadata-only result record and card (refs, digests, counts, verdicts)

Source refs

Validate market dashboard read model
validate_market_dashboard_read_model
Blocked missing artifact
blocked_missing_artifact
Diagram source
flowchart TD A["Synthetic dashboard, freshness, related fixtures"] --> B["Copied read-model helpers (market_dashboard_read_model.py)"] B --> C["validate_market_dashboard_read_model"] C --> C1["Structure: schema, queue-to-detail, graph edges, drilldown route safety"] C --> C2["Scope limit: no auto-apply, review-gated, no-advice, no trading language, zero silent omissions"] B --> D["_runtime_feed_freshness_overlay"] D --> D1["fresh_green_feed"] D --> D2["stale_green_feed"] D --> D3["blocked_missing_artifact"] B --> E["_related_situations"] E --> E1["Entity overlap or type match; self-excluded, capped at six; no overlap means empty"] C1 --> F["metadata-only result record and card (refs, digests, counts, verdicts)"] C2 --> F D1 --> F D2 --> F D3 --> F E1 --> F

Reader Evidence Routing

Start with paper_modules/batch12_market_dashboard_read_model_capsule.json for bundle-derived source authority, then read this Markdown as the explanatory projection. Use examples/batch12_market_dashboard_read_model_capsule/exported_batch12_market_dashboard_read_model_capsule_bundle/source_module_manifest.json to inspect copied-source digest status before opening copied source modules. Use tests/test_batch12_market_dashboard_read_model_capsule.py to verify the fixture and bundle expectations.

The useful evidence is dashboard read-model accounting over synthetic public fixtures: validation rows, freshness overlays, related-situation joins, negative cases, metadata-only result records, and scope limit fields.

Prior Art Grounding

The component is grounded in CQRS/read-model and dashboard-observability patterns: derive presentation-ready projections from source data, make freshness visible, and keep the read surface separate from mutation authority. Useful anchors include:

  • Microsoft's CQRS pattern, where read models are optimized for queries and presentation rather than command handling.
  • Grafana dashboards, which query and transform data sources into operational panels.

Microcosm borrows the read-model shape for dashboard validation, runtime feed freshness overlays, and related-situation joins. The result is fixture-bound mechanism evidence; it does not become market-level conclusions, external model access, investment-related actions, or launch-scope decision.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the dashboard read-model result record and sign-off JSON. The bundle command validates copied source system, manifest digests, freshness overlay rows, related-situation joins, negative cases, and metadata-only result record posture. The focused test checks fixture validation, bundle validation, digest/anchor coverage, and scope limits.

This result record path is reader-verifiable evidence only. It excludes launch, external model access, private-system equivalence, market-level conclusions, investment-related actions, or whole-system correctness.

Scope boundary

Scope limit

This module may claim public fixture evidence that the copied source system produced market-dashboard read-model rows, runtime feed freshness overlays, related-situation joins, negative-case checks, metadata-only result record posture, and validation result records over synthetic inputs.

This module may not claim launch-scope decision, external model access, private-system equivalence, live market-level conclusions, investment-related actions, deployment posture, source-file changes, publishing-scope decision, or whole-system correctness.

Scope limit

This is fixture-bound market-dashboard read-model mechanism evidence. It excludes launch, external model access, private-system equivalence, market-level conclusions, investment-related actions, deployment posture, source-file changes, publishing-scope decision, or whole-system correctness.

Prediction Market Board BundleReplays imported quant market math on test rows, with duplicate retention and seven refusals.5/5

Does This bundle imports the quant presentation mart source as public runnable system. Running it over synthetic prediction-market and feed-diagnostic rows shows event identity joining, duplicate-market retention by volume, orphan identity refusal, provider drift flags, missingness rows, unavailable previous-green deltas, and source lifecycle vintage enrichment.

Scope limit This is deterministic fixture evidence for copied quant helpers only; it is not live prediction-market-level conclusions, not provider truth, not forecast correctness, not investment-related actions, not external model access, and not launch-scope decision.

Run
microcosm batch12-prediction-market-board-capsule run-prediction-market-board-bundle --input examples/batch12_prediction_market_board_capsule/exported_batch12_prediction_market_board_capsule_bundle --out receipts/runtime_shell/demo_project/organs/batch12_prediction_market_board_capsule

EvidenceVerified source importevidence 5/5Copied source body

research-workflowsforecastingfinance

Source Design note · Source atlas

Paper module Set 12 Prediction Market Board Bundle

Purpose

Market and source dashboards have a recurring failure: a row looks like a fact when it is really a guess. A duplicate listing inflates a volume figure, an unmatched market slug grows a fabricated identity, a feed reports zero rows but the board shows it as healthy, and a "change since last time" number appears even when there is no prior baseline to compare against. The single question this component answers is whether the copied presentation-mart logic keeps those distinctions honest when run over public synthetic inputs.

It does that by importing the real quant_presentation_mart helper body and running it against fixtures that are built to expose each trap, then asserting the exact diagnostic the body should produce. The interesting choice is that the board never asserts what a market price means. It computes accounting about the data: which event a market belongs to, whether its identity was actually matched, how providers drifted, where rows went missing, and whether a vintage date is genuinely present. Aggregation is deliberately conservative. A missing value stays missing rather than defaulting to a confident zero, and an unmatched slug is reported as missing_from_feed_artifact instead of being given a synthetic event id.

The result is fixture-bound evidence, not a forecast. The board is a diagnostic surface over public synthetic rows. It does not read live markets, use external model services, or claim that any number is tradeable.

Mechanisms

  • _prediction_market_board
  • _polymarket_identity_by_slug
  • _provider_drift_monitor
  • _missingness_board
  • _delta_since_previous_green
  • _macro_lifecycle_by_slug
  • _macro_regime_board

How it works

The bundle loads three fixtures, runs the copied helpers, and checks eight named invariants. Each check targets a specific way a board can quietly mislead.

The event-join engine (_prediction_market_board with _polymarket_identity_by_slug) groups raw market rows into events using the Polymarket identity snapshot. Identity is matched by market_slug. When two rows share the same slug and outcome, only the higher-volume one is kept, so a duplicate listing cannot double a market count or inflate an aggregate. A slug with no identity match is not dropped and is not given a made-up event id. Its event_identity_status becomes missing_from_feed_artifact and its max_liquidity stays at 0.0. The fixture proves all three: the duplicate fold (top volume 900000 with one surviving market), the orphan with a null event id, and the deduped aggregate.

The provider-drift monitor (_provider_drift_monitor) reads each feed's diagnostics and raises typed flags rather than a single health score. Generic transport problems (provider_fallback_used, html_response_seen, fetch_failures) are kept distinct from FRED-specific ones (fred_invalid_series, fred_network_warning). The fixture checks that the stock feed surfaces the generic set, the news feed stays clean, and the source feed surfaces the FRED set. Keeping the families apart means a source data-source fault is not laundered into a generic warning.

The missingness board (_missingness_board) lists only feeds that are not both non-empty and ok. A feed with zero rows is labelled zero_rows; a populated but low-quality feed is labelled quality_degraded; a healthy feed is omitted entirely. The fixture confirms the healthy feed is absent and the two failing lanes carry the correct reason, so an empty feed cannot read as present.

The prior-green delta (_delta_since_previous_green) only computes a "change since last run" when a previous green run actually exists. With no baseline it returns status: unavailable and an empty row_deltas_by_lane, which the fixture asserts directly. This is the guard against a delta number that has nothing to compare against.

The source lifecycle enrichment (_macro_lifecycle_by_slug feeding _macro_regime_board) buckets source series, then binds each bucket's vintage_status and release_calendar_status to whether the lifecycle structured source record genuinely carries that metadata. The fixture proves a series with a present vintage reads available with the expected observation date, while a series whose lifecycle row is absent reads missing_from_feed_artifact. A vintage date is shown only when it is really there.

Shape

yesno, unmatchedno, matchedSynthetic market rowsSynthetic market rowsEvent join + identity match_prediction_market_boardEvent join + identity match _prediction_market_boardPolymarket identity snapshotPolymarket identity snapshotQuant-mart helper fixturesQuant-mart helper fixturesProvider drift monitorgeneric vs FRED flagsProvider drift monitor generic vs FRED flagsMissingness boardzero_rows vs quality_degradedMissingness board zero_rows vs quality_degradedPrior-green deltaunavailable with no baselinePrior-green delta unavailable with no baselineSource regime boardvintage status bound tostructured source recordSource regime board vintage status bound to structured source recordSlug + outcomeseen before?Slug + outcome seen before?Keep higher-volume marketKeep higher-volume marketno fabricated event idno fabricated event idAppend to event aggregateAppend to event aggregatemetadata-only result recordand carddiagnostic rows, negativecases,scope limitmetadata-only result record and card diagnostic rows, negative cases, scope limit

Source refs

no fabricated event id
missing_from_feed_artifact
Diagram source
flowchart TD Rows["Synthetic market rows"] --> Join["Event join + identity match _prediction_market_board"] Identity["Polymarket identity snapshot"] --> Join Helpers["Quant-mart helper fixtures"] --> Drift["Provider drift monitor generic vs FRED flags"] Helpers --> Miss["Missingness board zero_rows vs quality_degraded"] Helpers --> Delta["Prior-green delta unavailable with no baseline"] Helpers --> Source["Source regime board vintage status bound to structured source record"] Join --> Dedup{"Slug + outcome seen before?"} Dedup -->|yes| Keep["Keep higher-volume market"] Dedup -->|no, unmatched| Orphan["missing_from_feed_artifact no fabricated event id"] Dedup -->|no, matched| Append["Append to event aggregate"] Keep --> Result record["metadata-only result record and card diagnostic rows, negative cases, scope limit"] Orphan --> Result record Append --> Result record Drift --> Result record Miss --> Result record Delta --> Result record Source --> Result record

Reader Evidence Routing

Start with paper_modules/batch12_prediction_market_board_capsule.json for bundle-derived source authority, then read this Markdown as the explanatory projection. Use examples/batch12_prediction_market_board_capsule/exported_batch12_prediction_market_board_capsule_bundle/source_module_manifest.json to inspect copied-source digest status before opening copied source modules. Use tests/test_batch12_prediction_market_board_capsule.py to verify the fixture and bundle expectations.

The useful evidence is diagnostic accounting over synthetic public fixtures: provider identity matching, drift rows, missingness boards, prior-green deltas, lifecycle/vintage rows, source-regime enrichment, negative cases, metadata-only result records, and scope limit fields.

Prior Art Grounding

The component borrows from prediction-market information aggregation and public market-data integration practice: event contracts expose market prices and settlement states, while dashboards must keep provider identity, missingness, and vintage drift visible. Relevant anchors include:

Microcosm borrows the information-aggregation and provider-join shape, then keeps the board explicitly diagnostic: identity matching, provider drift, missingness, prior-green deltas, lifecycle vintage, and source-regime enrichment are tested over public synthetic fixtures. It is not market-level conclusions, provider truth, investment-related actions, or launch-scope decision.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the prediction-market board result record and sign-off JSON. The bundle command validates copied source system, manifest digests, provider identity and drift diagnostics, missingness rows, lifecycle rows, negative cases, and metadata-only result record posture. The focused test checks fixture validation, bundle validation, digest/anchor coverage, and scope limits.

This result record path is reader-verifiable evidence only. It excludes launch, external model access, private-system equivalence, market-level conclusions, provider truth, investment-related actions, or whole-system correctness.

Scope boundary

Scope limit

This is fixture-bound mechanism evidence for prediction-market joining, quant-mart diagnostics, and source-lifecycle vintage enrichment. It excludes launch, external model access, private-system equivalence, market-level conclusions, provider truth, investment-related actions, source-file changes, publishing-scope decision, or whole-system correctness.

Scope limit

It does not establish live market-level conclusions, provider truth, external model access, investment-related actions, source-file changes, launch-scope decision, publishing-scope decision, private-system equivalence, or whole-system correctness.

Import & drift control (19)

Source Projection Import ProtocolGates private-to-public imports, accepting only files with matching fingerprints and sources.5/5

Does This is the checkpoint that handles bringing material from the larger private project into the public Microcosm folder. When someone proposes a set of files to import, it verifies each one: it only accepts material, and only when the destination file, a content fingerprint (to confirm the copy matches), a record of where it came from, and the supporting checks all line up. Anything held back as private or secret has to come with a written note saying so, and attempts to claim more authority than allowed are rejected. The record shows exactly what was imported, what was deliberately left out, and what was refused, so the public copy stays honest about its limits instead of quietly leaking private source or pretending to be more than it is.

Scope limit It authorizes only verified source body import with provenance and content-digest checks; it does not grant source authority, private-system equivalence, launch, hosted deployment, public sharing, recipient work, provider or Lean/Lake execution, secret or private-source-body export, or any whole-system correctness claim.

Run
microcosm macro-projection-import-protocol plan --input examples/macro_projection_import_protocol/exported_projection_import_bundle

Paper module Source Projection Import Protocol

macro_projection_import_protocol is the source-available membrane for bringing source system into Microcosm. It exists because Microcosm should be dense and alive without becoming a dump of private source bodies, operator context, model-output data, or launch material.

The component validates a projection packet with four public claims:

  • source bodies are copied or source-faithfully refactored only when the target file, digest, provenance, validation refs, and metadata-only result record contract verify;
  • private material is omitted with explicit omission result records;
  • public runtime refs are fixtures, standards, paper modules, exported bundles, copied body targets, and result record refs;
  • authority stays capped below launch, public sharing, private-system equivalence, and live source source authority.

Purpose

Microcosm grows by copying real material out of a much larger private codebase. The danger in that move is obvious: a dense public copy is exactly the kind of artefact that quietly carries a secret, an operator conversation, a model-output data, or launch material along with the genuinely useful code. This component exists to answer one question for every copied slice: was this body allowed out, and is the public copy honestly tied to the source it claims to come from?

The answer is an accounting check, not a trust statement. Each copied row declares its source ref, its public target ref, a content digest, and a material class. The protocol sorts that class into one of two sets. Five classes are source bodies (pattern, standard, tool, result record, proof) and may be copied with provenance. Nine classes are forbidden outright (source note, operator thread, model-output data, account secret, secret, recipient packet, launch packet, and the like) and can never appear as an imported body. Anything claiming to be must also carry a verification record naming the digest, the source-to-target relation, and the command or test that consumes the copy.

The unusual part is how the protocol treats a copy whose source has since changed. For an exact-copy row it re-hashes the live source file on disk and compares it against the digest recorded at import time. A mismatch is not reported as a failed import. It is recorded as live source drift: the original copy was still honest, the source has simply moved on, and the row is flagged for the refresh actuator rather than failed. The protocol deliberately separates a dishonest import from a stale one. That keeps the public copy faithful without forcing it to track every upstream edit in lock-step, and it stops a routine upstream change from being mistaken for a broken proof.

What the check does not do is just as load-bearing. A passing scan proves that the named slice omitted the forbidden material classes and kept result record bodies out of the result record. It does not establish the public copy is complete, equivalent to the private root, or ready to launch. The import is evidence about provenance and boundaries, never a launch decision.

Shape

The protocol is the membrane between source source and public Microcosm evidence. It reads projection cells, classifies the requested import, verifies source/target refs and digest relations, applies the secret-exclusion boundary, and emits metadata-only result records that a public reader can replay without gaining live source authority.

Its shape is deliberately two-level:

  • fixture and exported-bundle commands validate whole projection packets, negative cases, omitted-material result records, and the intake/status board;
  • source-module manifests bind each imported slice to source refs, target refs, digest relation, body-import class, validation refs, and scope limits.

That split keeps the component usable as a source-open body floor while preventing the paper module from becoming a static copy-count ledger. Counts, status totals, and current body-import floors live in result records and runtime status surfaces.

Runtime Shape

Run the fixture:

PYTHONPATH=src python3 -m microcosm_core.organs.macro_projection_import_protocol run --input fixtures/first_wave/macro_projection_import_protocol/input --out receipts/first_wave/macro_projection_import_protocol

Run the exported bundle:

PYTHONPATH=src python3 -m microcosm_core.organs.macro_projection_import_protocol run-projection-bundle --input examples/macro_projection_import_protocol/exported_projection_import_bundle --out receipts/runtime_shell/demo_project/organs/macro_projection_import_protocol

Preview the next import slice without writing result records:

PYTHONPATH=src python3 -m microcosm_core.organs.macro_projection_import_protocol plan --input examples/macro_projection_import_protocol/exported_projection_import_bundle

The public CLI also exposes the same validator through:

The plan action emits macro_projection_import_intake_preview_v1. It does not write result records. It scores each proposed projection cell before import: source refs, public target refs, validation refs, selected pattern ids, copy policy, scope limit, omitted material, secret-exclusion scan count, verified body-import status, and ready/blocked status.

Exact-copy is a relation, not the whole protocol. Rows declared as exact-copy prove byte-identical source and target digests and may be maintained by the exact-copy refresh actuator. Rows declared as source-faithful public edits or refactors prove the source source digest and the improved public target digest separately, cite the rewrite or symbol mapping, and are maintained by their own validator/test lane. This is the lane for public-safety redaction, dependency trimming, Microcosm-standard compliance, or runnable local cleanup.

It also self-hosts the intake cell state machine. Every projection cell carries projection_status, cell_state, action_required, status reason, landed evidence refs, and a next runtime surface. The board totals those fields as status counts plus an open-actionable count so future passes can distinguish a ready but unlanded cell from a verified public runtime import, self-hosted protocol, or runtime bridge that is already consumed.

microcosm intake is the runtime bridge over that plan. It writes receipts/runtime_shell/intake_bridge/runtime_reveal_import_bridge.json, links the projection cells to the spine and reveal commands, and projects the same statuses into the first-run bridge. Current landed statuses are: public_runtime_import_landed for formal_math_readiness_extensions, self_hosted_status_protocol_landed for projection_protocol_self_host, and runtime_bridge_landed for runtime_reveal_import_bridge. These statuses do not raise authority above public metadata, fixture shape, and result record refs.

microcosm status and microcosm spine also expose the computed macro_body_import_floor. Treat that value as a result record-backed floor, not a stable prose constant: the current authority lives in result records/sign-off/first_wave/macro_projection_import_protocol_fixture_acceptance.json and the first-wave runtime result records under receipts/first_wave/macro_projection_import_protocol/. Cold readers should inspect public_safe_body_import_count, public_safe_body_import_status, projection_status_counts, open_actionable_cell_count, and secret_exclusion_scan there instead of trusting an old markdown count. The floor is still not a launch signal or private-system equivalence claim.

Trace-Bundle Source-Body Import

The trace-bundle slice is the current proof-grade example of a source-body import. Its source-module manifest is examples/macro_projection_import_protocol/exported_projection_import_bundle/trace_capsule_source_module_manifest.json; the projection cell is trace_capsule_prompt_edit_capture_source_modules_import. The cell imports four source source bodies into the bundle:

  • tools/meta/observability/cli_prompt_trace.py -> source_modules/tools/meta/observability/cli_prompt_trace.py;
  • system/server/tests/test_cli_prompt_trace_capsule.py -> source_modules/system/server/tests/test_cli_prompt_trace_capsule.py;
  • tools/agent_trace_structurer/parser.mjs -> source_modules/tools/agent_trace_structurer/parser.mjs;
  • tools/agent_trace_structurer/parser.test.mjs -> source_modules/tools/agent_trace_structurer/parser.test.mjs.

The manifest is the body-floor result record for this slice. It records module_count: 4, body_copied: true, body_in_receipt: false, sha256_match: true, line counts, byte counts, required anchors, source refs, target refs, and the shared copied_non_secret_macro_body classification. That means the public bundle carries the source bodies, while runtime result records carry paths, hashes, counts, anchors, and validation refs without duplicating the bodies.

Copied material rowsource ref, target ref,digest, material classCopied material row source ref, target ref, digest, material classMaterial class?Material class?Reject: forbidden body importReject: forbidden body importVerification record presentand target digest bound?Verification record present and target digest bound?Reject: unverified importReject: unverified importRe-hash live source source ondiskRe-hash live source source on diskSource digest still matches?Source digest still matches?body floorbody floorFlag live source drift(honest copy, refresh later)Flag live source drift (honest copy, refresh later)Per-slice manifest +metadata-only result recordPer-slice manifest + metadata-only result recordI
Diagram source
flowchart TD A["Copied material row source ref, target ref, digest, material class"] --> B{"Material class?"} B -- "forbidden class (secret, account secret, source note, operator, provider, launch)" --> R["Reject: forbidden body import"] B -- "class (pattern, standard, tool, result record, proof)" --> C{"Verification record present and target digest bound?"} C -- "no" --> R2["Reject: unverified import"] C -- "yes, exact copy" --> D["Re-hash live source source on disk"] D --> E{"Source digest still matches?"} E -- "yes" --> F["body floor"] E -- "no" --> G["Flag live source drift (honest copy, refresh later)"] G --> F F --> H["Per-slice manifest + metadata-only result record"] H --> I["Reader projection"] H -. does not grant .-> J["live source authority, public sharing, launch, or source-file changes"]

The imported Python side supplies the trace-bundle runtime surface: cli_prompt_trace.py reads selected source files, rejects binary paths, supports line-range and symbol selection, redacts selected excerpt text, and emits numbered source lines with schema metadata. Its companion test module proves terminal validation semantics, repeated prompt interning, source excerpt priority, and completion-report behavior. The imported JavaScript side supplies the Agent Trace Structurer surface: parser.mjs preserves source_text as the exact copied string, treats source_lines and indexes as deterministic navigation projections, and builds lossless attachment clips where exact text is reconstructed from source_segments[].text. parser.test.mjs proves embedded file artifact indexing, Codex trace shape, final-message extraction, AIW thread classification, and bounded export behavior.

This is a mechanism/evidence claim, not a launch claim. The slice proves that these four named, source bodies were imported into the public bundle with manifest-backed digest and anchor checks, and that the parser and trace bundle behavior have public fixture coverage. It does not establish that live provider logs, browser UI state, account or browser state, account secrets, raw operator thread bodies, recipient-send material, or future trace-bundle bodies are or exported.

Those artifacts are the source-open floor. The result record bodies stay metadata-only, and private source note, operator thread content, model-output data bodies, account secrets, account or browser state, and launch or recipient material remain outside the public bundle.

Evidence Binding

The component's current public authority is the accepted component row in core/organ_registry.json plus the sign-off result record result records/sign-off/first_wave/macro_projection_import_protocol_fixture_acceptance.json. The JSON paper-module bundle is core/paper_module_capsules.json#paper_module.macro_projection_import_protocol, and the resolved mechanism row is core/mechanism_sources.json#mechanism.macro_projection_import_protocol.validates_public_macro_projection_imports. The runtime source locus is src/microcosm_core/organs/macro_projection_import_protocol.py, with focused regression coverage in tests/test_macro_projection_import_protocol.py.

The exported bundle does not have a single catch-all source-module manifest. It carries one *_source_module_manifest.json file per imported slice under examples/macro_projection_import_protocol/exported_projection_import_bundle/, plus copied targets under that bundle's source_modules/ tree. That per-slice manifest shape is part of the evidence: it lets each imported route, tool, standard, result record, proof, or runtime body keep its own source ref, target ref, digest relation, validation refs, and scope limit.

The first command for the fixture lane is:

PYTHONPATH=src python3 -m microcosm_core.organs.macro_projection_import_protocol run --input fixtures/first_wave/macro_projection_import_protocol/input --out receipts/first_wave/macro_projection_import_protocol

Reader Evidence Routing

Use this order when checking the module:

  1. Read the JSON bundle and standard to confirm the paper-module binding, scope limit, source-module manifest contract, and result record fields.
  2. Run the fixture command to validate projection cells and negative cases against temporary result records.
  3. Run the exported-bundle command to validate the public bundle and copied source-module surfaces.
  4. Inspect the source-module manifests for exact-copy versus source-faithful edit relations before deciding which refresh lane applies.
  5. Run the focused regression and paper-module corpus checks before landing a markdown or manifest update.

If a manifest is dry but a bundle-level validator still fails, check whether a bundle manifest carries its own expected digest or line-count rows. Do not infer that all companion manifest surfaces were refreshed just because an exact-copy source-module dry run is clean.

Prior Art Grounding

The import membrane follows established provenance and software-supply-chain patterns: copied or refactored artifacts need source refs, target refs, digests, validation refs, omission records, and a claim boundary. The closest public anchors are W3C PROV for describing entity/activity/agent provenance, the SLSA specification for artifact integrity and provenance in software supply chains, and in-toto for linking supply-chain steps through signed metadata.

Microcosm applies those patterns to a public/private projection boundary rather than to launch attestation. The per-slice source-module manifests, secret-exclusion scans, metadata-only result records, and omission result records are inspired by that provenance lineage, but they remain a local validator contract for public Microcosm fixtures and exported bundles.

Negative Cases

The validator intentionally rejects:

  • private body import requests;
  • omitted source material without omission result record refs;
  • authority upgrades into live source source authority;
  • projection cells without validation refs;
  • launch, public sharing, recipient-work, or secret-export claims.

Validation Result record Path

From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:

These checks validate projection cells, per-slice manifests, omitted-material result records, and metadata-only result record policy only. A diagram view is generated for this module and an atlas card is linked. The checks do not authorize live source source authority, secret export, launch, public sharing, source-file changes, provider or Lean/Lake execution, or whole-system correctness.

Re-enter this module when a new projection cell lands, a source-module manifest is refreshed, or a result record count changes. The repair route is to rerun the component validator, refresh the first-wave and sign-off result records, and update the standard or paper module only where the result record contract changed. Do not raise the scope limit from documentation edits.

Scope boundary

Scope limit

This module can claim that the protocol validates projection cells, per-slice manifests, copied or source-faithful target bodies, omission result records, negative cases, and metadata-only result record policy. It can also claim that accepted result records expose current public_safe_body_import_count, public_safe_body_import_status, projection_status_counts, open_actionable_cell_count, and secret_exclusion_scan fields.

It cannot claim that Microcosm is launch-ready, equivalent to the private root, free of all private material, or authorized to publish. It also cannot raise an exact-copy refresh into permission to rewrite source-faithful public refactors, mutate live source source, use external model services, run Lean/Lake, or export operator/session bodies. Any stronger claim must come from the owning result record, standard, or launch gate.

Scope boundary: metadata, provenance, public runtime refs, copied-body presence, green fixture result records, digest refs, and intake status counts are bounded import evidence only. They are not launch-scope decision, publishing-scope decision, private-system equivalence, live source authority, semantic truth, complete secret-scan coverage, external model service, Lean/Lake execution, or whole-system correctness.

Scope limit

This paper module explains a public projection protocol. It excludes launch, hosted deployment, public sharing, recipient work, external model access, Lean/Lake execution, secret export, private source-body export, or whole-system correctness.

Source and projection details
Source-Open Body Floor

Exact-copy rows are refreshed by refresh-exact-copy-source-modules; source-faithful edit rows stay with their own validator/test lane because their target body is intentionally public cleanup, normalization, or path redaction rather than byte identity.

The bundle body floor is never inferred from prose. A reader should inspect:

  • examples/macro_projection_import_protocol/exported_projection_import_bundle/*_source_module_manifest.json for per-slice source-to-target relations;
  • the copied targets under examples/macro_projection_import_protocol/exported_projection_import_bundle/source_modules/;
  • receipts/first_wave/macro_projection_import_protocol/projection_import_intake_board.json for cell state, open actions, and landed evidence refs;
  • result records/sign-off/first_wave/macro_projection_import_protocol_fixture_acceptance.json for the accepted public authority result record.
World Model Projection Drift Control RoomPinpoints where a projected world-model copy drifted from its real source, with repair routes.5/5

Does This component shows, in plain result records, where a projected copy of a world model has drifted from its real source. For each drift it names the signal, points to where the real source lives, gives a suggested repair route, and cites the test that would confirm the fix. The result records show the drift is being flagged honestly: the projection never claims to be the source of truth, and the result records deliberately leave out any non-public or secret-backed data.

Scope limit It only validates the declared public, metadata-only drift-result record contract. It supports inspection of recorded drift rows and source-linked refs; live repair, source control, doctrine changes, model-output export, public sharing, and launch are outside the fixture. It does not claim complete drift coverage or live repair control.

Run
microcosm world-model-projection-drift-control-room run-drift-control-bundle --input examples/world_model_projection_drift_control_room/exported_projection_drift_control_bundle --out receipts/runtime_shell/demo_project/organs/world_model_projection_drift_control_room

Paper module World-Model Projection Drift Control Room

Abstract

world_model_projection_drift_control_room is Microcosm's public projection-drift control component. It turns projected world-model rows into an auditable runtime result record: each row must carry a source signal, source ref, target ref, repair route, validation ref, fact-authority mesh, and explicit scope boundary booleans before the projection can pass.

The mechanism is deliberately narrow. It validates that public, metadata-only projection rows remain tied to named source evidence and rejection policy; it does not claim that the projection is source authority, that a live route was repaired, that private runtime state was inspected, or that Microcosm is public sharing-authorized or launch-authorized.

Purpose

This component exists to answer one question: when a public read model says something has drifted, can that claim still be traced back to a real source artifact, or has the read model quietly started to stand in for the source?

The design choice that makes this more than a shape check is that the supplied drift_rows.json is never trusted as input. The validator recomputes the drift rows from the public runtime result record, then treats the supplied file only as an expected snapshot whose role is recorded as expected_snapshot_not_source_authority. If the snapshot disagrees with the recomputed rows, that is flagged as staleness, not accepted as fact. Each recomputed row is then diffed against a real source-state artifact: a row from the extracted-pattern ledger, or a view-quality action-map lens whose own summary is re-derived from its action rows. A row that cannot be re-derived from source, or whose guard reference or derivation path has changed, moves the verdict to blocked.

The same boundary holds in the other direction. A drift row may name a repair route, but the route stays a label rather than an action: the validator rejects any row that authorises live repair, source-file changes, automatic doctrine changes, or launch. A projection here can describe what is wrong and where to go next without ever being allowed to act on it or to speak for the source it describes.

Telos

Projection drift is the failure mode where a useful read model begins to look like truth. A dashboard row, generated structured source record, route card, or public runtime result record can be correct enough for navigation while still being downstream of a source artifact that owns the actual authority.

This component makes that boundary executable. It accepts public drift rows only when they retain:

  • a real source signal and source ref
  • a target ref that names where the projection appears
  • a repair-route label that remains a route, not a live mutation
  • a validation ref that can witness the row
  • a fact-authority record with authority, appearance, derivation, guard, and residual-route fields
  • metadata-only result record policy and an explicit scope limit

Technical Object

The runtime locus is src/microcosm_core/organs/world_model_projection_drift_control_room.py. The exported public example is examples/world_model_projection_drift_control_room/exported_projection_drift_control_bundle. The accepted first-wave fixture is fixtures/first_wave/world_model_projection_drift_control_room/input.

The component exposes two public validation routes:

cd microcosm-substrate
PYTHONPATH=src ../repo-python -m microcosm_core.organs.world_model_projection_drift_control_room \
  run-drift-control-bundle \
  --input examples/world_model_projection_drift_control_room/exported_projection_drift_control_bundle \
  --out /tmp/microcosm_world_model_projection_drift_bundle

Projection-Drift Mechanism

The validator recomputes the public projection rows from runtime result records and source artifacts, then compares them with the supplied fixture snapshot. A row passes only when the recomputed projection, supplied snapshot, source-ref evidence, source-state diff, source-module manifest check, copied-body geometry probe, runtime result record witness, and non-public-state exclusion scan all stay inside the public boundary.

The core result payload records:

  • drift_summary.row_count: 8
  • source_ref_count: 8
  • target_ref_count: 8
  • repair_route_count: 8
  • validation_ref_count: 8
  • fact_authority_row_count: 8
  • guarded_projection_treatment_count: 8
  • unguarded_duplicate_count: 0
  • runtime_receipt_witnessed_row_count: 8
  • source_authority_claim_count: 0
  • live_repair_authorized_count: 0
  • source_mutation_authorized_count: 0
  • automatic_doctrine_promotion_count: 0

The source-state result record evidence is intentionally small and inspectable. The focused test suite expects exactly two source-state evidence classes: extracted_pattern_ledger_row_diff and view_quality_action_map_summary_diff.

Runtime Result record Evidence

The public result record floor is metadata-only. The first-wave result records live at:

  • receipts/first_wave/world_model_projection_drift_control_room/world_model_projection_drift_control_room_result.json
  • receipts/first_wave/world_model_projection_drift_control_room/world_model_projection_drift_control_room_validation_receipt.json
  • result records/sign-off/first_wave/world_model_projection_drift_control_room_fixture_acceptance.json

The exported-bundle result record lives at:

  • receipts/runtime_shell/demo_project/organs/world_model_projection_drift_control_room/exported_projection_drift_control_bundle_validation_result.json

The exported-bundle result record records body_import_status: real_runtime_receipt_landed, body_material_status: copied_non_secret_macro_body_landed, body_copied_material_count: 4, body_in_receipt: false, and release_authorized: false. Its scope limit also sets source_authority_claim, source_mutation_authorized, live_route_repair_authorized, automatic_doctrine_promotion_authorized, provider_payload_exported, publication_authorized, and release_authorized to false.

Source-Available Body Floor

The exported bundle includes copied source bodies so a reader can inspect the implementation class without receiving private runtime state in the result record. The source-module manifest is:

  • examples/world_model_projection_drift_control_room/exported_projection_drift_control_bundle/source_module_manifest.json

It records four copied modules:

  • world_model_drift_aggregate_source_body_import
  • world_model_drift_endpoint_source_body_import
  • view_quality_action_map_source_body_import
  • view_quality_action_map_test_body_import

Every manifest row is body_copied: true, body_in_receipt: false, classification: copied_non_secret_macro_body, and material_class: public_macro_tool_body, with sha256_match: true. The largest bodies are the Station world-model reducer system/server/world_model.py, the /api/drift endpoint in system/server/main.py, the view-quality action-map builder tools/meta/observability/view_quality_census.py, and its focused source regression test system/server/tests/test_view_quality_census.py.

The body floor is therefore source-available by bundle, not by result record. Result records carry paths, hashes, counts, anchor checks, and verdicts; they do not duplicate private bodies, model-output data, browser UI state, account or browser material, source notes, recipient-send state, or account secret-equivalent payloads.

Mutation and Rejection Contract

The validator is not a shape-only check. The focused test suite mutates the public inputs and requires the verdict to move to blocked when authority or freshness is broken:

  • missing source refs produce DRIFT_SOURCE_REF_REQUIRED
  • missing repair or validation refs produce DRIFT_VALIDATION_REF_REQUIRED
  • missing fact-authority mesh produces DRIFT_FACT_AUTHORITY_REQUIRED
  • projection rows claiming source authority produce DRIFT_SOURCE_AUTHORITY_FORBIDDEN
  • live repair authority produces DRIFT_LIVE_REPAIR_FORBIDDEN
  • non-public runtime export produces DRIFT_PRIVATE_RUNTIME_EXPORT_FORBIDDEN
  • model-output data export produces DRIFT_PROVIDER_PAYLOAD_FORBIDDEN
  • automatic doctrine changes produces DRIFT_AUTOMATIC_DOCTRINE_PROMOTION_FORBIDDEN
  • launch-scope decision produces DRIFT_RELEASE_AUTHORITY_FORBIDDEN

Additional source-drift tests cover unwitnessed runtime rows, stale supplied snapshots, mutated runtime result record refs, missing source-ledger rows, source ledger rows without source_refs, view-quality source-file changes, internally consistent fake source refs, and selected-row order drift. These cases matter because a projection can be internally coherent and still lose authority if its source evidence, guard result record, or derivation path changes.

Shape

Public runtime result recordPublic runtime result recordRecompute drift rowsfrom selected_pattern_ids +result record rowsRecompute drift rows from selected_pattern_ids + result record rowsexpected snapshot,source-linked onlyexpected snapshot, source-linked onlySource-state diffextracted-pattern ledger +view-quality action mapSource-state diff extracted-pattern ledger + view-quality action mapView-quality geometry gradevia copiedview_quality_census.pyView-quality geometry grade via copied view_quality_census.pyRuntime result record witnessevery recomputed row appearsin the result recordRuntime result record witness every recomputed row appears in the result recordRejection gatesmissing/fake refs, privateexport, source authority,live repair, source-filechanges, doctrine changes,launchRejection gates missing/fake refs, private export, source authority, live repair, source-file changes, doctrine changes, launchmetadata-only result recordsfirst-wave, sign-off,exported bundlemetadata-only result records first-wave, sign-off, exported bundleScope limitprojection evidence onlyScope limit projection evidence only

Source refs

Public runtime result record
public_projection_drift_control_lens.json
expected snapshot, source-linked only
Supplied drift_rows.json
Diagram source
flowchart TD Result record["Public runtime result record public_projection_drift_control_lens.json"] Recompute["Recompute drift rows from selected_pattern_ids + result record rows"] Snapshot["Supplied drift_rows.json expected snapshot, source-linked only"] SourceDiff["Source-state diff extracted-pattern ledger + view-quality action map"] Geometry["View-quality geometry grade via copied view_quality_census.py"] Witness["Runtime result record witness every recomputed row appears in the result record"] Reject["Rejection gates missing/fake refs, private export, source authority, live repair, source-file changes, doctrine changes, launch"] Result records["metadata-only result records first-wave, sign-off, exported bundle"] Ceiling["Scope limit projection evidence only"] Result record --> Recompute Recompute --> Snapshot Recompute --> SourceDiff Recompute --> Witness Recompute --> Geometry Snapshot --> Reject SourceDiff --> Reject Witness --> Reject Geometry --> Reject Reject --> Result records Result records --> Ceiling

Reader Evidence Routing

Read in this order:

  1. Bundle and generated instance: core/paper_module_capsules.json::paper_modules[27:paper_module.world_model_projection_drift_control_room] and paper_modules/world_model_projection_drift_control_room.json.
  2. Runtime source and focused tests: src/microcosm_core/organs/world_model_projection_drift_control_room.py and tests/test_world_model_projection_drift_control_room.py.
  3. First-wave fixture and result records: fixtures/first_wave/world_model_projection_drift_control_room/input, receipts/first_wave/world_model_projection_drift_control_room/, and result records/sign-off/first_wave/world_model_projection_drift_control_room_fixture_acceptance.json.
  4. Exported-bundle evidence: examples/world_model_projection_drift_control_room/exported_projection_drift_control_bundle/ and receipts/runtime_shell/demo_project/organs/world_model_projection_drift_control_room/exported_projection_drift_control_bundle_validation_result.json.
  5. Generated projection evidence: Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges_after_atlas_binding, and the one selective dependency residual preserved by the generated JSON instance.

Prior Art Grounding

This control room watches a world-model projection for drift between what the model expects and what the runtime reports. It draws on the model-monitoring and concept-drift literature, which treats a growing gap between predicted and observed behaviour as an operational signal. Microcosm borrows the drift-as-signal shape over metadata-only result records; the result is fixture-bound monitoring evidence, source-linked only, private runtime inspection, or whole-system correctness.

Validation Result record Path

Focused runtime validation:

PYTHONPATH=src ./repo-pytest \
  tests/test_world_model_projection_drift_control_room.py -q

Paper-module corpus validation:

cd microcosm-substrate
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Paper-module index validation from the repo root:

./repo-python tools/meta/factory/build_paper_module_index.py --check

Scope boundary

Limitations

The component validates metadata-only drift result records and public-source refs. It supports inspection of recorded drift rows; live repair, source control, doctrine changes, model-output export, public sharing, and launch are outside the fixture. It also does not claim that every possible world-model drift source is covered. Its claim is narrower: the named public drift rows are guarded by source refs, target refs, validation refs, fact-authority mesh, copied source body evidence, metadata-only result records, and negative-case rejection.

Scope limit

This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.

It may not claim more than its bundle scope limit allows: Public metadata-only runtime result record and copied source-module evidence only; no private runtime body inspection, source authority, source-file changes, live route repair, automatic doctrine changes, model-output data export, launch-scope decision, publishing-scope decision, or whole-system correctness claim.

Source and projection details
Governing Lattice Bindings
  • source record: core/paper_module_capsules.json::paper_modules[27:paper_module.world_model_projection_drift_control_room]
  • Generated instance: paper_modules/world_model_projection_drift_control_room.json
  • Standard: standards/std_microcosm_world_model_projection_drift_control_room.json
  • Mechanism: mechanism.world_model_projection_drift_control_room.validates_public_projection_drift_control_boundary
  • Concept: concept.import_projection_and_drift_control_bundle
  • Principle refs: P-1, P-2, P-3, P-5, P-6, P-8, P-9, P-12, P-15
  • Axiom refs: AX-1, AX-4, AX-5, AX-7, AX-8, AX-11

The generated JSON instance reports source_authority: json_capsule, 19 resolved relationship edges, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges_after_atlas_binding, and one honest selective residual for paper_module.depends_on.paper_module because the bundle does not yet name a sibling dependency module.

Unsurfaced Source Primitives BundleExposes eleven real but under-surfaced parts and rejects non-public-state and overclaim cases.5/5

Does This bundle imports the Set-6 source primitives that were real but under-surfaced. It exposes the 11 mechanisms, exact source-module manifest, source execution outcomes, and negative cases without exposing copied body text, raw operator transcripts, prompt-shelf private logs, provider/browser state, live market data, or media assets in result records.

Scope limit It validates only a public source-open bundle and bounded public exercises; it is not raw operator memory, not prompt-shelf capture authority, not live market data, not provider/browser state, not media launch, and not public sharing or launch-scope decision.

Run
microcosm batch6-unsurfaced-primitives-capsule run --input fixtures/first_wave/batch6_unsurfaced_primitives_capsule/input --out receipts/first_wave/batch6_unsurfaced_primitives_capsule --acceptance-out receipts/acceptance/first_wave/batch6_unsurfaced_primitives_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 6 Unsurfaced Primitives Bundle

This component imports the Set-6 source primitives that the scout identified as real but under-surfaced. It is a source-open bundle: exact copied source source bodies plus bounded public exercises and stable negative cases.

The bundle covers text structuring, provenance reconciliation, epistemic display guards, governance policy judgment, clone-local concurrency, market-clock scheduling, provider recovery scoping, and demo-take temporal remapping. It does not import raw operator transcripts, prompt-shelf private logs, browser/provider state, live market data, account secrets, audio, video, or public sharing state.

Purpose

A scout found eleven small primitives scattered across the wider system that were real and load-bearing but had never been surfaced as public evidence. They are the sort of utility code that quietly decides whether a larger feature is correct: a finance unit-scale check, a clock that fires market events once per session, a function that subtracts paused time from a recorded video offset. This bundle exists to bring those eleven into the public system without pretending they are anything grander than they are.

The single question it answers is narrow but useful: do the copied bodies still behave as claimed? It is easy to copy a function into a public bundle, check its file hash, and call that proof. That only shows the bytes match. It says nothing about whether the logic is right. This bundle goes one step further. For each primitive it imports the copied body and runs it on a small public synthetic input, then asserts the specific output the real code should produce.

The unusual part is that the eleven primitives are checked by execution, not by description. The Markdown prose and the JSON bundle say what each one is meant to do; the component proves it by calling the real function and comparing the answer. Each primitive also carries a paired negative case, a deliberately malformed input that the code must reject or correct, so the bundle shows both the working path and the guard. No private bodies, transcripts, or live data enter the result records; only refs, digests, anchor names, and the pass or fail of each exercise.

Prior Art Grounding

This bundle borrows from provenance modeling, risk-governance frameworks, policy-engine design, and temporal modeling. Useful anchors include:

  • W3C PROV, for reconciling derived artifacts back to entities, activities, and responsible agents.
  • NIST's AI Risk Management Framework, as a governance vocabulary for mapping, measuring, and managing system risk without turning every guard into a launch claim.
  • Open Policy Agent, which separates policy evaluation from application code through a general-purpose policy engine.
  • Martin Fowler's bitemporal history, as a prior pattern for preserving event time separately from record time.

Microcosm borrows the provenance, governance, policy-evaluation, and temporal separation patterns, but keeps this bundle at source-open public fixtures. It does not expose private operator memory, live market data, provider state, or publishing-scope decision.

Shape

Start from the bundle JSON, not from this prose. The source row core/paper_module_capsules.json::paper_modules[78:paper_module.batch6_unsurfaced_primitives_capsule] is the authority for the component subject, mechanism subject, concept edge, principle and axiom refs, dependency modules, runtime locus, generated projection statuses, and the scope limit. The generated JSON instance is paper_modules/batch6_unsurfaced_primitives_capsule.json; it is the parity projection that carries source_authority: json_capsule, the resolved relationship edges, the generated Mermaid and Atlas statuses, and the explicit scope boundaries.

seedsboundsnames standard contract and ceilingcites resolved code locusruns fixture and bundle validatorspublic inputs and exact copied source bodies26 copied modules; sha256 and anchor checks; body_in_receipt falsederives relationship edgesnavigation projection onlypublic/private and launch boundarypass/fail evidence remains bounded byexcludesmust not outrankJSON bundle source rowJSON bundle source rowGenerated JSON instancesource basis: source recordGenerated JSON instance source basis: source recordMarkdownStandardsstd_microcosm publicMicrocosm boundaryStandards std_microcosm public Microcosm boundaryRuntime/source lociruntime_shell andsource_engines_gallery routesRuntime/source loci runtime_shell and source_engines_gallery routesFixtures, examples, sourcebundleFixtures, examples, source bundleTests and result recordsTests and result recordsGenerated navigationprojectionsGenerated navigation projectionsScope limitfixture-bound publicsource-body importdigest/anchor checks,synthetic exercises, negativecases, metadata-only resultrecords onlyScope limit fixture-bound public source-body import digest/anchor checks, synthetic exercises, negative cases, metadata-only result records onlyNot authorizedlive operator memory, promptcapture authority, livemarket data,provider/browser state, medialaunch, source-file changes,public sharing orlaunch-scope decision,private-system equivalence,whole-system correctnessNot authorized live operator memory, prompt capture authority, live market data, provider/browser state, media launch, source-file changes, public sharing or launch-scope decision, private-system equivalence, whole-system correctness

Source refs

JSON bundle source row
core/paper_module_capsules.jsonpaper_module.batch6_unsurfaced_primitives_capsule
Generated JSON instance source basis: source record
paper_modules/batch6_unsurfaced_primitives_capsule.json
paper_modules/batch6_unsurfaced_primitives_capsule.md
Standards std_microcosm public Microcosm boundary
standards/std_microcosm_batch6_unsurfaced_primitives_capsule.json
Runtime/source loci runtime_shell and source_engines_gallery routes
src/microcosm_core/organs/batch6_unsurfaced_primitives_capsule.py
Fixtures, examples, source bundle
fixtures/first_wave/batch6_unsurfaced_primitives_capsule/inputexamples/.../exported_batch6_unsurfaced_primitives_capsule_bundlesource_module_manifest.json
Tests and result records
tests/test_batch6_unsurfaced_primitives_capsule.pyreceipts/first_wave/... validation/result/boardreceipts/acceptance/... fixture_acceptance.json
Diagram source
flowchart LR Bundle["JSON bundle source row core/paper_module_capsules.json paper_module.batch6_unsurfaced_primitives_capsule"] Instance["Generated JSON instance paper_modules/batch6_unsurfaced_primitives_capsule.json source basis: source record"] Markdown["Markdown reader projection paper_modules/batch6_unsurfaced_primitives_capsule.md"] Standards["Standards standards/std_microcosm_batch6_unsurfaced_primitives_capsule.json std_microcosm public Microcosm boundary"] Runtime["Runtime/source loci src/microcosm_core/components/batch6_unsurfaced_primitives_capsule.py runtime_shell and macro_engines_gallery routes"] Fixtures["Fixtures, examples, source bundle fixtures/first_wave/batch6_unsurfaced_primitives_capsule/input examples/.../exported_batch6_unsurfaced_primitives_capsule_bundle source_module_manifest.json"] Result records["Tests and result records tests/test_batch6_unsurfaced_primitives_capsule.py result records/first_wave/... validation/result/board result records/sign-off/... fixture_acceptance.json"] Projections["Generated navigation projections Mermaid: available_from_capsule_edges Atlas: linked_from_capsule_edges"] Ceiling["Scope limit fixture-bound public source-body import digest/anchor checks, synthetic exercises, negative cases, metadata-only result records only"] Forbidden["Not authorized live operator memory, prompt capture authority, live market data, provider/browser state, media launch, source-file changes, public sharing or launch-scope decision, private-system equivalence, whole-system correctness"] Bundle -->|seeds| Instance Bundle -->|bounds| Markdown Bundle -->|names standard contract and ceiling| Standards Bundle -->|cites resolved code locus| Runtime Runtime -->|runs fixture and bundle validators| Result records Fixtures -->|public inputs and exact copied source bodies| Runtime Fixtures -->|26 copied modules; sha256 and anchor checks; body_in_receipt false| Result records Instance -->|derives relationship edges| Projections Projections -->|navigation projection only| Markdown Standards -->|public/private and launch boundary| Ceiling Result records -->|pass/fail evidence remains bounded by| Ceiling Ceiling -->|excludes| Forbidden Markdown -->|must not outrank| Bundle

The module is "actual" only because the reader can traverse these concrete surfaces:

  • Bundle/source row: paper_module.batch6_unsurfaced_primitives_capsule binds the accepted batch6_unsurfaced_primitives_capsule component, the mechanism.batch6_unsurfaced_primitives_capsule.validates_public_unsurfaced_primitives_capsule mechanism, concept.import_projection_and_drift_control_bundle, principles P-2, P-5, P-9, P-15, axioms AX-4, AX-8, AX-10, AX-11, and the dependency modules named in the structured lattice table below.
  • Generated instance: paper_modules/batch6_unsurfaced_primitives_capsule.json reports active status, public_paper_module_json_seeded_from_capsule_registry_not_legacy_markdown_authority, generated Mermaid available_from_capsule_edges, generated Atlas linked_from_capsule_edges, no unpopulated selective relations, and scope boundaries that the row is not runtime-correctness, launch-readiness, or whole-system authority.
  • Standards: standards/std_microcosm_batch6_unsurfaced_primitives_capsule.json is the specific public bundle standard, backed by std_microcosm for the wider Microcosm entry and public/private boundary. It allows public mechanism ids, source refs, digests, anchors, exact copied source modules, synthetic outcomes, scope limits, and scope boundaries; it forbids account secrets, account or browser state, model-output data bodies, browser UI live-access material, raw operator transcripts, prompt-shelf private logs, live market data responses, media assets, and public sharing operation state.
  • Runtime/source loci: the resolved locus is src/microcosm_core/organs/batch6_unsurfaced_primitives_capsule.py, with the runtime shell bundle-validation route and source-engines gallery route as readers over the same public component. The source bundle manifest records 26 copied source bodies with exact-copy source-to-target relations, SHA-256 matches, required anchors, and body_in_receipt: false.
  • Fixtures/examples/source bundle: fixture inputs live under fixtures/first_wave/batch6_unsurfaced_primitives_capsule/input; the exported bundle lives under examples/batch6_unsurfaced_primitives_capsule/exported_batch6_unsurfaced_primitives_capsule_bundle; source_module_manifest.json is the source-open body-floor manifest for copied modules and metadata-only result record handling.
  • Tests/result records: tests/test_batch6_unsurfaced_primitives_capsule.py covers the runtime component, copied subengine proofs, exact-copy imports, bundle shape, and private body omission. Result record authority is the fixture sign-off row plus receipts/first_wave/batch6_unsurfaced_primitives_capsule/batch6_unsurfaced_primitives_capsule_result.json, batch6_unsurfaced_primitives_capsule_board.json, batch6_unsurfaced_primitives_capsule_validation_receipt.json, and result records/sign-off/first_wave/batch6_unsurfaced_primitives_capsule_fixture_acceptance.json; the validation result record reports pass for source-module manifest status, exercise status, negative-case status, secret exclusion, and result record body scan.
  • Scope limit: this page can claim fixture-bound public source-body import, copied-module digest/anchor evidence, synthetic source-exercise evidence, negative-case coverage, and metadata-only result records only. It cannot claim live operator memory, prompt-shelf capture authority, live market data, provider/browser state, media launch, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness.

Source Modules

The exported bundle copies the relevant source sources under examples/batch6_unsurfaced_primitives_capsule/exported_batch6_unsurfaced_primitives_capsule_bundle/source_modules/. Result records carry source refs, digests, anchors, counts, and exercise outcomes, not copied body text or private state.

Reader Evidence Routing

Read this module through the fixture command, exported-bundle validation, focused pytest, structured source record, and result record paths. The fixture proves a public source-open Set-6 exercise, while the bundle proves copied source digests, anchors, synthetic source exercises, negative cases, copied-subengine proofs, and metadata-only cards. The generated structured source record proves that Mermaid and Atlas availability come from bundle edges.

The validator's mechanism set remains evidence for the accepted Set-6 component result record. It does not turn this page into live operator memory, prompt-shelf capture authority, trading decisions, live provider recovery, browser state, demo media launch, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.

Mechanism Set

The validator requires exactly these 11 mechanism rows: source note keyphrase engine, schema-loose distillation index, operator handoff linkage, observed-turn window merge, market situation graph, finance numeric assurance, fail-closed status judge, idea-microcosm concurrency guard, metabolism market clock, population-lane provider recovery, and demo-take temporal join.

The source module manifest requires 14 exact copied source source/support modules. The fixture requires 11 stable negative cases, one per mechanism row. The command card is the intended cold-reader first surface; the full result record is the drilldown.

How it works

For each mechanism the component loads the copied source body, runs it on a fixed public synthetic input, and checks the exact result. A few of the exercises make the idea concrete.

  • Demo-take temporal join. video_t_seconds converts a wall-clock offset into a position in a recorded video by subtracting elapsed paused time. The exercise feeds it a 120-second wall offset with one pause and resume fifteen seconds apart, and asserts the result is exactly 105.0. A second call with a pause that has not yet resumed checks the open-pause branch returns 15.0. The negative case confirms a still-open pause is handled rather than ignored.
  • Finance numeric assurance. build_finance_numeric_assurance recomputes declared numbers instead of trusting them. The exercise hands it a flow row tagged usd_millions whose flow and flow_usd fields disagree by orders of magnitude, plus a probability declared as 70.2. The check raises stockgrid_flow_unit_scale_mismatch and probability_bounds, and the result record's display_state becomes blocked rather than trusted. The point is that a mislabelled unit or an out-of-range probability fails closed.
  • Operator handoff linkage. score_pair compares an agent's suggestion (a Type B capture) against what the operator later typed (a Type A input) using containment, token overlap, and anchor matching. The exercise scores a related pair above the 0.8 floor with containment true, then scores an unrelated "summarize the weather" input and asserts it falls below 0.3. This is how the primitive tells a real handoff from a coincidence.
  • Market-clock scheduling. due_fire_points decides which scheduled market events are due at a given moment. The exercise sets the clock to 15:31 UTC with the open event already fired earlier that day, then asserts the hourly points fire while the already-fired open event is suppressed. The guard is idempotence: an event that already fired must not fire again in the same session.

The other mechanisms follow the same shape: keyphrase ranking returns ranked phrases for real text but an empty list for stopword-only input; the schema-loose distiller keeps assistant text and operator tail as separate roles without persisting either body; the fail-closed status judge blocks a transition when its policy is malformed; the concurrency guard reports that a parent directory and a child path overlap. Every exercise records only its pass or fail and a few summary numbers, never the copied body it ran against.

Copied-Subengine Proofs

The post-Set-12 proof surface exercises two copied dormant subengines directly from the exported source bundle:

  • operator_thread_memory is loaded from the copied manifest and checked with synthetic observed-window cases for observed_window_within_memory and preserved_existing_no_overlap.
  • market_situation_graph is loaded from the same copied bundle and checked with a public synthetic mart that covers fixture scoring, counterevidence, context rows, and source refs.

These are public test-level proofs in tests/test_batch6_unsurfaced_primitives_capsule.py. They do not add an accepted component, do not widen the fixture scope limit, and do not export private thread memory or live market data.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the Set-6 public primitive-import result record and sign-off JSON. The bundle command validates copied source digests, anchor evidence, synthetic source exercises, negative cases, and metadata-only cards. The focused test covers the runtime component, copied subengine proofs, exported bundle shape, exact-copy imports, and private body omission. The corpus and projection checks prove only that the generated paper-module instance remains fresh for this bundle-backed Markdown state.

This result record path is public fixture evidence only. It does not establish live operator memory, capture authority, live market data, provider/browser state, media launch, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.

Scope boundary

Scope limit

This is not live operator memory, not capture authority, not trading decisions, not live provider recovery, not demo media launch, not publishing-scope decision, and not launch-scope decision. It is an exact-source public bundle with digest checks, source exercises, and negative-case coverage.

Authority Systems Source BundleReplays eight authority and systems checks, rejecting provider, proof, and launch overclaims.5/5

Does This bundle imports Set 5 public authority and systems source bodies as a bounded source-open replay. It checks post-execution result record validation, reasoning replay scope and lineage, proof-contract gating, process orphan classification, generated-state fixpoint settlement, trace-tape compaction, code blast radius, and doctrine graph compilation, with negative cases that prevent live provider, proof-success, process-signal, generated-state-mutation, source-file changes, public sharing, or launch overclaims.

Scope limit It validates only copied Set 5 authority-system source bodies and bounded deterministic exercises; it does not dispatch providers, prove Lean success, send live process signals, mutate generated state, change source files, authorize public sharing, include launch operations, or claim private-system equivalence.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.batch5_authority_systems_capsule run --input fixtures/first_wave/batch5_authority_systems_capsule/input --out receipts/first_wave/batch5_authority_systems_capsule --acceptance-out receipts/acceptance/first_wave/batch5_authority_systems_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenanceauthority-boundary

Source Design note · Source atlas

Paper module Set 5 Authority and Systems Bundle

Set 5 imports the next authority/systems contour as a bundle: post-execution result record validation, reasoning replay scope and lineage, verifier-gated Lean repair harnessing, process orphan classification, generated state fixpoint settlement, trace-tape compaction, code blast radius, and doctrine graph compilation.

The bundle carries exact copied source bodies in examples/batch5_authority_systems_capsule/exported_batch5_authority_systems_capsule_bundle/source_modules/ and tests those copies against source-root digests and anchors. The runnable Microcosm exercise is deliberately bounded: it uses synthetic public inputs to prove the negative claim fences while preserving the real source source as the source-open system.

Purpose

This page answers one question: can a cold reader inspect eight separate authority and systems mechanisms, and confirm each one refuses the wrong thing, without the reader having to run any of the real machinery?

The eight mechanisms are unrelated in subject. One validates post-execution result records; another decides when a reasoning step needs re-running; another gates a Lean proof attempt; another classifies a stray process; another settles generated-state residuals; another compacts a trace tape; another computes a code blast radius; another compiles a doctrine graph. What they share is a single discipline: each must decline to claim more than it has earned. The result record validator must not accept a drifted result record; the proof gate must not hand a placeholder proof to Lean; the orphan reaper must not signal a live process; the blast-radius pass must not invent coverage for a leaf with no dependents.

The unusual choice is that the bundle does not replay the real tools. It carries an exact copy of each source source body, checks those copies against the source-root digests and required anchors, and then runs a small synthetic re-derivation for each mechanism. Each re-derivation recomputes its own verdict from the fixture input rather than echoing a stored answer, so a negative case passes only when the exercise itself reaches the refusal, not when a fixture asserts it. The page is therefore a way to read eight refusal behaviours at once, with the genuine source bodies kept verifiable alongside.

Shape

Copied source bundle +exercise manifestdigests and required anchorschecked firstCopied source bundle + exercise manifest digests and required anchors checked firstRuntime componentRuntime componentResult record validatorflagprovider/context/artifactdriftResult record validator flag provider/context/artifact driftReplay scopeno_replay when changedcontext is disjointReplay scope no_replay when changed context is disjointProof gatereject sorry/plan-only beforeLeanProof gate reject sorry/plan-only before LeanOrphan reaperlive descendant ->requires_owner_checkOrphan reaper live descendant -> requires_owner_checkFixpoint drainerresidual source moved ->non-convergingFixpoint drainer residual source moved -> non-convergingTrace tapeover-budget -> pointer +omission result recordTrace tape over-budget -> pointer + omission result recordBlast radiusreverse closure; empty leafstays emptyBlast radius reverse closure; empty leaf stays emptyDoctrine graphreport deleted paths andtombstonesDoctrine graph report deleted paths and tombstonesShared refusal checkeach exercise recomputes itsown verdictShared refusal check each exercise recomputes its own verdictmetadata-only result recordsmetadata-only result recordsScope limit:no external model access,mutation,proof success, launch, orprivate equivalenceScope limit: no external model access, mutation, proof success, launch, or private equivalence

Source refs

Runtime component
batch5_authority_systems_capsule.py
metadata-only result records
receipts/first_wave/batch5_authority_systems_capsule
Diagram source
flowchart TD Manifest["Copied source bundle + exercise manifest digests and required anchors checked first"] --> Component["Runtime component batch5_authority_systems_capsule.py"] Component --> E1["Result record validator flag provider/context/artifact drift"] Component --> E2["Replay scope no_replay when changed context is disjoint"] Component --> E3["Proof gate reject sorry/plan-only before Lean"] Component --> E4["Orphan reaper live descendant -> requires_owner_check"] Component --> E5["Fixpoint drainer residual source moved -> non-converging"] Component --> E6["Trace tape over-budget -> pointer + omission result record"] Component --> E7["Blast radius reverse closure; empty leaf stays empty"] Component --> E8["Doctrine graph report deleted paths and tombstones"] E1 --> Refusal["Shared refusal check each exercise recomputes its own verdict"] E2 --> Refusal E3 --> Refusal E4 --> Refusal E5 --> Refusal E6 --> Refusal E7 --> Refusal E8 --> Refusal Refusal --> Result records["metadata-only result records result records/first_wave/batch5_authority_systems_capsule"] Result records --> Ceiling["Scope limit: no external model access, mutation, proof success, launch, or private equivalence"]

The diagram starts where the runtime starts: the copied source bundle and the exercise manifest, checked against source-root digests and anchors. The component then fans out to the eight mechanism exercises, each recomputing its own pass or refusal verdict, and folds the results into metadata-only result records under a single scope limit. Generated-state mutation, external model access, proof-success claims, and launch-scope decision all stay outside that ceiling.

What the eight exercises check

Each exercise reads a small synthetic block from the fixture manifest and recomputes a verdict. None of them call a provider, run Lean, signal a process, or mutate generated state. What follows is the specific question each one answers.

  • Result record validator. Given a runtime grant and two post-execution result records, it recomputes the drift codes for the second result record: a substituted provider, a context class outside the grant's allowed set, an output artifact hash that diverges from the grant, or runtime_execution claimed when no runtime grant was issued. The valid result record must pass and the drifted one must be flagged; the exercise will not call drift "absent".
  • Replay scope. It compares the context classes a step consumed against the classes that changed. When the two sets are disjoint, the classification is no_replay. In the fixture, a step consumed a task spec and a public fixture while only ambient browser state changed, so re-running the step is not demanded.
  • Proof gate. It scans a candidate proof string before any Lean call. A sorry token, a plan-only phrasing such as "plan:" or "I will", or a proof that merely restates the declared theorem without an exact are each treated as failure classes, and the gate verdict becomes rejected_before_lean. The exercise records 0/8 historical banked attempts; no proof-success claim.
  • Orphan reaper. A process marked as a live-session descendant is classified requires_owner_check, not safe_close_candidate, and no signal is sent even when the fixture requests SIGKILL. The refusal is the point: a stray-looking process that belongs to a live session must not be killed on inventory alone.
  • Fixpoint drainer. It walks residual signatures. If the same residual id reappears under a moved source signature, the settlement is classified settlement_residual_source_moved, which marks a non-converging residual rather than a settled one. No generated-state mutation is authorised either way.
  • Trace tape. When the joined trace text exceeds the byte budget, the exercise truncates to a head budget and appends a pointer row plus an omission result record that records the omitted byte count. A budget breach with no omission result record is treated as a failure, so compaction can never silently drop trace bytes.
  • Blast radius. It builds the reverse-dependency graph and takes the transitive closure of dependents for a target. A target with real dependents reports them; a leaf with no dependents reports an honestly empty bucket rather than inventing coverage.
  • Doctrine graph. It scans doctrine nodes for two conditions: a node whose code path no longer exists, reported as an authority gap, and a node marked tombstone, reported with its replacement id. The exercise passes only when both a drift finding and a tombstone candidate are present, so a deleted code path behind a doctrine claim cannot pass unnoticed.

Reader Evidence Routing

  • A source-authenticity reader starts with the exported bundle source_module_manifest.json, then checks the copied files under examples/batch5_authority_systems_capsule/exported_batch5_authority_systems_capsule_bundle/source_modules/ against the source source refs and anchor rows. The useful question is whether the public bundle is source-faithful, not whether it grants live generated-state authority.
  • A runtime reader runs the fixture command and the run-batch5-bundle command in the Validation Result record Path. The useful question is whether the synthetic exercise and exported bundle return bounded pass evidence while keeping body material out of result records.
  • A launch-boundary reader opens tests/test_batch5_authority_systems_capsule.py and the Scope limit before trusting any card copy. The useful question is whether negative fences block external model access, generated-state mutation, Lean proof-success claims, and launch-scope decision.

If any digest or exact-copy test is red, treat that as source-body import drift for the body-import owner. It does not make this Markdown a bundle source row, and it must not be patched here by hand.

Prior Art Grounding

This bundle borrows from provenance interchange, trace instrumentation, and software supply-chain attestation practice. Useful anchors include:

  • W3C PROV, which models the entities, activities, and agents involved in producing data so readers can assess reliability and trustworthiness.
  • OpenTelemetry, as a vendor-neutral pattern for traces, metrics, and logs across composed systems.
  • SLSA provenance, which treats artifact origin, builder identity, and build parameters as explicit attestable metadata.

Microcosm borrows the lineage, trace, and attestation shape, but keeps the exercise bounded to copied public source bodies, synthetic inputs, and negative claim fences. It excludes generated-state mutation, external model access, proof success, or launch.

First Command

PYTHONPATH=src python3 -m microcosm_core.organs.batch5_authority_systems_capsule run --input fixtures/first_wave/batch5_authority_systems_capsule/input --out /tmp/batch5_authority_systems_capsule --card

Source Bodies

The bundle imports these source bodies as exact public snapshots:

  • tools/meta/factory/validate_reasoning_execution_receipt.py
  • tools/meta/factory/build_reasoning_execution_replay_scope.py
  • tools/meta/factory/build_reasoning_execution_lineage.py
  • tools/meta/factory/build_reasoning_execution_schedule_preflight.py
  • tools/meta/factory/run_verisoftbench_micro10_c_arm_provider_repair.py
  • tools/meta/control/orphan_reaper.py
  • system/lib/generated_state_drainer.py
  • system/lib/agent_execution_trace.py
  • system/lib/code_architecture_projection.py
  • system/lib/doctrine_graph.py

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

PYTHONPATH=src ../repo-python -m microcosm_core.organs.batch5_authority_systems_capsule run \
  --input fixtures/first_wave/batch5_authority_systems_capsule/input \
  --out /tmp/microcosm-batch5-authority-systems-vrp \
  --card
PYTHONPATH=src ../repo-python -m microcosm_core.organs.batch5_authority_systems_capsule run-batch5-bundle \
  --input examples/batch5_authority_systems_capsule/exported_batch5_authority_systems_capsule_bundle \
  --out /tmp/microcosm-batch5-authority-systems-bundle-vrp \
  --card
PYTHONPATH=src ../repo-pytest tests/test_batch5_authority_systems_capsule.py -q --basetemp /tmp/microcosm-batch5-authority-systems-tests

The fixture command writes the bounded synthetic exercise result record. The exported-bundle command validates the copied authority-system source modules, manifest digests, anchor rows, and secret-exclusion posture while keeping source bodies out of the result record. The focused test file checks the runtime exercise, exported bundle, omission result records, body-scan boundary, and negative claim fences.

This result record path is reader-verifiable evidence only. It does not flip Mermaid/Atlas status, create bundle authority, authorize generated-state mutation, dispatch providers, certify Lean proof success, claim launch-scope decision, or aggregate doctrine-lattice coverage.

Scope boundary

Scope limit
  • No live model/external model access.
  • No Lean proof-success or benchmark claim.
  • No process signals are sent.
  • No generated-state mutation is authorized.
  • No private-system equivalence, public sharing, or launch-scope decision.
Scope limit

Legacy Markdown path inventory only; no JSON bundle authority, typed subject coverage, runtime correctness, or launch proof.

This ceiling is deliberately lower than the runnable component evidence. The code and tests can show that the Batch5 exercise is inspectable and that its negative claim fences hold, but this page cannot promote itself into bundle authority, typed doctrine coverage, generated-state mutation permission, Lean proof success, provider correctness, publishing-scope decision, or aggregate doctrine-lattice health.

Trace, Code-Map & Scheduling Engines BundleRuns fifteen trace, code-map, and scheduling engines on test data, blocking truth overclaims.5/5

Does This bundle imports the Set-7 source engines as exact copied source bodies plus deterministic public exercises. It exposes fifteen JS, TS, and Python engine bodies for trace parsing, code-map layout, DAG scheduling, source indexing, patch validation, hermetic clean-clone execution, robust numeric scoring, personalized PageRank routing, and regression-test selection, with negative cases that prevent launch, private-system, semantic-truth, or test-sufficiency overclaims.

Scope limit It validates only a public source-open bundle and bounded exercises; it is not live source authority, private-system equivalence, semantic truth, investment-related actions, complete sandbox proof, selected-test sufficiency proof, public sharing, or launch-scope decision.

Run
microcosm batch7-macro-engines-capsule run --input fixtures/first_wave/batch7_macro_engines_capsule/input --out receipts/first_wave/batch7_macro_engines_capsule --acceptance-out receipts/acceptance/first_wave/batch7_macro_engines_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 7 Source Engines Bundle

TLDR

batch7_macro_engines_capsule imports the Set-7 source engines as source bodies and runs focused exercises around them. It is a real-system bundle: source copies, original JS/TS witnesses, deterministic Python exercises, negative cases, digest checks, and fenced claims.

What It Makes Visible

  • tools/agent_trace_structurer/parser.mjs as a trace-IR/edit-claim witness with node --test parser.test.mjs.
  • system/server/ui/src/lib/codemap/ as a code-map layout witness with Vitest.
  • DAG wave scheduling, source indexing, patch context validation, network blocking, robust numeric center/scale, PageRank mass preservation, and never-empty regression-test selection.

What Each Exercise Proves

Each engine has a single deterministic check with a known answer, plus a paired negative case that must keep failing. The exercises are concrete:

  • Trace IR parser (agent_trace_ir_compiler). Runs node --test parser.test.mjs against the copied parser. The paired negative case is a commit claim with no diff evidence, which the parser's own test rejects, so a pass means the edit-claim gate is intact rather than merely that the file copied.
  • Code-map layout (codemap_orbit_layout). Runs the Vitest suite for the layout module and, in process, places five nodes on an orbit and measures every pair distance. The pass condition requires zero circle overlaps, so the layout proves geometric non-collision, not route meaning.
  • DAG scheduler (constitutional_dag_kernel). Calls compute_waves on a six-node graph and checks the schedule is exactly [["a","f"], ["b","c"], ["d"], ["e"]]. A two-node cycle must raise, and an impure config path must be flagged, so the kernel proves wave ordering and cycle rejection together.
  • launch-root index (release_root_compiler). Parses the copied module's AST and confirms the expected report-building functions exist and that a missing-reference count is reported. This is source indexing, not launch-scope decision.
  • Source surgeon (source_surgeon_patch). Applies a one-line unified diff and checks the result is exactly a = 'B'. A diff whose context does not match must raise, and broken Python must fail to parse, so the engine proves patch-context and syntax validation, not semantic correctness.
  • Clean clone (hermetic_clean_clone). Temporarily replaces the socket factory and confirms an outbound connection raises a network-disabled error. It proves a hermetic baseline, not complete sandboxing.
  • Robust calculator (calculator_standard_actor). Feeds [1, 2, 3, 4, 5, 100] to the robust centre/scale routine. The robust centre stays at 3.5 while the naive mean is dragged above 19, so the outlier is resisted. It is a numeric primitive, not market data or investment-related actions.
  • PageRank ranker (personalized_pagerank_ranker). Ranks a four-node graph and checks the score mass sums to 1.0; an unknown source node must return an empty map. It proves the rank invariant and missing-source refusal, not semantic understanding.
  • Regression selection (regression_test_selection). Confirms the impacted- test selector never returns an empty set: an empty selection must fall back to a non-empty bundle. It proves the never-empty contract, not that the selected tests are sufficient.

When the input is the exported source-open bundle rather than the live fixture, the same nine engine rows are gated on the copied source manifest instead: every expected digest must match and every required anchor must be present before any row passes. The exercises stay metadata-only throughout; result records carry status, counts, digests, and refs, never the copied source or command output.

Prior Art Grounding

The component is grounded in trace instrumentation, graph analysis, and regression selection practice: parse execution traces into structured spans, project code or route graphs into navigable layouts, preserve graph-rank invariants, and choose focused tests without claiming sufficiency. Relevant anchors include:

  • OpenTelemetry, especially traces/spans as a vendor-neutral model for representing units of work and their relationships.
  • D3 force layouts, a common graph layout pattern for visualizing networks and hierarchies.
  • NetworkX PageRank, which documents the PageRank family for graph-link analysis.

Microcosm borrows the structured-trace, graph-layout, and invariant-checking shape across its mixed Set-7 engines. The bundle remains a bundle of focused source witnesses and deterministic exercises; it is not a complete sandbox, semantic truth engine, or proof that selected tests are sufficient.

Source Body Imports

The source-module manifest at examples/batch7_macro_engines_capsule/exported_batch7_macro_engines_capsule_bundle/source_module_manifest.json lists the exact copied source bodies and required anchors. Result records store digests and counts, not source bodies.

Purpose

This module is the reader-facing instrument for the accepted batch7_macro_engines_capsule component. Its source authority is the JSON source record in core/paper_module_capsules.json; this Markdown explains the proof boundary for a cold reader and points back to the runtime component, copied source manifest, and focused tests.

The component answers one narrow question: do nine unrelated source engines, copied out of the larger system as source, still behave the way their own tests and invariants say they should? Rather than describe them in prose, the bundle runs each one. A trace-IR parser is checked by its own Node test runner; a code-map layout is checked by its Vitest suite; a dependency-graph scheduler, a robust numeric scorer, a PageRank ranker, a patch applier, a network-isolation guard, an AST source index, and a regression-test selector are each driven through a small deterministic exercise with a known correct answer.

What is worth noting is the mix. Most validators in this set check one shape of evidence. This one deliberately binds several kinds under a single fixture and a single scope limit: an external JavaScript test process, an external TypeScript test process, in-process Python function calls, and static AST reads. The point is not that any one engine is impressive in isolation. It is that nine engines with quite different runtimes can be exercised together, each with a concrete pass condition, while every exercise stays below launch, semantic-truth, and source-file changes.

The failure mode this guards against is the comfortable assumption that copied code still works. A source body can be copied faithfully, pass a digest check, and still be broken or subtly different from the original. The bundle refuses to treat a digest match as behaviour: each engine has to produce the expected output, and each negative case has to keep failing, before the row is allowed to pass.

Shape

livelivelivelivebundleInput dirInput dirLive fixtureor exported bundle?Live fixture or exported bundle?Trace IR parsernode --testTrace IR parser node --testCode-map layoutVitest + orbit non-overlapCode-map layout Vitest + orbit non-overlapDAG schedulerwaves + cycle rejectDAG scheduler waves + cycle rejectlaunch index, source surgeon,clean clone, calculator,PageRank, regressionselectionlaunch index, source surgeon, clean clone, calculator, PageRank, regression selectionSource manifest:digests match + anchorspresentSource manifest: digests match + anchors presentNine engine rowsNine engine rowsNegative casesmust keep failingNegative cases must keep failingmetadata-only resultstatus, counts, digestsmetadata-only result status, counts, digestsscope limitno launch, no semantic truth,no source-file changesscope limit no launch, no semantic truth, no source-file changes

Source refs

Nine engine rows
source_open_manifest_verified
Diagram source
flowchart LR input["Input dir"] mode{"Live fixture or exported bundle?"} subgraph Live["Live fixture: run each engine"] trace["Trace IR parser node --test"] codemap["Code-map layout Vitest + orbit non-overlap"] dag["DAG scheduler waves + cycle reject"] rest["launch index, source surgeon, clean clone, calculator, PageRank, regression selection"] end subgraph Bundle["Exported bundle: gate on manifest"] manifest["Source manifest: digests match + anchors present"] rows["Nine engine rows source_open_manifest_verified"] end neg["Negative cases must keep failing"] result["metadata-only result status, counts, digests"] ceiling["scope limit no launch, no semantic truth, no source-file changes"] input --> mode mode -->|live| trace mode -->|live| codemap mode -->|live| dag mode -->|live| rest mode -->|bundle| manifest manifest --> rows trace --> neg codemap --> neg dag --> neg rest --> neg rows --> result neg --> result result --> ceiling

Reader Evidence Routing

Start from the component source when checking behavior:

  • EXPECTED_NEGATIVE_CASES names the rejected cases.
  • AUTHORITY_CEILING names the forbidden claims.
  • _source_open_bundle_exercises and _evaluate assemble the accepted public witness set.
  • run_batch7_bundle and result_card expose the reproducible command and metadata-only summary.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the Set-7 source-engine result record and sign-off JSON. The exported-bundle command validates copied trace, codemap, DAG, source-rank, and regression-selection witnesses without emitting private bodies. The focused test covers the runtime component, exported bundle shape, exact-copy source imports, negative cases, card body omission, and numeric dependencies. The corpus and projection checks prove only that the generated paper-module instance remains fresh for this bundle-backed Markdown state.

This result record path is public fixture evidence only. It does not establish semantic truth, selected-test sufficiency, sandbox completeness, private-system equivalence, launch-scope decision, external model access, source-file changes, or whole-system correctness.

Scope boundary

Scope limit

This bundle is not launch-scope decision, hosted-public authority, semantic truth, investment-related actions, a complete sandbox, or proof that selected tests are sufficient. It excludes raw operator transcripts, provider/browser state, wallet/account state, account secrets, and live market fetches.

Scope limit

The module can support only fixture-bound public source-body import evidence and deterministic exercise result records. It cannot authorize external model access, source-file changes, launch, public sharing, investment-related actions, private-system equivalence, or whole-system correctness.

Oracle Sibling Source BundleReplays subject-index and truth-diff logic on copied code, rejecting reasoning overclaims.5/5

Does This bundle imports the Set 7 public Oracle sibling source bodies as a bounded, source-open replay. It checks subject-index grounding, subject-snapshot hydration, source truth-diff deltas, quartet repair alias planning, and original pytest witness evidence, with negative cases that prevent Oracle reasoning, external model access, source-file changes, semantic-truth, coverage, public sharing, or launch overclaims.

Scope limit It validates only Oracle sibling copied source bodies and bounded deterministic exercises; it does not run Oracle reasoning, dispatch providers or bridges, invoke private orchestration engine, change source files, prove semantic truth, prove all Oracle paths are covered, authorize public sharing, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.batch7_oracle_sibling_capsule run --input fixtures/first_wave/batch7_oracle_sibling_capsule/input --out receipts/first_wave/batch7_oracle_sibling_capsule --acceptance-out receipts/acceptance/first_wave/batch7_oracle_sibling_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Demo Take Console Source BundleReplays the recording console's Swift logic without launching the app or capturing audio.5/5

Does This bundle imports the Set 7 Demo Take Console public Swift source bodies as a bounded source-open replay. It checks SwiftPM build-witness posture, recording-state control, capture-helper bridge contracts, recorder-store capture FSM boundaries, hotkey/audio-meter behavior, and transcribe-payload construction, with negative cases that prevent app-launch, capture, model-dispatch, source-file changes, UI-coverage, public sharing, or launch overclaims.

Scope limit It validates only Demo Take Console copied Swift source bodies and bounded deterministic exercises; it does not launch the app, authorize screen or microphone capture, export recording sessions, execute FFmpeg, dispatch WhisperKit or other models, change source files, prove complete UI coverage, authorize public sharing, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.batch7_demo_take_console_capsule run --input fixtures/first_wave/batch7_demo_take_console_capsule/input --out receipts/first_wave/batch7_demo_take_console_capsule --acceptance-out receipts/acceptance/first_wave/batch7_demo_take_console_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Tools-Tail Primitives BundleExercises four copied helper tools over fixed inputs without touching live systems or data.5/5

Does This bundle imports four Set-8 tools-tail primitives as copied public source bodies with deterministic fixture exercises. It exposes observer set diffs, JSON patch interpretation, stable ledger-id hashing, and shadow envelope parsing without invoking live oracles, repository mutation, external model access, public sharing, or launch-scope decision.

Scope limit It validates only the imported source body. It does not claim source authority, private-system equivalence, launch, or public sharing.

Run
microcosm batch8-tools-tail-primitives-capsule run --input fixtures/first_wave/batch8_tools_tail_primitives_capsule/input --out receipts/first_wave/batch8_tools_tail_primitives_capsule --acceptance-out receipts/acceptance/first_wave/batch8_tools_tail_primitives_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 8 Tools-Tail Primitives Bundle

This component imports four Set-8 tools-tail primitives as exact copied source source bodies plus bounded public exercises: observer set diffing, JSON patch interpretation, ledger identity hashing, and shadow envelope parse coverage.

The bundle is intentionally source-open and bounded. It exercises pure mechanics over synthetic public fixtures. It does not run GodMode, use external model services, execute live bridge work, mutate repositories, export private lab artifacts, claim oracle truth, authorize public sharing, or approve launch.

Purpose

When a piece of tooling is copied from the private system into the public system, the obvious question is whether the copy still behaves the way the original did, or whether it has quietly drifted into a stub that only looks right. This bundle answers that one question for four small "tools-tail" primitives: does the copied source body, when run on a fixed public input, still produce the exact output the original would?

The unusual choice here is that the bundle does not re-describe the primitives or re-implement them. It loads the copied module straight from the exported bundle and runs the real functions, then checks the results against hard-coded expected values. If the copy were a hollow shell, the assertion would fail rather than pass with a green tick. The evidence is therefore behavioural, not merely a digest match: the code is executed, not just hashed.

What it deliberately does not do is treat any of that execution as truth about the world. Diffing two sets of observer rows is set arithmetic, not a claim that either set is correct. Applying a JSON patch is interpreting an edit script, not a claim that the edit is the right one. The bundle keeps the gap between "the mechanism runs as copied" and "the answer is correct" explicit, which is why the scope limit refuses oracle truth, prediction correctness, and semantic edit correctness even though real code ran.

Shape

The shape is a tools-tail primitive evidence map.

JSON source recordsource basis: source recordJSON source record source basis: source recordGenerated JSON instance20 edges; 0 unresolvedselective relationsGenerated JSON instance 20 edges; 0 unresolved selective relationsMarkdownLocal standardLocal standardRuntime/source locusloads copied modules, runsfour exercises, checks exactoutputRuntime/source locus loads copied modules, runs four exercises, checks exact outputFour primitive exercisesobserver set diff |JSON-patch VMledger-id hash | shadowenvelope parseeach: accept path + negativecaseFour primitive exercises observer set diff | JSON-patch VM ledger-id hash | shadow envelope parse each: accept path + negative casePublic fixture inputfour primitives + negativecasesPublic fixture input four primitives + negative casesCopied source bundleCopied source bundleTests and result recordsresult records/first_wave +sign-off + bundle validationTests and result records result records/first_wave + sign-off + bundle validationGenerated navigationGenerated navigationScope limitdeterministic publicprimitive exercises andmetadata-only source refsonlyno oracle truth, semanticedit correctness, livebridge/Lab execution,external model access, repomutation, public sharing,launch, or whole-system proofScope limit deterministic public primitive exercises and metadata-only source refs only no oracle truth, semantic edit correctness, live bridge/Lab execution, external model access, repo mutation, public sharing, launch, or whole-system proof

Source refs

JSON source record source basis: source record
core/paper_module_capsules.json[64]
Generated JSON instance 20 edges; 0 unresolved selective relations
paper_modules/batch8_tools_tail_primitives_capsule.json
paper_modules/batch8_tools_tail_primitives_capsule.md
Local standard
standards/std_microcosm_batch8_tools_tail_primitives_capsule.json
Runtime/source locus loads copied modules, runs four exercises, checks exact output
src/microcosm_core/organs/batch8_tools_tail_primitives_capsule.py
Public fixture input four primitives + negative cases
fixtures/first_wave/batch8_tools_tail_primitives_capsule/input
Copied source bundle
examples/batch8_tools_tail_primitives_capsule/exported_batch8_tools_tail_primitives_capsule_bundlesource_module_manifest.json
Tests and result records result records/first_wave + sign-off + bundle validation
tests/test_batch8_tools_tail_primitives_capsule.py
Diagram source
flowchart TD bundle["JSON source record core/paper_module_capsules.json[64] source basis: source record"] instance["Generated JSON instance paper_modules/batch8_tools_tail_primitives_capsule.json 20 edges; 0 unresolved selective relations"] markdown["Reader projection paper_modules/batch8_tools_tail_primitives_capsule.md"] standard["Local standard standards/std_microcosm_batch8_tools_tail_primitives_capsule.json"] runtime["Runtime/source locus src/microcosm_core/components/batch8_tools_tail_primitives_capsule.py loads copied modules, runs four exercises, checks exact output"] exercises["Four primitive exercises observer set diff | JSON-patch VM ledger-id hash | shadow envelope parse each: accept path + negative case"] fixture["Public fixture input fixtures/first_wave/batch8_tools_tail_primitives_capsule/input four primitives + negative cases"] bundle["Copied source bundle examples/batch8_tools_tail_primitives_capsule/exported_batch8_tools_tail_primitives_capsule_bundle source_module_manifest.json"] tests["Tests and result records tests/test_batch8_tools_tail_primitives_capsule.py result records/first_wave + sign-off + bundle validation"] projections["Generated navigation Mermaid: available_from_capsule_edges Atlas: linked_from_capsule_edges"] ceiling["Scope limit deterministic public primitive exercises and metadata-only source refs only no oracle truth, semantic edit correctness, live bridge/Lab execution, external model access, repo mutation, public sharing, launch, or whole-system proof"] bundle --> instance bundle --> runtime instance --> projections instance --> markdown standard --> runtime runtime --> bundle bundle --> runtime fixture --> runtime runtime --> exercises exercises --> tests fixture --> tests bundle --> tests tests --> ceiling projections --> ceiling markdown --> ceiling

The bundle explains the batch8_tools_tail_primitives_capsule component and the public tools-tail mechanism, binds the import/projection drift concept plus the principle and axiom edges, and resolves the runtime locus to src/microcosm_core/organs/batch8_tools_tail_primitives_capsule.py. The local standard keeps the evidence to four primitive mechanics: observer set diffs, JSON-patch interpretation, ledger identity hashing, and shadow-envelope parse coverage. Public evidence may include primitive ids, source refs, digests, anchors, counts, stable negative cases, metadata-only result record posture, and scope limits; it must not include private lab artifacts, model-output data, bridge payloads, account or browser state, or account secret-equivalent material.

The fixture path fixtures/first_wave/batch8_tools_tail_primitives_capsule/input and exported bundle examples/batch8_tools_tail_primitives_capsule/exported_batch8_tools_tail_primitives_capsule_bundle hold the public inputs and exact copied source modules. The focused test and result records prove fixture mechanics, bundle validation, negative cases, source-module digest/anchor posture, and no body text in result records. Generated Mermaid and Atlas links only make the bundle edges walkable; they do not authorize live tool execution, bridge work, external model access, repository mutation, publishing-scope decision, launch-scope decision, or whole-system correctness.

How it works

The evaluator loads four copied modules by manifest reference and runs one bounded exercise against each, comparing the live output to a fixed expected value. A primitive passes only when every checked field matches.

  • Observer set diff. The copied diff_evidence and diff_predictions functions take two lists of rows keyed by id and partition them. For evidence, three lab rows and two oracle rows resolve to one overlap, one missed id, and one extra id; a row with no ledger_id is dropped rather than crashing the diff. For predictions, rows are split into matching, divergent, and missing-target sets. The exercise also asserts the dropped malformed row never appears in the serialised result, so a parse gap cannot leak through as silent data.
  • Version committer JSON-patch VM. The copied _apply_op interprets a small set of edit operations (set, merge, append) over a nested document by path. The exercise applies four ops, checks the resulting document exactly, and confirms that attempting to traverse into a scalar (/profile/name where profile is a string) raises VersionCommitterError instead of corrupting the document. The interesting property is the refusal: a malformed path is a controlled error, not a partial write.
  • Ledger-id identity hash. The copied generate_ledger_id produces a stable id from a lane and a record. The exercise checks that the lane alias poly and POLYMARKET normalise to the same canonical lane and hash to the same id, so the id is identity-stable across spelling; an unknown lane falls back to an X_ prefix; and a record missing the identity field its lane requires raises ValueError rather than hashing a blank.
  • Shadow envelope parser coverage. The copied run parses a small envelope DSL (miner tuples, a spine line, prediction rows) written into a temporary run directory. The exercise feeds it one well-formed line and one malformed tuple per node, then checks that parsing did not hard-fail, that the well-formed rows parsed, and that the malformed tuple was counted as a comma_arity coverage gap. The point is that the parser reports its own coverage holes rather than swallowing them.

Each exercise also has a matching negative case (EXPECTED_NEGATIVE_CASES) that re-runs the same code on input designed to be rejected and confirms the rejection. So for every primitive the page shows both the accepting path and the refusing path. None of these checks open a network, a provider, or the live bridge; they run copied source bodies in process and keep the bodies out of the result records.

Reader Evidence Routing

  • Bundle route: read core/paper_module_capsules.json::paper_modules[64] before treating this Markdown as explanation.
  • Generated route: inspect paper_modules/batch8_tools_tail_primitives_capsule.json for the current generated instance.
  • Bundle route: inspect examples/batch8_tools_tail_primitives_capsule/exported_batch8_tools_tail_primitives_capsule_bundle for the copied source source modules.
  • Runtime route: run tests/test_batch8_tools_tail_primitives_capsule.py and the commands in ## Validation Result record Path.

Prior Art Grounding

This bundle borrows from standardized patch formats, transparency-log identity patterns, provenance metadata, and parser coverage practice. Useful anchors include:

  • IETF RFC 6902, which defines JSON Patch operations such as add, remove, replace, move, copy, and test.
  • IETF RFC 9162, where Certificate Transparency uses an append-only Merkle tree as an auditable log pattern.
  • W3C PROV, for representing the provenance of derived artifacts and their generating activities.

Microcosm borrows the patch-operation, identity-hash, append-only-log, and provenance shapes, but keeps this bundle at deterministic fixture exercises. It does not claim oracle truth, semantic edit correctness, live bridge authority, external model access, repository mutation authority, or launch-scope decision.

Source Modules

The exported bundle copies the relevant source sources under examples/batch8_tools_tail_primitives_capsule/exported_batch8_tools_tail_primitives_capsule_bundle/source_modules/. Result records carry source refs, digests, anchors, counts, and exercise outcomes, not copied body text or private state.

Mechanism Set

The validator requires exactly these four mechanism rows: observer set diff kernel, version-committer JSON patch VM, ledger-id identity hash engine, and shadow envelope DSL parser coverage.

The source module manifest requires four exact copied source source modules. The fixture requires four stable negative cases, one per mechanism row. Shared registry, sign-off, runtime-shell, CLI, atlas, and generated docs wiring is intentionally deferred while the existing shared Microcosm core lease is active.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the bounded tools-tail primitives result record and sign-off JSON. The bundle command validates copied source sources, manifest digests, observer-diff, JSON-patch, ledger-id, and shadow-envelope exercises, body-exclusion posture, and scope limit fields. The focused test checks fixture mechanics, bundle validation, negative cases, and the no-live-bridge scope limit.

This result record path is reader-verifiable evidence only. It is not oracle truth, not prediction correctness, not semantic edit correctness, not live bridge or Lab execution authority, not external model access, not repository mutation authority, not publishing-scope decision, and not launch-scope decision.

Scope boundary

Scope limit

This is deterministic public-system evidence over fixture inputs only. It is not oracle truth, not prediction correctness, not semantic edit correctness, not provenance by itself, not Lab execution authority, not live Oracle bridge authority, not repository mutation authority, not external model access, and not launch-scope decision.

Scope limit

This paper module can claim a tools-tail primitives fixture with a diagram view generated for navigation. It can explain deterministic public-system checks over fixture inputs and metadata-only source-module result records.

It cannot claim oracle truth, prediction correctness, semantic edit correctness, provenance sufficiency by itself, Lab execution authority, live Oracle bridge authority, repository mutation authority, external model access, publishing-scope decision, launch-scope decision, or whole-system correctness.

Policy Engines BundleMaps three policy engines over test data without model calls or live campaign execution.5/5

Does This bundle imports three Set-8 policy-engine bodies as copied public source modules with deterministic fixture exercises. It exposes lab contract audit, market fusion readiness, and campaign transition adjudication mechanics without model dispatch, live campaign execution, public sharing, or launch-scope decision.

Scope limit It validates only the imported source body. It does not claim source authority, private-system equivalence, launch, or public sharing.

Run
microcosm batch8-policy-engines-capsule run --input fixtures/first_wave/batch8_policy_engines_capsule/input --out receipts/first_wave/batch8_policy_engines_capsule --acceptance-out receipts/acceptance/first_wave/batch8_policy_engines_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 8 Policy Engines Bundle

This component imports three Set-8 policy engines as exact copied source source bodies plus bounded public exercises: Lab contract audit red/green gating, market-fusion fail-closed claim preflight, and campaign dispatch transition adjudication.

The bundle is source-open and bounded. It exercises deterministic policy mechanics over synthetic public fixtures. It does not run live campaigns, use external model services, mutate repositories, export private artifacts, claim market-level conclusions, authorize public sharing, or approve launch.

Purpose

The three engines copied here share one design idea: a machine-checkable gate that runs before any judgement, model call, or downstream action, and refuses by default rather than passing on absent evidence. The bundle answers a single question for a cold reader: do these copied gate bodies still make the same deterministic decisions when run against public fixture inputs?

The Lab contract audit reads persisted Lab node artifacts from disk and applies fixed structural rules: a ban on question marks in compute-node outputs (an output carrying a ? is treated as an unresolved hedge, not an answer), tuple formatting and two-sentence annotation rules, exact thesis inheritance between nodes, prediction targets grounded against an allowed set, and contradiction reconciliation. Any hard fail flips the report from green to red. The interesting choice is that this audit is deterministic and runs ahead of any semantic interpretation, so a runtime gate can fail closed on structure without asking a model whether the output looks right.

The market-fusion readiness gate decides whether a consumer may turn raw feed presence into a cross-feed claim. Every registered candidate situation is currently set to refuse, each for named, specific reasons (a missing provider, an absent event window, relation edges that are not measurement-conditioned). An unregistered situation also refuses, but with the distinct reason candidate_situation_gate_missing. That distinction is the point: the gate fails closed on anything it has not explicitly reasoned about, and the bundle checks that a registered refusal and a fail-closed refusal stay legible as different things.

The campaign dispatch adjudicator is a small state machine over a fixed table of legal status transitions. It returns legal_transition for an allowed move, already_target for a no-op, and raises an error for an illegal one. Its load- bearing rule is that completed is terminal: a completed dispatch cannot move back to running without an explicit superseding event.

Shape

Read this module as a bounded evidence pipeline: the JSON bundle names the paper-module authority, runtime locus, standard, and generated projections; the runtime exercises copied policy sources against public fixtures; the tests and result record commands verify those fixture mechanics and scope boundaries. Everything below the bundle is reader or navigation evidence, not live policy, source-file changes, market, public sharing, provider, production, or launch-scope decision.

Copied source source bodiesCopied source source bodiesPublic synthetic fixturesLab node artifacts, candidateclaims,dispatch status pairsPublic synthetic fixtures Lab node artifacts, candidate claims, dispatch status pairsquestion-mark ban,tuple/annotation,thesis inheritance, targetgroundingquestion-mark ban, tuple/annotation, thesis inheritance, target groundinggreenno hard failsgreen no hard failsredQUESTION_MARK_OUTPUT andothersred QUESTION_MARK_OUTPUT and othersMkrunMkrunrefuse: named reasonsregistered situationrefuse: named reasons registered situationfail-closed defaultfail-closed defaultCprunCprunlegal_transition /already_targetlegal_transition / already_targetCampaignTransitionErrorcompleted is terminalCampaignTransitionError completed is terminalBundle evaluatorthree engines must pass,three stable negative casesBundle evaluator three engines must pass, three stable negative casesScope limitfixture evidence and copiedsource refs onlyno live campaign, provider,market, repo, or launch-scopedecisionScope limit fixture evidence and copied source refs only no live campaign, provider, market, repo, or launch-scope decision

Source refs

Copied source source bodies
lab_contract_audit.pymarket_fusion_readiness.pycampaign_state_transition.py
question-mark ban, tuple/annotation, thesis inheritance, target grounding
compute_lab_contract_audit
Mkrun
preflight_candidate_situation
fail-closed default
refuse: candidate_situation_gate_missing
Cprun
validate_dispatch_transition
Diagram source
flowchart TD bundle["Copied source source bodies lab_contract_audit.py market_fusion_readiness.py campaign_state_transition.py"] fixtures["Public synthetic fixtures Lab node artifacts, candidate claims, dispatch status pairs"] subgraph Lab["Lab contract audit"] labrun["compute_lab_contract_audit question-mark ban, tuple/annotation, thesis inheritance, target grounding"] labgreen["green no hard fails"] labred["red QUESTION_MARK_OUTPUT and others"] end subgraph Market["Market-fusion readiness"] mkrun["preflight_candidate_situation"] mknamed["refuse: named reasons registered situation"] mkmissing["refuse: candidate_situation_gate_missing fail-closed default"] end subgraph Campaign["Campaign dispatch adjudicator"] cprun["validate_dispatch_transition"] cplegal["legal_transition / already_target"] cpillegal["CampaignTransitionError completed is terminal"] end exercises["Bundle evaluator three engines must pass, three stable negative cases"] ceiling["Scope limit fixture evidence and copied source refs only no live campaign, provider, market, repo, or launch-scope decision"] bundle --> labrun bundle --> mkrun bundle --> cprun fixtures --> labrun fixtures --> mkrun fixtures --> cprun labrun --> labgreen labrun --> labred mkrun --> mknamed mkrun --> mkmissing cprun --> cplegal cprun --> cpillegal labred --> exercises mkmissing --> exercises cpillegal --> exercises exercises --> ceiling

Reader Evidence Routing

  • Bundle route: read core/paper_module_capsules.json::paper_modules[61] before treating this Markdown as explanation.
  • Generated route: inspect paper_modules/batch8_policy_engines_capsule.json for the current generated instance of this module.
  • Bundle route: inspect examples/batch8_policy_engines_capsule/exported_batch8_policy_engines_capsule_bundle for the three copied source policy sources.
  • Runtime route: run tests/test_batch8_policy_engines_capsule.py and the commands in ## Validation Result record Path for recomputation evidence.

Prior Art Grounding

This bundle borrows from policy-as-code, risk-management, and market-claim boundary practice. Useful anchors include:

  • Open Policy Agent, which treats policy as a separately evaluated engine over structured input.
  • NIST's AI Risk Management Framework, whose govern/map/measure/manage posture is a useful precedent for explicit risk gates and red/green decision surfaces.
  • The CFTC's prediction markets explainer, as a boundary reminder for market-facing claims and event-contract language.

Microcosm borrows the deterministic policy-gate and market-claim-preflight shape, but keeps the component to fixture inputs and copied public source. It does not run campaigns, use external model services, claim market-level conclusions, mutate repositories, or approve launch.

Source Modules

The exported bundle copies the relevant source sources under examples/batch8_policy_engines_capsule/exported_batch8_policy_engines_capsule_bundle/source_modules/. Result records carry source refs, digests, anchors, counts, and exercise outcomes, not copied body text or private state.

Mechanism Set

The validator requires exactly these three engine rows: Lab contract audit deterministic red gate, market-fusion readiness fail-closed gate, and campaign dispatch status transition adjudicator. The source module manifest requires three exact copied source source modules. The fixture requires three stable negative cases, one per engine row.

Each engine exercise runs the copied body and checks a concrete decision, so a silent change in gate behaviour shows up as a blocked exercise:

  • Lab contract audit: a green artifact set must return green, and the same set with a banned ? injected into a compute-node output must return red with QUESTION_MARK_OUTPUT in its hard fails. The negative case BATCH8_LAB_CONTRACT_QUESTION_MARK_RED_GATE confirms the red gate fires.
  • Market-fusion readiness: a registered candidate situation must refuse with named reasons, while an unregistered situation and a malformed payload must both refuse with candidate_situation_gate_missing. The negative case BATCH8_MARKET_FUSION_MISSING_GATE_REFUSED confirms the fail-closed default.
  • Campaign dispatch adjudicator: candidate -> blocked is a legal_transition, completed -> completed is already_target, and completed -> running raises a terminal-state error. The negative case BATCH8_CAMPAIGN_COMPLETED_TO_RUNNING_REFUSED confirms the refusal.

Shared registry, sign-off, runtime-shell, CLI, atlas, package-data, and generated docs wiring is intentionally deferred while the existing shared Microcosm core lease is active.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the bounded policy-engine result record and sign-off JSON. The bundle command validates copied source policy sources, manifest digests, negative cases, source-body exclusion, and scope limit posture. The focused test checks deterministic red/green gates, bundle validation, private-boundary scans, and the no-launch scope limit.

This result record path is reader-verifiable evidence only. It does not run live campaigns, use external model services, mutate repositories, validate markets, certify whole system safety, authorize public sharing, or approve launch.

Scope boundary

Scope limit

This is deterministic public-system evidence over fixture inputs only. It is not Lab correctness, not live campaign execution authority, not market validation, not whole-system safety, not repository mutation authority, not external model access, and not launch-scope decision.

Scope limit

This paper module covers a bounded policy-engines fixture. A diagram view and atlas card are generated for this module. It can explain deterministic policy checks over public fixture inputs and metadata-only source-module result records.

It cannot claim Lab correctness, live campaign execution authority, market validation, whole-system safety, repository mutation authority, external model access, publishing-scope decision, launch-scope decision, or private-system equivalence.

Audio Level RMS PortComputes the audio loudness math on test arrays without opening a microphone or capturing input.3/5

Does This port projects the pure AudioLevelMonitor normalized-level RMS math into a runnable Python fixture over synthetic sample arrays. It exposes the normalization behavior without starting an audio session, requesting microphone permission, capturing device input, publishing, or granting launch control.

Scope limit projection mechanics only, not domain-level conclusions

Run
microcosm batch8-audio-level-rms-port run --input fixtures/first_wave/batch8_audio_level_rms_port/input --out receipts/first_wave/batch8_audio_level_rms_port --acceptance-out receipts/acceptance/first_wave/batch8_audio_level_rms_port_fixture_acceptance.json

EvidenceComputed projectionevidence 3/5Source-faithful refactor

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 8 Audio Level RMS Port

This component ports the pure AudioLevelMonitor.normalizedLevel RMS math from Swift to Python and exercises it over public synthetic sample arrays.

The bundle is bounded to numeric parity. It does not start an AVCaptureSession, request microphone permission, read recorded audio, capture a device, claim UI readiness, authorize public sharing, or approve launch.

Purpose

The Swift AudioLevelMonitor feeds a live microphone level meter in a recording app. Most of that file is platform machinery: opening a capture session, selecting a device, reading sample buffers off a callback. Buried inside is one small, pure function, normalizedLevel, that turns a block of audio samples into a single number between zero and one. That number is the only part that can be checked without a microphone, so it is the only part this component ports.

The single question this component answers is: does the Python re-implementation of that calculation produce the same level value as the Swift original, on inputs we can publish? Everything device-specific, permission-gated, or stateful is deliberately left on the Swift side. What crosses into Python is the arithmetic alone.

The interesting choice here is what is held out, not what is included. A live level meter is hard to test because it depends on real audio hardware and OS permissions that cannot live in a public fixture. By isolating the pure amplitude maths and exercising it over synthetic sample arrays, the component keeps a checkable parity claim about the part that matters for the meter reading, while making no claim at all about capture, permission, or device state. The test is scoped to being a maths port and nothing more.

How it works

normalized_level takes a sequence of samples and a format tag. It accepts only float32 and int16; any other tag raises ValueError, which is how the "unsupported format" case is exercised. An empty buffer returns 0.0 immediately, before any arithmetic.

For each sample it accumulates the square of the value. Float samples are used as-is; int16 samples are first divided by 32767.0 (the Swift Int16.max) to map the integer range onto roughly minus-one to one. It then takes the root mean square, sqrt(total / count), which summarises the block's energy as a single amplitude. That value is multiplied by 8.0 and clamped to the [0.0, 1.0] range with min(max(rms * 8.0, 0.0), 1.0). The gain of eight is a display choice carried over verbatim from the Swift source: quiet speech sits low on a zero-to-one meter without it, so the level is scaled up and then capped so loud input cannot overshoot one. These two lines, the int16 divisor and the rms * 8 clamp, are the anchors the bundle requires to match the copied Swift text.

The runtime checks three reference cases drawn from a public probe manifest (float32, int16, and an over-one buffer that must clamp), optionally decodes mono 16-bit PCM WAV byte fixtures and recomputes their level from the raw bytes, and runs three negative exercises: empty buffer must read zero, an over-one buffer must clamp to one, and an unknown format must be refused. Each case compares the observed level against the manifest's expected value within a small tolerance. A mismatch, a missing expected case, or a failed refusal is recorded as a finding, and any finding turns the verdict from pass to blocked.

Shape

Read this module as a bounded RMS-parity pipeline: the JSON bundle names the reader authority, runtime locus, standard, and generated navigation edges; the runtime ports Swift normalizedLevel math over public fixture arrays; tests and result records verify numeric parity and metadata-only evidence. Generated Mermaid and Atlas links are navigation status, not macOS audio-session, microphone, device, source-file changes, public sharing, or launch-scope decision.

"not float32/int16""float32 or int16""yes""no""yes""no"Copied Swift sourceAudioLevelMonitor.normalizedLevelmetadata-only; anchors onlyCopied Swift source AudioLevelMonitor.normalizedLevel metadata-only; anchors onlyPublic probe manifestsynthetic sample arrays + WAVbytesexpected level per casePublic probe manifest synthetic sample arrays + WAV bytes expected level per casenormalized_level(samples,format)normalized_level(samples, format)format tag?format tag?raise ValueErrorunsupported format refusedraise ValueError unsupported format refusedbuffer empty?buffer empty?return 0.0return 0.0square + accumulateint16 divided by 32767square + accumulate int16 divided by 32767rms = sqrt(total / count)rms = sqrt(total / count)min(max(rms * 8, 0), 1)scaled, then clamped to 0..1min(max(rms * 8, 0), 1) scaled, then clamped to 0..1compare observed vs expectedwithin tolerancecompare observed vs expected within toleranceany finding?any finding?status: blockedstatus: blockedstatus: passstatus: passScope limitRMS parity over publicfixtures onlyno audio session, microphone,device,source-file changes, publicsharing, or launchScope limit RMS parity over public fixtures only no audio session, microphone, device, source-file changes, public sharing, or launch
Diagram source
flowchart TD swift["Copied Swift source AudioLevelMonitor.normalizedLevel metadata-only; anchors only"] manifest["Public probe manifest synthetic sample arrays + WAV bytes expected level per case"] samples["normalized_level(samples, format)"] fmt{"format tag?"} refuse["raise ValueError unsupported format refused"] empty{"buffer empty?"} zero["return 0.0"] scale["square + accumulate int16 divided by 32767"] rms["rms = sqrt(total / count)"] clamp["min(max(rms * 8, 0), 1) scaled, then clamped to 0..1"] compare["compare observed vs expected within tolerance"] verdict{"any finding?"} blocked["status: blocked"] passed["status: pass"] ceiling["Scope limit RMS parity over public fixtures only no audio session, microphone, device, source-file changes, public sharing, or launch"] swift --> samples manifest --> samples samples --> fmt fmt -->|"not float32/int16"| refuse fmt -->|"float32 or int16"| empty empty -->|"yes"| zero empty -->|"no"| scale scale --> rms rms --> clamp clamp --> compare refuse --> compare zero --> compare compare --> verdict verdict -->|"yes"| blocked verdict -->|"no"| passed blocked --> ceiling passed --> ceiling

Reader Evidence Routing

  • Bundle route: read core/paper_module_capsules.json::paper_modules[59] before treating this Markdown as explanation.
  • Generated route: inspect paper_modules/batch8_audio_level_rms_port.json for the current generated instance derived from the source record.
  • Bundle route: inspect examples/batch8_audio_level_rms_port/exported_batch8_audio_level_rms_port_bundle for copied Swift source refs and digest evidence.
  • Runtime route: run tests/test_batch8_audio_level_rms_port.py and the commands in ## Validation Result record Path for recomputation evidence.

Prior Art Grounding

The component is grounded in standard digital-audio metering practice: root mean square amplitude is a common way to summarize signal energy for level displays, while OS capture APIs and media tools are kept outside pure numeric tests. Useful anchors include:

  • Apple's AVFoundation media framework family for time-based audiovisual capture and processing on Apple platforms.
  • FFmpeg audio/video documentation, as a broad media-processing toolchain where audio streams and levels are handled as explicit inputs and transforms.

Microcosm borrows only the pure RMS-level calculation shape and ports it to fixture-bound Python parity tests. It does not start an audio session, request microphone permission, read recorded audio, capture a device, or approve UI or launch-scope decision.

Source Reference

The exported bundle copies apps/demo-take-console/Sources/DemoTakeConsoleApp/AudioLevelMonitor.swift under examples/batch8_audio_level_rms_port/exported_batch8_audio_level_rms_port_bundle/source_modules/. Result records carry refs, digests, anchors, sample counts, and parity verdicts, not copied body text, recorded audio, or private device state.

Mechanism Set

The validator requires float32 parity, int16 parity, over-one clamp behavior, empty-buffer zero behavior, and unsupported-format refusal. Shared registry, sign-off, runtime-shell, CLI, atlas, package-data, and generated docs wiring is intentionally deferred while the existing shared Microcosm core lease is active.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the bounded RMS parity result record and sign-off JSON. The bundle command validates the copied Swift source module, digest anchors, negative exercises, body-exclusion scan, and source-ref boundary. The focused test checks the Python port, bundle validation, result record body scan, and scope limit.

This result record path is reader-verifiable evidence only. It does not start an audio session, request microphone permission, read recorded audio, prove device capture, approve UI readiness, change source files, authorize public sharing, or approve launch.

Scope boundary

Scope limit

This is deterministic Python-port evidence over fixture inputs only. It is not macOS audio-session evidence, not microphone permission authority, not device capture, not UI readiness, not source-file changes, and not launch-scope decision.

Scope limit

This paper module can claim a deterministic Python port of the audio-level RMS calculation with a diagram view generated for this module and navigation links available from the same source row. It can explain deterministic numeric RMS/level behavior over fixture inputs and metadata-only result records.

It cannot claim macOS audio-session evidence, microphone permission authority, device capture, UI readiness, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness. Those claims would need new supporting evidence before this module could narrate them.

Structural Theses Finance BundleRuns a copied finance-thesis model through dated test cases with no live market data or advice.5/5

Does This bundle imports the structural_theses finance spine as a copied public source body with synthetic dated thesis-card exercises. The exercises run the lifecycle/backtest mechanics without live market data, investment-related actions, portfolio action, external model access, public sharing, or launch-scope decision.

Scope limit It validates only the imported source body. It does not claim source authority, private-system equivalence, launch, or public sharing.

Run
microcosm batch8-structural-theses-capsule run --input fixtures/first_wave/batch8_structural_theses_capsule/input --out receipts/first_wave/batch8_structural_theses_capsule --acceptance-out receipts/acceptance/first_wave/batch8_structural_theses_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 8 Structural Theses Bundle

This component imports tools/finance/structural_theses.py as exact copied source source and exercises it over public synthetic structural-thesis fixtures.

The bundle is bounded to replayable CP1/CP2 thesis-family validation. It excludes financial decisions, investment recommendations, live market data, external model access, portfolio action, public sharing, or launch.

Purpose

The copied source, tools/finance/structural_theses.py, takes a tempting idea and disciplines it. The tempting idea is that some market moves look structurally obvious, so a corpus of "obvious" theses ought to predict the next one. The trap is survivorship: it is easy to assemble a list of patterns that worked in hindsight and call the list a method.

The single question the source answers is narrower and harder. Given claims that looked structurally obvious at the time they were written, which reasoning families still survive once you resolve every claim forward and keep the ones that failed? The load-bearing inversion is that "obvious" is treated as a claim-status frozen at commitment time, never as a label applied to outcomes afterwards. A thesis whose meaning shifts once the result is known is a post-hoc mutation, and the leakage guard rejects it.

What is unusual is that losers and negative controls are first-class, required evidence rather than noise. A refuted thesis must flow through the same pipeline as a confirmed one and stay legible as valid evidence; a negative control must be present and must not resolve into a confirmed claim. The output vocabulary deliberately has no tradable "winner": the strongest a surviving pattern can earn is review_candidate, a flag for human review and nothing more.

This bundle does not assert any of those findings as market-level conclusions. It imports the source verbatim, runs it over public synthetic rows, and checks that the discipline holds. It is not financial decisions, an investment recommendation, or live-market validation.

What it validates

The component loads the copied finance source, builds one public winner, loser, and control family from a synthetic probe, and then exercises the source's own validator over both the clean family and three deliberately broken variants.

The clean path confirms the at-time semantics survive a full run: the winner resolves claim_confirmed_forward, the loser resolves claim_refuted_forward and is marked valid evidence, the control resolves as a control without becoming a confirmed claim, the surviving pattern lands in family memory as a candidate_set, and the authority boundary keeps investment_recommendation_authorized false. Under the hood the source maps each thesis onto the existing forecast-claim shape and drives the real CP1 admission, CP2 resolution, proper-scoring replay, and purged walk-forward replay with deterministic fixture prices rather than building a new evaluator.

The three negative exercises are the substance of the proof, because each one forces a specific discipline to fire:

  • Survivor-only. A family built from winners alone, with no failed thesis, must be rejected. The source raises NO_LOSER_FLOWED_THROUGH, NO_NEGATIVE_CONTROL, and SURVIVORSHIP_SAMPLE; the component confirms all three appear (error code BATCH8_STRUCTURAL_THESES_SURVIVOR_ONLY_REJECTED).
  • Forward-gate breach. A refuted pattern is smuggled into the forward review candidates. The source must raise FORWARD_GATE_BREACH, because only a pattern that survived at-time replay may produce a review_candidate (BATCH8_STRUCTURAL_THESES_FORWARD_GATE_BREACH_REJECTED).
  • Control leak. A negative control is mutated to claim it confirmed forward. The source must raise CONTROL_LEAK (BATCH8_STRUCTURAL_THESES_CONTROL_LEAK_REJECTED).

If any of these refusals fails to fire, the component records a blocked finding rather than a pass. Alongside the family check it verifies exact digest parity and required anchors for the copied source, so the page cannot drift away from the code it claims to exercise. Result records carry verdicts, counts, error codes, and refs only; copied bodies, market data, and model-output data stay out.

Shape

This module's shape is bundle-first and projection-bounded. The source row is core/paper_module_capsules.json::paper_modules[63:paper_module.batch8_structural_theses_capsule]; the generated JSON instance is paper_modules/batch8_structural_theses_capsule.json, and it preserves source_authority: json_capsule.

digest + anchor parityyesnorefusal firesrefusal firesrefusal firesrefusal missingrefusal missingrefusal missingJSON source recordJSON source recordRuntime locusRuntime locusExact copied sourceExact copied sourcePublic synthetic probewinner, loser, control rowsplus realized returnsPublic synthetic probe winner, loser, control rows plus realized returnsCP1 admit forward-onlyCP2 resolve vs frozencriterionproper-scoring + purgedreplayCP1 admit forward-only CP2 resolve vs frozen criterion proper-scoring + purged replayon the clean familyon the clean familyWinner confirmed,loser refuted + validevidence,control not confirmed?Winner confirmed, loser refuted + valid evidence, control not confirmed?Three broken variantsThree broken variantsSurvivor-only familyNO_LOSER_FLOWED_THROUGHNO_NEGATIVE_CONTROLSURVIVORSHIP_SAMPLESurvivor-only family NO_LOSER_FLOWED_THROUGH NO_NEGATIVE_CONTROL SURVIVORSHIP_SAMPLERefuted pattern smuggledinto forward candidatesFORWARD_GATE_BREACHRefuted pattern smuggled into forward candidates FORWARD_GATE_BREACHControl mutated to confirmedCONTROL_LEAKControl mutated to confirmed CONTROL_LEAKBounded pass result recordBounded pass result recordBlocked findingBlocked findingScope limitpublic synthetic fixture +copied source onlyScope limit public synthetic fixture + copied source only

Source refs

JSON source record
core/paper_module_capsules.json[63]
Runtime locus
organs/batch8_structural_theses_capsule.py
Exact copied source
tools/finance/structural_theses.py
CP1 admit forward-only CP2 resolve vs frozen criterion proper-scoring + purged replay
build_structural_thesis_family
on the clean family
validate_structural_thesis_family
Diagram source
flowchart TD Bundle["JSON source record core/paper_module_capsules.json[63]"] --> Runtime["Runtime locus components/batch8_structural_theses_capsule.py"] Source["Exact copied source tools/finance/structural_theses.py"] -->|digest + anchor parity| Runtime Probe["Public synthetic probe winner, loser, control rows plus realized returns"] --> Runtime Runtime --> Build["build_structural_thesis_family CP1 admit forward-only CP2 resolve vs frozen criterion proper-scoring + purged replay"] Build --> Clean["validate_structural_thesis_family on the clean family"] Clean --> CleanCheck{"Winner confirmed, loser refuted + valid evidence, control not confirmed?"} Runtime --> Neg["Three broken variants"] Neg --> Survivor["Survivor-only family NO_LOSER_FLOWED_THROUGH NO_NEGATIVE_CONTROL SURVIVORSHIP_SAMPLE"] Neg --> Forward["Refuted pattern smuggled into forward candidates FORWARD_GATE_BREACH"] Neg --> Control["Control mutated to confirmed CONTROL_LEAK"] CleanCheck -->|yes| Pass["Bounded pass result record"] CleanCheck -->|no| Block["Blocked finding"] Survivor -->|refusal fires| Pass Forward -->|refusal fires| Pass Control -->|refusal fires| Pass Survivor -->|refusal missing| Block Forward -->|refusal missing| Block Control -->|refusal missing| Block Pass --> Ceiling["Scope limit public synthetic fixture + copied source only"] Ceiling -. forbids .-> NoClaims["No advice, recommendation, live market data, external model access, portfolio action, public sharing, launch"]

The standards lane is split deliberately. The module-specific public runtime standard, standards/std_microcosm_batch8_structural_theses_capsule.json, governs the fixture fields, public/private boundary, result record contract, validator command, negative-case count, and explicit anti-purpose. The wider codex/standards/std_microcosm.json::paper_module_coverage_contract governs how paper-module coverage, Atlas cards, generated Mermaid, and context-pack depth stay navigable without promoting generated projections into source truth.

The runtime/source lane is likewise bounded. The Microcosm component src/microcosm_core/organs/batch8_structural_theses_capsule.py loads the copied structural-theses source, builds the winner/loser/control family, evaluates survivor-only, forward-gate-breach, and control-leak negative exercises, and writes metadata-only result records. The exported bundle at examples/batch8_structural_theses_capsule/exported_batch8_structural_theses_capsule_bundle contains source_module_manifest.json; that manifest records 12 exact copied source modules for bundle validation, including source_modules/tools/finance/structural_theses.py, while the first-wave result record narrows the copied-source proof to the structural-theses module itself.

The proof lane is fixture-level. The public fixture input under fixtures/first_wave/batch8_structural_theses_capsule/input and the focused regression tests/test_batch8_structural_theses_capsule.py validate digest and anchor parity, thesis-family replay, winner/loser/control semantics, stable negative cases, body exclusion, scope limits, and the runtime-shell bundle path. Result record evidence lives under receipts/first_wave/batch8_structural_theses_capsule/, result records/sign-off/first_wave/batch8_structural_theses_capsule_fixture_acceptance.json, and receipts/runtime_shell/demo_project/organs/batch8_structural_theses_capsule/exported_batch8_structural_theses_capsule_bundle_validation_result.json.

The generated Mermaid and Atlas statuses are useful only as navigation result records: available_from_capsule_edges and linked_from_capsule_edges mean the JSON bundle edges are walkable. They do not authorize financial decisions, investment recommendations, live-market validation, external model access, portfolio action, public sharing, launch, private-system equivalence, or whole-system correctness.

Reader Evidence Routing

  • Bundle route: read core/paper_module_capsules.json::paper_modules[63] before treating this Markdown as explanation.
  • Generated route: inspect paper_modules/batch8_structural_theses_capsule.json for current generated state.
  • Bundle route: inspect examples/batch8_structural_theses_capsule/exported_batch8_structural_theses_capsule_bundle for copied source refs and digest evidence.
  • Runtime route: run tests/test_batch8_structural_theses_capsule.py and the commands in ## Validation Result record Path.

Prior Art Grounding

This bundle borrows from empirical-finance validation and bias-control patterns. Useful anchors include:

  • Fama and French's common risk factors work and data-library tradition, as a precedent for decomposing structural market claims into named factor families and testable rows.
  • MacKinlay's event-study methodology, as a prior pattern for separating an event window, expected baseline, and abnormal-return evidence.
  • Brown, Goetzmann, Ibbotson, and Ross on survivorship bias, which motivates explicit loser/control cases rather than winner-only thesis replay.

Microcosm borrows the factor-family, event-window, and bias-control shape, but keeps the component to public synthetic thesis rows and copied source. It is not financial decisions, an investment recommendation, live-market validation, portfolio authority, publishing-scope decision, or launch-scope decision.

Source Reference

The exported bundle copies tools/finance/structural_theses.py under examples/batch8_structural_theses_capsule/exported_batch8_structural_theses_capsule_bundle/source_modules/. Result records carry refs, digests, anchors, counts, and runtime verdicts, not copied body text, model-output data, market data, or private runtime state.

Mechanism Set

The validator requires exact source digest parity, structural-thesis source anchors, a public winner/loser/control family, valid loser evidence, a negative control that does not become a confirmed claim, and rejection of survivor-only, forward-gate-breach, and control-leak exercises. Shared registry, sign-off, runtime-shell, CLI, atlas, package-data, and generated docs wiring is intentionally deferred while shared Microcosm core leases are active.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the bounded thesis-family result record and sign-off JSON. The bundle command validates copied source refs, digest anchors, public winner/loser/control cases, negative controls, body-exclusion posture, and scope limit fields. The focused test checks fixture validation, bundle validation, survivor-bias refusal, control-leak refusal, and claim boundaries.

This result record path is reader-verifiable evidence only. It is not financial decisions, not an investment recommendation, not live-market validation, not external model access, not portfolio authority, not publishing-scope decision, and not launch-scope decision.

Scope boundary

Scope limit

This is deterministic fixture evidence over public synthetic thesis rows and exact copied source only. It is not advice, not an investment recommendation, not live-market validation, not external model access, not portfolio authority, not publishing-scope decision, and not launch-scope decision.

Scope limit

This paper module demonstrates a bounded structural-theses fixture: deterministic validation over public synthetic thesis rows, exact copied source refs, and metadata-only result records. A diagram view is generated for this module and it appears in the Atlas navigation surface.

It cannot claim advice, investment recommendation, live-market validation, external model access, portfolio authority, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness. Higher claims must be authorized by the JSON bundle and generated projection state first.

Engine Room DemoRuns proof, runtime, security, and routing demos through bounded public examples with stated limits.5/5

Does This component turns the staged Engine Room bundles into one accepted public demo surface. It exercises the proof-search, runtime, integrity, security, navigation, orchestration, and reference-routing bundles through bounded public fixtures with explicit scope boundaries.

Scope limit It validates only the public Engine Room composition contract; it is not deployment posture, private-system equivalence, frontier theorem proving, complete security proof, public sharing, or launch-scope decision.

Run
microcosm engine-room-demo run --input fixtures/first_wave/engine_room_demo/input --out receipts/first_wave/engine_room_demo --acceptance-out receipts/acceptance/first_wave/engine_room_demo_fixture_acceptance.json

EvidenceContract validatorevidence 5/5Import validation

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Engine Room Demo

engine_room_demo is the accepted Microcosm composition component for the staged Engine Room set. It wraps the bundles under microcosm_core.engine_room, runs the composed demo/audit path, and writes first-wave result records without promoting fixture rows into private-system or launch-scope decision.

Purpose

The Engine Room set is ten separate bundles: a Lean proof-search lab, a metabolism runtime, command singleflight, a generated-projection drift gate, a derived-fact engine, a public-projection leak gate, an egress self-compliance gate, a navigation-fitness benchmark, a bridge-campaign DAG, and an reference knowledge router. Each bundle has its own fixture and result record. This component exists so that a reader does not have to trust ten claims separately. It answers one question: do the ten bundles together cover the fourteen targets the controller asked for, and does each one still own its full surface and run.

A bundle "owns its surface" only when six files exist for it: module source, fixture input, fixture manifest, paper module, standard, and test. The audit checks all six per bundle, runs each fixture through its declared evaluator, and unions the targets the bundles actually declare against the fourteen the controller expected. A passing run means the set is complete and every fixture executed, not that any single bundle is finished or correct.

The design choice worth noting is in the negative case. Rather than compare against a frozen answer key, the negative fixture recomputes the live set of covered targets and fails only when the fixture names a target that is genuinely outside it. That keeps the refusal honest as the bundle set grows: the test cannot drift into agreement with a stale list, because there is no stored list to agree with.

A second deliberate boundary is that the runner reads the shared component registry, sign-off file, and atlas, but never writes to them. It reports whether the composition component is integrated into those shared surfaces as a separate visibility line, and always records shared_registry_mutated: false. Composition coverage and shared-registry integration are kept as two distinct facts, so a green demo cannot quietly imply registry authority it does not hold.

What It Runs

  • Verifies the 14 Engine Room jewel targets selected by the controller prompt.
  • Checks the owned staged bundle surfaces: module source, fixture input, fixture manifest, paper module, standard, and tests.
  • Executes the staged bundle demo through the public fixture chain.
  • Observes a negative fixture where an expected target is intentionally absent.

Shape

Engine Room fixture casesEngine Room fixture casesAccepted component wrapperAccepted component wrapperController coverage auditController coverage audit10 staged bundle evaluators10 staged bundle evaluators14 covered jewel targets14 covered jewel targetsShared surface integrationcheckShared surface integration checkResult, board, validationresult recordResult, board, validation result recordSign-off result recordSign-off result recordMissing-target negative caseMissing-target negative case
Diagram source
flowchart LR A["Engine Room fixture cases"] --> B["Accepted component wrapper"] B --> C["Controller coverage audit"] C --> D["10 staged bundle evaluators"] D --> E["14 covered jewel targets"] C --> F["Shared surface integration check"] B --> G["Result, board, validation result record"] G --> H["Sign-off result record"] A --> I["Missing-target negative case"] I --> C

The shape is a composition proof over declared public bundles. The wrapper asks the staged Engine Room runner to verify target coverage, surface presence, fixture execution, shared-surface visibility, and the missing-target negative case. It writes public result records and an sign-off result record without exporting private source run state or turning the staged demo into launch-scope decision.

Technical Mechanism

src/microcosm_core/organs/engine_room_demo.py is a result record-writing wrapper around src/microcosm_core/engine_room/demo.py. The wrapper loads one or more fixture cases, calls _evaluate_case for each case, and writes four metadata-only artifacts: result, board, validation result record, and optional sign-off result record. The positive case delegates to audit_controller_coverage; the negative case does not compare against a static answer key, but recomputes the actual staged target set and fails only when the fixture names a target outside that set.

audit_controller_coverage is the mechanism that makes the composition claim specific. It enumerates the ten CAPSULES, unions their declared jewel targets against EXPECTED_JEWEL_TARGETS, checks each bundle's owned source, fixture, manifest, paper module, standard, and test surface, optionally runs the staged bundle exercises through run_demo, and reads registry, sign-off, and atlas ids only as visibility evidence. The resulting result record distinguishes staged bundle completion from shared-registry integration and always reports shared_registry_mutated: false.

run_demo is the execution spine below the audit. It imports each staged bundle module, calls the declared evaluator (evaluate_fixture_dir or validate_fixture_dir), records compact per-bundle status, and summarizes the covered jewel targets. A pass therefore means the selected public fixture chain ran for the declared bundle set and covered the expected target lattice; it does not mean the Engine Room set is deployment-posture, privately equivalent, benchmark-complete, or launch-approved.

Governing Doctrine Relations

The generated structured source record binds this page to concept.import_projection_and_drift_control_bundle, mechanism.engine_room_demo.validates_public_engine_room_demo, and three adjacent Engine Room mechanisms for projection leakage, generated-projection drift, and command singleflight. Its governing principle refs are P-1, P-2, P-3, P-5, P-6, P-8, P-9, P-12, and P-15; its axiom refs are AX-1, AX-4, AX-5, AX-7, AX-8, and AX-11. In this module those refs all converge on one rule: composition evidence must be routed through explicit source, fixture, result record, and projection boundaries before it can support a reader claim.

The ten dependency modules are not decorative neighbors. They are the actual staged Engine Room bundle families consumed by the demo runner: Lean/proof-search, metabolism runtime, command singleflight, generated projection drift, derived facts, public projection leak checks, egress self-compliance, navigation fitness, bridge campaign DAGs, and reference knowledge routing. The bundle edge set is therefore a mechanism lattice over those bounded components, not an invitation to generalize beyond their result records.

Named Proof Consumers

  • Fixture wrapper consumer: PYTHONPATH=src ../repo-python -m microcosm_core.components.engine_room_demo run --input fixtures/first_wave/engine_room_demo/input --out /tmp/microcosm-engine-room-demo/fixture --sign-off-out /tmp/microcosm-engine-room-demo/sign-off.json --json consumes build_result, the positive controller-audit fixture, the semantic missing-target negative case, result record writing, metadata-only sign-off output, and the module scope limit.
  • Controller audit consumer: PYTHONPATH=src ../repo-python -m microcosm_core.engine_room.demo audit --root . --json consumes the ten-bundle inventory, 14-target coverage set, staged surface checks, shared-surface visibility readback, and the no-shared-mutation boundary.
  • Staged bundle execution consumer: PYTHONPATH=src ../repo-python -m microcosm_core.engine_room.demo run --root . --json consumes each public bundle evaluator and proves the composition runner can execute the declared Engine Room fixture chain without touching shared registry, sign-off, atlas, or generated projection surfaces.
  • Focused regression consumer: PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_engine_room_demo.py tests/test_engine_room_demo_organ.py -q pins the bundle inventory, CLI JSON output, controller audit, semantic negative case, result record writer, public-relative fixture refs, and private-path redaction floor.
  • It is a read-only result record for the Markdown slice, not permission to hand-edit generated projections.

Reader Evidence Routing

Read expected_jewel_count: 14 and covered_jewel_count: 14 as controller target coverage for the staged Engine Room set. Read capsule_count: 10 and passed_capsule_count: 10 as successful execution of the selected public fixture evaluators.

Read shared_registry_mutated: false as an authority boundary: the staged runner observes registry, sign-off, and atlas visibility, but it does not mutate those shared surfaces. Read shared_integration_status as a visibility result record, not as permission to alter the shared registry from this page.

Read body_in_receipt: false as the public-copy boundary. Result records can expose counts, target ids, fixture refs, stable error codes, scope limits, and omission-safe summaries; they must not copy private source run state, model-output data, raw operator threads, browser UI material, account secrets, or cloned third-party body text.

Prior Art Grounding

The component borrows from integration-testing and CI composition practice: multiple component checks are assembled into one public demo/audit path, negative fixtures prove refusal behavior, and result records summarize execution without upgrading fixture evidence into launch claims. Useful anchors include:

  • IBM's integration testing overview, which frames testing around whether composed modules interact as intended.
  • pytest fixtures, as a common pattern for public synthetic setup and reusable test inputs.
  • GitHub Actions, as a widely used workflow surface for composing build, test, and publish stages with explicit status.

Microcosm borrows the composed-demo and audit-pipeline shape, but keeps the claim at declared public composition only. It is not deployment posture, private-system equivalence, benchmark validation, a security proof, or launch-scope decision.

Public Command

The CLI alias is:

The fixture manifest names one positive case (positive_controller_audit) and one negative case (missing_expected_target_negative) that expects ENGINE_ROOM_EXPECTED_TARGET_MISSING. The expected component result is status: pass, expected_jewel_count: 14, positive_case_count: 1, negative_case_count: 1, and observed_negative_case_count: 1.

The staged composition runner can also be inspected without writing sign-off result records:

PYTHONPATH=src python3 -m microcosm_core.engine_room.demo audit --root . --json
PYTHONPATH=src python3 -m microcosm_core.engine_room.demo run --root . --json

Focused verification from the source repo root:

PYTHONPATH=src ./repo-pytest tests/test_engine_room_demo.py tests/test_engine_room_demo_organ.py -q --basetemp /tmp/microcosm-engine-room-demo
cd microcosm-substrate && PYTHONPATH=src python3 scripts/build_doctrine_projection.py --check-paper-module-corpus

Validation Result record Path

PYTHONPATH=src ./repo-pytest tests/test_engine_room_demo.py tests/test_engine_room_demo_organ.py -q --basetemp=/tmp/microcosm_engine_room_demo_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This component validates the declared public composition contract only. It is not deployment posture, not private-system equivalence, not a frontier theorem-proving claim, not a complete security proof, not benchmark validation, and not launch-scope decision.

Backend & Governance Engines BundleExercises thirteen copied backend and governance engines over fixed public test cases.5/5

Does This bundle imports the Set-9 source engines as exact copied source bodies plus deterministic public exercises. The exercises inspect thirteen backend, governance, projection, frontend data-shaping, worker-gate, and quality-accounting mechanisms, their source-module digest evidence, and the negative cases that prevent live-authority or result record-only overclaims.

Scope limit It validates only a public source-open bundle and bounded synthetic exercises; it is not live lineage truth, human approval authority, market/news truth, host-state truth, work log truth, external model access, source-file changes, public sharing, launch-scope decision, or private-system equivalence.

Run
microcosm batch9-macro-engines-capsule run --input fixtures/first_wave/batch9_macro_engines_capsule/input --out receipts/first_wave/batch9_macro_engines_capsule --acceptance-out receipts/acceptance/first_wave/batch9_macro_engines_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 9 Source Engines Bundle

Purpose

Copying a file into a public bundle proves only that the bytes match. It does not establish that the imported logic still behaves the way it did in the larger system it came from. This component exists to close that gap for thirteen backend, governance, and frontend data-shaping modules. The single question it answers is: do these copied source bodies still compute what they claim to compute, when run against bounded fixtures, here in the public repository?

The unusual part is how it checks. Rather than asserting against pre-baked result files, the component loads each copied module and calls its real functions. It imports system/lib/approval_registry.py and runs decide_approval against a temporary approvals tree to confirm a pre-acquired claim is refused. It imports system/lib/python_documentation_tree.py and runs build_file_entry over written-out Python to read symbols back. It runs the copied mission-graph compiler, the dependency-pin parser, the config-authority registry validator, the host-pressure admission builder, the worker budget guard, and the milestone metric computer, each on its own fixture. The three TypeScript bodies for finance clustering, edge extraction, and WorkAtlas aggregation are parsed for their load-bearing constants and branches, then mirrored deterministically. Each exercise carries both a positive shape and a paired negative case, so the proof moves with source behaviour, not with a static result record.

The reader should treat the result as fixture-bound evidence and nothing more. A passing bundle shows that representative mechanics still match the imported bodies under positive and negative cases. It does not assert live lineage truth, approval authority, real market or news truth, host-state truth, work log truth, external model access, source-file changes, public sharing, or launch-scope decision.

Abstract

Set 9 Source Engines Bundle is a public Microcosm paper module for a source-open, body-import-backed component. The component copies thirteen source source bodies into examples/batch9_macro_engines_capsule/exported_batch9_macro_engines_capsule_bundle/source_modules/, checks their digests and required anchors, then runs deterministic public exercises over fixture data. The result is a reproducible evidence bundle for backend, governance, frontend data-shaping, worker-gate, and quality-accounting mechanics without granting live system authority.

The useful claim is narrow: the copied bodies and public fixtures can show that representative mechanics still behave like the imported source bodies under bounded positive and negative cases. They do not prove live lineage truth, approval authority, market or news truth, host-state truth, work log truth, external model access, source-file changes, public sharing, launch-scope decision, private-system equivalence, or whole-system correctness.

Telos

This module exists to make the Set-9 import legible as technical evidence rather than as generic public copy. A cold reader should be able to answer four questions:

  • Which source bodies were copied, and how are they checked?
  • Which mechanisms are exercised, and which ones are source-body-sensitive?
  • Which result records prove only fixture truth, and which claims remain forbidden?
  • How does this component relate to the Microcosm concept/mechanism/principle lattice?

Mechanism Map

13 copied source bodies13 copied source bodiesfirst_wave fixture inputprobe manifest + 13 negativecasesfirst_wave fixture input probe manifest + 13 negative casesrun / run_batch9_bundlerun / run_batch9_bundleDigest + anchor checkcopied bytes match source,required anchors presentDigest + anchor check copied bytes match source, required anchors presentRe-execute imported logic_run_all_exercisesRe-execute imported logic _run_all_exercises10 Python bodiesimportlib load, call realfunctions(lineage, approval, AST,mission graph,pin drift, config, hostpressure,doctrine, worker gate,milestone)10 Python bodies importlib load, call real functions (lineage, approval, AST, mission graph, pin drift, config, host pressure, doctrine, worker gate, milestone)3 TS-backed bodiesparse constants/branches,mirror(finance, WorkAtlas, edgeextractor)3 TS-backed bodies parse constants/branches, mirror (finance, WorkAtlas, edge extractor)Positive caseexpected shapePositive case expected shapeNegative casee.g. self-loop pruned,preacquired claim refused,forbidden surface blockedNegative case e.g. self-loop pruned, preacquired claim refused, forbidden surface blockedmetadata-only result recordsresult, board, validation,sign-off; body_in_receiptfalsemetadata-only result records result, board, validation, sign-off; body_in_receipt falseScope limitfixture evidence onlyScope limit fixture evidence only

Source refs

13 copied source bodies
source_module_manifest.json
run / run_batch9_bundle
batch9_macro_engines_capsule.py
Diagram source
flowchart TD manifest["source_module_manifest.json 13 copied source bodies"] fixtures["first_wave fixture input probe manifest + 13 negative cases"] runtime["batch9_macro_engines_capsule.py run / run_batch9_bundle"] digest["Digest + anchor check copied bytes match source, required anchors present"] exercise["Re-execute imported logic _run_all_exercises"] py["10 Python bodies importlib load, call real functions (lineage, approval, AST, mission graph, pin drift, config, host pressure, doctrine, worker gate, milestone)"] ts["3 TS-backed bodies parse constants/branches, mirror (finance, WorkAtlas, edge extractor)"] pos["Positive case expected shape"] neg["Negative case e.g. self-loop pruned, preacquired claim refused, forbidden surface blocked"] result records["metadata-only result records result, board, validation, sign-off; body_in_receipt false"] ceiling["Scope limit fixture evidence only"] manifest --> runtime fixtures --> runtime runtime --> digest runtime --> exercise exercise --> py exercise --> ts py --> pos py --> neg ts --> pos ts --> neg digest --> result records pos --> result records neg --> result records result records --> ceiling

The runtime source is src/microcosm_core/organs/batch9_macro_engines_capsule.py. Its load-bearing symbols are EXPECTED_MECHANISMS, EXPECTED_MODULE_IDS, EXPECTED_NEGATIVE_CASES, SOURCE_REQUIRED_ANCHORS, AUTHORITY_CEILING, run, run_batch9_bundle, and result_card.

Set-9 Pipeline

The Set-9 pipeline has four stages.

  1. Source import. source_module_manifest.json declares thirteen copied source bodies, each with source_ref, copied target path, digest equality fields, line and byte counts, material class, and required anchors. The manifest states source_import_class: copied_non_secret_macro_body, body_copied_material_count: 13, and body_in_receipt: false.
  1. Fixture execution. run consumes fixtures/first_wave/batch9_macro_engines_capsule/input, including batch9_macro_engines_capsule_probe_manifest.json plus thirteen negative-case files. It writes the result, board, validation result record, and optional sign-off JSON.
  1. Exported-bundle validation. run_batch9_bundle validates examples/batch9_macro_engines_capsule/exported_batch9_macro_engines_capsule_bundle. The bundle manifest names exported_batch9_macro_engines_capsule_bundle as the input mode, points at source_module_manifest.json, and declares thirteen negative cases.
  1. Result record and ceiling. The public result records may expose refs, digests, anchors, counts, verdicts, negative-case outcomes, and omission evidence. They must not inline copied source bodies or private/live payloads.

Mechanism Set

Mechanism idImported source bodyWhat the public exercise checks
lineage_temporal_provenance_chain_resolversystem/server/lineage.pyParent/truth lineage chain behavior and self-loop pruning.
approval_sign_off_claim_adjudicatorsystem/lib/approval_registry.pyApproval decision shape and claim-conflict enforcement.
python_ast_symbol_index_doc_treesystem/lib/python_documentation_tree.pyPython AST symbol extraction, including async/function/class coverage.
finance_news_dedup_cluster_rankersystem/server/ui/src/lib/financePresentation.tsHeadline fingerprinting, stopword behavior, and duplicate clustering.
mission_graph_topological_compilersystem/server/graph.pyDAG compilation, group closure, upstream dependency walk, and missing-target handling.
dependency_pin_drift_auditortools/dev/check_pin_drift.pyRequirement parsing and drift/missing/unparseable classification.
config_authority_drift_auditsystem/lib/config_authority_registry.pyConfig authority registry validation and mutation-allowed rejection.
heterogeneous_graph_edge_extractorsystem/server/ui/src/pages/RootNavigator.tsxGeneric edge-field map extraction and relation normalization.
work_atlas_cell_histogram_aggregatorsystem/server/ui/src/components/intelligence/WorkAtlas.tsxCell aggregation and the unrouted-only route-reason histogram gate.
host_pressure_admission_decision_gatesystem/lib/admission_consumer.pyAdmission normalization and summary-first blocking behavior.
doctrine_file_enrichment_multihop_joinsystem/server/doctrine_enrichment.pyFile-to-doctrine enrichment join and empty-envelope detection.
worker_job_budget_forbidden_surface_gatesystem/lib/type_a_worker_harness.pyProvider budget and forbidden-surface pre-dispatch gates.
milestone_relative_promotion_quality_accountingsystem/lib/population_lane_metrics.pyMilestone-relative promotion metrics and blocker-to-next-action classification.

Several tests deliberately mutate copied source bodies in a temporary public bundle and refresh the manifest digest. Finance, lineage, approval, AST, mission graph, dependency, config, WorkAtlas, heterogeneous edge, doctrine, worker-gate, host-pressure, and milestone tests prove the exercise result moves with source-body behavior rather than with static result record fixtures alone. Two tamper modes are load-bearing: an unapproved copied-body edit without a manifest digest refresh fails CROWN_JEWEL_SOURCE_DIGEST_MISMATCH, while a body edit with a refreshed digest is only accepted when the required witnesses and semantic exercise still pass. Removing a required witness while refreshing the digest still fails CROWN_JEWEL_SOURCE_ANCHOR_MISSING. The fixture path also resolves through the copied source-module manifest, so a fixture-only or static result record replacement is outside the accepted proof shape.

Copied-Body and Import Authority

The source-module manifest is the body-import authority for this paper module. It proves that the public bundle contains copied bodies and that the runtime can compare copied target digests with expected source digests and required anchors. It does not make the Markdown source authority.

The authority chain is:

  • core/paper_module_capsules.json::paper_modules[73:paper_module.batch9_macro_engines_capsule] is the paper-module bundle source row.
  • paper_modules/batch9_macro_engines_capsule.json is the governed generated instance derived from that bundle.
  • organs/batch9_macro_engines_capsule.json and mechanisms/mechanism.batch9_macro_engines_capsule.validates_public_macro_engines_capsule.json bind the accepted component and mechanism to the runtime, result records, and scope limit.
  • standards/std_microcosm_batch9_macro_engines_capsule.json defines the public standard: exactly thirteen mechanisms, exactly thirteen copied source source modules, metadata-only result records, and forbidden live-authority claims.

Current Partial-Realness Limitations

Set 9 is real system progress because it copies source bodies and verifies source-sensitive behavior in public fixtures. It is still partial-realness, not live authority.

  • The lineage exercise is a public provenance specimen, not live lineage truth.
  • The approval exercise checks adjudication mechanics, not human approval authority.
  • The finance exercise checks headline clustering over synthetic rows, not real market-level conclusions, investment-related actions, or news-truth authority.
  • The host-pressure exercise checks admission-consumer behavior over quoted fixtures, not host-state truth.
  • The WorkAtlas, worker-gate, and milestone exercises validate bounded mechanics, not live work log authority or external model access readiness.
  • The generated Markdown/JSON/site projections remain navigation and reader surfaces; source authority stays in JSON contracts, source manifests, tests, and result records.

Failure Modes

The standard and tests protect against these failure modes:

  • Mechanism count drifts away from thirteen.
  • Source-module count drifts away from thirteen without manifest and test updates.
  • The source manifest stops declaring copied_non_secret_macro_body.
  • A copied source body changes without a matching manifest digest update.
  • A copied source body loses required anchors, even if the manifest digest is refreshed.
  • Runtime exercises stop checking named engine semantics and become result record-only assertions.
  • Negative-case files declare error codes that the semantic evaluator does not actually observe.
  • Result records include copied body text, raw operator transcripts, provider/browser state, account secrets, live market data, private runtime state, or source bodies.
  • Public prose expands fixture evidence into launch, public sharing, provider, source-file changes, live-system, or private-system-equivalence authority.

Evidence Contract

Run these commands from microcosm-substrate/:

The fixture command proves the public fixture path. The bundle command proves the exported bundle path. The focused test suite covers exact-copy source imports, source-sensitive behavior shifts, copied-body digest mismatch blocking, source-import-class perturbation, required-witness removal with a refreshed digest, semantic negative cases, bundle validation, and metadata-only command cards. The doctrine projection checks prove only that the bundle-backed generated instance remains fresh for the current corpus. Rank saturation, rerank, and projection inheritance remain downstream routing work; this paper module does not apply or claim those projection mutations.

Reader Evidence Routing

Use this order when auditing the module:

  1. Read standards/std_microcosm_batch9_macro_engines_capsule.json for the governing standard and scope boundaries.
  2. Read src/microcosm_core/organs/batch9_macro_engines_capsule.py for expected mechanisms, expected modules, required source anchors, negative-case semantics, and scope limit.
  3. Read examples/batch9_macro_engines_capsule/exported_batch9_macro_engines_capsule_bundle/source_module_manifest.json for copied-body authority.
  4. Run the fixture and bundle validators, then the focused tests.
  5. Treat result records as metadata-only evidence summaries, not as copied body storage or live-system proof.

Prior Art Grounding

This bundle imports copied source engine bodies and exercises them over fixtures. It follows the characterization, or golden-master, testing tradition (Feathers, Working Effectively with Legacy Code), which pins existing behaviour with deterministic fixtures before trusting it. Microcosm borrows the pin-then-exercise shape; the result is fixture-bound import evidence, not lineage truth, human-approval authority, or market-level conclusions.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

PYTHONPATH=src python3 -m pytest tests/test_batch9_macro_engines_capsule.py -q
PYTHONPATH=src python3 scripts/build_doctrine_projection.py --check-paper-module-corpus

These are reader-verifiable evidence only and do not include launch operations, external model access, source-file changes, or whole-system correctness.

Scope boundary

Scope limit

This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.

It may not claim more than its bundle scope limit allows: Fixture-bound public source-body import and deterministic exercise evidence only; no live lineage truth, human approval authority, real market/news truth, host-state truth, work log truth, external model access, source-file changes, public sharing, launch-scope decision, or private-system equivalence.

Governance & Compiler Mechanisms BundleChecks thirteen copied governance and compiler routines against the code they were copied from.5/5

Does This bundle imports the Set-10 governance, compiler, launch, finance, dependency, DAG, table, reference, and recent-change source mechanisms as source-open system. It exposes for inspection the exact source-module digest evidence, the source-faithful public refactor for public sharing-manifest selector checks, the deterministic exercises, and the planted negative cases without exposing non-public paths, copied body text in result records, live ledgers, or launch-scope decision.

Scope limit It validates only the imported source body. It does not claim source authority, private-system equivalence, launch, public sharing, live ledger control, or source-file changes.

Run
microcosm batch10-governance-compilers-capsule run --input fixtures/first_wave/batch10_governance_compilers_capsule/input --out receipts/first_wave/batch10_governance_compilers_capsule --acceptance-out receipts/acceptance/first_wave/batch10_governance_compilers_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 10 Governance And Compilers Bundle

Purpose

This bundle answers one question: when the wider system claims that a governance gate, a compiler, or a launch check behaves correctly, can a cold reader confirm that claim from copied source and a re-run, rather than taking the claim on trust? It collects fourteen mechanisms that already exist in the main system, copies their source bodies into the public bundle, and re-runs a small, source-faithful port of each one against controlled inputs.

The mechanisms span the work they were drawn from: a mutation gate that reads the latest user message and blocks file writes when the intent is diagnostic; an observe/apply compiler that turns an artifact into an apply plan and refuses malformed input; a reviewer gauntlet that checks a public proof bundle from several reader personas; launch-blocker triage; a public sharing path-contract check; result record-reuse staleness; a no-lookahead finance horizon; a session dependency wave; claim-conflict detection; role-aware blocking in a task graph; and three frontend helpers for table shaping, reference grouping, and recent-change coalescing.

What is unusual is the stance towards its own fixtures. The negative-case files on disk hold only a label and an expected error code. The bundle does not treat that error code as proof of anything. For each negative case it recomputes the outcome itself, in code, and compares the computed result against the expectation. A fixture that merely declares the right error code, without the ported logic actually producing it, is flagged rather than passed. The point is to stop a test from grading itself green by assertion.

Route Card

  • Component id: batch10_governance_compilers_capsule
  • JSON bundle authority: core/paper_module_capsules.json::paper_module.batch10_governance_compilers_capsule
  • Accepted-component evidence class: verified_macro_body_import
  • Runtime source: src/microcosm_core/organs/batch10_governance_compilers_capsule.py
  • Fixture input: fixtures/first_wave/batch10_governance_compilers_capsule/input
  • Runtime bundle: examples/batch10_governance_compilers_capsule/exported_batch10_governance_compilers_capsule_bundle
  • Exact-copy authority: the bundle source_module_manifest.json plus copied source modules; refresh through macro_projection_import_protocol, not by hand.

This Microcosm component imports and exercises Set-10 source system for governed mutation, observe/apply compilation, public-proof review, launch blocker triage, public sharing path contracts, result record reuse, no-lookahead horizons, session-wave execution, claim conflict wait tax, role-aware DAG blocking, frontend data shaping, reference grouping, and recent-change coalescing.

The bundle carries exact source source snapshots where safe. publication_manifest_selector_contract_verifier is represented as a source-faithful public refactor because the source source contains a private home-path example. weighted_lane_width_apportionment_solver is recorded as a binding repair deferred to the Set-9 RootNavigator body, not as a fresh Set-10 import.

Integrity hardening: negative-case fixture files are labels and stable-code rows only. The result record's exercise.integrity_matrix is the verdict surface: each Set-10 mechanism records source relation, positive computed output, negative input shape, negative computed output, scope limit, and whether the result was computed by the bundle evaluator. A fixture-supplied error_codes row is never enough to prove refusal behavior.

Shape

The source row is core/paper_module_capsules.json::paper_modules[75:paper_module.batch10_governance_compilers_capsule]; the generated instance is paper_modules/batch10_governance_compilers_capsule.json; and the runtime source locus is src/microcosm_core/organs/batch10_governance_compilers_capsule.py. The specific standard is standards/std_microcosm_batch10_governance_compilers_capsule.json, with Microcosm-wide coverage and entry boundaries governed by std_microcosm.

seedsbounds prosenames laws and source authoritycites code locuscomputes integrity matrix and result recordspublic inputs, exact copies, declared refactormanifest and source bundle validatederives edgesnavigation onlyresult record evidence remains belowenforces public/private and launch boundarymust not outrankJSON bundle source rowJSON bundle source rowGenerated JSON instanceGenerated JSON instanceMarkdownStandardsstd_microcosmStandards std_microcosmRuntime/source lociexercise 14 mechanism portsresolve source evidence permechanismrecompute each negative caseRuntime/source loci exercise 14 mechanism ports resolve source evidence per mechanism recompute each negative caseFixtures and source bundlefixtures/first_wave/.../input(labels + expected codes)exported bundle: 13 copiedsource modulesFixtures and source bundle fixtures/first_wave/.../input (labels + expected codes) exported bundle: 13 copied source modulesTests and result recordsTests and result recordsGenerated navigationprojectionsGenerated navigation projectionsScope limitfixture-bound publicsource-open evidence onlyno live ledger truth,source-file changes, publicsharing, launch, provider,private-system, benchmark, ormarket authorityScope limit fixture-bound public source-open evidence only no live ledger truth, source-file changes, public sharing, launch, provider, private-system, benchmark, or market authority

Source refs

JSON bundle source row
core/paper_module_capsules.jsonpaper_module.batch10_governance_compilers_capsule
Generated JSON instance
paper_modules/batch10_governance_compilers_capsule.json
paper_modules/batch10_governance_compilers_capsule.md
Standards std_microcosm
std_microcosm_batch10_governance_compilers_capsule
Runtime/source loci exercise 14 mechanism ports resolve source evidence per mechanism recompute each negative case
batch10_governance_compilers_capsule.pyflag fixture_verdict_echo_risk
Fixtures and source bundle fixtures/first_wave/.../input (labels + expected codes) exported bundle: 13 copied source modules
source_module_manifest.json
Tests and result records
tests/test_batch10_governance_compilers_capsule.pyreceipts/runtime_shell/demo_project/organs/batch10_governance_compilers_capsule
Diagram source
flowchart LR Bundle["JSON bundle source row core/paper_module_capsules.json paper_module.batch10_governance_compilers_capsule"] Instance["Generated JSON instance paper_modules/batch10_governance_compilers_capsule.json"] Markdown["Markdown reader projection paper_modules/batch10_governance_compilers_capsule.md"] Standard["Standards std_microcosm_batch10_governance_compilers_capsule std_microcosm"] Runtime["Runtime/source loci batch10_governance_compilers_capsule.py exercise 14 mechanism ports resolve source evidence per mechanism recompute each negative case flag fixture_verdict_echo_risk"] Fixtures["Fixtures and source bundle fixtures/first_wave/.../input (labels + expected codes) exported bundle: 13 copied source modules source_module_manifest.json"] Tests["Tests and result records tests/test_batch10_governance_compilers_capsule.py result records/runtime_shell/demo_project/components/batch10_governance_compilers_capsule"] Projections["Generated navigation projections Mermaid: available_from_capsule_edges Atlas: linked_from_capsule_edges"] Ceiling["Scope limit fixture-bound public source-open evidence only no live ledger truth, source-file changes, public sharing, launch, provider, private-system, benchmark, or market authority"] Bundle -->|seeds| Instance Bundle -->|bounds prose| Markdown Bundle -->|names laws and source authority| Standard Bundle -->|cites code locus| Runtime Runtime -->|computes integrity matrix and result records| Tests Fixtures -->|public inputs, exact copies, declared refactor| Runtime Fixtures -->|manifest and source bundle validate| Tests Instance -->|derives edges| Projections Projections -->|navigation only| Markdown Tests -->|result record evidence remains below| Ceiling Standard -->|enforces public/private and launch boundary| Ceiling Markdown -->|must not outrank| Bundle

The bundle makes the module actual by binding five reader questions to typed authority surfaces:

  • What is the source of record? The source record and generated JSON instance, not this Markdown file and not generated Mermaid or Atlas output.
  • What is being exercised? The accepted batch10_governance_compilers_capsule component, the mechanism.batch10_governance_compilers_capsule.validates_public_governance_compilers_capsule mechanism, and the concept.import_projection_and_drift_control_bundle concept edge named by the bundle.
  • Which runtime and source artifacts matter? The component module computes the integrity matrix, negative-case verdicts, source evidence, fixture run, bundle validation, result card, and AUTHORITY_CEILING; the exported bundle carries source_module_manifest.json, copied source modules, and the declared public refactor for the private-path-bearing public sharing manifest selector body.
  • Which result records and tests are binding? The focused test file verifies the fixture run, bundle validation, digest mismatch rejection, private-body omission, negative-case semantics, source-evidence classifications, source helper parity, and reviewer-gauntlet behavior; the result record directory under receipts/runtime_shell/demo_project/organs/batch10_governance_compilers_capsule holds the runtime shell validation result, board, and validation result record.
  • What is the honest ceiling? The module can claim fixture-bound public source-open import/refactor evidence, deterministic exercise evidence, integrity-matrix verdicts, metadata-only result records, and validation result records. It cannot claim live work log truth, live work log truth, source-file changes, publishing-scope decision, launch-scope decision, external model access, private root equivalence, neutral benchmark evidence, market advice, deployment posture, or whole-system correctness.

Bundle-Bound Reader Shape

The JSON bundle binds this paper module to one accepted subject: the batch10_governance_compilers_capsule component. The executable proof locus is src/microcosm_core/organs/batch10_governance_compilers_capsule.py, especially _build_integrity_matrix, _source_evidence, _evaluate, run, run_batch10_governance_compilers_bundle, result_card, EXPECTED_MECHANISMS, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.

The bundle keeps the mechanism and concept layer intentionally narrow: it names the resolving governance/compiler mechanism subject and the concept.import_projection_and_drift_control_bundle concept, while additional concept or mechanism edges stay residual until resolving Microcosm rows exist. Its law edges are bounded to content-addressed reuse, provenance, freshness, and projection-below-source rules: P-2, P-5, P-9, P-15, AX-4, AX-8, AX-10, and AX-11. Its sibling paper-module dependencies are macro_projection_import_protocol, batch10_live_source_drift_capsule, and batch9_macro_engines_capsule.

If a projection disagrees with the bundle or refreshed source-open bundle, refresh the projection; do not edit generated output by hand.

How it works

The run takes a public input directory, validates the source-module manifest, and exercises each of the fourteen mechanisms against inputs the evaluator constructs itself. _build_integrity_matrix then writes one row per mechanism. Each row records the source evidence for that mechanism, the positive computed output, the attached negative cases with their computed outputs, the scope limit, and a current_action of keep, harden, or block.

Source evidence is resolved per mechanism by _source_evidence. A mechanism's named source reference is looked up in the manifest. If the body was copied exactly, the row carries the copy's digest status and anchor-match count. If the body could not be copied verbatim, the row instead names a declared source-faithful public refactor and records the original source digest. Two mechanisms are honest about not being plain copies. publication_manifest_selector_contract_verifier is a public refactor, because the source source carried a private home-path example that cannot ship. weighted_lane_width_apportionment_binding_repair is recorded as an under-bound repair deferred to the Set-9 RootNavigator body, so it is held as a block rather than presented as a fresh Set-10 import.

The negative cases are handled the same way. For each case, _compute_negative_case_probe runs the ported logic over the case's declared input and reads the result at a named path. For example, the mutation case feeds a diagnostic message and confirms prohibit_file_writes is true; the finance case feeds an unparseable horizon and confirms it is rejected; the public sharing case feeds a non-public paths against a hard-exclude rule and confirms it is caught. A row counts as proven only when the computed value matches the expectation. If any negative case lacks computed evidence, the summary raises fixture_verdict_echo_risk, and the run is blocked. The bundle also requires exactly thirteen copied source modules, so a thinned bundle fails rather than passes quietly.

Prior Art Grounding

The component is grounded in policy-as-code, admission-control, and supply-chain assurance patterns: compile rules into deterministic checks, reject unsupported actions before they mutate state, and preserve provenance for the decision. Relevant anchors include:

  • Open Policy Agent, which decouples policy decisions from enforcement and evaluates structured input against machine-readable rules.
  • Kubernetes validating admission policies, which can block, warn, or audit non-compliant API requests before admission.
  • SLSA and OpenSSF Scorecard, which represent the broader software-supply-chain pattern of typed assurance levels, checks, and provenance.

Microcosm borrows the compiler/gate shape for governed mutation, public sharing path contracts, blocker triage, result record reuse, and claim-conflict accounting. The bundle remains fixture-bound evidence over copied or refactored source system; it is not live work log truth, source-file changes, publishing-scope decision, or investment-related actions.

Reader Evidence Routing

A cold reader should inspect the evidence in this order:

  1. Open the JSON source record to confirm source authority, subject ids, dependency ids, principle and axiom refs, code locus, Mermaid status, Atlas status, and the absence of unresolved selective relations.
  2. Run the focused component test to prove the public fixture still computes the integrity matrix and observes the required negative cases.
  3. Run the exported bundle validator when copied source digests, declared public refactors, metadata-only result records, or source-evidence rows are the question.
  4. Treat generated JSON, Mermaid, Atlas, and coverage as projection evidence only; if they drift, refresh them through the doctrine-lattice builder.
  5. Use the result record floor to verify source relations, positive and negative computed outputs, scope limits, and metadata-only result record payloads.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the governance/compiler integrity-matrix result record and sign-off JSON. The bundle command validates copied or source-faithful source system, source evidence, positive and negative exercise rows, metadata-only result records, and scope limit fields. The focused test verifies the mechanism matrix, negative floor, bundle validation, and scope limit.

This result record path is reader-verifiable evidence only. It does not establish live work log truth, live work log truth, source-file changes, publishing-scope decision, launch-scope decision, external model access, neutral benchmark evidence, private-system equivalence, or investment-related actions.

Scope boundary

Scope limit

This module may claim public fixture evidence that the copied or declared governance/compiler source system produced source-evidence rows, computed positive and negative exercise rows, integrity-matrix verdicts, metadata-only result records, and validation result records with explicit scope limits.

This module may not claim live work log truth, live work log truth, source-file changes, publishing-scope decision, launch-scope decision, external model access, neutral benchmark evidence, private-system equivalence, investment-related actions, deployment posture, or whole-system correctness.

Scope limit

This is not live work log truth, not live work log truth, not source-file changes, not public sharing or launch-scope decision, not external model access, not neutral benchmark evidence, not private-system equivalence, and not investment-related actions.

The useful claim is narrower: over the public fixtures and refreshed source-open bundle, the component shows that the Set-10 governance/compiler mechanisms have copied or declared source evidence, computed positive and negative exercise rows, and metadata-only result records with explicit scope limits.

Saturation Engines BundleVerifies twelve copied engine routines and computes each failure probe from inputs, not echoes.5/5

Does This bundle imports twelve Set-11 saturation-engine mechanisms as source-open system. It exposes for inspection exact source-module digest evidence, source-faithful computed exercises, and computed negative-case probes without exposing non-public paths, copied body text in result records, live runtime state, source-file changes, public sharing, or launch-scope decision.

Scope limit It validates only the imported source body. It does not claim source authority, private-system equivalence, launch, public sharing, live runtime control, or source-file changes.

Run
microcosm batch11-saturation-engines-capsule run --input fixtures/first_wave/batch11_saturation_engines_capsule/input --out receipts/first_wave/batch11_saturation_engines_capsule --acceptance-out receipts/acceptance/first_wave/batch11_saturation_engines_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 11 Saturation Engines Bundle

Purpose

batch11_saturation_engines_capsule is a Microcosm component for the Set-11 saturation pass. It takes thirteen unrelated pieces of internal machinery, copies their source bodies into a public bundle, and re-runs each one against small synthetic fixtures so a reader can see the logic behave rather than take a claim on trust. The thirteen targets are deliberately mixed:

  • run affinity session scoring
  • calculator cluster insight derivation
  • std_python delta ratchet gating
  • exogenous navigation ladder grading
  • portability gate supersession rollup
  • shard browse context-priority sectioning
  • holographic research evidence selection
  • projection secret scanning
  • stockgrid flow multisource merge and unit normalization
  • source regime bucketing and z-score board construction
  • frontend navigation wayfinding
  • agent session diagnostic lenses
  • demo-take story coverage auditing

The single question the bundle answers is narrow: for each of these mechanisms, does the imported source actually compute the guard it claims to, on inputs designed to fail? It is a saturation pass because the targets share nothing except that pattern. They are a route ranker, a few financial-data normalisers, a navigation grader, a secret scanner, a graph wayfinder, and so on, swept up together so a reviewer can audit a broad slice of the codebase from one place.

The part worth noticing is how a negative case is treated. A fixture file named ..._stale_terminal_rejected is only a label. The bundle never lets that label stand in for a result. It re-runs the real function on the fixture's own probe_input, computes whether the guard fired, and refuses to mark the case verified unless the mechanism's own exercise and the independent probe both agree. A fixture that asserts a failure it cannot demonstrate is flagged, not counted. That guard against self-congratulating fixtures is the reason the page exists.

How it works

The run loop is the same for every target. The bundle first imports the copied source bundle and checks each module against the recorded source digest, line count, and a handful of required provenance anchors, so a drifted or partial copy is caught before any logic runs. It then exercises all thirteen mechanisms in a fixed order, and any reordering, blocked exercise, or missing module fails the run.

Each mechanism's exercise feeds an integrity matrix row. A row pairs the mechanism's own computed output with an independently computed fixture probe and a binding disposition that records how the mechanism relates to the rest of the system: a new import, an already-bound gate the bundle is only re-checking, or an under-bound path it is extending. The two computed values must agree. The matrix marks a row's negative result verified only when the mechanism exercise and the fixture probe both come out true, and it sets fixture_verdict_echo_risk on any row where they do not. A non-zero echo-risk count is a finding that blocks the whole run.

Two short examples show what the probes actually compute. For run affinity, the probe builds a recommendation over candidate runs and confirms that a stale terminal run, even when made sticky and feed-rich, is not the one selected. For projection secret scanning, the probe runs the redaction patterns over a file carrying a synthetic key shape and a private ledger path and confirms both are blocked. The fixtures are synthetic and the key shapes are deliberate test strings, never live material.

The failure mode all of this guards against is the quiet pass: a fixture whose filename promises a rejection while the code underneath was never exercised, or was exercised and did not reject. By recomputing the guard from the fixture's own input and refusing to count a label it cannot reproduce, the bundle keeps the negative cases honest. The result records carry refs, digests, counts, and the computed verdicts; the copied bodies stay in the bundle's source_modules tree and are never inlined.

Shape

This module's shape is a reader map over source-backed artifacts, not a new authority layer. The source record in core/paper_module_capsules.json is the source of record for subjects, code loci, doctrine refs, dependency edges, and projection status; paper_modules/batch11_saturation_engines_capsule.json is the governed JSON parity seed; this Markdown only narrates the proof boundary.

seeds subjects, dependencies, code locus, projection statusgoverned bycites resolved runtime/source locusrequires fixture and result record contractrequires copied/source-faithful public bundleexercisesvalidates exact-copy/source-faithful evidencewrites metadata-only result and validation result recordschecks runtime, bundle, corpus, projection freshnessgenerated projection edge statusbounded evidence, not launch-scope decisionprojection, source-linked onlypaper_modules[76:paper_module.batch11_saturation_engines_bundle]source basis: source recordpaper_modules[76:paper_module.batch11_saturation_engines_bundle] source basis: source recordgoverned JSON instancegoverned JSON instanceactive public runtimestandardboundary: not livenavigation/ledger/market/secret authorityactive public runtime standard boundary: not live navigation/ledger/market/secret authorityrun, validate-bundle,result_card, scope_limitrun, validate-bundle, result_card, scope_limitpublic mechanism andnegative-case probespublic mechanism and negative-case probessource_module_manifest.json:12 copied/refactored publicsource modulessource_module_manifest.json: 12 copied/refactored public source modulesTestsTestsstatus: pass; accepted: true;body_in_receipt: falsestatus: pass; accepted: true; body_in_receipt: falseatlas/doctrine_lattice_graph.mmd anddoctrine_lattice_projection.jsonatlas/doctrine_lattice_graph.mmd and doctrine_lattice_projection.jsonScope limitfixture-bound source-bodyimport, source-faithfulpublic ports,computed negative probes,metadata-only result recordsonlyScope limit fixture-bound source-body import, source-faithful public ports, computed negative probes, metadata-only result records only

Source refs

paper_modules[76:paper_module.batch11_saturation_engines_bundle] source basis: source record
core/paper_module_capsules.json
governed JSON instance
paper_modules/batch11_saturation_engines_capsule.jsonmarkdown: legacy_import_projection_until_roundtrip_builder
active public runtime standard boundary: not live navigation/ledger/market/secret authority
standards/std_microcosm_batch11_saturation_engines_capsule.json
run, validate-bundle, result_card, scope_limit
src/microcosm_core/organs/batch11_saturation_engines_capsule.py
public mechanism and negative-case probes
fixtures/first_wave/batch11_saturation_engines_capsule/input
source_module_manifest.json: 12 copied/refactored public source modules
examples/batch11_saturation_engines_capsule/exported_batch11_saturation_engines_capsule_bundle
Tests
tests/test_batch11_saturation_engines_capsule.pyscripts/build_doctrine_projection.py --check-paper-module-corpusscripts/build_doctrine_projection.py --check
status: pass; accepted: true; body_in_receipt: false
receipts/first_wave/batch11_saturation_engines_capsule/*receipts/acceptance/first_wave/batch11_saturation_engines_capsule_fixture_acceptance.json
Diagram source
flowchart TD bundle["core/paper_module_capsules.json paper_modules[76:paper_module.batch11_saturation_engines_capsule] source basis: source record"] instance["paper_modules/batch11_saturation_engines_capsule.json governed JSON instance markdown: legacy_import_projection_until_roundtrip_builder"] standard["standards/std_microcosm_batch11_saturation_engines_capsule.json active public runtime standard boundary: not live navigation/ledger/market/secret authority"] runtime["src/microcosm_core/components/batch11_saturation_engines_capsule.py run, validate-bundle, result_card, scope_limit"] fixture["fixtures/first_wave/batch11_saturation_engines_capsule/input public mechanism and negative-case probes"] bundle["examples/batch11_saturation_engines_capsule/exported_batch11_saturation_engines_capsule_bundle source_module_manifest.json: 12 copied/refactored public source modules"] tests["tests/test_batch11_saturation_engines_capsule.py scripts/build_doctrine_projection.py --check-paper-module-corpus scripts/build_doctrine_projection.py --check"] result records["result records/first_wave/batch11_saturation_engines_capsule/* result records/sign-off/first_wave/batch11_saturation_engines_capsule_fixture_acceptance.json status: pass; accepted: true; body_in_receipt: false"] atlas["atlas/doctrine_lattice_graph.mmd and doctrine_lattice_projection.json Mermaid: available_from_capsule_edges Atlas: linked_from_capsule_edges"] ceiling["Scope limit fixture-bound source-body import, source-faithful public ports, computed negative probes, metadata-only result records only"] bundle -->|seeds subjects, dependencies, code locus, projection status| instance bundle -->|governed by| standard instance -->|cites resolved runtime/source locus| runtime standard -->|requires fixture and result record contract| fixture standard -->|requires copied/source-faithful public bundle| bundle runtime -->|exercises| fixture runtime -->|validates exact-copy/source-faithful evidence| bundle runtime -->|writes metadata-only result and validation result records| result records tests -->|checks runtime, bundle, corpus, projection freshness| result records instance -->|generated projection edge status| atlas result records -->|bounded evidence, not launch-scope decision| ceiling atlas -->|projection, source-linked only| ceiling

The public/private and launch boundary stays narrow: the fixture inputs, source refs, digest rows, computed values, negative-probe labels, sign-off status, and metadata-only result records are evidence for the standalone microcosm-substrate bundle. They do not authorize live work log claims, navigation decisions, market or investment conclusions, complete secret detection, transcript or video authority, source-file changes, external model access, publishing-scope decision, launch-scope decision, private-system equivalence, generated-lattice source authority, or whole-system correctness.

Reader Evidence Routing

Read this module through the fixture, exported-bundle, focused-test, and generated-row surfaces. The fixture and bundle commands prove public source-body import discipline: exact copied-source digests, source-faithful public ports, computed negative-probe values, and metadata-only result cards. The structured source record proves that the paper module is bundle-backed and that Mermaid and Atlas availability come from bundle edges rather than prose.

The mixed Set-11 target list remains evidence routing, not an authority expansion. The reader should treat each target as a public fixture exercise inside the accepted saturation-engines component, not as live work log truth, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, publishing-scope decision, or launch-scope decision.

Prior Art Grounding

The component borrows from overload management, backpressure, and observability practice: systems need explicit signals for saturation, queue pressure, freshness, and recoverability instead of relying on a single success/failure bit. Relevant anchors include:

Microcosm borrows the saturation-signal and pressure-accounting pattern across its mixed Set-11 targets: route affinity, delta gates, shard browse priorities, evidence selection, secret scanning, market boards, wayfinding, and diagnostic lenses. The bundle computes public fixture verdicts; it is not live work log truth, complete secret detection, live market data, or launch-scope decision.

Binding Dispositions

Set-11 contained a mixed target set. The bundle records the distinction explicitly:

  • New or under-bound imports: run affinity, calculator insight, exogenous nav grading, shard browsing, holographic evidence selection, quant stockgrid, source regime board, frontend wayfinding, and session diagnostics.
  • Already-bound validations: projection secret scan and portability gate are covered by the engine-room public projection leak gate family; demo-take coverage is already represented by the Set-7 demo-take component. Set-11 validates the relevant scoring or gate behavior rather than claiming a standalone authority surface.
  • Partial existing system: the std_python ratchet path had existing assay coverage; the Set-11 bundle adds a bounded delta-regression witness.

Shared Wiring Status

The component-owned system can validate independently. Shared registry, atlas, sign-off, Components, ARCHITECTURE, preflight, and package wiring must be serialized behind the live shared Microcosm binding owner before this component is promoted to whole-surface discoverability.

Validation Result record Path

Negative-case fixture files are inputs, not verdicts. Each file carries a public probe_input; the component computes the corresponding fixture probe and records fixture_probe_input_digest, fixture_computed_value, and mechanism_computed_value in the integrity matrix before counting a negative case as verified.

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the Set-11 saturation-engine result record and sign-off JSON. The bundle command validates copied source-source digests, source-faithful public port evidence, computed negative-probe evidence, and metadata-only cards. The focused test covers the runtime component, exported bundle shape, exact-copy imports, private body omission, stable negative cases, and tier-B mechanism output coverage. The corpus and projection checks prove only that the generated paper-module instance remains fresh for this bundle-backed Markdown state.

This result record path is public fixture evidence only. It does not establish live work log truth, navigation authority, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, source-file changes, publishing-scope decision, launch-scope decision, external model access, or whole-system correctness.

Scope boundary

Boundary

This bundle is not live work log truth, navigation authority, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, publishing-scope decision, or launch-scope decision. Result records expose only refs, digests, counts, computed verdicts, public negative-case probe digests, and omission result records; copied source source bodies remain under the public bundle's source_modules tree.

Scope limit

This bundle is fixture-bound public source-body import, source-faithful public port evidence, computed negative-probe evidence, and metadata-only result record evidence only. It does not establish live work log truth, navigation authority, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, source-file changes, publishing-scope decision, launch-scope decision, external model access, private-system equivalence, or whole-system correctness.

Scope limit

Those result records do not prove live work log truth, navigation authority, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, external model access, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness.

Tool Server Pressure InventoryFlags detached helper processes and launch pressure from synthetic rows, not live hosts.5/5

Does This component imports the source helper-process pressure inventory pattern as a public-safe, read-only validator. Over synthetic ps-shaped process rows it surfaces detached helper candidates, active-owner descendants, keep runtimes, and owner-launch pressure groups without reading live host processes or exposing command bodies.

Scope limit validates declared public helper-process pressure inventory contract only; no live process reads, process signalling, host mutation, launch-scope decision, external model access, non-public data equivalence, or whole-system correctness

Run
microcosm tool-server-pressure-inventory run --input fixtures/first_wave/tool_server_pressure_inventory/input --out receipts/first_wave/tool_server_pressure_inventory --acceptance-out receipts/acceptance/first_wave/tool_server_pressure_inventory_fixture_acceptance.json

EvidenceContract validatorevidence 5/5Import validation

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Tool Server Pressure Inventory

tool_server_pressure_inventory is the public read-only import of the source helper-process pressure inventory pattern from tools/meta/control/orphan_reaper.py. It validates the classifier without exposing live host state: fixtures inject a synthetic ps-shaped process table, synthetic helper-kind policy rows, and a synthetic owner-status taxonomy.

The accepted component keeps the load-bearing mechanism:

  • parse helper processes from ps-shaped rows
  • classify helper kind and owner status
  • distinguish detached orphan candidates from active-owner descendants
  • emit launch requests for over-budget active owners
  • keep all rows digest-only through command_hash

The exported bundle carries a source module manifest plus a source-faithful refactor body under source_modules/tools/meta/control/. That manifest records the source source ref, target digest, source digest, relation, material class, and required anchors. Result records carry refs, hashes, counts, and verdicts only; they do not inline the copied/refactored body.

The component rejects seven boundary failures:

  • active-owner descendants marked as safe-close candidates
  • unknown-owner processes marked as safe-close candidates
  • detached processes younger than the minimum age marked safe-close
  • process-signal results on the public surface
  • live command bodies instead of digest-only rows
  • absolute host paths
  • active-owner launch requests that overclaim kill or termination

Purpose

Long-running agent sessions leave helper processes behind: MCP servers, dev servers, keepalives. Over time these accumulate and the host slows down. The obvious fix is a reaper that walks the process table and kills stale helpers, and the source tool this component is ported from does exactly that. But a reaper is dangerous. The hard case is telling a genuinely abandoned process apart from a helper that a live session is still using. Kill the wrong one and you break the work in flight.

This component answers a single question: given a process table, which helper processes are safe to close, and which must be left alone because a live owner still depends on them? It does so by reconstructing each process's owner chain. A helper whose parent is launchd (ppid == 1) has been detached from any session and is a candidate. A helper that still traces back through a live agent session is not. The decision is deliberately narrow: a process is a safe-close candidate only when it is detached, its kind is on an allowlist, and it has been idle past a minimum age. Everything else routes to "needs an owner check" or "keep".

What is unusual is the second half of the design. When an active owner is over its helper budget, the component does not propose a kill. It emits a launch *request*: a row that asks the owning session to launch or reuse its own lease. The inventory is explicitly not a kill list. The central invariant, enforced by an audit pass over the component's own output, is that an active-owner descendant can never become a safe-close candidate.

The public version keeps that classifier and that invariant but removes every actuator. There is no os.kill, no signal, no live ps call. Input is synthetic process text from a fixture, rows carry a command_hash rather than a command line, and a redaction guard rejects any fixture that smuggles an absolute path, a live command body, or a process-signal claim onto the public surface. The result is the safety reasoning of a reaper presented as a read-only validator, with the part that could actually harm a host left out.

Shape

Synthetic pressure fixtureprocess_table,pressure_policy,owner_classesSynthetic pressure fixture process_table, pressure_policy, owner_classesClassify helper kind,walk owner chain (up to 8hops),hash command to command_hashClassify helper kind, walk owner chain (up to 8 hops), hash command to command_hashOwner status?Owner status?Detached orphanppid == 1Detached orphan ppid == 1Active owner or keep runtimeActive owner or keep runtimecandidate_safe_closeonly if allowlistedand age >= mincandidate_safe_close only if allowlisted and age >= minrequires_owner_checkor keeprequires_owner_check or keepOver-budget owner:launch REQUEST,never a killOver-budget owner: launch REQUEST, never a killBoundary failuresunsafe safe-close, commandleak,process signal, absolutepath,launch overclaimBoundary failures unsafe safe-close, command leak, process signal, absolute path, launch overclaimSource manifestpublic refactor digest +anchorsSource manifest public refactor digest + anchorsmetadata-only result recordsresult, board, validation,fixture sign-offmetadata-only result records result, board, validation, fixture sign-off
Diagram source
flowchart LR Fixture["Synthetic pressure fixture process_table, pressure_policy, owner_classes"] Classifier["Classify helper kind, walk owner chain (up to 8 hops), hash command to command_hash"] Owner{"Owner status?"} Detached["Detached orphan ppid == 1"] Keep["Active owner or keep runtime"] SafeClose["candidate_safe_close only if allowlisted and age >= min"] Check["requires_owner_check or keep"] launch["Over-budget owner: launch REQUEST, never a kill"] Negative["Boundary failures unsafe safe-close, command leak, process signal, absolute path, launch overclaim"] Source["Source manifest public refactor digest + anchors"] Result records["metadata-only result records result, board, validation, fixture sign-off"] Fixture --> Classifier Classifier --> Owner Owner --> Detached Owner --> Keep Detached --> SafeClose Detached --> Check Keep --> Check Keep --> launch Classifier --> Negative SafeClose --> Result records Check --> Result records launch --> Result records Negative --> Result records Source --> Result records

Technical Mechanism

The runtime mechanism is an actuatorless port of the read-only pressure path in tools/meta/control/orphan_reaper.py. The component receives injected synthetic ps_text plus pressure_policy.json and owner_classes.json; it never shells out to ps, imports process-control modules, or sends signals. _parse_process_rows normalizes process rows, _process_kind maps command tokens to helper kinds, and _owner_status_for_process walks parent links up to eight hops to separate launchd_detached helpers from active owner chains and keep runtimes.

The decision law is deliberately narrow. _inventory_owner_and_decision emits candidate_safe_close only when a helper is detached (ppid == 1), its kind is allowlisted, and its age exceeds the configured threshold. Active-owner chains, unknown parents, young detached helpers, and keep runtimes route to requires_owner_check or keep. Over-budget active-owner groups are summarized by _active_owner_pressure_groups, but the emitted helper_owner_release_request_v1 can only ask the owner to launch the helper; it cannot claim that Microcosm killed, terminated, or safely closed a process.

The source-open body floor and the public result records enforce the same membrane. _source_module_manifest_result verifies the exported orphan_reaper_pressure_inventory_public_refactor body, its source_faithful_public_refactor relation, target digest, and required anchors. _redaction_findings rejects command previews, absolute host paths, and process-signal claims before result record writing. The result is a pressure classifier with executable evidence and a hard no-actuator boundary, not a live host cleanup tool.

Reader Evidence Routing

Read the positive fixture as pressure-inventory evidence, not host process control. The fixture supplies process_table.json, pressure_policy.json, and owner_classes.json; the component classifies helper kind, owner status, detached safe-close eligibility, active-owner descendants, keep runtimes, and over-budget active-owner groups. Active-owner pressure becomes a launch request row, not a kill, terminate, or signal action.

Read the negative cases as the scope limit. The required failures are active_owner_kill_candidate.json, unknown_owner_kill.json, premature_safe_close.json, process_signal_sent.json, command_preview_leak.json, absolute_path_leak.json, and owner_release_overclaim.json. They prove the public surface rejects unsafe safe-close candidates, live command bodies, absolute host paths, process-signal claims, and launch-overclaim language.

Read source-open evidence through the source module manifest. The exported bundle includes one copied public refactor body at examples/tool_server_pressure_inventory/exported_tool_server_pressure_inventory_bundle/source_modules/tools/meta/control/orphan_reaper_pressure_inventory.py. The manifest binds source and target digests, declares source_faithful_public_refactor, requires anchors such as build_tool_server_pressure_inventory, build_pressure_hygiene_relief_receipt, no_process_signal_sent, and request_owner_release, and keeps body_in_receipt and body_text_in_receipt false.

Named Proof Consumers

  • Runtime fixture consumer: microcosm_core.organs.tool_server_pressure_inventory run consumes the synthetic pressure fixture and writes the result, board, validation result record, and sign-off result record.
  • Source-body consumer: microcosm_core.organs.tool_server_pressure_inventory run-pressure-bundle consumes the exported source-module bundle and blocks on missing manifests, target-ref mismatch, digest mismatch, unsafe body classes, or redaction hits.
  • Focused pytest consumer: tests/test_tool_server_pressure_inventory.py asserts every expected negative case, verifies that active-owner descendants are never safe-close candidates, checks owner-launch requests instead of kill actions, scans the component and public refactor AST for process-control imports or .kill(...), validates target-ref/digest parity, and checks compact card omission result records.
  • Scope limit consumer: the standard standards/std_microcosm_tool_server_pressure_inventory.json and the scope limit in the component require process_signal_authority, live_process_table_read_authorized, host_mutation_authorized, release_authorized, provider_calls_authorized, and whole_system_correctness_claim to remain false.

Prior Art Grounding

This component draws on process-inventory, tool-server, and owner-reference patterns. psutil.process_iter() is a common API for iterating over process metadata without shelling out to ad hoc ps parsing. Kubernetes garbage collection uses owner references to distinguish objects that may be collected from objects still owned by live controllers. The Model Context Protocol's tool-server model gives the local "server exposes callable tools" shape. The Microcosm version keeps the result deliberately weaker: synthetic rows are classified for pressure and safe-close eligibility, but the component does not read live host state or send signals.

Prior-art anchors:

  • psutil process iteration: https://psutil.readthedocs.io/en/latest/#psutil.process_iter
  • Kubernetes owner-reference garbage collection: https://kubernetes.io/docs/concepts/architecture/garbage-collection/
  • Model Context Protocol tool servers: https://modelcontextprotocol.io/docs/concepts/tools

Scope limit: this is projection and validation only. It does not read the live process table, signal processes, mutate host state, include launch operations, use external model services, export private account or browser state, or prove whole-system correctness.

Validation Result record Path

From microcosm-substrate, validate with result records under /tmp:

Passing result records prove synthetic inventory classification and source-manifest shape only; they do not read live host process state, send process signals, mutate host state, authorize cleanup, use external model services, or certify launch-scope decision. A diagram view and an atlas entry are generated for this module from the same source row.

Scope boundary

Scope limit

This module can claim that synthetic process-table fixtures, owner-status policy rows, digest-only command rows, boundary-failure cases, source manifest evidence, and metadata-only result records validate a public tool-server pressure classifier. It cannot claim live host inspection, process signaling, safe cleanup authority, host-state mutation, provider authority, launch-scope decision, private account or session export, or whole-system correctness.

Source and projection details
Governing Lattice Relation

The generated row binds this module to mechanism mechanism.tool_server_pressure_inventory.validates_public_tool_server_pressure_inventory, concept concept.import_projection_and_drift_control_bundle, principles P-2, P-4, P-6, and P-9, axioms AX-3, AX-5, AX-7, and AX-8, and the runtime code locus src/microcosm_core/organs/tool_server_pressure_inventory.py. Those edges make the module a Microcosm import-and-validation proof: source-open digest evidence is allowed, while private host state, process control, provider authority, launch-scope decision, and whole-system correctness stay outside the claim.

The dependency edges to mission_transaction_work_spine, provider_context_recipe_budget, and world_model_projection_drift_control_room define the reader route. This module can explain how a helper-pressure row becomes a metadata-only result record and an owner-launch request, but it must borrow mission-landing, provider-budget, and projection-drift boundaries from those sibling modules before any broader operational or launch claim is made.

Compliance Pipeline BundleConfirms six copied compliance source files carry their functions; runs one helper on sample text.3/5

Does This component imports the compliance adapter registry, compliance coverage and baseline scanners, Microcosm compliance adapter, bounded compliance-ledger builder, and observe pipeline stages as public runnable system. Running it shows how registered compliance adapters, bounded no-write checks, baseline companion scans, digest normalization, observe-plan helper selection, and dispatch/process boundaries fit together without refreshing the live ledger or dispatching providers.

Scope limit validates declared public Set 8 compliance pipeline bundle contract only; no full compliance-ledger freshness, external model access, model dispatch, source-file changes, source note mutation, launch, public sharing, non-public data equivalence, or whole-system correctness

Run
microcosm batch8-compliance-pipeline-capsule validate-bundle --input examples/batch8_compliance_pipeline_capsule/exported_batch8_compliance_pipeline_capsule_bundle --out /tmp/microcosm-batch8-compliance-pipeline-capsule

EvidenceComputed projectionevidence 3/5Source-faithful refactor

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 8 Compliance Pipeline Bundle

batch8_compliance_pipeline_capsule copies two source subsystems into Microcosm as source bodies and then exercises them. The first is the compliance scanner registry and its bounded ledger builder. The second is the six-stage observe pipeline that turns a source note into a synthesis seed. The component runs six engines over the copied bodies and writes metadata-only result records.

Purpose

Most of the bundle components in this set are shape linters: they grep the copied source for expected tokens and pass when the tokens are present. This one goes further. Four of its six engines run the copied bodies on synthetic inputs, importing the pipeline and scanner helpers directly or driving the ledger builder as a subprocess, so the result record records observed behaviour rather than mere presence. The question it answers is narrow and testable: when these two subsystems are imported as copied bodies, do they still behave as their source contracts say, without touching the live ledger or dispatching any work?

The behaviour worth singling out is digest preservation. The pipeline compresses a long source note down to a short digest before deciding what to inspect next. If that compression silently drops an instruction, the agent downstream loses it. The component feeds the real digest_raw_seed an eighty-line block of low-signal text with one directive line buried inside, then checks the directive survives the compression. The matching negative case removes the directive marker from the copied source and confirms the directive is then lost. That pairing is what the page is really about: a compression step that is asserted to keep the one line that matters, with a test that fails when it does not.

The standing limit is just as deliberate. The bounded compliance check runs the ledger builder in --check --report mode, which reads and reports but never writes the ledger. The pipeline engines stop before any bridge or external model access. The bundle is evidence that the imported mechanics work on a sample, not a claim that the full compliance ledger is fresh or that every branch is covered.

Role

This module imports the source compliance scanner registry, the bounded compliance ledger builder, and the observe-loop pipeline stages into Microcosm as copied source bodies with a runnable component.

Imported system

  • system/lib/compliance/__init__.py
  • system/lib/compliance/compliance_coverage_adapter.py
  • system/lib/compliance/standard_baseline_adapter.py
  • system/lib/compliance/microcosm_adapter.py
  • tools/meta/factory/build_compliance_ledger.py
  • system/lib/pipeline/stage_extract.py
  • system/lib/pipeline/stage_select.py
  • system/lib/pipeline/stage_emit.py
  • system/lib/pipeline/stage_compile.py
  • system/lib/pipeline/stage_execute.py
  • system/lib/pipeline/stage_process.py

What the engines check

The component runs six engines and passes only if all six pass and every required source body is present.

  • compliance_registry_runtime_witness confirms the copied registry exposes the adapter table, the domain and baseline standard-id sets, and a scan_all entry point, that the coverage adapter carries its self-audit fields, and that the ledger builder carries its bounded-check command. When the live registry is importable it also reads the adapter, domain, and baseline counts as a shape witness, never as a freshness claim.
  • compliance_coverage_bounded_check runs the ledger builder with --check --report for two named standards. The pass condition is strict: the check reports ok, wrote_ledger is false, there are no error findings, and a next-step ratchet command is present. The point is a check that reads and reports without writing. Stale ledger rows that were not selected stay outside the claim.
  • baseline_companion_scanner_contract runs the baseline scanner on a sample standard and checks the returned row is honest about its own shallowness: it must be marked a baseline-inventory row with no domain-specific adapter, so a bare file-exists check can never read as a real compliance pass.
  • pipeline_digest_and_shard_normalization exercises three pure helpers from the extract stage. It checks the buried directive survives digest compression, that an unknown shard status is normalised to pending while the original value is preserved as a variant, and that diverse-shard selection caps how many shards one group can contribute.
  • pipeline_observe_compile_helpers runs the compile-stage helpers on a small fixture and checks they pull the right known-file mentions from free text, order follow-up files, and lift probe questions from a plan while skipping synthesis and summary roles.
  • pipeline_dispatch_process_boundary_contract confirms the execute and process stages keep the dispatch boundary explicit. It checks the copied bodies carry the observe_dispatch_skipped and observe_dispatch_started markers and the result record-selection helper, so the page can state plainly that bridge dispatch stays disabled.

Each engine carries its own scope limit in the result record. The six negative cases each remove one load-bearing token from a copied body and confirm the matching engine then reports blocked, so a pass means the contract was actually exercised rather than skipped.

Shape

The authoritative source record is core/paper_module_capsules.json::paper_modules[60:paper_module.batch8_compliance_pipeline_capsule]. The generated JSON instance is paper_modules/batch8_compliance_pipeline_capsule.json, whose source_refs mark that source record as the source of record and this Markdown as legacy_markdown_projection_not_source_authority.

Copied source bundle11 source bodiesbody_in_receipt: falseCopied source bundle 11 source bodies body_in_receipt: falseRegistry runtime witnessadapter table, scan_all,coverage self-auditRegistry runtime witness adapter table, scan_all, coverage self-auditBounded ledger checkcheck --reportreports ok, wrote_ledger:falseBounded ledger check check --report reports ok, wrote_ledger: falseBaseline scanner contractrow admits no domain adapterBaseline scanner contract row admits no domain adapterDigest and shard helpersburied directive survives;status normalised, variantkeptDigest and shard helpers buried directive survives; status normalised, variant keptCompile helpersfile mentions, follow-ups,probe questionsCompile helpers file mentions, follow-ups, probe questionsDispatch and process boundaryDispatch and process boundary6 negative casesremove one token per body;matching engine reportsblocked6 negative cases remove one token per body; matching engine reports blockedmetadata-only result recordsresult, board, validationmetadata-only result records result, board, validationScope limitno ledger refresh, noprovider/bridge dispatch,no source note or source-filechanges,no public sharing or launchScope limit no ledger refresh, no provider/bridge dispatch, no source note or source-file changes, no public sharing or launch

Source refs

Dispatch and process boundary
observe_dispatch_skipped
Diagram source
flowchart LR bundle["Copied source bundle 11 source bodies body_in_receipt: false"] subgraph Compliance["Compliance subsystem (3 engines)"] reg["Registry runtime witness adapter table, scan_all, coverage self-audit"] bounded["Bounded ledger check --check --report reports ok, wrote_ledger: false"] base["Baseline scanner contract row admits no domain adapter"] end subgraph Pipeline["Observe pipeline (3 engines)"] digest["Digest and shard helpers buried directive survives; status normalised, variant kept"] compile["Compile helpers file mentions, follow-ups, probe questions"] boundary["Dispatch and process boundary observe_dispatch_skipped"] end neg["6 negative cases remove one token per body; matching engine reports blocked"] result records["metadata-only result records result, board, validation"] ceiling["Scope limit no ledger refresh, no provider/bridge dispatch, no source note or source-file changes, no public sharing or launch"] bundle --> reg & bounded & base bundle --> digest & compile & boundary bundle --> neg reg & bounded & base --> result records digest & compile & boundary --> result records neg --> result records result records --> ceiling

The shape is a bounded compliance and observe-pipeline witness. The bundle names the component subject batch8_compliance_pipeline_capsule, the mechanism subject mechanism.batch8_compliance_pipeline_capsule.validates_public_compliance_pipeline_capsule, the resolved runtime/source locus src/microcosm_core/organs/batch8_compliance_pipeline_capsule.py, and the dependency/concept/law edges.

The local standard, when read as standards/std_microcosm_batch8_compliance_pipeline_capsule.json, keeps the same boundary: public engine ids, stable negative-case codes, source refs, digests, line counts, required anchors, bounded synthetic outcomes, scope limits, and scope boundaries are public-safe; keys, account secrets, browser state, account or browser state, model-output data bodies, browser UI live-access material, raw operator transcripts, private artifact bodies, live observe dispatch state, and source note bodies are forbidden public inputs. Its validator contract expects eleven copied source source modules and six negative cases, with the runtime command routed through microcosm_core.organs.batch8_compliance_pipeline_capsule.

The runtime locus writes and validates result records through run, run_batch8_compliance_pipeline_bundle, result_card, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING. The fixture path fixtures/first_wave/batch8_compliance_pipeline_capsule/input and the example bundle examples/batch8_compliance_pipeline_capsule/exported_batch8_compliance_pipeline_capsule_bundle carry the public exercise inputs, source-module manifest, and copied compliance/pipeline source bodies. The manifest currently records source_import_class: copied_non_secret_macro_body, module_count: 11, and body_in_receipt: false.

Validation evidence is the focused test tests/test_batch8_compliance_pipeline_capsule.py, the first-wave result record set under receipts/first_wave/batch8_compliance_pipeline_capsule/, the sign-off result record result records/sign-off/first_wave/batch8_compliance_pipeline_capsule_fixture_acceptance.json, the runtime-shell exported validation result record under receipts/runtime_shell/demo_project/organs/batch8_compliance_pipeline_capsule/, and the verifier cycle result record state/microcosm_verifier/receipts/20260604T0346Z_batch8_compliance_pipeline_capsule_cycle.json. Those result records can show pass status, exact-copy digest/anchor checks, stable negative cases, no-write behavior, secret/body exclusion scans, and body_in_receipt: false; they do not become full compliance-ledger freshness, pipeline dispatch, external model access, source-file changes, public sharing, launch, or whole-system correctness authority.

Reader Evidence Routing

  • Bundle route: read core/paper_module_capsules.json::paper_modules[60] before treating this Markdown as explanation.
  • Generated route: inspect paper_modules/batch8_compliance_pipeline_capsule.json for the current generated instance (relationship graph, diagram availability, and lattice position).
  • Bundle route: inspect examples/batch8_compliance_pipeline_capsule/exported_batch8_compliance_pipeline_capsule_bundle for copied compliance and pipeline source refs.
  • Runtime route: run tests/test_batch8_compliance_pipeline_capsule.py and the commands in ## Validation Result record Path for recomputation evidence.

Prior Art Grounding

This bundle borrows from control-assessment, policy-as-code, provenance, and observability practice. Useful anchors include:

  • NIST SP 800-53 Rev. 5, as a control-catalog pattern for naming, assessing, and reporting control posture.
  • Open Policy Agent, as a general-purpose policy engine pattern for evaluating structured inputs without embedding every rule in the caller.
  • SLSA provenance, for treating artifact origin and process metadata as explicit attestations.
  • OpenTelemetry, for instrumentation patterns around pipeline stages, traces, metrics, and logs.

Microcosm borrows the scanner, policy, provenance, and pipeline-stage shape, but the component only validates bounded no-write behavior and pure helper mechanics. It stays with bounded registry/helper checks; broader compliance refresh, provider work, source-record changes, and complete branch certification are outside this fixture.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the bounded compliance/pipeline exercise result record and sign-off JSON. The bundle command validates copied compliance and pipeline source modules, manifest digests, observed negative cases, result record body scans, and public/private boundary checks. The focused test confirms the no-write runtime boundary, bundle validation, omission posture, and scope limit.

This result record path is reader-verifiable evidence only. It does not refresh the full compliance ledger, dispatch bridge or provider work, change source records, certify every compliance branch, authorize public sharing, or approve launch.

Scope boundary

Scope limit

The bundle validates registry shape, bounded no-write compliance checks, baseline scanner truth accounting, and pure pipeline helper behavior. It does not refresh the full compliance ledger, dispatch bridge/provider work, change source records, or certify every compliance and pipeline branch.

Scope limit

This paper module can claim a compliance pipeline fixture with a diagram view generated for this module and a navigable atlas card. It can explain registry shape checks, bounded no-write compliance probes, scanner truth accounting, pure pipeline helper behavior, and metadata-only result records.

It cannot claim full compliance-ledger refresh, bridge or external model access, source-record changes, complete compliance branch certification, public sharing, launch, or whole-system correctness.

Live Source Drift BundleCompares four copied router and landing routines against current code to surface stale copies.5/5

Does This component imports exact current public source bodies for the option-surface router, mission-transaction landing preflight, work landing controller, and work log controller. Running it inspects stale-versus-current digest repair, source anchors, and compile-only validation without reading private runtime state or granting live mutation authority.

Scope limit verified source body import only, not route authority, work log or work log mutation authority, mission-transaction execution, git staging or commit approval, source-file changes, non-public runtime export, launch, or public sharing

Run
microcosm batch10-live-source-drift-capsule run --input fixtures/first_wave/batch10_live_source_drift_capsule/input --out receipts/first_wave/batch10_live_source_drift_capsule --acceptance-out receipts/acceptance/first_wave/batch10_live_source_drift_capsule_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 10 Live Source Drift Bundle

Purpose

batch10_live_source_drift_capsule answers one narrow question: can Microcosm prove that selected internal control source copies match the current source source bytes, still compile without import execution, and still carry the scope limit that prevents copied code from becoming route or mutation authority?

The component imports exact current Python source bodies for four source internal control files:

  • system/lib/standard_option_surface.py
  • system/lib/mission_transaction_landing_preflight.py
  • tools/meta/control/work_landing.py
  • tools/meta/factory/work_ledger.py

The bundle exists because the source source moved ahead of older public source-module records. The interesting part is that it keeps the old, wrong digest visible on purpose. Each digest row carries three fingerprints that must agree before a copy passes: the copied public body, the manifest target it claims to match, and the current source source. In the same row it keeps the stale recorded digest and asserts it differs from the current one, so the proof of freshness and the evidence of the earlier drift sit side by side.

That makes the component a drift sentinel rather than a one-off check. It is built to go red when the public copies fall behind the source source again, and a red result is the signal to refresh the copies through the exact-copy source lane, not a defect in the page. Two cheap checks back the freshness claim without running anything dangerous: the copied Python is compiled but never imported, so a malformed body is caught without executing source code, and a small set of named anchors is matched in each body so a copy that compiles but has quietly lost a command or contract surface is still flagged.

Shape

Probe manifeststale + current digestsProbe manifest stale + current digestsDigest refresh matrixcopied = target = current,stale differs from currentDigest refresh matrix copied = target = current, stale differs from currentCopied internal controlbodiesand source manifestCopied internal control bodies and source manifestScope limit gateimport is not route ormutation authorityScope limit gate import is not route or mutation authorityCompile gatepy_compile, no importCompile gate py_compile, no importAnchor matrixnamed command andcontract surfaces presentAnchor matrix named command and contract surfaces presentmetadata-only result recordand cardmetadata-only result record and card
Diagram source
flowchart TD A["Probe manifest stale + current digests"] --> C B["Copied internal control bodies and source manifest"] --> C C["Digest refresh matrix copied = target = current, stale differs from current"] --> F B --> D["Compile gate py_compile, no import"] B --> E["Anchor matrix named command and contract surfaces present"] D --> F E --> F F["Scope limit gate import is not route or mutation authority"] --> G["metadata-only result record and card"] C -. mismatch .-> H["Blocked: refresh copies via exact-copy source lane"]

Prior Art Grounding

The component borrows from reproducible-build and supply-chain provenance practice: declared source inputs are fingerprinted, generated or copied artifacts are checked against those fingerprints, and result records avoid shipping unnecessary private state. Useful anchors include:

  • Bazel hermeticity, especially the emphasis on source identity, declared inputs, and repeatable outputs.
  • SLSA provenance, which records how software artifacts relate to build inputs and supply-chain guarantees.

Microcosm applies that pattern to live source-copy drift: stale digest rows remain visible as regression fixtures, current public copies must match source digests byte-for-byte, and result records carry digest/anchor/negative-case evidence instead of private source bodies or runtime state.

Reader Evidence Routing

The copied bodies are real system, not result record-only metadata. The evidence route is still metadata-only at result record time: result records keep digest rows, required anchors, negative-case outcomes, compile status, and scope limit evidence.

The engine ids are:

  • live_source_drift_digest_refresh_matrix: compares stale recorded digests, current source digests, copied target digests, and target digest status.
  • copied_python_source_compile_gate: compiles each copied Python target without importing or executing it.
  • control_surface_anchor_matrix: checks that each copied body still exposes expected command, route, landing, claim, or read-result record anchors.
  • claim_ceiling_gate: verifies the copied-body import excludes live route decisions, work log mutation, work log mutation, mission execution, git staging, source-file changes, launch, public sharing, external model access, or non-public runtime export.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

This component is also a drift sentinel. The fixture command writes digest-refresh, compile-gate, anchor-matrix, and scope limit result records, and it is allowed to exit blocked when copied public source no longer matches current source source. That blocked result is evidence for the exact-copy/source-refresh owner lane, not a paper-module corpus defect. Re-entry after a blocked result is to refresh the copied public source bodies and manifest digests through the source-open exact-copy lane, then rerun the fixture, bundle, and focused test.

The bundle command validates current copied source digests, source manifests, compile-without-import checks, stale-digest negative cases, metadata-only cards, and scope limit fields when the exact-copy refresh is current. The focused pytest command is therefore a green-gate after refresh: if the sentinel is blocked, that test file is expected to fail on pass-status or exact-body equality and should be reported with the same exact-copy refresh residual. When current, the focused test covers stale digest replay, compile bypass, private runtime state export, and live mutation-authority claims.

This result record path is reader-verifiable evidence only. It does not provide route authority, Work or work log mutation authority, mission execution, git approval, source-file changes, launch, public sharing, external model access, or non-public runtime export.

Scope boundary

Scope limit

Fixture-bound source-digest, anchor, compile, and scope limit evidence only; no route authority, Work or work log mutation, mission execution, git approval, source-file changes, launch, public sharing, external model access, or non-public runtime export.

Scope limit

This module supports only the reader-verifiable claim that selected internal control source copies can be compared with current source digests, compiled without import execution, checked for required anchors, and guarded by stale-digest and scope limit negative cases when the exact-copy lane is current. A green result does not grant route authority, Work or work log mutation authority, mission execution, git approval, source-file changes, launch-scope decision, publishing-scope decision, external model access, non-public runtime export, or whole-system correctness.

Release Public Wording GateFlags affirmative open-source and deployment-posture wording while allowing safe boundary notes.5/5

Does This component imports the launch claim-language gate as public runnable system. Running it over small public sharing-manifest fixtures shows boundary-only warnings allowed while affirmative open-source and production-readiness wording is classified as active claim language and blocked by the assert-clear contract.

Scope limit This is lexical fixture evidence only; it is not launch-scope decision, not publishing-scope decision, not semantic NLP truth, not secret-scan coverage, and not whole-system correctness.

Run
microcosm batch12-release-claim-language-gate run-release-claim-language-gate-bundle --input examples/batch12_release_claim_language_gate/exported_batch12_release_claim_language_gate_bundle --out receipts/runtime_shell/demo_project/organs/batch12_release_claim_language_gate

EvidenceVerified source importevidence 5/5Copied source body

source intakeprovenancedrift-control

Source Design note · Source atlas

Paper module Set 12 launch claim-Language Gate

Purpose

Public copy drifts towards over-claiming. A page that started as "fixture-proven, not yet published" gets edited over months until someone writes launch, licensing, or maturity language without noticing that nothing changed underneath. This component answers one question: does a piece of public copy claim more than the result records behind it can support, and would the launch gate catch it if it did?

The mechanism it wraps is a deterministic regex scan, not a language model. The copied gate body reads a public sharing manifest, walks every claim-bearing file it lists, and matches each line against fixed families of risky launch, licensing, maturity, and private launch-control wording. What makes the scan more than a grep is the classification step. The same family of wording is read three ways depending on context: a bare affirmative launch or maturity claim becomes an active_claim_blocker; the same wording inside a forbidden-example block or near a negation marker becomes boundary_or_negative_context and is allowed; and a phrase that has neither an affirmative verb nor a clear negation marker is parked in a needs_review queue rather than waved through.

That last branch is the interesting design choice. The gate fails closed. An ambiguous claim does not pass quietly; it lands in a no-go review state, and main --assert-clear exits non-zero whenever any active blocker or unresolved review item remains. The scan never rewrites a file, never authorises launch, and treats marketing copy as just another claim surface with an evidence ledger rather than a looser register of speech.

This paper module is the public, fixture-bound check that the wrapped gate behaves as described over the shipped fixtures. The component runs the copied gate over a safe fixture and an active fixture, then checks that boundary-context language clears, that bare launch language blocks, and that the assert-clear exit code is 2 when blockers remain. It is a check on the checker, held behind digest, result record, and scope limit boundaries.

Mechanisms

  • _classify_hit
  • build_gate
  • main --assert-clear

Shape

  • Runtime locus: src/microcosm_core/organs/batch12_release_claim_language_gate.py, especially _blocked_exercise, _write_gate_fixture, _run_main_assert_clear, _evaluate, run, run_batch12_release_claim_language_gate_bundle, result_card, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
  • Source source import: tools/meta/dissemination/release_claim_language_gate.py, copied into the exported bundle as one source body with digest equality and anchors RISKY_PHRASES, NEGATIVE_CONTEXT_MARKERS, def _classify_hit, and def build_gate.
  • Positive fixture shape: one safe boundary-context claim surface passes because limiting language keeps does_not_authorize_release: true.
  • Active fixture shape: two active claim blockers are reported for bare unsupported launch-language surfaces, while boundary/negative context remains counted separately.
  • Negative floor: affirmative_open_source_production_ready_blocks and assert_clear_returns_exit_2, with stable error codes BATCH12_RELEASE_CLAIM_ACTIVE_BLOCKER and BATCH12_RELEASE_CLAIM_ASSERT_CLEAR_EXIT_2.
  • Public result record posture: real-system bundle, source manifest pass, secret-exclusion scan pass, result record body scan pass, and a false body_in_receipt flag.
allowedblockedambiguoussafe and active public copysurfacessafe and active public copy surfacesexact copied source gate bodyexact copied source gate bodyload source moduledigest equality and requiredanchorsload source module digest equality and required anchorssafe fixture root_write_gate_fixture(active=false)safe fixture root _write_gate_fixture(active=false)active fixture root_write_gate_fixture(active=true)active fixture root _write_gate_fixture(active=true)build_gatescan manifest files forRISKY_PHRASESbuild_gate scan manifest files for RISKY_PHRASES_classify_hitread each phrase in context_classify_hit read each phrase in contextnegation marker or forbiddenexample=> allowednegation marker or forbidden example => allowedactive_claim_blockeraffirmative line, nodowngrade=> statusactive_claim_blockedactive_claim_blocker affirmative line, no downgrade => status active_claim_blockedneeds_reviewno clear marker either way=> fail-closed no-go queueneeds_review no clear marker either way => fail-closed no-go queuemain --assert-clearexit 2 when notpublic_copy_cleanmain --assert-clear exit 2 when not public_copy_cleancomputed negative casesaffirmative claim blocksassert-clear exits 2private internal control leakblockscomputed negative cases affirmative claim blocks assert-clear exits 2 private internal control leak blocksmetadata-only result recordsresult, board, validation,sign-offmetadata-only result records result, board, validation, sign-offscope limitno launch, public sharing,NLP truth,secret completeness, orwhole-system claimscope limit no launch, public sharing, NLP truth, secret completeness, or whole-system claim

Source refs

safe and active public copy surfaces
release_gate_fixture.json
exact copied source gate body
source_module_manifest.json
negation marker or forbidden example => allowed
boundary_or_negative_context
Diagram source
flowchart TD Fixture["release_gate_fixture.json safe and active public copy surfaces"] Manifest["source_module_manifest.json exact copied source gate body"] Loader["load source module digest equality and required anchors"] SafeRoot["safe fixture root _write_gate_fixture(active=false)"] ActiveRoot["active fixture root _write_gate_fixture(active=true)"] Scan["build_gate scan manifest files for RISKY_PHRASES"] Classify{"_classify_hit read each phrase in context"} Boundary["boundary_or_negative_context negation marker or forbidden example => allowed"] Active["active_claim_blocker affirmative line, no downgrade => status active_claim_blocked"] Review["needs_review no clear marker either way => fail-closed no-go queue"] Assert["main --assert-clear exit 2 when not public_copy_clean"] Negatives["computed negative cases affirmative claim blocks assert-clear exits 2 private internal control leak blocks"] Result records["metadata-only result records result, board, validation, sign-off"] Ceiling["scope limit no launch, public sharing, NLP truth, secret completeness, or whole-system claim"] Fixture --> SafeRoot Fixture --> ActiveRoot Manifest --> Loader Loader --> Scan SafeRoot --> Scan ActiveRoot --> Scan Scan --> Classify Classify -->|allowed| Boundary Classify -->|blocked| Active Classify -->|ambiguous| Review Active --> Assert Review --> Assert Boundary --> Negatives Active --> Negatives Assert --> Negatives Negatives --> Result records Result records --> Ceiling

This component is the public copy gate for result record-backed evidence accounting. It does not ask whether a phrase sounds impressive; it asks whether the phrase is within the evidence class and scope limit that result records can support.

Evidence strength is typed ordinal data, not vibes: ranks, real-system flags, and fail-closed defaults constrain how far public language may climb. Independent validators reconcile each component's declared class against result record-backed facts so over-claiming is blocked and stale under-claiming can be surfaced for review. Result record scanners may downgrade when bodies or account secret-equivalent payloads leak; they cannot upgrade merely because a narrative is strong.

The boundary-context classifier is allowed to pass negated or limiting language such as "not a hosted product" while blocking bare maturity claims when no launch-scope decision exists. Marketing copy is therefore treated as another claim surface with an accounting ledger, not as a looser mode of speech.

Reader Evidence Routing

  • Start with paper_modules/batch12_release_claim_language_gate.json for source authority, then read this Markdown as the projection.
  • Open standards/std_microcosm_batch12_release_claim_language_gate.json for the required witnesses, negative floor, denied authority, result record contract, validator command, and runtime bundle command.
  • Open core/fixture_manifests/batch12_release_claim_language_gate.fixture_manifest.json for source-open body import count, source manifest refs, and durable result record refs.
  • Open examples/batch12_release_claim_language_gate/exported_batch12_release_claim_language_gate_bundle/source_module_manifest.json before inspecting copied source modules; result records carry refs, hashes, counts, verdicts, and omissions rather than copied body text.
  • Open tests/test_batch12_release_claim_language_gate.py for assertions on pass result records, digest mismatch rejection, fixture path safety, duplicate-key rejection, duplicate fixture names, exact source body import, and card body omission.
  • Run fixture and bundle routes from microcosm-substrate/. The CLI supports --card, but it does not expose a --json flag.
  • Use scripts/build_doctrine_projection.py --check-paper-module-corpus to verify this Markdown projection still satisfies the shared paper-module coverage contract.

Prior Art Grounding

The component borrows a narrow pattern from advertising-substantiation and regulated-communication practice: public claims should stay within evidence actually held, and stronger language requires stronger support. This is prior art for the proof-consumer shape only. The module does not implement legal compliance, include launch operations, or decide whether public copy is fit to publish.

External source result record, checked 2026-06-05:

SourceExact URLWhy it matters hereLocal boundary
FTC advertising substantiation policyhttps://www.ftc.gov/legal-library/browse/ftc-policy-statement-regarding-advertising-substantiationObjective claims need a reasonable basis before dissemination, and express or implied support claims must match the support actually held.Microcosm maps this to result record-backed evidence classes and fail-closed launch-language blockers, not to legal sufficiency.
FINRA Rule 2210https://www.finra.org/rules-guidance/rulebooks/finra-rules/2210Public communications must be fair and balanced, give a sound factual basis, and avoid false, exaggerated, unwarranted, promissory, or misleading claims.The module only uses this as a prior-art analogue for keeping benefits, risks, and qualifications in the same local claim context.
SEC investment adviser marketing guidehttps://www.sec.gov/resources-small-businesses/small-business-compliance-guides/investment-adviser-marketingThe marketing rule guide summarizes general prohibitions on untrue or misleading material statements, unsupported material facts, unfair treatment of risks, and constrained performance or endorsement claims.The module's investment-advice scope boundary stays negative: a green result record is not adviser marketing compliance or investment-related actions.
SEC marketing compliance FAQhttps://www.sec.gov/rules-regulations/staff-guidance/division-investment-management-frequently-asked-questions/marketing-compliance-frequently-asked-questionsCurrent staff FAQ entries still route extracted performance and characteristics through Rule 206(4)-1 general prohibitions.This is a currency/source-link result record for scope limit posture, not a new Microcosm capability or finance claim.

Microcosm adapts the substantiation pattern to launch and evidence language. Result record-backed classes, ordinal evidence strength, real-system flags, boundary-context exceptions, and fail-closed defaults constrain what public copy may say. The gate blocks unsupported elevation without turning itself into public launch permission, market-level conclusions, investment-related actions, legal review, or whole-system correctness.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

The fixture command writes the claim-language gate result record and sign-off JSON. The bundle command validates copied source system, source manifest digests, active-blocker and boundary-context classification, negative cases, metadata-only result records, and scope limit fields. The focused test checks pass result records, digest mismatch rejection, fixture path safety, duplicate-key and duplicate-fixture rejection, exact source body import, and card body omission.

This result record path is reader-verifiable evidence only. It excludes launch, public sharing, external model access, semantic NLP truth, complete secret detection, private-system equivalence, portability proof, market-level conclusions, investment-related actions, source-file changes, or whole-system correctness.

Scope boundary

Scope limit

This module may claim fixture-bound evidence that the Set 12 public launch-language gate can classify result record-backed public copy against an scope limit. Positive claims stay within typed claim hits, evidence strength ranks, real-system flags, boundary-context classification, fail-closed defaults, active blockers, negative cases, copied source source-module refs and bodies, source-manifest pass status, metadata-only result record scan status, secret-exclusion scan status, and validation result records.

This module may not claim public launch permission, public sharing posture, hosted product status, external model dispatch authority, semantic NLP truth, complete secret detection, private-system equivalence, portability proof, market-level conclusions, investment-related actions, source editing authority, deployment maturity, formal-result correctness beyond the listed witnesses, or whole-system correctness.

Limitations

The gate is a lexical and fixture-driven proof consumer, not a launch oracle. It exercises copied release_claim_language_gate.py behavior over bounded public markdown fixtures, so it can detect active over-claiming phrases, boundary-context exceptions, digest drift, fixture path hazards, and stable negative-case regressions. It cannot prove that public copy is semantically complete, market-accurate, legally sufficient, safe for public sharing, or free of all secrets.

The exact-copy evidence floor is intentionally narrow. The source-module manifest proves one copied source body, required anchors, digest equality, and metadata-only result record posture; it excludes refreshing the source module, accepting private-system equivalence, mutating launch policy, or publishing copied bodies into result records. Any change to the copied source body, fixture corpus, negative cases, or scope limit belongs in the source, standard, and bundle lanes before this Markdown can expand its claim.

The focused test proves the runtime contract only for the shipped fixtures and bundle shape. Passing test_batch12_release_claim_language_gate.py means the public proof consumer still rejects digest mismatch, unsafe fixture names, duplicate fixture inputs, unstable negative labels, and result record body leakage in that bundle. It does not establish launch-scope decision for other documents, providers, frontends, markets, or future site projections.

Scope limit

This is fixture-bound launch claim-language gate evidence. Its scope stops before public launch permission, public sharing posture, external model dispatch, semantic NLP truth, complete secret detection, private-system equivalence, portability proof, market-level conclusions, investment-related actions, source editing authority, deployment maturity, formal-result correctness beyond the listed witnesses, or whole-system correctness.

Source and projection details
Governing Lattice Relation

This paper module sits under concept.import_projection_and_drift_control_bundle: a copied source mechanism is imported into the public system, exercised through public fixtures, and held behind digest, result record, and projection boundaries. The bundle therefore does not treat Markdown prose as authority; it treats the JSON bundle, generated instance, mechanism row, standard, source manifest, and result records as the lattice that the prose must explain.

The governing principles P-2, P-6, P-13, and P-15 map onto the component's operational checks. Typed evidence ranks and real-system flags keep public claims below the result record-backed ceiling; public/private boundary rules keep source bodies and private launch state out of result records; negative fixtures and fail-closed defaults prevent optimistic marketing language from bypassing the validator; and generated Mermaid/Atlas rows remain projections of bundle edges, not independent launch-scope decision.

The axiom boundary is the hard scope limit. AX-5, AX-7, AX-11, and AX-12 require the gate to preserve source truth, avoid projection drift, route public copy through explicit authority checks, and block unsupported launch language. That is why the mechanism couples _write_gate_fixture, _evaluate, run_batch12_release_claim_language_gate_bundle, exact-copy source manifest validation, and metadata-only result records instead of asking a prose reviewer to decide whether a claim sounds acceptable.

The sibling dependencies define how to read the result. public_reveal_walkthrough supplies the public-copy setting, proof_derived_governed_mutation_authorization supplies the proof-before-mutation posture, and batch8_validator_checker_capsule supplies the validator/checker pattern. This module is the claim-language checker within that lattice, not the public launch decision itself.

The generated JSON row currently contributes 15 relationship edges: two paper_module.explains.organ_or_mechanism edges, one paper_module.governed_by.concept edge, four paper_module.governed_by.principle edges, four paper_module.abides_by.axiom edges, three sibling paper_module.depends_on.paper_module edges, and one resolved paper_module.cites.code_locus edge.

At this HEAD the generated instance reports zero unresolved selective relations. If future bundle edits introduce residuals, this Markdown may name them but must not invent concept ids or promote candidate doctrine.

Work & continuity (4)

Mission Transaction Work SpineRuns the real work-ledger engine on a sanitised snapshot to re-derive each change's verdict.4/5Runs real tools

Does Replays a fixed set of pre-recorded work-landing situations against a toy repository and shows when a change would be allowed to "land" versus blocked: two claims competing on the same file, a claim built on a stale parent commit, a claim missing its owned path, a clean preflight check that wrongly says the work is already finished, and which commit lane (a narrow scoped commit vs a broad checkpoint) a dirty working tree is permitted to use. Its exported bundle also anchors the public work log seed-speed source imports for session heartbeat, mutation-check, active-claim snapshot, and path-collision handling. The resulting result records show exactly why each situation was permitted or refused, instead of an opaque "it's done" message.

Scope limit It validates work-landing, claim, checkpoint-lane, and dependency metadata projections over fixed fixtures only; it does not mutate live ledgers or git, certify real completion, authorize broad staging without operator intent, or prove any change is actually correct or complete.

Run
microcosm mission-transaction-work-spine run --input fixtures/first_wave/mission_transaction_work_spine/input --out receipts/first_wave/mission_transaction_work_spine

Paper module Mission Transaction Work Spine

Purpose

This component exists because the riskiest moment in agentic code work is the one that feels safest: the agent runs a few checks, sees no errors, and concludes that its work is finished and committed. Those are different facts. A clean preflight describes the state of the checks. It says nothing about whether a competing claim already owns the same path, whether the branch has moved under the agent, or whether the commit ever actually landed. The single question this module answers is narrow and concrete: what evidence has to hold before a unit of work is allowed to land, and is that evidence checkable rather than asserted?

The interesting design choice is that the module refuses to trust its own declared verdicts. Most fixtures pass when their inputs carry the right labels. It then perturbs the input one field at a time: a same-path claim conflict, a stale expected-parent hash, a checkpoint lane mutated into an unauthorised broad commit. A genuine check has to break under each of those and stay clear under harmless ones, such as a claim on an unrelated path. That asymmetry, not the bare pass, is the claim.

The result is deliberately bounded. A pass means the public fixture, the exported source bodies, and the negative cases together preserve the work-landing contract and that its discriminating tests still discriminate. It does not touch the live work log, the live work log, or Git, and it grants no authority to commit, checkpoint broadly, back up, or launch.

Abstract

mission_transaction_work_spine is the public Microcosm paper module for work-landing discipline. Its telos is to make the boundary between "a check looked clean" and "work is actually allowed to land" inspectable as source, fixture, result record, and test evidence rather than as chat confidence or status arithmetic.

The component validates a fixed public mission-transaction bundle: Work item rows, work log path claims, dependency unlocks, transaction plans, result record drains, completion projections, scoped mutation policy, checkpoint-lane decisions, copied internal control source modules, and metadata-only result records.

The result is intentionally narrow. A pass means the public fixture and exported bundle preserve the mission-transaction contract, its source-open body floor, and its negative cases. It does not mutate work log, work log, or Git; it does not certify arbitrary live completion; and it does not grant broad checkpoint, backup, launch, public sharing, provider, or whole-system authority.

Problem

Agentic code work fails most often at transaction boundaries, not at isolated syntax checks. Common false positives include:

  • treating a clean preflight as a landed commit;
  • ignoring a competing work log claim on the same path;
  • accepting a claim whose expected parent no longer matches the repository;
  • marking a downstream Work item ready without hard-dependency evidence;
  • reading a dirty tree as a blocker for scoped commits while allowing broad staging without explicit operator authorization;
  • writing result records that smuggle private ledger or provider bodies into a public artifact.

The module turns those failures into a deterministic replay. A cold reader can inspect which public rows are projections, which copied source bodies implement the checks, which negative cases must be observed, and which authority claims remain forbidden even when every validator passes.

Shape

JSON bundleJSON bundleFirst-wave fixtureWork items, claims, deps,lanes, result recordsFirst-wave fixture Work items, claims, deps, lanes, result recordsExported bundlework log, work log,checkpoint, scoped commit,preflight source bodiesExported bundle work log, work log, checkpoint, scoped commit, preflight source bodiesReal work log sessionsnapshotactive claims, heartbeat,source hashReal work log session snapshot active claims, heartbeat, source hashComponent runtimeComponent runtimeR3 replay verdictruntime-derived, notlabel-derivedR3 replay verdict runtime-derived, not label-derivedmetadata-only result recordsrefs, hashes, counts, limitsmetadata-only result records refs, hashes, counts, limitsScope limitno live ledger, git, launch,or provider authorityScope limit no live ledger, git, launch, or provider authority

Source refs

JSON bundle
paper_module.mission_transaction_work_spine
Component runtime
mission_transaction_work_spine.py
Diagram source
flowchart TD Bundle["JSON bundle paper_module.mission_transaction_work_spine"] Fixture["First-wave fixture Work items, claims, deps, lanes, result records"] Bundle["Exported bundle work log, work log, checkpoint, scoped commit, preflight source bodies"] Snapshot["Real work log session snapshot active claims, heartbeat, source hash"] Runtime["Component runtime mission_transaction_work_spine.py"] R3["R3 replay verdict runtime-derived, not label-derived"] Result records["metadata-only result records refs, hashes, counts, limits"] Ceiling["Scope limit no live ledger, git, launch, or provider authority"] Bundle --> Runtime Fixture --> Runtime Bundle --> Runtime Snapshot --> Runtime Runtime --> R3 Runtime --> Result records R3 --> Ceiling Result records --> Ceiling

This Mermaid diagram is the reader flow. The generated lattice Mermaid remains available_from_capsule_edges, and the generated Atlas card remains linked_from_capsule_edges; both are derived from bundle and doctrine-lattice rows, not from this prose.

Technical Mechanism

The component exposes two validator paths.

The first-wave fixture command validates the local replay fixture and writes the canonical result record set:

PYTHONPATH=src python3 -m microcosm_core.organs.mission_transaction_work_spine run \
  --input fixtures/first_wave/mission_transaction_work_spine/input \
  --out receipts/first_wave/mission_transaction_work_spine

That path loads public fixture rows, validates dependency unlocks, claim preflight, scoped result record authority, private-marker rejection, preflight overclaim rejection, checkpoint lane policy, and the real active-claims snapshot.

The exported-bundle command validates source-open import and bundle replay:

PYTHONPATH=src python3 -m microcosm_core.organs.mission_transaction_work_spine \
  validate-mission-transaction-bundle \
  --input examples/mission_transaction_work_spine/exported_mission_transaction_bundle \
  --out receipts/first_wave/mission_transaction_work_spine

That path checks copied work log, work log, checkpoint, scoped-commit, and mission-preflight source modules by manifest, digest, anchor strings, secret-exclusion scan, and body_in_receipt: false. It also requires the real work log snapshot in the mission bundle. Commit da97bc6394 (Require real work log snapshot in mission bundle) landed the snapshot as a required bundle input; later source/test commits recomputed the snapshot verdict and bound the R3 claim to runtime evidence.

Prior Art Grounding

This component is the mission-transaction member of Microcosm's local work-landing family. Its closest sibling is durable_agent_work_landing_replay, which checks recorded landing rows, validation-before-commit ordering, HEAD movement, blocker capture, and work log completion evidence without performing live Git work. mission_transaction_work_spine narrows that pattern to the transaction preflight and work log seed-speed membrane: same-path claim conflicts, expected-parent mismatches, checkpoint-lane selection, dependency unlocks, result record drains, and session finalization posture.

It also supplies a source-import anchor used by adjacent public components such as concurrency_mission_control and macro_projection_import_protocol. Those links are structural evidence routes, not runtime invocation or launch-scope decision. The prior-art claim is therefore local and source-bounded: this paper module inherits the work-landing accounting shape, then tests the particular mission-transaction and work log session-snapshot boundary.

Data And Evidence Contract

The public evidence bundle is composed of source refs, hashes, rows, and result records. The source bodies live only in the exported bundle's source_modules/ tree; result records carry refs, counts, hashes, verdicts, and ceilings, not private or live internal control bodies.

  • JSON bundle: core/paper_module_capsules.json::paper_modules[20:paper_module.mission_transaction_work_spine]
  • Runtime locus: src/microcosm_core/organs/mission_transaction_work_spine.py
  • Fixture input: fixtures/first_wave/mission_transaction_work_spine/input
  • Exported bundle: examples/mission_transaction_work_spine/exported_mission_transaction_bundle
  • Real snapshot: examples/mission_transaction_work_spine/exported_mission_transaction_bundle/real_work_ledger_active_claims_snapshot.json
  • Fixture manifest: core/fixture_manifests/mission_transaction_work_spine.fixture_manifest.json
  • Mechanism row: mechanism.mission_transaction_work_spine.validates_public_mission_transaction_bundle
  • Standard: standards/std_microcosm_mission_transaction_work_spine.json

The result record floor includes preflight, dependency blocked, work landing attempt, claim preflight, scoped mutation, checkpoint lane, completion projection, dependency unlock scheduler, reconcile plan, and exported-bundle validation result records. The fields must preserve schema and component ids, validator id, command, status, observed and missing negative cases, error codes, scope boundary, secret-exclusion status, public work-landing status, body-import status, body_in_receipt: false, scope limit, and result record paths.

Discriminating Tests

The positive claim is not "the fixture passes." The positive claim is that the fixture accepts real-good evidence and rejects targeted perturbations.

  • Real-good case: the real active-claims snapshot passes with R3 public_safe_real_work_ledger_session_snapshot_replay, a state/work_ledger/active_claims_snapshot.json source ref, a matching source hash, a bound session heartbeat, and five source-session claims.
  • Same-path perturbation: adding a competing claim on the requested path blocks preflight through work_ledger_runtime.active_claim_collisions_for_paths and emits SAME_PATH_CLAIM_CONFLICT.
  • Parent perturbation: changing the expected parent for a real claim blocks with EXPECTED_PARENT_MISMATCH; changing it back to the current parent clears.
  • Disjoint perturbation: adding a claim on a disjoint path does not create a collision for the requested path, so the public preflight remains pass.
  • Landing-row perturbation: mutating the checkpoint lane into an unauthorized broad checkpoint blocks with the checkpoint-lane violation floor.
  • Private-body perturbation: a fixture row that carries live private work log body material is rejected, while source bodies copied into the public bundle remain outside result records.
  • Overclaim perturbation: a clean preflight cannot claim that work is already landed.
  • Dependency perturbations: dangling dependency refs and ready rows with incomplete hard dependencies remain blockers.

Focused regression coverage lives in tests/test_mission_transaction_work_spine.py. The R3 tests assert that the verdict is re-derived from runtime evidence, expected labels are not sufficient, source hashes are bound, mutated or stale snapshots are rejected, clear perturbations move the verdict, and body_in_receipt is false.

Reader Evidence Routing

Read this module as an evidence-accounting paper, not as a live controller.

  1. Open the mechanism row and standard to see the required bundle fields: work items, claim table, dependency graph, transaction plan, result record drain, completion projection, scoped mutation policy, checkpoint lane policy, copied source imports, body import verification, scope boundary, and scope limit.
  2. Inspect the real active-claims snapshot to see the source ref, source hash, snapshot time, source session id, owned paths, checkpoint lane case, runtime session, and metadata-only posture.
  3. Read the focused tests to verify R3 is runtime-derived: same-path conflicts, stale parents, landing-row violations, disjoint paths, and equal-parent mutations are all discriminated.
  4. Treat generated JSON, generated Mermaid, Atlas, public-site docs, and result records as projections or validator outputs.

Limits And Non-Claims

The module's useful claim is compact: public fixture rows, copied control source bodies, a real work log session snapshot replay, discriminating negative cases, metadata-only result records, and focused tests preserve the mission-transaction work spine at R3.

It may not claim live work log authority, live work log authority, live Git mutation, broad checkpoint authorization, private backup execution, current repository completion, source-file changes, provider behavior, browser UI state, launch-scope decision, publishing-scope decision, hosted-product readiness, or whole-system correctness.

Validation Result record Path

For this Markdown-only paper-module update, use non-mutating checks from repo root:

./repo-pytest tests/test_mission_transaction_work_spine.py \
  -q \
  --basetemp=/tmp/microcosm_mission_transaction_work_spine_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

For a source, bundle, or projection landing, run the owner lane from microcosm-substrate:

PYTHONPATH=src python3 scripts/build_doctrine_projection.py --check-paper-module-corpus
PYTHONPATH=src python3 scripts/build_doctrine_projection.py --check

Do not run --write from this Markdown-only lane.

Scope boundary

Scope limit

This module may claim that public fixture rows, copied control source bodies, a real work log session snapshot replay, discriminating negative cases, metadata-only result records, and focused tests preserve the mission-transaction work spine at R3. That is a replay and evidence-shape claim.

This module may not claim live work log authority, live work log authority, live Git mutation, broad checkpoint authorization, private backup execution, current repository completion, source-file changes, provider behavior, browser UI state, launch-scope decision, publishing-scope decision, hosted-product readiness, or whole-system correctness.

Durable Agent Work Landing ReplayAudits recorded work-claims so each cites files, validates before commit, and proves HEAD moved.5/5

Does This checks recorded examples of an agent finishing a piece of work the careful way: each example must name the exact files it claims to have touched, show that validation was recorded before any commit was attempted, and only label itself "committed" if the example also records the repository's HEAD moving. It also checks that blockers and ledger completion were captured. The check shows whether each recorded work-claim carries the required evidence and ordering, rather than being an unbacked chat boast. It judges the recorded claims against the contract; it does not run Git or prove that any commit truly landed in a real repository.

Scope limit It validates only the declared public work-landing contract over recorded rows. It is evidence for fixture-local completion mechanics, not for live Git side effects, unrelated-path staging, non-public body export, service operation, or distribution clearance.

Run
microcosm durable-agent-work-landing-replay run-work-landing-bundle

EvidenceContract validatorevidence 5/5Import validation

workflow-engineeringcontinuity

Source Design note · Source atlas

Paper module Durable Agent Work-Landing Replay

Durable agent work-landing replay is the public work-spine component for showing how Microcosm treats agent work as a transaction instead of a chat claim. It binds owned-path claims, owner-native validation, scoped commit attempts, protected Git-metadata blockers, work log capture, work log finalizers, and seed reentry into a source-available replay contract.

The component is useful to a cold agent because it turns a landing claim into an evidence checklist: a row is not "landed" unless claimed paths, validation refs, commit-attempt refs, HEAD-before/after evidence, blocker capture, and ledger completion all line up in the recorded replay. It validates the replay contract and the negative fixtures. It does not perform the live landing itself.

Purpose

This component exists because an agent saying "I committed the fix" is cheap, and the claim is the part that tends to be wrong. The single question it answers is narrow: given a recorded landing attempt, does the evidence actually support the words used to describe it?

The approach worth noticing is that two ordinary-sounding rules are made into rejections rather than suggestions. A row that uses landed-commit language is rejected unless the recorded Git HEAD moved between before and after, so "I landed it" cannot stand on a HEAD that never advanced. A row on the commit path is rejected unless validation is recorded as preceding the commit attempt, so "it passed" cannot be back-filled after the fact. Those two checks, plus blocker capture for metadata-blocked rows and work log completion for every row, are what separate a transaction from a chat claim.

The replay is also source-backed rather than described from memory. The mechanics it checks rows against are not paraphrased; the actual source internal control files (work landing, mission preflight, scoped commit, the work log) are copied into the bundle by digest, so a reader can see which code the model was tested against. The component reads that evidence and rejects overclaims; it never runs Git, stages anything, or authorises a launch.

Shape

Public replay fixtureclaimed rows, validationrefs,commit attempts, blocker rowsPublic replay fixture claimed rows, validation refs, commit attempts, blocker rowsCopied internal controlsource bodieswork landing, preflight,scoped commit, work logCopied internal control source bodies work landing, preflight, scoped commit, work logValidatorValidatorReplay mechanicsclaim before mutation,validate before commit,HEAD movement before landedlanguageReplay mechanics claim before mutation, validate before commit, HEAD movement before landed languageNegative floorlive Git authority, missingcompletion,uncaptured blocker, privateleakageNegative floor live Git authority, missing completion, uncaptured blocker, private leakageResult recordsboard, result, validation,sign-off; no live mutationauthorityResult records board, result, validation, sign-off; no live mutation authority

Source refs

Validator
durable_agent_work_landing_replay validator
Diagram source
flowchart LR Fixture["Public replay fixture claimed rows, validation refs, commit attempts, blocker rows"] Source["Copied internal control source bodies work landing, preflight, scoped commit, work log"] Validator["durable_agent_work_landing_replay validator"] Mechanics["Replay mechanics claim before mutation, validate before commit, HEAD movement before landed language"] Negative["Negative floor live Git authority, missing completion, uncaptured blocker, private leakage"] Result record["Result records board, result, validation, sign-off; no live mutation authority"] Fixture --> Validator Source --> Validator Validator --> Mechanics Validator --> Negative Mechanics --> Result record Negative --> Result record

Public Contract

  • The source pattern is durable_agent_work_landing_replay_compound.
  • The fixture lives at fixtures/first_wave/durable_agent_work_landing_replay/input/.
  • The runtime example lives at examples/durable_agent_work_landing_replay/exported_work_landing_replay_bundle/.
  • The validator is microcosm_core.organs.durable_agent_work_landing_replay.
  • The CLI command is microcosm durable-agent-work-landing-replay run-work-landing-bundle.
  • The governing standard is standards/std_microcosm_durable_agent_work_landing_replay.json.
  • The component model row is core/organ_atlas.json#durable_agent_work_landing_replay.
  • The sign-off row is core/organ_registry.json#durable_agent_work_landing_replay.

Technical Mechanism

The replay fixture imports six source internal control bodies through examples/durable_agent_work_landing_replay/exported_work_landing_replay_bundle/source_module_manifest.json. Those bodies are copied into source_modules/ with digest provenance instead of being summarized from memory:

  • system/lib/workitem_runtime_entrypoint.py
  • system/lib/work_landing_status.py
  • tools/meta/control/work_landing.py
  • tools/meta/control/mission_transaction_preflight.py
  • tools/meta/control/scoped_commit.py
  • tools/meta/factory/work_ledger.py

The validator checks the replay rows against those source-backed mechanics rather than accepting a prose landing claim. validate_projection_protocol requires source pattern refs, projection result record refs, and public runtime refs. validate_landing_policy requires the scoped-commit, broad-checkpoint, metadata-blocked patch-bundle, and hard-stop lanes, with broad checkpointing kept behind explicit operator authorization and launch-scope decision kept false. validate_work_landing_runs enforces claim-before-mutation evidence, validation before commit attempt, HEAD movement before landed language, blocker capture before metadata-blocked completion, dirty-tree boundary evidence, and work log finalizer evidence.

The source-open body floor is enforced separately by validate_source_module_imports. The manifest must declare copied_non_secret_macro_body, body_in_receipt: false, exact-copy source-to-target relations, allowed public source material classes, expected digests, and required anchors inside each copied source body. That check keeps the reader claim tied to actual source internal control files while result records carry only refs, digests, counts, and verdicts.

The result builder merges projection-protocol, landing-policy, work-run, source-module, source-open-body, and secret-exclusion checks into one metadata-only result record set. The board result record records three claimed-path rows, two validation-before-commit mechanics, one metadata-blocked row, one landed-commit row, nine observed negative cases for the first-wave fixture, and zero authority for live Git mutation or launch.

Prior Art Grounding

This component is grounded in provenance and software supply-chain integrity patterns. The W3C PROV family provides a general model for entities, activities, and agents involved in producing an artifact. SLSA brings a similar concern to software builds: source, build process, provenance, and artifact integrity are tracked so consumers can reason about where an artifact came from and how it was produced.

Microcosm borrows that provenance posture for agent work landing: claimed paths, validation refs, commit attempts, HEAD-before/after evidence, blocker capture, Task/work log completion, and seed reentry are separate evidence fields. It does not perform a live Git landing or prove arbitrary commits outside the replay.

Reader Evidence Routing

Read the replay as an evidence-accounting component, not as a live landing controller. The board result record is the primary reader surface: it shows which claimed-path rows carried validation evidence, which rows were blocked by Git-metadata or dirty-tree constraints, and which rows had enough HEAD before/after evidence to use landed language.

Read the source-module manifest as provenance evidence for the imported control plane, not as a permission slip to mutate those source files. The manifest binds the copied bodies by digest and line count so a cold agent can see which mechanics the replay model was checked against.

Read negative cases as the authority floor. Rows that claim live Git mutation, broad checkpoint authority, missing work log completion, uncaptured blockers, launch-scope decision, or non-public paths/body export are supposed to fail. Passing those refusals is part of the positive claim.

Evidence Result records

  • receipts/first_wave/durable_agent_work_landing_replay/durable_agent_work_landing_replay_result.json
  • receipts/first_wave/durable_agent_work_landing_replay/durable_agent_work_landing_replay_board.json
  • receipts/first_wave/durable_agent_work_landing_replay/durable_agent_work_landing_replay_validation_receipt.json
  • result records/sign-off/first_wave/durable_agent_work_landing_replay_fixture_acceptance.json

Run the fixture result record refresh from microcosm-substrate with:

PYTHONPATH=src python3 -m microcosm_core.organs.durable_agent_work_landing_replay run --input fixtures/first_wave/durable_agent_work_landing_replay/input --out receipts/first_wave/durable_agent_work_landing_replay

Run the exported bundle validator without mutating durable result records with:

PYTHONPATH=src python3 -m microcosm_core.organs.durable_agent_work_landing_replay run-work-landing-bundle --input examples/durable_agent_work_landing_replay/exported_work_landing_replay_bundle --out /tmp/durable-agent-work-landing-replay

Named Proof Consumers

  • First-wave runtime consumer: microcosm_core.organs.durable_agent_work_landing_replay run consumes the fixture input, writes result, board, validation, and optional sign-off result records, and observes the nine negative cases declared in EXPECTED_NEGATIVE_CASES.
  • Exported-bundle consumer: microcosm_core.organs.durable_agent_work_landing_replay run-work-landing-bundle consumes the exported bundle without durable result record mutation, validates the source-module manifest, checks copied source-body digests and anchors, and emits the command card path used by runtime-shell demos.
  • Scope limit consumer: standards/std_microcosm_durable_agent_work_landing_replay.json, the component AUTHORITY_CEILING, and the fixture negative cases keep live Git mutation, broad checkpoint authority, unrelated dirty-path staging, live Task/work log mutation, external model access, source-file changes, public sharing, launch, non-public body export, and whole-system correctness outside this module.

Negative Cases

The fixture rejects the nine named negative cases in core/fixture_manifests/durable_agent_work_landing_replay.fixture_manifest.json: missing validation evidence, validation recorded after a commit attempt, missing recorded completion, commit-landed language without a HEAD advance, live Git side-effect authority, missing dirty-tree boundary, uncaptured metadata blockers, overbroad distribution claims, and non-public path/body leakage.

Validation Result record Path

./repo-pytest tests/test_durable_agent_work_landing_replay.py -q --basetemp=/tmp/microcosm_durable_agent_work_landing_replay_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This module may claim public replay evidence that claimed-path rows, validation-before-commit rows, HEAD before/after evidence, blocker-capture rows, work log finalizer evidence, copied internal control bodies, source manifests, metadata-only result records, and negative cases support the declared work-landing replay contract. The component, mechanism, code locus, governed concept, and principles are bound in the structured lattice bindings above.

This module may not claim live Git mutation, arbitrary commit-landed truth, live work log mutation, live work log mutation, external model access, broad checkpoint authority, source-file changes, hosted-public posture, launch-scope decision, publishing-scope decision, implementation correctness beyond the listed witnesses, or whole-system correctness.

Scope limit

This component is source-open replay evidence for synthetic result records and copied source bodies with digest provenance. It supports local inspection of recorded work-landing mechanics, while operational distribution and live Git side effects stay outside the public fixture.

Source and projection details
Governing Lattice Relation

The JSON bundle binds this module to mechanism mechanism.durable_agent_work_landing_replay.validates_public_work_landing_replay_contract, component durable_agent_work_landing_replay, concept concept.work_landing_and_continuity_control_bundle, principles P-5, P-10, P-14, P-15, and P-16, axioms AX-4 and AX-9, and the runtime code locus src/microcosm_core/organs/durable_agent_work_landing_replay.py. That lattice position makes the module a bounded work-landing accounting replay: it explains how evidence is recorded and rejected, not how to perform live Git mutation.

The concept edge is the scope limit. Broader work-continuity claims must route through sibling modules such as bridge_phase_continuity_runtime and work_landing_control_spine, while live landing behavior remains with the source internal control source files and work log/scoped-commit owner lanes. This module can cite their copied bodies as evidence, but it cannot promote itself into their live authority.

Bridge Phase Continuity RuntimeReplays a paused job to prove the rules for safely resuming it hold and reject duplicate resumes.5/5

Does Replays a small synthetic record of a paused background job to check that the rules for safely resuming it hold. It confirms the job left behind a real resume note, that trying to resume the same job twice is refused, that "still alive" pings can never count as permission to resume (and a long-stale ping can't be passed off as proof the job is still healthy), and that only a proper completion result record is allowed to claim the work actually finished. It also includes deliberately broken cases to prove each rule rejects them. These continuity rules are inspectable in plain result records without any live job, network call, or non-public data ever being touched.

Scope limit It validates only the declared public continuity contract over synthetic fixtures; it does not run live bridge transport, use external model services, read operator HUD/browser/phase-runtime or private-memory state, prove provider or UI uptime, land work, change source files, or include launch operations.

Run
microcosm bridge-phase-continuity-runtime run --input fixtures/second_wave/bridge_phase_continuity_runtime/input --out /tmp/microcosm-bridge-continuity

EvidenceContract validatorevidence 5/5Import validation

workflow-engineeringcontinuity

Source Design note · Source atlas

Paper module Bridge Phase Continuity Runtime

Route Card

bridge_phase_continuity_runtime is the public, executable synthetic transport continuity membrane for detached bridge work. It lets a cold agent validate the disk-first observe/apply handoff without opening live bridge transport, model-output data, operator HUD/browser state, prompt-shelf bodies, private memory, or active phase runtime state.

Purpose

This paper module exists to make detached bridge continuity testable as a public fixture instead of a trust story about hidden agents or provider sessions. The component asks one bounded question: can a disk-first observe/apply handoff be represented by public synthetic transport inputs, validated through continuation, heartbeat, resource-pressure, resume, worker-skip, and completion result records, and kept below live bridge/provider/UI/source-file changes?

The important mechanism is not "run a bridge." It is a continuity membrane: every claim must pass through explicit packet fields, negative-case checks, metadata-only result record writes, and an scope limit that says heartbeat is liveness evidence, resume is resume evidence, and neither is proof that work landed.

First command:

microcosm bridge-phase-continuity-runtime run --input fixtures/second_wave/bridge_phase_continuity_runtime/input --out /tmp/microcosm-bridge-continuity

Prior Art Grounding

This runtime borrows from durable execution, workflow orchestration, leases, and provenance practice. Useful anchors include:

  • Temporal, whose durable-execution model keeps workflow state resumable across process failure and retries.
  • Apache Airflow DAGs, which separate task ordering and retry/timeout policy from task internals.
  • Kubernetes Lease-based leader election, as a prior pattern for liveness evidence, lease renewal, and failover without confusing a heartbeat with work completion.
  • W3C PROV, for provenance records that let readers evaluate how an output was produced.

Microcosm borrows the resumable-workflow, DAG, lease, and provenance shapes, but keeps the component to public synthetic observe/apply fixture sign-off. It does not run live bridge transport, use external model services, prove UI uptime, land work, change source files, or include launch operations.

Primary authority surfaces:

  • Runtime: src/microcosm_core/organs/bridge_phase_continuity_runtime.py
  • Standard: standards/std_microcosm_bridge_phase_continuity_runtime.json
  • Fixture manifest: core/fixture_manifests/bridge_phase_continuity_runtime.fixture_manifest.json
  • Source-module manifest: examples/macro_projection_import_protocol/exported_projection_import_bundle/observe_runtime_source_module_manifest.json
  • Result record set: receipts/second_wave/bridge_phase_continuity_runtime/*.json

Shape

"yes""no""env set""env absent"Six synthetic transportinputsdetached job, continuationpacket,heartbeat rows, resourcepressure,worker-skip result record,forbidden termsSix synthetic transport inputs detached job, continuation packet, heartbeat rows, resource pressure, worker-skip result record, forbidden terms_validate_synthetic_transport_contract_validate_synthetic_transport_contractValid job?yielded to disk, packet notconsumed,fresh heartbeat, phase andcontinuity matchValid job? yielded to disk, packet not consumed, fresh heartbeat, phase and continuity matchPositive path acceptedPositive path acceptedRefusal floor:missing packet, missingfields,duplicate resume, heartbeatclaims resume,stale heartbeat overclaim,dispatch blockedRefusal floor: missing packet, missing fields, duplicate resume, heartbeat claims resume, stale heartbeat overclaim, dispatch blockedConcrete error codesConcrete error codes_validate_fixture_contractsource digests, completionfinalizer,apply-failure rollback,public boundary_validate_fixture_contract source digests, completion finalizer, apply-failure rollback, public boundaryprivate_state scanfixture and transport inputsprivate_state scan fixture and transport inputsFive metadata-only resultrecordscontinuation, heartbeat,resource pressure,resume, completion transitionFive metadata-only result records continuation, heartbeat, resource pressure, resume, completion transitionTracked result record-writegateTracked result record-write gateResult records writtenResult records writtenBlockedBlockedScope limit:no live bridge transport,external model access,HUD/browser/private memory,source-file changes,launch, or whole-system proofScope limit: no live bridge transport, external model access, HUD/browser/private memory, source-file changes, launch, or whole-system proof

Source refs

Blocked
tracked_receipt_writes_blocked
Diagram source
flowchart TD Inputs["Six synthetic transport inputs detached job, continuation packet, heartbeat rows, resource pressure, worker-skip result record, forbidden terms"] --> Transport["_validate_synthetic_transport_contract"] Transport --> Good{"Valid job? yielded to disk, packet not consumed, fresh heartbeat, phase and continuity match"} Good -->|"yes"| Accept["Positive path accepted"] Good -->|"no"| Refuse["Refusal floor: missing packet, missing fields, duplicate resume, heartbeat claims resume, stale heartbeat overclaim, dispatch blocked"] Refuse --> Codes["Concrete error codes"] Accept --> Fixture["_validate_fixture_contract source digests, completion finalizer, apply-failure rollback, public boundary"] Codes --> Fixture Fixture --> Scan["private_state scan fixture and transport inputs"] Scan --> Result records["Five metadata-only result records continuation, heartbeat, resource pressure, resume, completion transition"] Result records --> Gate{"Tracked result record-write gate"} Gate -->|"env set"| Written["Result records written"] Gate -->|"env absent"| Blocked["tracked_receipt_writes_blocked"] Written --> Ceiling["Scope limit: no live bridge transport, external model access, HUD/browser/private memory, source-file changes, launch, or whole-system proof"] Blocked --> Ceiling

The shape is the public continuity membrane: six synthetic transport inputs are checked for a single valid resumable job and against a refusal floor, the accepted and rejected paths both feed the fixture-contract and non-public-state checks, and only then are the five metadata-only result records written through the tracked-write gate. The result record roles delimit what a reader can trust.

Mechanism Pipeline

The runtime source locus is src/microcosm_core/organs/bridge_phase_continuity_runtime.py. Its public entry point run reads the fixture manifest, resolves public-relative fixture paths, and validates six synthetic transport inputs: detached_job.json, continuation_packet.json, heartbeat_rows.jsonl, resource_pressure.json, worker_skip_receipt.json, and private_state_forbidden_terms.json. JSONL heartbeat rows are streamed by _read_required_jsonl so malformed rows are findings, not a reason to ingest a whole live heartbeat body.

The central validator is _validate_synthetic_transport_contract. It separates five result record roles: continuation packet, heartbeat, resource pressure, resume result record, and completion transition. The implementation then writes the canonical result record set only through the result record-write gate. When the requested output is a tracked result record path and MICROCOSM_TRACKED_RECEIPT_WRITES=1 is absent, the component reports tracked_receipt_writes_blocked instead of silently refreshing tracked evidence.

The negative-case floor is source-declared in EXPECTED_NEGATIVE_CASES and validated from fixture contents. Missing continuation packets, missing required fields, duplicate resume attempts, heartbeat rows that claim resume authority, stale heartbeat overclaims, resource-pressure dispatch blocks, private HUD body leakage, resume-pass work-landing overclaims, and observe/apply validation rollback all become explicit error codes. A pass therefore means the fixture both accepted the positive path and observed the refusal floor.

Reader Evidence Routing

Reader evidence routes from this module to the runtime source locus, fixture manifest, source-module manifest, public result records, and focused regression. A diagram view and an atlas card are generated for this module. This page explains what a reader can infer from them.

Evidence classWhat it supportsProof consumer
Positive synthetic fixtureThe runner consumes the observe/apply fixture, writes five metadata-only result record roles, keeps non-public-state scan clean, and preserves the scope limit.tests/test_bridge_phase_continuity_runtime.py::test_bridge_phase_continuity_runner_consumes_observe_apply_fixture
JSONL input handlingHeartbeat rows are streamed, invalid JSONL rows become findings, and non-object rows are rejected without reading live transport state.test_bridge_phase_continuity_jsonl_reader_streams
Tracked result record gateDurable tracked result record paths are not refreshed unless the explicit tracked-write environment variable is present.test_bridge_phase_continuity_runner_reports_tracked_receipt_write_gate
Public synthetic labelFixture inputs use synthetic_transport and do not carry stale legacy transport or expected-error-code labels.test_bridge_phase_continuity_fixture_inputs_use_public_synthetic_transport_label
Negative floorThe seven expected negative case classes are observed as concrete error codes rather than prose warnings.Focused bridge-continuity negative-case tests in tests/test_bridge_phase_continuity_runtime.py
CLI card boundaryCompact command cards can summarize status without leaking forbidden private/live body classes.Bridge-continuity CLI/card tests in tests/test_bridge_phase_continuity_runtime.py

What It Proves

The component proves a bounded public fixture contract:

  • A yielded synthetic job can be resumed only through an explicit continuation packet.
  • Missing packets, missing packet fields, and already consumed packets are rejected.
  • Heartbeat rows stay liveness evidence only; fresh or stale heartbeat rows do not become resume authority or provider/UI uptime evidence.
  • Resource pressure can block dispatch and must be recorded as a blocked decision.
  • Resume success is resume-only; it does not establish work landed without the completion transition result record.
  • Worker-skip result records dedupe a no-op without silently closing the claim.
  • The fixture and result records stay metadata-only for private/live-state classes.

The reusable mechanism is not "subagents are good." It is the concrete continuity membrane that future agents can run before relying on observe/apply bridge resumption claims.

Source-Backed System

The runtime consumes seven public fixture inputs:

  • observe_apply_session_fixture.json
  • detached_job.json
  • continuation_packet.json
  • heartbeat_rows.jsonl
  • resource_pressure.json
  • worker_skip_receipt.json
  • private_state_forbidden_terms.json

The fixture manifest declares five copied source body imports: codex_paths_body_import, markdown_routing_body_import, observe_memory_body_import, observe_surfaces_body_import, and observe_runtime_body_import. The component validates the copied target digests from observe_runtime_source_module_manifest.json; result record output keeps those bodies out of result records and records digest verdicts instead.

Result record Floor

A passing run writes five canonical result record roles:

  • continuation_packet.json
  • heartbeat.json
  • resource_pressure.json
  • resume_receipt.json
  • closeout_transition.json

Each result record carries organ_id, fixture_id, validator_id, checker_id, status, continuation_packet_status, heartbeat_status, resource_pressure_decision, resume_once_status, duplicate_resume_rejection, worker_skip_receipt_status, private_state_scan, authority_ceiling, anti_claim, and the full result record path set.

The runtime also enforces tracked result record-write gating. A direct run to tracked result record paths without MICROCOSM_TRACKED_RECEIPT_WRITES=1 reports tracked_receipt_writes_blocked rather than mutating tracked evidence silently.

Negative Cases

The expected negative-case floor is source-declared in the runtime and manifest:

  • missing_packet_duplicate_resume_and_resource_block
  • continuation_packet_missing_required_fields
  • heartbeat_claims_resume_authority
  • bridge_packet_private_hud_body
  • stale_heartbeat_overclaims_liveness
  • resume_success_overclaims_work_landed
  • apply_validation_failure_rolls_back_observe_promotion

The current result record error-code set includes MISSING_CONTINUATION_PACKET, MISSING_CONTINUATION_PACKET_FIELDS, CONTINUATION_PACKET_ALREADY_CONSUMED, HEARTBEAT_NOT_RESUME_AUTHORITY, STALE_HEARTBEAT_LIVENESS_CLAIM, RESOURCE_PRESSURE_DISPATCH_BLOCKED, BRIDGE_PACKET_PRIVATE_HUD_BODY, RESUME_PASS_OVERCLAIMS_WORK_LANDED, and OBSERVE_APPLY_VALIDATION_FAILED.

Validation Anchors

Focused tests:

./repo-pytest --host-pressure-policy=warn tests/test_bridge_phase_continuity_runtime.py

Source-form runtime:

cd microcosm-substrate && PYTHONPATH=src python3 -m microcosm_core.organs.bridge_phase_continuity_runtime run --input fixtures/second_wave/bridge_phase_continuity_runtime/input --out /tmp/microcosm-bridge-continuity

Compact card:

cd microcosm-substrate && PYTHONPATH=src python3 -m microcosm_core.organs.bridge_phase_continuity_runtime run --input fixtures/second_wave/bridge_phase_continuity_runtime/input --out /tmp/microcosm-bridge-continuity --card

Validation Result record Path

./repo-pytest --host-pressure-policy=warn tests/test_bridge_phase_continuity_runtime.py -q --basetemp=/tmp/microcosm_bridge_phase_continuity_runtime_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This module may claim public fixture evidence that synthetic observe/apply continuation packets, heartbeat rows, resource-pressure decisions, resume-once behavior, worker-skip result records, completion-transition result records, source-module manifests, negative cases, validation result records, and generated projections support the declared bridge-continuity fixture contract.

This module may not claim live bridge transport health, external model access, operator HUD/browser access, prompt-shelf or private-memory disclosure, live phase runtime truth, source-file changes, hosted-public posture, launch-scope decision, publishing-scope decision, implementation correctness beyond the listed witnesses, or whole-system correctness.

Scope limit

The component authorizes only public synthetic observe/apply fixture sign-off. It does not run live bridge transport, use external model services, read operator HUD/browser state, read live phase runtime state, read prompt-shelf or private-memory bodies, prove provider or UI uptime, land work, change source files, include launch operations, or certify whole-system correctness.

Read the five result records as fixture evidence, not as a bridge-health statement. A pass means the declared public continuity contract held for the synthetic fixture and copied body floor.

Concurrency Mission ControlRuns copied claim-coordination code so duplicate, stale, and conflicting claims get blocked.5/5

Does This component imports the real concurrency mission-control specimen builder plus public provider and work log bridge artifacts as exact copies. Running it shows duplicate claims, dependency conflicts, stale leases, missing result records, supervised finalizers, and misanchored claims blocked through repair rows while authority-collapse counters stay at zero.

Scope limit verified concurrency mission-control source body import only, not a live scheduler, external model access, hosted orchestration, production concurrency-safety proof, source authority, private-system equivalence, public sharing, or launch-scope decision

Run
microcosm concurrency-mission-control run --input fixtures/first_wave/concurrency_mission_control/input --out receipts/first_wave/concurrency_mission_control --acceptance-out receipts/acceptance/first_wave/concurrency_mission_control_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

agent-concurrencyworkflow-engineeringcontinuity

Source Design note · Source atlas

Paper module Concurrency Mission Control

concurrency_mission_control imports the real self-indexing-cognitive-system/src/idea_microcosm/concurrency_mission_control_specimen.py source builder plus its public provider-canary and work log bridge artifacts as exact source copies. The component runs the copied builder in a temporary public seed root, then checks the transaction failure matrix, authority membrane, and a public work log seed-speed topology fixture. The work log code body itself is consumed through the existing mission_transaction_work_spine source-body import surfaces rather than duplicated here.

The component is deliberately narrow: it demonstrates fail-closed transaction gating for synthetic multi-agent lanes, not private mission-control runtime, external model access, live scheduling, production concurrency safety, hosted orchestration, or launch-scope decision.

Purpose

When several agents work the same repository at once, the dangerous moment is not a crash. It is a quiet one: two lanes edit the same generated file, or one lane commits work whose owner has not finished, and nobody notices until the state is already wrong. This component exists to make that moment a checkable verdict rather than a judgement call.

The single question it answers is: given a dirty path and the live claim topology around it, is acting on that path safe, and if not, what must happen first? The answer is never "probably fine". Each case resolves to a named classification and one allowed action, so a lane can decide whether to proceed, hand off, or wait.

What is unusual is where the evidence comes from. Rather than re-implementing a scheduler, the component runs the real source mission-control builder over public synthetic lanes and reads a public snapshot of the work log's seed-speed topology: who holds which claim, whether their heartbeat is current, and where path claims collide. The most pointed part is the pair of classifier lenses. The closure-state lens then folds in validation, commitability, and residual evidence to say whether a piece of work is genuinely closed or only looks closed. Both lenses default to the cautious verdict when the evidence is thin, which is the behaviour the page is really about.

Prior Art Grounding

This component borrows from workflow DAGs, lease-based coordination, atomic commit protocols, and CI concurrency controls. Useful anchors include:

  • Apache Airflow DAGs, for representing tasks, dependencies, retries, and scheduling separately from task internals.
  • Kubernetes Lease-based leader election, as a prior pattern for lease holders, renewals, and failover-sensitive internal control coordination.
  • IBM Research on two-phase commit, as a transaction-consistency pattern for distributed participants under failure.
  • GitHub Actions workflow syntax, for declared workflow concurrency and job orchestration controls.

Microcosm borrows the DAG, lease, commit-gate, and workflow-concurrency shapes, but keeps the component to fail-closed synthetic multi-agent transaction gating. It does not claim private mission-control runtime, external model access, live scheduling, production concurrency safety, hosted orchestration, or launch.

Shape

Copied source builderrun in temp seed root:mission board, bridges,result recordCopied source builder run in temp seed root: mission board, bridges, result recordPublic bridge artifactsprovider canary andwork log cap economyPublic bridge artifacts provider canary and work log cap economywork log seed-speed snapshotclaims, heartbeats,collisions, session cardswork log seed-speed snapshot claims, heartbeats, collisions, session cardsfailure_matrix_gateconflict, duplicate run,dependency, lease,result record, finalizervisiblefailure_matrix_gate conflict, duplicate run, dependency, lease, result record, finalizer visiblebridges green,authority-collapse zero,forbidden claims blockedbridges green, authority-collapse zero, forbidden claims blockedheartbeat current,path claims collision-freeheartbeat current, path claims collision-freedirty generated file:owner live / stale / absent> allowed actiondirty generated file: owner live / stale / absent > allowed actionclosure_state_lensclosed and committed,validation deferred,or open and unclassifiedclosure_state_lens closed and committed, validation deferred, or open and unclassifiedNegative floormissing seed root,blocked bridge,authority collapse,private runtime,claim collisionNegative floor missing seed root, blocked bridge, authority collapse, private runtime, claim collisionmetadata-only result recordsrefs, digests, anchors,counts, verdicts;no session or proof bodiesmetadata-only result records refs, digests, anchors, counts, verdicts; no session or proof bodiesEnginesEngines

Source refs

bridges green, authority-collapse zero, forbidden claims blocked
bridge_authority_membrane
heartbeat current, path claims collision-free
work_ledger_seed_speed_gate
dirty generated file: owner live / stale / absent > allowed action
generated_surface_claim_lens
Diagram source
flowchart LR Builder["Copied source builder run in temp seed root: mission board, bridges, result record"] Bridge["Public bridge artifacts provider canary and work log cap economy"] Seed["work log seed-speed snapshot claims, heartbeats, collisions, session cards"] subgraph Engines["Six engines (all must pass)"] Matrix["failure_matrix_gate conflict, duplicate run, dependency, lease, result record, finalizer visible"] Membrane["bridge_authority_membrane bridges green, authority-collapse zero, forbidden claims blocked"] SeedGate["work_ledger_seed_speed_gate heartbeat current, path claims collision-free"] SurfaceLens["generated_surface_claim_lens dirty generated file: owner live / stale / absent -> allowed action"] ClosureLens["closure_state_lens closed and committed, validation deferred, or open and unclassified"] end Negative["Negative floor missing seed root, blocked bridge, authority collapse, private runtime, claim collision"] Result record["metadata-only result records refs, digests, anchors, counts, verdicts; no session or proof bodies"] Builder --> Matrix Builder --> Membrane Bridge --> Membrane Seed --> SeedGate Seed --> SurfaceLens Seed --> ClosureLens Engines --> Negative Negative --> Result record

Engines

  • mission_transaction_original_builder dynamically loads the copied source builder and emits the mission board, provider repair bridge, work-metabolism bridge, residual replay bridge, and result record.
  • failure_matrix_gate checks that owner-path conflicts, duplicate command runs, dependency gaps, stale leases, missing result records, supervised-scope gaps, missing parent finalizers, and misanchored claims all remain visible.
  • bridge_authority_membrane checks that bridge statuses are green while authority-collapse counters remain zero and forbidden claims stay blocked.
  • work_ledger_seed_speed_gate checks that public session heartbeat, seed-speed status, mutation-check commands, multi-session/claim counts, and collision-free path-claim rows are present without exporting private work log session bodies.
  • Each classification carries the single allowed action, so the verdict is what a lane should do, not just what it observed.
  • closure_state_lens decides whether a unit of work is genuinely closed. It folds the generated-surface classification together with validation state, commitability, and any open residual, separating closed_and_committed from the cases that only look done: closed_validation_deferred (validation parked under host pressure), closed_uncommitted_authority (event authority exists but shared append logs are unsafe to stage), false_residual_stale (a residual left open against a passing generator check), or open_unclassified when the closure evidence is simply insufficient. The default is the last of these, so absent evidence never reads as success.

Reader Evidence Routing

Read this module as a coordination-evidence membrane, not as a live scheduler. Start with paper_modules/concurrency_mission_control.json for the full structured binding, then open standards/std_microcosm_concurrency_mission_control.json for required copied-body counts, negative cases, result record fields, and the public/private boundary.

Open core/fixture_manifests/concurrency_mission_control.fixture_manifest.json and examples/concurrency_mission_control/exported_concurrency_mission_control_bundle/source_module_manifest.json before inspecting copied source modules. The manifest floor names one source builder body and six public bridge artifacts; result record payloads should carry source refs, hashes, anchors, counts, verdicts, and omission result records, not body text.

Read the work log seed-speed topology as a public coordination fixture. It can show heartbeat participation, mutation-check commands, session and claim counts, and collision-free selected rows, but it cannot export private work log session bodies or authorize live scheduling.

Negative Cases

The fixture carries stable cases for missing seed roots, blocked provider bridges, authority-collapse claims, private runtime overclaims, and unresolved work log seed-speed claim collisions. If focused validation reports an exact-copy source-module body mismatch, route that repair through microcosm_exact_copy_refresh; do not treat this Markdown projection as source authority for copied source bodies.

Validation Result record Path

From microcosm-substrate, validate with throwaway result record outputs first:

A diagram view and navigation card are generated for this module from its declared component, mechanism, concept, principle, axiom, dependency, and code-locus relationships. Fixture and bundle passes prove only public fail-closed coordination evidence over the declared copied bodies and synthetic fixtures. Source-copy digest drift belongs to microcosm_exact_copy_refresh; shared lattice projection drift belongs to the live projection owner lane.

Scope boundary

Scope limit

This module may claim public fixture evidence that the exact public source builder copy, provider-canary and work log bridge artifacts, failure-matrix fixture, bridge authority membrane, work log seed-speed topology fixture, source manifests, metadata-only result records, negative cases, and generated navigation projections support the declared concurrency mission-control fixture contract. It may also claim that the structured binding row resolves the accepted component subject, resolved mechanism subject, runtime source locus, governed concept, five principles, four axioms, and three dependency modules.

This module may not claim private mission-control runtime truth, external model access, live scheduling, production concurrency safety, hosted orchestration, source-file changes, hosted-public posture, launch-scope decision, publishing-scope decision, implementation correctness beyond the listed witnesses, or whole-system correctness.

Source and projection details
Governing Lattice Relation

The governing lattice claim is that this module turns concurrency coordination from a status narrative into a transaction-scoped evidence check. The bundle structured source record reports sixteen resolved edges and zero unresolved selective relations: the page explains the accepted component and mechanism, cites the runtime source locus, depends on the mission-transaction, bridge-continuity, and work-landing modules, and is governed by concept.work_landing_and_continuity_control_bundle. That concept binds this component to the same family shape as work landing and continuity controls: public fixture or exported bundle input becomes a coordination validator, and the result is a scoped transaction or continuity result record rather than chat status or generated projection authority.

The mechanism row mechanism.concurrency_mission_control.validates_public_concurrency_mission_control is the source-backed explanation edge. In source, run, run_concurrency_mission_control_bundle, classify_generated_surface_claim_lens, and classify_concurrency_closure_state_lens require copied-source digest equality, required anchors, failure-class coverage, work log seed-speed topology checks, metadata-only result records, and explicit scope limits. The focused proof consumer is tests/test_concurrency_mission_control.py: it checks the happy-path fixture, exported-bundle validation, digest-mismatch rejection, exact source-body imports, semantic negative cases, owner-state classification, and closure-state classification. The standard std_microcosm_concurrency_mission_control.json supplies the same ceiling in schema form, including seven copied public source modules, five negative cases, no non-public body export, and no live scheduler/provider/launch-scope decision.

The principle and axiom edges keep the proof boundary from drifting upward. P-10, P-16, and AX-9 make coordination effects transaction-scoped and compensable; P-2, P-6, P-8, AX-5, AX-7, and AX-8 force the validator to lower claim strength when evidence, preconditions, provenance, or refusal reasons are missing. A passing run therefore proves only the public concurrency mission-control fixture contract over declared copied bodies and synthetic fixtures. It does not establish private mission-control runtime truth, live scheduling, external model access, hosted orchestration, production concurrency safety, source-file changes, launch-scope decision, or whole-system correctness.