A paper module is the long-form write-up behind one component: a single document covering what the mechanism does, how it is checked, the test matrix behind it, its realness rungs, and where its scope stops. Each component card embeds its own write-up in place; this page is the whole collection in one list, including write-ups still missing, so you can read across mechanisms without opening cards one by one.
Rule feed
Paper modules carry typed handles into the same static rule lattice used by the Doctrine reference page. The browser reads this precomputed feed from content-graph.json; it does not fetch source JSON or run a backend.
Principles
20 records · 470 module links
Axioms
12 records · 374 module links
Concepts
11 records · 97 module links
Mechanisms
95 records · 108 module links
Paper modules
93 records · 223 module links
Principles, axioms, and anti-principles read on Doctrine; concepts and mechanisms read in the Doctrine reference until a separate body route exists. This page projects only the counts and each module's direct source-declared handles.
Verifier Lab KernelThe public verifier-lab composition root folds bounded formal-math component result records into one leak-proof result record while separating verifier-backed, oracle-compared, provider-suggested, retrieval-miss, CP2, Evolve, and contract-rejected rows.
Verifier Lab Kernel is the public result record compiler for the formal-math fixture cohort. It runs or consumes named component result records, classifies proof-lab rows by authority class, preserves scope boundaries, and excludes proof/provider/oracle bodies from exported result records.
Scope limit Public fixture and exported-bundle result records only; no theorem-prover authority, Mathlib import authority, oracle-to-forward success, provider proof authority, benchmark solve-rate claim, launch-scope decision, publishing-scope decision, source-file changes, or secret export.
verifier_lab_kernel is the public composition root for the formal-math verifier lab. It is not a theorem prover, a benchmark runner, a private Lean import, or a frontend surface. It composes already-public Microcosm components into one leak-proof result record so a reader can see which claim came from a verifier, which claim came from an oracle comparator, which claim came from a provider hypothesis, and which rows were rejected by contract.
The component consumes:
a public ForwardProblem packet with target shape, statement summary, public input hash, and allowed premise ids;
an OracleSidecar packet that may compare against hidden or hindsight knowledge but never increments forward success;
verifier attempts and verifier result classes;
provider/NIM hypotheses as advisory residual diagnoses only;
CP2 typed action candidates, bounded evidence bodies or raw tactic scripts;
bounded Evolve candidates over policy artifacts only.
The runnable fixture also calls the existing public components:
tactic_portfolio_availability_probe;
target_shape_tactic_routing_gate;
formal_math_verifier_trace_repair_loop;
formal_math_lean_proof_witness.
Purpose
In a formal-math agent loop, several different things can look like progress. A Lean checker can accept a term. An oracle holding a hindsight answer can say a candidate matches. A provider model can offer a plausible next tactic. A retrieval step can return a premise. Treated loosely, all of these blur into a single sense of "it worked", and oracle or provider success quietly inflates the count of theorems actually proved. This component exists to stop that blur. The one question it answers is: for each row of evidence, which authority class does it belong to, and what may that class claim?
The composition root runs or consumes nine named component components (corpus readiness, Lean Std premise indexing, premise retrieval, tactic availability, target-shape routing, Ring2 precision and recall, verifier trace repair, proof diagnostics, and the Lean proof witness) and sorts every result into seven separate buckets: verifier-checked, provider-suggested, oracle-compared, retrieval-miss, CP2-translated, Evolve-candidate, and contract-rejected. Each bucket keeps its own authority. A passing component cannot lend its standing to a different bucket.
The unusual part is how the boundary is enforced rather than merely described. The kernel keeps two counters, oracle_forward_success_increment_count and provider_results_counted, and they must read zero. An oracle that marks itself as forward success, or a provider hypothesis that claims proof authority, is recorded as a contract violation, not as a result. The same discipline applies to data: forward problems and CP2 actions are scanned for fields that would smuggle in a proof body, an ideal answer, or an oracle's needed premise ids, and CP2 and Evolve outputs are confined to a fixed vocabulary of action classes and policy artifacts. What the reader receives is a single aggregate result record that carries references, digests, counts, and verdicts, with the proof, provider, oracle, and stdout bodies left out.
Shape
Read the verifier lab kernel as a public result record composition route, not as a proof oracle. The local path spine is the bundle and structured source record (core/paper_module_capsules.json::paper_modules[0:paper_module.verifier_lab_kernel], paper_modules/verifier_lab_kernel.json), the runtime composition root (src/microcosm_core/organs/verifier_lab_kernel.py), the public packet (fixtures/first_wave/verifier_lab_kernel/input/verifier_lab_packet.json), and the emitted public result records under receipts/first_wave/verifier_lab_kernel/.
flowchart TD bundle["Bundle and structured source record core/paper_module_capsules.json paper_modules/verifier_lab_kernel.json"] packet["Public verifier packet fixtures/first_wave/verifier_lab_kernel/input/verifier_lab_packet.json"] kernel["Composition root src/microcosm_core/components/verifier_lab_kernel.py"] components["Public component result records tactic portfolio / target shape / trace repair / Lean witness"] buckets["Separated claim buckets lean_verified | oracle_compared | provider_suggested | retrieval_miss | cp2_translated | evolve_candidate | contract_rejected"] result records["Public board/result/validation result records result records/first_wave/verifier_lab_kernel/*.json"] ceiling["Scope limit no proof-body import; no oracle/provider forward success; no launch claim"] bundle --> packet --> kernel kernel --> components --> buckets --> result records kernel --> ceiling buckets --> ceiling
Prior Art Grounding
This component is grounded in small-kernel theorem-proving and proof-certificate composition patterns. The LCF approach and HOL Light anchor the idea that a verifier lab should distinguish trusted checked results from heuristics and automation. Lean-oriented work such as LeanDojo adds the modern agent context: retrieval, provider hypotheses, and proof-state interaction need explicit boundaries before they can influence proof claims.
Microcosm borrows the composition discipline: verifier success, oracle comparison, provider hypothesis, CP2 translation, and Evolve candidate rows are separate buckets with separate authority. It does not count oracle or provider success as forward proof success.
The sign-off result record must separate these buckets:
lean_verified;
provider_suggested;
oracle_compared;
contract_rejected;
retrieval_miss;
cp2_translated;
evolve_candidate.
The kernel rejects five contract failures:
forward problems that carry candidate, ideal, repair, oracle, source proof, proof body, or base-index fields;
oracle comparator success counted as forward success;
provider hypotheses claiming proof authority;
CP2 candidates carrying proof bodies, raw tactic scripts, provider bodies, or oracle templates;
Evolve candidates mutating anything outside the bounded policy-artifact set.
Reader Evidence Routing
Cold-reader audit starts with the generated structured source record for this module, not with a broad theorem-proving claim. The structured source record must confirm that verifier and mechanism subjects resolve and that a diagram view and atlas card are available for this module.
Evidence should be read in this order:
Module definition: core/paper_module_capsules.json::paper_module.verifier_lab_kernel and paper_modules/verifier_lab_kernel.json.
Runtime proof: src/microcosm_core/organs/verifier_lab_kernel.py, the fixture input packet, and the public component calls listed above.
Bucket-separation proof: result record rows for lean_verified, provider_suggested, oracle_compared, contract_rejected, retrieval_miss, cp2_translated, and evolve_candidate.
This paper module describes public fixture and exported bundle result records only. It excludes private proof-body import, Mathlib-dependent proof authority, oracle-to-forward success, provider proof authority, CP2 proof bodies, arbitrary Evolve mutation, source-file changes, benchmark solve-rate claims, launch, public sharing, hosted deployment, or secret export.
Limitations
The verifier lab kernel is a composition and result record-boundary mechanism. It does not establish formal-result correctness beyond the public component result records it consumes or emits, and it does not create Mathlib import authority when the corpus-readiness gate reports only bounded fixture evidence. A Lean/Lake return code or compiled declaration count is evidence for the corresponding public fixture or exported bundle, not a license to generalize to arbitrary formal math benchmarks.
Oracle structured source record remain hindsight or comparator evidence. They can diagnose a forward problem but cannot increment forward_success; the runtime authority counters must keep oracle_forward_success_increment_count at zero. Provider or NIM hypotheses remain residual diagnoses until a verifier result record or other system effect exists, so provider_results_counted must also remain zero.
CP2 rows are limited to typed action candidates from the bounded action-class vocabulary, with disconfirmation tests before rerun promotion. They are bounded evidence bodies, raw tactic scripts, provider output bodies, or oracle templates. Evolve rows are limited to the named policy-artifact set and must cite baseline or rerun result records; they do not authorize arbitrary source-file changes. Public result records must keep proof, provider, oracle, stdout/stderr, and private-source bodies out of exported evidence.
Coverage is finite: the present proof consumer exercises the first-wave fixture and exported-bundle contracts, the five named negative cases, and the component-stack result record shape. New claim classes, new fixture packets, or new launch/public sharing language need a fresh proof consumer and negative cases before this module can carry them.
Scope limit
This paper module can claim reader wiring for the verifier lab kernel composition root: verifier and mechanism subjects resolve, the runtime source locus is named, a diagram view and atlas card are generated for this module. It cannot claim private proof-body import, Mathlib-dependent proof authority, oracle-to-forward success, provider proof authority, CP2 proof bodies, arbitrary Evolve mutation, source-file changes, benchmark solve-rate claims, publishing-scope decision, hosted deployment, launch-scope decision, secret export, or whole-system correctness.
Fixture result records, exported-bundle result records, focused tests, and public component composition can support only bucket separation across verifier, oracle, provider, CP2, and Evolve rows. The diagram view and atlas card are navigation aids; they do not convert oracle or provider success into forward proof success, and they do not authorize benchmark or launch claims.
Source and projection details
Governing Lattice Relation
The governing lattice should be read as a claim-separation contract. The concept edge to concept.formal_math_and_proof_witness_bundle says the reader is looking at a proof-witness bundle, not a single proof oracle. The mechanism edge to mechanism.verifier_lab_kernel.composes_public_formal_math_receipts narrows that concept to one public operation: compose formal-math component result records into a leak-proof aggregate while keeping verifier, oracle, provider, retrieval, CP2, Evolve, and contract-rejected buckets distinct.
The code-locus edge is the runtime authority boundary. run and run_kernel_bundle select fixture or exported-bundle mode, _build_result loads the public packet and negative cases, validates the proof-lab route, runs or consumes the component stack, scans for forbidden classes, builds claim_separation, and records authority counters. _write_receipts then emits the board, result, validation, and sign-off result records with body_in_receipt: false, the result record-transparency contract, and the same scope boundary.
The nine depends_on paper-module edges are not a loose bibliography. They are the proof-lab dependency spine: corpus readiness, Lean Std premise indexing, premise retrieval, tactic availability, target-shape routing, Ring2 precision and recall, verifier trace repair, proof diagnostic evidence, and the Lean proof witness each remain separately bounded before the kernel aggregates their result records. This prevents a successful component from lending authority to a different bucket. The principle refs P-1, P-2, P-3, P-6, P-8, and P-15, plus axiom refs AX-1, AX-2, AX-5, and AX-7, are therefore read as ceiling law: public result record evidence may be composed, but hidden bodies, provider/oracle success, source-file changes, launch-scope decision, and whole-system correctness cannot cross the lattice boundary.
Focused test evidence checks the same relation. The verifier-lab test asserts that all expected negative cases are observed, all component statuses pass, claim_separation contains exactly the seven public buckets, oracle/provider authority counters stay at zero, body_in_receipt is false, public result record paths do not leak local roots, and legacy redaction fields do not survive result record normalization. Those checks make the lattice relation concrete for this module: the public aggregate result record is evidence of separation and containment, not of unbounded proof authority.
Navigation Hologram Route PlaneThe public navigation route-plane fixture validates bounded route projections, source-coupling gates, entry floors, affordance passports, and copied navigation source-module digests without treating browse rows as authority.
Navigation Hologram Route Plane is the public route-surface validator for moving from a Microcosm-local control entry into browsable route projections. It checks route rows, source-coupling fingerprints, copied source-module anchors and hashes, route-lease policy, entry-packet floors, affordance-passport selection, and code-architecture projection packets while keeping full source bodies out of result records.
Scope limit Public fixture and exported-bundle result records only; no live route freshness, source authority, provider/live-kernel execution, private operator state, later-component authorization, launch-scope decision, or whole-wave certification.
A large codebase has a recurring failure: the agent or reader that lands in it starts from whatever browse surface is nearest to hand, treats that surface as the authority, and acts on a stale or partial view. The route plane exists to make the first move legible and to stop a browse row from being mistaken for the thing it describes. It answers one question: given a control entry, what is the safe ordered path into the browsable route projections, and what proof says that path is wired rather than asserted?
The unusual part is that the component never asserts a route is correct from prose. It treats every browse row as a projection and demands a coupling result record before that projection is allowed any authority. Source coupling is a plain SHA-256 over the route rows: the manifest carries an expected fingerprint and an expected row count, and if either disagrees with the rows on disk the projection is denied current authority. A route summary that claims to be current while its coupling is stale is rejected outright.
The other half of the design is what it refuses to do. First contact must begin at the control entry, not at a drilldown projection, so a request that tries to start from a browse row is replaced with the entry route. Compaction of the entry packet may not drop a required control field. An affordance row whose passport carries an anti-trigger is demoted before similarity search can ever select it. None of these are stylistic preferences; each is a named negative case the fixture must keep catching, so the route plane is defined as much by the eight things it blocks as by the path it permits.
Teleology
The navigation route plane gives a public clone a typed way to move from a control entry to browseable route projections without treating browse rows as authority.
Public Contract
The component runs in two modes against the same checks. The fixture mode loads a set of synthetic inputs, builds a toy option-surface from the rows (a cluster-flag summary plus one selected card), and then runs the negative-case validators that prove each guard still fires. The exported-bundle mode runs the same kind of checks against a real copied bundle: it validates the route rows, the source-coupling fingerprint, the source-module manifest, the route-lease policy, the entry-packet floor, the affordance passports, and the code-architecture projection packet, and only reports a pass when the secret scan is clean, a card row is selected, and every component validator passes.
The source-coupling gate is the spine. It hashes the route rows with SHA-256 and compares that against the fingerprint and row count declared in the manifest; a mismatch denies the projection any current authority, and a summary that claims current authority while coupling is stale is recorded as an overclaim. The source-module manifest names five exact copies of source route and control bodies. Each is checked by digest and by required navigation anchors, and each must declare that its body is copied but never written into the result record, so the evidence is reproducible without exposing the source text.
Shape
Diagram source
flowchart TD Entry["Control entry first browse row that claims first contact is replaced with the entry route"] subgraph Gates["Route-plane gates"] Couple["Source coupling SHA-256 over route rows vs manifest fingerprint + row count"] Rows["Route rows surface role, actionable command, no source-authority claim, omission result record when required"] Modules["Source-module manifest 5 copied source bodies digest + required anchors, body never in result record"] Lease["Route-lease policy selected lane, permitted actions, source authority rejected"] Floor["Entry-packet floor required control fields survive compaction"] Pass["Affordance passports anti-trigger rows demoted before similarity can select them"] end Verdict{"Coupling current, all gates pass, card row selected?"} Entry --> Couple Couple --> Rows --> Modules --> Lease --> Floor --> Pass --> Verdict Verdict -->|yes| Result records["metadata-only result records cluster flag, card, coupling, route lease, entry admission, affordance, code-architecture packet"] Verdict -->|no| Blocked["Blocked stable error codes, findings, bodies redacted"] Negative["Negative-case floor BANNED_FIRST_CONTACT_ROUTE, SOURCE_COUPLING_STALE, and 7 more"] -.-> Gates Result records --> Ceiling["Scope limit projection evidence only; no live route freshness, source authority, or launch"] Blocked --> Ceiling
Source-Backed Doctrine Packet
core/organ_registry.json::implemented_organs[navigation_hologram_route_plane] is the accepted component authority. It records status accepted_current_authority, evidence class semantic_validator, evidence strength rank 5, scope limit validates declared public contract only, and validator command python -m microcosm_core.organs.navigation_hologram_route_plane run --input fixtures/first_wave/navigation_hologram_route_plane/input --out receipts/first_wave/navigation_hologram_route_plane.
core/organ_atlas.json::organs[navigation_hologram_route_plane] gives the cold-reader gloss: control entry comes first, browse rows stay projections, eight route-plane negative cases are detected, exact copied navigation source modules validate, and result records omit body text.
standards/std_microcosm_navigation_hologram_route_plane.json governs the standard authority boundary public_navigation_route_plane_runtime_and_copied_source_body_validator_not_live_source_authority. It requires route rows, option-surface contracts, source coupling, source-module manifests, route leases, entry-packet floors, affordance passports, code-architecture packets, body-import verification, scope limit, and scope boundary.
src/microcosm_core/organs/navigation_hologram_route_plane.py is the runtime source for fixture validation, route-plane bundle validation, secret-exclusion scan, route-lease checks, entry-admission floor checks, affordance-passport demotion, code-architecture packet result records, and source-module digest/anchor validation.
core/fixture_manifests/navigation_hologram_route_plane.fixture_manifest.json binds fixture expectations: body_copied_material_count=5, body_material_status=copied_non_secret_macro_route_substrate_with_provenance, body_in_receipt=false, and negative cases tied to stable error codes.
examples/navigation_hologram_route_plane/exported_route_plane_bundle/source_module_manifest.json names five exact copied source route-control bodies: navigation_route_plane_intervention_source_body_import, navigation_route_plane_context_pack_source_body_import, navigation_route_plane_entry_packet_source_body_import, navigation_route_plane_option_surface_source_body_import, and navigation_route_plane_navigation_contract_source_body_import.
tests/test_navigation_hologram_route_plane.py is the regression floor for fixture result records, exact source-source digest matches, source-module anchors, result record redaction, exported bundle validation, digest-mismatch rejection, and this source-backed paper-module packet.
receipts/first_wave/navigation_hologram_route_plane/*.json carries public result records for cluster/card output, source coupling, route lease, entry-payload admission, affordance-passport selection, code-architecture packet, and exported bundle validation.
Atlas scope limit restated: It validates only the declared public toy route-plane contract and its regression fixtures (plus exact copied navigation source modules in the bundle path); it does not establish live route freshness, grant source authority, authorize any later component, run any provider/live-kernel call, or certify the whole wave.
The negative-case floor is part of the doctrine, not incidental test trivia. Across the eight negative cases, the fixture must keep detecting these stable error codes (one case carries two codes, so the list runs to nine):
BANNED_FIRST_CONTACT_ROUTE
SOURCE_COUPLING_STALE
MISSING_OMISSION_RECEIPT
ATLAS_PROJECTION_NOT_CONTROL_ENTRY
ROUTE_CARD_PRIVATE_BODY_LEAK
ROUTE_SUMMARY_OVERCLAIMS_FRESHNESS
DUPLICATE_ROUTE_ID_CONFLICT
ENTRY_ADMISSION_CONTROL_FLOOR_DROPPED
AFFORDANCE_PASSPORT_ANTITRIGGER_IGNORED
Reader Evidence Routing
Reader evidence starts at the generated JSON instance, then routes through the route-plane runtime, fixture manifest, source-module manifest, public result records, and focused regression. The browse rows, Mermaid diagram, and Atlas card are derived projections; they are not control-entry or source authority.
Prior Art Grounding
The route plane is grounded in information-architecture and graph-navigation patterns. The first-contact rule follows the same usability pressure as progressive disclosure: show the control entry and immediate affordances before deeper browse rows. The CLI-facing surface is also informed by the Command Line Interface Guidelines, especially the emphasis on discoverable commands, examples, and clear next actions.
The graph side maps to established directed-graph tooling. NetworkX documents topological sorting as an ordering over dependency edges, and graph-ranking algorithms such as PageRank show the older pattern of computing route salience from graph structure. Microcosm keeps those ideas below authority: route cards, leases, and browse rows are projections unless source-coupling and entry-admission result records agree.
Validation Result record Path
From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:
These checks validate the public fixture and exported route-plane bundle only; they do not grant live route freshness, source authority, provider/live-kernel execution, later-component authorization, launch-scope decision, or whole-wave certification.
Scope boundary
Scope limit
This module can be cited as evidence that the public fixture and exported route-plane bundle validate their declared contract. It does not establish live route freshness, grant live source-kernel authority, authorize source-file changes, authorize external model access, export account or browser state, expose browser UI live access, authorize recipient work, authorize public sharing or launch, prove whole-system correctness, or certify private-system equivalence.
Scope limit
This module may claim public fixture evidence that the route-plane rows, exported bundle, copied navigation source modules, source manifests, negative cases, and validation result records agree on the declared public route-plane contract. It may also claim that the generated JSON row resolves the accepted component subject, resolved mechanism subject, runtime source locus, governed concept, and the full set of declared principles, axioms, dependency modules, and relationship bindings.
This module may not claim live route freshness, live source-kernel authority, provider or browser UI access, source-file changes, recipient work authorization, hosted-public posture, launch-scope decision, publishing-scope decision, private-system equivalence, implementation correctness beyond the listed witnesses, or whole-system correctness.
Scope boundary
This module documents a public route-plane fixture and exported source-body bundle. It does not certify live corpus freshness, later public components, launch operations, provider/account or browser access, private root equivalence, whole-system correctness, or secret export.
Agent Route Observability RuntimeThe public route-observability fixture validates synthetic route feedback, route leases, hook-shadow advisory rows, anti-pattern debt, copied source trace manifests, and metadata-only result records without claiming live session authority.
Agent Route Observability Runtime is the public evidence membrane for recorded agent route feedback. It checks actor-axis boundaries, selected and replacement routes, route-lease consumption, duplicate trace ids, hook-shadow advisory status, anti-pattern debt retirement, agent-principle-lens admission, egress-mirror boundaries, source-module manifests, and non-public-state exclusion while keeping transcript, provider, browser, HUD, account, account secret, and live-hook bodies out of result records.
Scope limit Public synthetic route-observability fixtures, copied source body digests, and exported-bundle result records only; no live session introspection, provider/browser UI/account authority, live hook control, benchmark-performance proof, source-file changes, launch-scope decision, or whole-system correctness.
This public slice validates synthetic route-feedback fixtures for the agent observability component.
It checks actor-axis authority boundaries, route-lease consumption, duplicate trace ids, hook-shadow advisory status, anti-pattern debt retirement, and behavior-change evidence gates.
Purpose
The observability surface is the place where a cold reader should see that a local run produced a route, work transaction, event trail, evidence ref, and authority boundary. It should not force that reader to start from raw JSON, and it should not replace command-backed evidence with motion or dashboard style.
The useful first artifact is therefore a compact causal board: one command, one selected route, one work/event/evidence chain, one result record or validator handle, and one scope limit. Browser views, screenshots, and videos are allowed projections of that board, not separate claims.
Underneath the board, the component is a replay validator rather than a live tap. It reads recorded trace rows and turns them into paper-visible evidence only after a set of authority-boundary checks agree, and the choice that does the real work is the actor axis. A row tagged as an advisory actor cannot also claim live mutation authority, and a behaviour-change claim is blocked unless it names the trace ids that evidence it. This guards the specific failure mode of observability that reads as proof: a trace that asserts it changed how an agent behaved, with nothing recorded that a reader can check. Here the assertion is rejected, the private transcript body it might carry is redacted, and the run is marked blocked rather than green.
Shape
The reader path starts at the source record, then follows the governed standard into the runtime component, public fixture inputs, source-open manifests, result records, and regression checks. The path is local and inspectable: core/paper_module_capsules.json::paper_modules[2:paper_module.agent_route_observability_runtime] points to standards/std_microcosm_agent_route_observability_runtime.json and src/microcosm_core/organs/agent_route_observability_runtime.py; the runtime then writes bounded public result records under receipts/first_wave/agent_route_observability_runtime/.
flowchart TD Bundle["JSON bundle core/paper_module_capsules.json::paper_modules[2]"] Standard["Authority boundary standards/std_microcosm_agent_route_observability_runtime.json"] Runtime["Runtime component src/microcosm_core/components/agent_route_observability_runtime.py"] Fixtures["Public fixture input fixtures/first_wave/agent_route_observability_runtime/input"] Manifests["Source-open manifests examples/agent_route_observability_runtime/*/source_module_manifest.json"] Negatives["Required negative cases actor axis, route lease, hook shadow, non-public-state floors"] Bundles["10 exported bundle validators 31 source-module rows"] Card["Compact result card large/private payloads omitted"] Result records["metadata-only result records result records/first_wave/agent_route_observability_runtime/"] Tests["Focused checks tests/test_agent_route_observability_runtime.py"] Ceiling["Scope limit public route-observability fixtures only"] Bundle --> Standard --> Runtime Runtime --> Fixtures --> Negatives --> Result records Runtime --> Manifests --> Bundles --> Result records Result records --> Card Result records --> Tests --> Ceiling
Technical Mechanism
The component is a route-feedback replay validator, not a live observability tap. run loads the first-wave fixture, streams JSONL trace rows without materializing the whole file, scans public inputs for forbidden non-public-state classes, and then composes six validation gates: route compliance, hook-shadow coverage, anti-pattern debt retirement, route-lease mode control, agent-principle-lens admission, and egress mirror boundaries. The result only passes when the expected negative-case set is complete, the non-public-state scan passes, hook-shadow coverage passes, agent-principle-lens rows do not mint principles or promote candidate axioms, and the egress mirror keeps private state, model-output data, and browser UI/operator UI state false.
The first-wave fixture is deliberately small but adversarial. The focused test expects 10 trace rows, one actor-axis mismatch, one authority rejection, one route-miss replacement, six hook-shadow cases, six egress cases, one anti-pattern debt retirement, and two route-lease control failures (KERNEL_BLOAT_BEFORE_DIRECT_ACTION and ROUTE_LEASE_NOT_CONSUMED). The negative-case floor covers wrong actor axis, missing route lease, private transcript body, duplicate trace id, route-compliance overclaim, route miss replacement, hook-shadow missing authority, banned-route intervention, command displacement, live-state read attempt, and hook-shadow budget overrun.
The exported-bundle side is the source-open body floor. Ten source_module_manifest.json files under examples/agent_route_observability_runtime/ declare 31 copied or sanitized public source-module rows with body_in_receipt=false. Most rows are copied source bodies; the route-compliance-audit bundle is a mixed manifest with one public-reference sanitized row and copied body rows. Bundle validators check source-target digests, line counts, byte counts where declared, required anchors, validation refs, and non-public-state scans. A manifest digest mismatch or synthetic non-public-state regression token blocks the bundle and still keeps result record bodies redacted.
Result records are generated as public-relative JSON proof surfaces. write_receipts emits route-compliance, hook-shadow, debt-retirement, route-lease, agent-principle-lens, and egress-mirror result records with common fields: validator id, command, status, expected and observed negative cases, findings, scope boundary, non-public-state scan, scope limit, source pattern ids, and result record paths. result_card then exposes a compact card while omitting large or private payload classes such as findings, private scans, source body imports, and scope limit bodies.
Named Proof Consumers
run is the first-wave fixture consumer. It proves the public trace-row, hook-shadow, route-lease, agent-principle-lens, egress, negative-case, and metadata-only result record boundary for the local fixture.
run_observability_bundle is the main exported-bundle consumer. It validates public route events, agent-path observations, session diagnostics, hook-shadow rows, actor-axis checks, debt rows, process-audit rows, observability policy, source-module manifest integrity, and result record-card reuse for the exported observability bundle.
The companion bundle consumers run_route_compliance_audit_bundle, run_session_attribution_bundle, run_harness_configuration_audit_bundle, run_multi_agent_fanin_bundle, run_bridge_dispatch_yield_resume_bundle, run_controller_heartbeat_bundle, run_agent_trace_route_repair_bundle, run_agent_observability_store_bundle, and run_computer_use_action_trace_bundle prove the same route-observability membrane across adjacent public route, session, bridge, controller, store, and computer-use evidence slices.
tests/test_agent_route_observability_runtime.py is the focused regression consumer. It asserts source-module manifest body-copy contracts, digest and line-count checks, sanitized-row handling, duplicate-key rejection, streaming loaders, required negative cases, public-relative redacted result records, bundle blocking on digest mismatch/non-public-state hits, and compact card omission of private scans.
tests/test_macro_projection_import_protocol.py::test_agent_execution_trace_body_import_is_unified_under_macro_projection_spine is the cross-module consumer that keeps the route-observability body import under the source projection import spine rather than a local-only copy story.
Prior Art Grounding
This component is grounded in distributed tracing and agent trajectory work. The W3C Trace Context recommendation and OpenTelemetry show the established observability pattern: propagate trace identity, collect events, and preserve enough context to debug a distributed transaction. Agent work such as ReAct also made the interleaved reasoning/action trajectory a first-class object for interpreting agent behavior.
Microcosm borrows the traceability shape for route feedback: selected route, route lease, trace id, work/event/evidence chain, validator ref, and scope limit are exposed together. It does not read live operator traces, model-output data, browser HUD state, or certify runtime behavior outside the public fixture.
Source-Backed Doctrine Packet
This module is source-backed only when a reader can move from the public doctrine claim to the runtime component, standard, source-module manifests, result records, and negative cases without guessing. The compact packet is:
This governs the public route-observability schema, body import posture, authority boundary public_route_observability_runtime_metadata_and_copied_macro_trace_bodies_not_live_session_provider_browser_hud_or_hook_authority, and scope boundary language.
These bind the authored Markdown projection to the accepted component, mechanism row, code locus, generated projection hooks, result record refs, guardrails, and focused regression command without treating this prose as source authority.
The runtime builds public fixture result records, exported bundle validators, observability cards, source-manifest checks, non-public-state scans, and typed negative-case results. The manifests bind the copied route/observability source body materials recorded by microcosm workingness::agent_route_observability_runtime.source_open_body_imports while preserving body_in_receipt=false.
Reader evidence starts at the JSON source record, then follows the accepted component and mechanism refs into the runtime source, fixture manifests, result record set, and focused regression. The route is intentionally source-backed but not source-authoritative: this Markdown helps a cold reader find the proof surfaces, while the bundle, registry, standard, mechanism row, runtime source, and result records remain the authority.
Observable First Artifact Contract
The first observable artifact must fit a single browser or terminal viewport and preserve this order:
Required cues:
Local action: show the exact command, normally microcosm hello <project> or microcosm tour --card <project>. A visual board cannot be the first proof if the producing command is hidden.
Selected route: show selected_route_id plus a short reason. Route explanation stays tied to the local project, not whole-system capability.
Work transaction: show work id, state, and result record ref when present. State changes are local system events, not source-file changes or external model service.
Event and evidence chain: show event ids, evidence class, proof surface, and scope boundary. Counts remain accounting fields, not progress or launch scores.
Authority boundary: place the scope limit beside the positive claim. The board rejects hosted launch, private-data equivalence, external model access, and whole-system correctness.
Structural scale bridge: name the larger system surface exercised by this run. Scale is a drilldown path, not an implied proof upgrade.
If a renderer cannot show all slots in one viewport, it should show the command, route, evidence class, result record ref, and scope limit first, then link to the full route model as drilldown.
Validation Result record Path
From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:
The focused regression file is the proof consumer for this result record section. Its 51 tests cover the Markdown source-backed packet, streaming JSONL loaders, duplicate-key rejection, source-module manifest contracts, digest and line count helpers, exported observability and companion bundles, public-relative redacted result records, field-floor ratchets, card reuse, private-scan blockers, exact or source body imports, and computer-use action-trace boundaries. The source-projection focused test keeps the agent execution trace body import under the shared source projection spine.
That result record path validates declared public metadata and writes bounded result records. It does not inspect live sessions, export transcript/provider bodies, read browser HUD state, expose account secrets or browser state, control accounts, send recipients, change source files, prove benchmark performance, authorize public sharing, claim hosted readiness, or include launch operations.
The negative-case floor is part of the doctrine, not an implementation detail. Keep these cases visible when strengthening the module: actor-axis mismatch, missing route lease, private transcript body, duplicate trace id, route-compliance overclaim, kernel bloat before direct action, reusable-lease metadata without trace feedback, route miss replacement, hook-shadow missing authority, hook-shadow banned-route intervention, command displacement, live-state read attempt, and hook-shadow budget overrun.
Validation Shape
Fixture validation should continue to require actor-axis boundaries, route-lease consumption, duplicate trace-id detection, hook-shadow advisory status, anti-pattern debt retirement, and behavior-change evidence gates. When an observable-first board or endpoint is present, validation should also prefer fields that prove the compact causal order:
command ref before visual state;
selected route before full route graph;
work/event/evidence refs before explanation prose;
evidence class and scope boundary beside any counter;
scope limit before hosted, launch, provider, or correctness language;
compact endpoint or board ref before raw JSON drilldown.
Scope boundary: this module does not inspect live operator traces, prompt/provider bodies, HUD/browser/operator UI state, live work log rows, model-output data, private source bodies, or runtime behavior. It only defines the public fixture and projection boundary for observable route evidence.
Scope boundary
Scope limit
The positive claim is limited to public recorded route-feedback and observability metadata fixtures. The runtime can validate fixture result records, copied body refs, route-lease consumption, trace attribution, hook-shadow advisory posture, debt retirement, and behavior-change evidence gates. It does not read live sessions, mutate routes or source, install hooks, authorize providers, or turn observability into launch-scope decision.
Exact claim_ceiling from core/organ_registry.json::implemented_organs[organ_id=agent_route_observability_runtime]:
validates only public recorded route-feedback and observability metadata fixtures, including route-lease consumption, trace attribution, hook-shadow advisory status, anti-pattern debt retirement, behavior-change evidence gates, and public source body import refs; does not read live operator/provider/browser UI/account state, mutate work log or source, install hooks, certify runtime behavior, authorize pattern assimilation, private-system equivalence, launch, public sharing, or whole-system correctness
The observatory can be made browser-first or video-friendly only by projecting the same compact causal board. It may animate route selection, highlight event edges, or show a result record reveal, but it must keep the command, result record/evidence ref, scope boundary, and scope limit visible before any decorative motion.
It must not expose live operator traces, model-output data, account or browser state, private source bodies, HUD/browser/cockpit internals, or hosted-product claims. It may point to public fixtures, exported public bundles, generated result records, and public-root card emitters.
Source and projection details
Governing Lattice Relation
The bundle binds this page to mechanism.agent_route_observability_runtime.validates_public_route_feedback, the agent_reliability_and_safety_validator_bundle concept, principles P-1 and P-2, axiom AX-1, and five dependency modules that supply route-plane, cold-reader, anti-pattern, pattern-binding, and source-import context. Within that lattice, the mechanism is an evidence membrane: route feedback becomes paper-visible only after trace rows, route leases, hook-shadow rows, anti-pattern debt, egress boundaries, source manifests, non-public-state scans, and result record fields agree.
The governing relation is deliberately narrower than live observability. A green run can show that a public fixture or exported bundle carries coherent route feedback and metadata-only proof surfaces; it cannot infer live session state, mutate a route, install hooks, authorize provider/browser UI access, promote candidate axioms, prove benchmark behavior, or approve launch.
Source-Open Body Floor
The source-open body floor is the copied public route/observability body import set plus its manifests and result records, not live operator/provider/browser state. The materials below must stay inspectable through source_module_manifest.json refs, source-module digests, bundle validators, and metadata-only result records.
Agent Benchmark Integrity Anti-Gaming ReplayThe agent benchmark integrity anti-gaming replay validates copied public source pattern provenance bodies and metadata-only benchmark replay rows before any score-like language is allowed.
Agent Benchmark Integrity Anti-Gaming Replay is the public benchmark-claim boundary for Microcosm. It checks locked evaluator ids and config hashes, declared benchmark case rosters, replay rows, file-access and contamination refs, trusted-reference score refs, output-replay refs, computed-vs-declared integrity verdicts, three copied source pattern provenance bodies, and eleven anti-gaming negative cases while keeping private issue, oracle patch, hidden-gold, provider, raw patch, and score payload bodies out of result records.
Scope limit Copied public source pattern provenance bodies and metadata-only synthetic benchmark-integrity replay result records only; no benchmark claims, SWE-bench performance claim, hidden-gold access, oracle patch body export, private issue body export, external model access, live repository mutation, launch-scope decision, publishing-scope decision, source-file changes, or product-progress evidence.
This module is the public Microcosm projection of the rule that agent benchmark claims must be replay-backed before they are score-backed. It carries copied source-open source pattern provenance bodies for the benchmark-integrity pattern row and reconstruction state, plus a metadata-only regression integrity component. It is not a benchmark runner or product-progress claim.
The fixture models a repository repair benchmark with public case ids, task and patch hashes, locked evaluator ids, evaluator config hashes, file-access log refs, contamination-check refs, trusted-reference score refs, output-replay refs, held-out guard ids, and body_in_receipt=false rows. It deliberately keeps issue bodies, oracle patch bodies, hidden-gold answers, model-output data, and live repository paths out of the public boundary.
The exported bundle includes source_module_manifest.json and source_artifacts/ copies of the source pattern provenance rows from state/microcosm_portfolio. The validator verifies those copied bodies by manifest digest and keeps body text out of result records.
Purpose
Agent benchmark numbers are easy to state and hard to trust. A single headline like "passes N percent of repository repair tasks" hides every decision that produced it: which evaluator ran, whether its configuration was frozen, whether the agent could see held-out answers, whether the test cases leaked into training, and whether one lucky attempt was promoted as the score. This component exists to answer one question before any of that language is allowed: can each claimed pass be replayed from public refs that name their evaluator, their configuration hash, and the evidence that the run was not gamed?
A positive result cannot be asserted. A replay row that simply declares integrity_pass is recomputed from scratch. The validator checks that the evaluator id is on a locked list, that the configuration hash is one the policy declared in advance, that file-access, contamination, and output-replay evidence artifacts exist and pass, and that the case id was registered up front. If any of those is missing or contradicted, the row is recomputed as quarantine regardless of what it declared. Declaring success is treated as the thing to be checked, not as the proof.
There is a further floor: an integrity_pass must be backed by a sanitised real command-run trace, not only by hand-written replay refs. Each row cites a real_benchmark_trace_ref that has to resolve to a copied artifact carrying a passing focused pytest run for this component, with sha256 digests bound to the recorded command-run id and an explicit list of omitted live material (model-output data, account secrets, private issue bodies, oracle patch bodies). The point is to stop a benchmark claim from resting on prose. The evidence has to trace back to a command that actually ran and is reproducible from public refs, while the private and live material that command touched stays out of the public boundary.
This is a discipline fixture, not a leaderboard. It proves that a metadata-only replay respected an anti-gaming boundary over public case ids and locked evaluator refs. It never reports a score, a SWE-bench result, or a capability claim, and the eleven negative cases below are there to demonstrate the boundary holding rather than to advertise a number.
Technical Mechanism
The component turns a benchmark claim into a replay-verification problem. Its inputs are the projection protocol, locked evaluator policy, benchmark case roster, replay observations, exported bundle manifest, source-module manifest, and copied source_artifacts/ rows. _build_result loads those inputs, validates source-module imports, scans public inputs and copied source bodies against the non-public-state forbidden-class policy, checks projection protocol density, validates the locked evaluator policy, validates the case roster, and then validates each replay row against the same public boundary.
A positive replay cannot pass by declaring success. The replay row must name a case id present in benchmark_cases.json, cite a locked evaluator id, carry an evaluator config hash allowed by locked_evaluator_policy.json, expose file-access, contamination-check, trusted-reference, and output-replay refs, and cite source-artifact evidence refs that match the exported source-module manifest targets. Each of those evidence refs must resolve to a metadata-only benchmark_integrity_evidence_artifact_v1 artifact bound to the same replay, case, evaluator, and config hash, with file-access marked passed, contamination flags clear, a trusted reference present without a claimed score, and an output replay that is not final-answer-only grading. The validator recomputes whether each row is integrity_pass or quarantine; missing refs, unregistered cases, unlocked or mutated evaluators, score authorization, private issue bodies, oracle patch bodies, hidden-gold access, model-output data, pass-k cherry-picking, and misleading tests force quarantine or a blocking finding.
A further gate is the real-trace floor. Every positive replay row also cites a real_benchmark_trace_ref, and that ref must resolve to a copied source-module artifact whose material_class is public_sanitized_real_benchmark_trace. The validator opens that artifact and checks that it records a completed, exit-zero command run of the focused pytest for this component, carries a passing pytest summary, binds sha256 digests for the command metadata, stdout, and stderr to a declared command-run id, cites state/command_runs/ source refs for that id, and declares the omission of model-output data, account secrets, private issue bodies, and oracle patch bodies. A replay whose real_benchmark_trace_ref is missing, unverified, or not also listed in the source-artifact evidence refs cannot stand as a pass. This is what stops a benchmark claim from resting on hand-authored refs alone: the integrity verdict has to trace back to a command that actually ran and is reproducible from public refs.
The copied body floor is verified separately from the public result record. The source-module manifest must declare copied_non_secret_macro_body material, public source pattern body classes, body_in_receipt=false, and digest-stable targets. validate_source_module_imports checks that each manifest row points to an existing copied artifact and that its recorded SHA-256 digest matches disk. Result records and command cards then omit the bodies and carry only ids, refs, digests, classes, counts, verdicts, findings, and scope limits.
The public trace is a second proof pass rather than a display copy of replay rows. build_public_benchmark_integrity_anti_gaming_trace recomputes each span from locked-evaluator status, contamination signals, file-access refs, contamination-check refs, trusted-reference refs, and declared quarantine reasons. The expected public fixture has three spans: two recompute as integrity_pass, one recomputes as quarantine, and the trace must agree with the declared replay verdicts before the component can return status=pass.
Named Proof Consumers
run consumes the first-wave fixture and writes the result, board, validation result record, sign-off result record, and metadata-only command card. It is the proof consumer for the canonical fixture boundary and required negative-case floor.
run-benchmark-integrity-bundle consumes the exported public bundle and proves that source-open body imports, bundle shape, manifest digests, and metadata-only result record/card rules survive outside the fixture directory.
tests/test_agent_benchmark_integrity_anti_gaming_replay.py is the focused regression consumer. It asserts negative-case observation, digest verification, source-artifact evidence refs, public trace verdict recomputation, positive/negative verdict handling, metadata-only result records, bundle runtime shape, and command-card reuse of a fresh result record.
A cold reader consumes this Markdown only after checking the JSON bundle, generated JSON instance, exported source manifest, case roster, replay observations, focused test path, and scope limit. The reader may verify the replay boundary but must not infer a benchmark claims, provider behavior, product-progress state, public sharing state, or launch-scope decision.
Shape
Source refs
Protocol
projection_protocol.json
Manifest
source_module_manifest.json
Diagram source
flowchart LR Bundle["JSON bundle authority"] --> Markdown["Reader projection"] Protocol["projection_protocol.json"] --> ProtocolGate["source refs and result record density"] Manifest["source_module_manifest.json"] --> DigestGate["material class and digest gate"] DigestGate --> Bodies["copied public source provenance bodies"] DigestGate --> RealTrace["sanitised real command-run trace passing pytest, sha256 digests, declared omissions"] Cases["3 public case ids"] --> ReplayGate["case roster and required replay refs"] Policy["locked evaluator policy"] --> EvaluatorGate["locked ids and config hashes"] Replays["3 replay observations"] --> ReplayGate EvaluatorGate --> ReplayGate ProtocolGate --> ReplayGate ReplayGate --> EvidenceGate["per-ref evidence artifacts file-access, contamination, trusted reference, output replay"] EvidenceGate --> Recompute["recompute integrity_pass or quarantine"] RealTrace --> Recompute Recompute --> Trace["public trace verdict recomputation"] Trace --> Verdicts["2 integrity_pass and 1 quarantine"] Negatives["11 anti-gaming fixtures"] --> Quarantine["quarantine or blocking finding"] Bodies --> PrivateScan["metadata-only non-public-state scan"] RealTrace --> PrivateScan Verdicts --> Result record["metadata-only integrity result record"] Quarantine --> Result record PrivateScan --> Result record Result record --> Ceiling["anti-score scope limit"]
The page shape is a bounded replay spine, not a benchmark leaderboard. A reader starts at the JSON bundle, follows the source-open manifest into three copied public source provenance bodies, then checks the public case roster, locked evaluator policy, replay observations, recomputed trace verdicts, and metadata-only result records. The output is an integrity-boundary verdict: two public case replays pass the boundary, one public case replay is quarantined, and no score or hidden-gold authority is created.
Reader Evidence Routing
Bundle route: read core/paper_module_capsules.json::paper_modules[3], then the generated JSON instance, before treating this Markdown as explanatory projection.
Bundle route: read examples/agent_benchmark_integrity_anti_gaming_replay/exported_benchmark_integrity_bundle/source_module_manifest.json for module_count=3, body_in_receipt=false, copied body refs, digest refs, and the explicit secret-exclusion boundary.
Case route: read benchmark_cases.json for repo_issue_public_001, repo_issue_public_002, and repo_issue_public_003; the rows expose ids, hashes, splits, and held-out guard ids, not issue bodies or oracle patches.
Replay route: read replay_observations.json for the locked evaluator ids, config hashes, file-access refs, contamination refs, trusted-reference refs, output-replay refs, and the two integrity_pass plus one quarantine verdict pattern.
Runtime route: run tests/test_agent_benchmark_integrity_anti_gaming_replay.py when the reader needs recomputation evidence. The focused tests assert source-module digest verification, public trace verdict recomputation, required negative cases, and metadata-only result record boundaries.
Public Mechanics
A replay cannot pass unless the evaluator id and config hash are locked.
A replay row cannot pass unless its case id appears in the declared benchmark_cases.json roster.
File-access logs, contamination checks, trusted references, and output replay refs are required before any benchmark-style language can be considered.
Train/test leakage, hidden-gold access, oracle patch bodies, model-output data, final-answer-only grading, pass-k cherry-picking, misleading tests, private issue bodies, unregistered case replays, and score overclaims are quarantine cases.
integrity_pass is evidence that a metadata-only regression replay respected the boundary, not evidence of a SWE-bench score, live agent capability, or product-spine system progress.
Result records expose ids, refs, verdicts, counts, negative cases, and scope limits only.
Source body imports expose source pattern provenance artifacts in the bundle, with result records limited to refs, digests, classes, and validation status.
Prior Art Grounding
This component is grounded in the long-running observation that optimized metrics can become targets and lose evidential force, plus the AI-safety literature on reward hacking and specification gaming. Concrete Problems in AI Safety frames reward hacking as a practical accident-risk problem, DeepMind's specification-gaming survey collects concrete examples of agents satisfying a proxy in the wrong way, and benchmark-contamination work such as Benchmarking Benchmark Leakage in Large Language Models motivates explicit leakage and benchmark-use documentation.
Microcosm borrows the anti-gaming accounting pattern: evaluator ids, config hashes, case rosters, file-access logs, contamination checks, trusted-reference refs, and replay refs must be present before benchmark-style language is allowed. It does not report or imply a model score.
Validation Result records
The focused proof consumer is tests/test_agent_benchmark_integrity_anti_gaming_replay.py. A passing result record has to show that the fixture and exported-bundle validators recompute benchmark-integrity replay from public case ids, locked evaluator ids, config hashes, file-access refs, contamination-check refs, trusted-reference refs, output-replay refs, source-module manifest digests, and negative-case rows rather than trusting declared benchmark language.
For the focused test, the result record boundary is the asserted shape: three public case ids, three replay rows, two recomputed integrity_pass rows, one quarantine row, three public trace spans, locked-evaluator and config-hash coverage, three copied source-module imports, nine source-artifact evidence refs, three verified source-artifact evidence refs, body_in_receipt=false, and negative cases for verdict mismatch, invalid declared verdict, evaluator config hash swaps, missing replay/source evidence, digest mismatches, manifest boundary violations, hidden-gold/oracle/provider/score overclaims, and unsafe command-card body reuse. For the corpus check, the result record only proves bundle/instance parity; it does not create benchmark claims, product-progress, provider, public sharing, or launch-scope decision.
Validation Result record Path
Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:
The focused regression test and corpus projection checks are:
cd microcosm-substrate && ../repo-pytest tests/test_agent_benchmark_integrity_anti_gaming_replay.py
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
Scope boundary
Scope limit
This module may claim only that the public fixture and exported bundle preserve a metadata-only benchmark-integrity replay boundary: public case ids, locked evaluator refs, config hashes, contamination refs, output-replay refs, manifest digests, negative cases, and scope limits are recomputed or checked.
It must not claim benchmark performance, SWE-bench score, provider capability, hidden-gold access, oracle patch access, private issue access, live repository mutation, publishing-scope decision, product-progress evidence, or launch-scope decision.
Scope boundary
This module does not claim benchmark performance, run providers, expose private issue or oracle patch bodies, access hidden-gold answers, mutate live repositories, publish results, host a benchmark, or include launch operations.
Source and projection details
Source-Open Body Floor
The standard treats the bundle source_module_manifest.json as the body-row authority for three copied source pattern provenance bodies: benchmark_integrity_extracted_pattern_ledger_row_body_import, benchmark_integrity_high_novelty_growth_receipt_body_import, and benchmark_integrity_deterministic_pattern_order_body_import.
Those rows stay in source_artifacts/; result records and workingness/status cards carry refs, digests, classes, counts, and scope limits only. The body floor is accepted as regression-negative fixture evidence, not as a benchmark claims, SWE-bench performance claim, hidden-gold export, provider authority, live repository mutation authority, product-progress evidence, public sharing, or launch-scope decision.
Governing Lattice Relation
The bundle binds this page to mechanism.agent_benchmark_integrity_anti_gaming_replay.validates_public_benchmark_integrity_replay, the agent_reliability_and_safety_validator_bundle concept, provisional principles P-1 and P-2, provisional axiom AX-1, and the paper_module.mission_transaction_work_spine dependency. Within that lattice, the mechanism is an evidence-before-score gate: benchmark-style language has no paper authority unless the source record, copied-source manifest, locked policy, case roster, replay observations, public trace, negative-case floor, and metadata-only result records agree.
The governing concept is accountability for validator bundles, not public leaderboard construction. The principle/axiom ceiling is enforced as a refusal surface: private issue bodies, hidden-gold answers, oracle patch bodies, model-output data, source-file changes, live repository mutation, publishing-scope decision, product-progress evidence, and launch-scope decision remain false even when the replay fixture passes.
Research Replication Rubric Artifact ReplayThe research replication rubric-artifact replay validates source-backed public replication bundles before any paper-replication language is allowed.
Research Replication Rubric Artifact Replay is the public research-replication claim boundary for Microcosm. It checks contribution decomposition refs, rubric trees, allowed public inputs, scratch repo scaffolds, experiment DAG refs, metric scripts, declared artifact-hash rosters, artifact hashes, grader reports, compute/runtime budgets, ablation diffs, failure taxonomies, cold-rerun result records, public agent-execution trace spans, four copied source modules, and eight replication-overclaim negative cases while keeping private paper/data bodies, hidden rubrics, model-output data, original-author code bodies, benchmark claims, and public-sharing claims out of result records.
Scope limit Copied public source pattern provenance bodies, exact-copy public Python internal control body, metadata-only research-replication replay result records, public agent-execution trace spans, and fixture validation only; no actual paper replication success, benchmark performance claim, private paper/data body export, hidden-rubric export, external model access, unbounded compute search, original-author code reuse, launch-scope decision, publishing-scope decision, source-file changes, or product-progress evidence.
research_replication_rubric_artifact_replay is a public Microcosm component that turns "an agent replicated a paper" into a replayable evidence contract. It does not rerun a real paper, use external model services, certify benchmark performance, or grant publishing-scope decision. It checks whether a public replay bundle exposes the objects a replication claim must cite before its authority can rise: contribution decomposition refs, rubric-tree refs, allowed input refs, scratch-scaffold refs, experiment-DAG refs, metric-script refs, declared artifact hashes, grader reports, runtime budgets, ablation diffs, failure taxonomies, cold-rerun refs, public execution-trace spans, and source-module digests.
The technical result is an R3 local artifact replay: one public metric script is executed over one allowed public input table, the produced output is compared with a declared output artifact, and the declared hash file is checked against that artifact. A successful run says the replay packet is structurally accountable, digest-bound, redaction-aware, and negative-case tested. It does not say that a real paper was independently replicated.
Purpose
The single question this component answers is narrow: before an agent is allowed to say it replicated a paper, can the claim be forced into a bundle that a cold runtime can check without trusting any prose? The interesting move is that the component refuses to treat "replicated" as one fact. It pulls the claim apart into the objects a real replication would have left behind, a contribution decomposition, a grading rubric tree, the allowed public inputs, an experiment DAG, metric scripts, declared artifact hashes, a grader report, a runtime budget, an ablation diff, a failure taxonomy, and a cold-rerun result record, and it asks for each one by name.
What keeps this from being a checklist linter is the small executable core. The exported bundle does not just assert that an artifact hash exists. The runtime reads one public metric script, runs it over one allowed public input table, produces an output, and then checks that output against both the declared output artifact and the declared hash file. A replay row can name all the right refs and still fail here if the numbers do not reproduce. The negative-case fixtures attack exactly the gap a plausible fake would exploit: report-only success, benchmark-performance language, final-answer-only grading, undeclared hashes, and reuse of the original author's code.
The deliberately modest part is the subject matter. The two paper bundles are public synthetic examples, and the metric is a single sum over a small table. The component's value is the boundary, not the science. It does not run a real paper, call a provider, search compute without bound, or grant any launch or publishing-scope decision. It only makes a replication claim accountable enough that an independent reader can see where the evidence stops.
Telos
Research-agent demos often collapse four objects into one sentence: the paper, the runnable artifact, the grading rubric, and the evidence that an independent rerun happened. This component keeps those objects separate. A replay is admissible only when it names each evidence object and when the local runtime can check the public artifact replay without touching private paper bodies, non-public data bodies, hidden rubrics, model-output data, original-author code bodies, or launch/publishing-scope decision.
The central bet is modest and technical: before any replication claim is made, the system can force the claim into a falsifiable bundle with declared hashes, bounded metric execution, metadata-only result records, and explicit scope boundaries.
Mechanism
The mechanism row is mechanism.research_replication_rubric_artifact_replay.validates_public_research_replication_replay. It runs in src/microcosm_core/organs/research_replication_rubric_artifact_replay.py and is backed by the functions run, run_replication_bundle, validate_source_module_imports, validate_projection_protocol, validate_replication_policy, validate_research_replays, _build_result, _freshness_basis, and the constants EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_REF, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.
The runtime has two modes:
Fixture mode reads fixtures/first_wave/research_replication_rubric_artifact_replay/input, includes positive replay rows plus eight negative-case fixtures, and writes first-wave result, board, validation, and sign-off result records.
Exported-bundle mode reads examples/research_replication_rubric_artifact_replay/exported_research_replication_bundle, validates the public runtime example, checks the source-module manifest, and writes receipts/runtime_shell/demo_project/organs/research_replication_rubric_artifact_replay/exported_research_replication_bundle_validation_result.json.
The proof object is the tuple:
replication_policy.json, which states required replay fields, rubric axes, and forbidden claims.
research_replays.json, which supplies two synthetic paper bundles that cite public inputs, metrics, artifact hashes, grader reports, budgets, failures, and cold-rerun result records.
execution_artifacts/execution_artifact_manifest.json, which authorizes the replayable artifact relation.
source_module_manifest.json, which names copied source bodies and digest obligations.
Runtime result records, which expose refs, counts, digests, trace spans, and scope boundaries without embedding private bodies.
Metric-Script and Artifact Evidence
The exported bundle includes a small but real artifact-replay loop:
run_replication_bundle reads execution_artifacts/execution_artifact_manifest.json, executes the public_sum_metric over the allowed public input, compares the produced payload with execution_artifacts/artifacts/result_table.json, and verifies the declared hash in execution_artifacts/artifacts/result_table.sha256.json. The focused tests mutate each side of that relation, so the pass is not just a field-presence check.
The policy also requires eight rubric axes: contribution decomposition, artifact replay, experiment DAG, metric script, grader alignment, budget boundary, failure taxonomy, and cold rerun. A replay row can therefore pass only as a structured evidence packet, not as a final answer or narrative report.
The exported runtime result record currently records the following evidence floor: two synthetic paper bundles, two replay rows, two artifact replay rows, two cold-rerun refs, two public execution-trace spans, four copied source modules, no findings, no error codes, source-module status pass, and input_mode: exported_research_replication_bundle. The fixture result record records all eight negative cases as observed.
Failure Modes and Guardrails
The expected negative cases are:
original-author code reuse
hidden-rubric leakage
report-only success
benchmark-performance overclaim
private paper or data body leakage
unbounded compute search
final-answer-only grading
undeclared artifact hash refs
The tests also cover source-module digest mismatch, local bundle body tamper, rehashing a swapped source module, wrong execution-artifact hashes, wrong artifact refs with matching hashes, report-only exported replays, metric perturbation, replay metric-script ref tamper, input perturbation, output body tamper, baked output swaps, and self-consistent input/output/hash rewrites. These cases make the component stronger than a field-presence linter: it rejects common ways to produce plausible but unaccountable replication prose.
Test Matrix
The focused regression file tests/test_research_replication_rubric_artifact_replay.py carries the source proof for this module.
The fixture and exported bundle produce metadata-only result records, observe the required negative cases, execute the local metric replay, and build two public trace spans.
Result records remain public-relative and secret-excluded; command cards reuse fresh result records and reject stale ones after input mutation.
Realness Rungs
This module's realness is intentionally runged:
Synthetic replay subjects. The two paper bundles are public synthetic examples, one ML-method replay and one computational-science replay.
Real schema pressure. The required fields, rubric axes, declared hash roster, source-module manifest, and non-public-state exclusions are enforced by runtime code and focused tests.
Local artifact replay. The exported bundle executes a local metric over allowed public input and compares produced output against declared artifact hashes.
Source-open provenance. Three public source pattern bodies and one exact Python internal control body are copied into the bundle and digest-checked.
metadata-only public result records. Result records carry counts, refs, digests, verdicts, trace spans, and scope boundaries while excluding private/live/provider material.
The rung contract matters: the component is more than generic documentation polish, but it is still not paper-replication authority.
Relation to Concepts, Principles, and Axioms
The JSON bundle binds the module to concept.research_and_science_replay_evidence_bundle. That concept is instantiated by the mechanism above and abides by AX-1, AX-6, AX-8, and AX-12 at the concept layer. The bundle's direct axiom refs are AX-1, AX-2, AX-5, and AX-7.
The bundle's principle refs are P-1, P-2, P-3, P-6, P-8, and P-15. For this component, the important principle pressure is:
Evidence must be structured and replayable before authority rises.
Result records and scope boundaries are part of the artifact, not commentary after it.
Projections stay below source authority; a readable paper module does not outrank the JSON bundle, mechanism row, runtime code, source-module manifest, or result records.
Typed refusal is part of the mechanism: benchmark, provider, public sharing, private-body, original-code, and unbounded-compute claims remain false unless another authority surface actually grants them.
The module depends on paper_module.agent_benchmark_integrity_anti_gaming_replay. Benchmark performance overclaim controls stay routed through that sibling instead of being reinvented here.
Reader Evidence Routing
Open evidence in this order:
core/paper_module_capsules.json#paper_module.research_replication_rubric_artifact_replay for the source-authority bundle, scope limit, doctrine refs, generated projection statuses, and code loci.
core/mechanism_sources.json#mechanism.research_replication_rubric_artifact_replay.validates_public_research_replication_replay for the validator command, exported-bundle validator command, focused regression, guardrails, input refs, result record refs, and upstream mechanisms.
standards/std_microcosm_research_replication_rubric_artifact_replay.json for the first-wave standard, public/private boundary, source-body floor, and hard launch/public sharing/provider/source-file changes flags.
examples/research_replication_rubric_artifact_replay/exported_research_replication_bundle/source_module_manifest.json for source-open body-floor counts and digest obligations.
receipts/runtime_shell/demo_project/organs/research_replication_rubric_artifact_replay/exported_research_replication_bundle_validation_result.json for the current exported-bundle validation result.
tests/test_research_replication_rubric_artifact_replay.py for negative cases, digest tamper tests, metric replay tests, public-relative result record tests, command-card economy, and source-body exclusion.
Prior Art Grounding
This replay scores a research artifact against a replication rubric. It follows artifact-evaluation practice from systems and machine-learning venues (ACM Artifact Review and Badging), which separates 'available' from 'functional' from 'reproduced'. Microcosm borrows the rubric-over-artifact shape; the result is fixture-bound replay evidence, not a reproducibility guarantee or a peer-review verdict.
The runtime commands behind the result records are:
Scope boundary
Limitations
The two replay subjects are synthetic public paper bundles, not real external paper replications.
The metric replay is intentionally small: one public metric spec over one public input table with one declared output artifact. Its value is boundary enforcement, not benchmark substance.
Source-open proof is limited to three public source pattern body slices and one exact-copy public Python internal control body. It does not expose private source-root bodies, source notes, model-output data, account or browser state, browser UI state, or original-author code bodies.
A green run does not establish research truth, paper novelty, formal-result correctness, benchmark performance, external model service, launch-scope decision, or publishing-scope decision.
Authority Boundary
This component validates synthetic public replay metadata, local public artifact replay, source-module digest boundaries, public trace spans, negative-case coverage, and metadata-only result record shape. It does not claim actual paper replication success, benchmark performance, external model service, hidden-rubric access, original-author-code reuse, private paper/data export, unbounded compute search, final-answer-only grading, launch-scope decision, publishing-scope decision, source-file changes, product progress, or whole-system correctness.
Scope limit
This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.
It may not claim more than its bundle scope limit allows: Copied public source pattern provenance bodies, exact-copy public Python internal control body, metadata-only research-replication replay result records, public agent-execution trace spans, and fixture validation only; no actual paper replication success, benchmark performance claim, private paper/data body export, hidden-rubric export, external model access, unbounded compute search, original-author code reuse, launch-scope decision, publishing-scope decision, source-file changes, or product-progress evidence.
Source and projection details
Source-Open Body Floor
The source-module manifest at examples/research_replication_rubric_artifact_replay/exported_research_replication_bundle/source_module_manifest.json is the source-open body floor. It declares four copied modules:
research_replication_extracted_pattern_ledger_row_body_import, a public source pattern body slice.
research_replication_high_novelty_growth_receipt_body_import, a public source reconstruction result record slice.
research_replication_deterministic_pattern_order_body_import, a public deterministic pattern-order slice.
research_replication_replay_control_plane_source_body_import, an exact-copy public Python internal control body for this component.
Each row carries a source ref, target ref, material class, copied-body flag, result record-body exclusion flag, line count or byte count, and sha256 digest. The runtime verifies target digests; for the exact-copy Python row it also checks source currentness and source-target byte equality. Result records expose refs, counts, digests, and verdicts only. They do not embed source bodies.
Agentic Vulnerability Discovery Patch-Proof ReplayThe agentic vulnerability discovery patch-proof replay validates metadata-only synthetic vulnerability evidence chains before any found-and-fixed security language is allowed.
Agentic Vulnerability Discovery Patch-Proof Replay is the public security-claim boundary for Microcosm. It checks projection protocol, vulnerability policy, synthetic target refs, issue hypotheses, trace evidence, abstract exploitability refs, patch diff refs, regression tests, verifier result records, sandbox verdicts, false-positive triage, cold replay, public agent-execution trace spans, secret-exclusion scan, nine copied source/control/standard/tool bodies, source-module manifest digests, metadata-only result record policy, and eight security-overclaim negative cases while keeping live targets, real CVE exploitation, weaponized payloads, account secrets, network exfiltration steps, actionable exploit instructions, model-output data, raw issue or patch bodies, benchmark claims, and source-file changes out of result records.
Scope limit Copied public source/control/standard/tool bodies, metadata-only synthetic patch-proof replay result records, public agent-execution trace spans, and fixture validation only; no live target testing, real CVE exploitation, weaponized payload export, account secret handling, network exfiltration, actionable exploit instructions, external model access, source-file changes, benchmark security score, launch-scope decision, publishing-scope decision, whole-system security claim, or product-progress evidence.
This module documents the source-available claim contract for agentic_vulnerability_discovery_patch_proof_replay. It turns an agentic vulnerability-discovery claim into a public trace-backed local replay: synthetic metadata-only targets, issue hypotheses, trace evidence, abstract exploitability refs, patch diffs, regression tests, verifier result records, sandbox policy verdicts, false-positive triage, cold replay, negative cases, and scope limits.
Purpose
An agent that says it found and fixed a security bug is making a claim that is easy to assert and hard to check. The phrase "found and fixed" can stand for a real, tested repair, or for a plausible-looking patch that was never run, a false positive promoted to a finding, or a benchmark number with no evidence behind it. This component exists to refuse that ambiguity. It answers one question: before any "found and fixed" language is allowed, does a complete evidence chain line up, from a synthetic target through a hypothesis, a trace, an abstract exploitability ref, a patch diff, a regression test, and a verifier result record?
The part worth noticing is that two of those checks are not field checks. They recompute the thing the fixture is claiming. Each executable regression witness names one of three small, public mini-targets, a webhook redirect allowlist, a notebook log redactor, and a scheduler path normaliser. The validator runs that function twice, once in its unpatched form and once patched, and compares the results it computes against the expected_pre_patch and expected_post_patch values the fixture declared. A witness whose declared output does not match the computed output is rejected. In the same spirit, each verifier result record has its pass or false_positive verdict recomputed from the joined proof, patch, test, and witness evidence; the row's own label and result record filename are not taken on trust. The failure mode this guards against is a fixture that asserts a green result without the work behind it ever having run.
This is a synthetic, metadata-only replay, not live security work. The synthetic overclaim fixtures, live targets, real CVE exploitation, weaponised payloads, exploit steps, patch-without-test claims, benchmark claims, are regression boundaries the runtime must reject, not capabilities it offers. The useful claim is narrow and is stated plainly below: Microcosm can hold an agentic security story to a checked evidence chain before it admits patch-proof language.
Shape
Diagram source
flowchart TD bundle["JSON bundle authority"] markdown["Markdown reader projection"] mechanism["mechanism source row"] component["patch-proof replay runtime"] fixture["first-wave fixture"] bundle["exported patch-proof bundle"] targets["synthetic target refs"] hypotheses["issue hypotheses"] traces["trace evidence refs"] proofs["abstract exploitability refs"] patches["patch diff refs"] regressions["regression test refs"] executable["executable regression witnesses"] verifiers["verifier result records"] sandbox["sandbox verdicts"] negative["negative-case fixtures"] secret_scan["secret-exclusion scan"] replay["cold replay rows"] public_trace["public trace spans"] source_modules["source-module body floor"] result records["metadata-only result records"] consumer["focused proof-consumer tests"] ceiling["scope limit"] bundle --> markdown bundle --> mechanism mechanism --> component component --> fixture component --> bundle fixture --> targets bundle --> targets targets --> hypotheses hypotheses --> traces traces --> proofs proofs --> patches patches --> regressions regressions --> executable executable --> verifiers verifiers --> sandbox negative --> result records secret_scan --> result records sandbox --> replay replay --> public_trace source_modules --> secret_scan source_modules --> public_trace public_trace --> result records result records --> consumer result records --> ceiling
The module shape is a metadata-only synthetic patch-proof replay, not a live vulnerability discovery or fix-correctness claim. The runtime forces target refs, hypotheses, trace refs, abstract exploitability refs, patch diff refs, regression test refs, verifier result records, sandbox verdicts, false-positive triage, cold replay, public trace spans, source-module digests, negative cases, and scope boundaries to line up before bounded patch-proof language is admitted.
Technical Mechanism
The mechanism is an evidence join, not a scanner. The JSON bundle names the component and mechanism row, and the component resolves every claim through _build_result in src/microcosm_core/organs/agentic_vulnerability_discovery_patch_proof_replay.py. That function loads the projection protocol and vulnerability policy, then validates targets, issue hypotheses, trace evidence, exploitability refs, patch diffs, regression tests, executable regression witnesses, verifier result records, sandbox verdicts, false-positive triage, cold replay rows, optional negative-case fixtures, the public trace builder, and the source-module manifest. A result can pass only when those validators agree, the secret-exclusion scan has zero blocking hits, the public trace status is pass, all positive validators are pass, and the exported bundle's manifest digests match copied source bodies.
Two of those validators do work the others do not. The executable regression witness check runs each declared mini-target function in both its unpatched and patched form and compares the computed pre/post outputs against the values the fixture declared, so a witness cannot pass on a label alone. The verifier result record check recomputes each pass or false_positive verdict from the joined hypothesis, proof, patch, test, and witness evidence, and also requires the result record-ref filename to match that recomputed verdict, so a row cannot claim a result its own evidence does not support. The other validators are stricter joins: every hypothesis must resolve to a synthetic target, every patch-required hypothesis must carry both an abstract exploitability ref and a metadata-only patch diff, and every patch must pair with a regression test that fails before the patch and passes after it. A patch without a paired test, or a false positive promoted to a finding, blocks the result.
The runtime deliberately keeps two evidence modes separate. The first-wave fixture includes the negative-case authority, so it must observe the expected overclaim failures such as live target material, real CVE exploitation, weaponized payload export, exploit steps, patch-without-test claims, and benchmark claims claims. The exported bundle is the public runtime example, so its expected_negative_cases can be empty while it still proves the body floor, public trace, digest checks, regression witnesses, and scope limit. Both modes write metadata-only result records; copied bodies stay behind the source_module_manifest.json refs and hashes.
Named Proof Consumers
tests/test_agentic_vulnerability_discovery_patch_proof_replay.py::test_agentic_vulnerability_patch_proof_replay_observes_negative_cases consumes the first-wave fixture and checks the expected counts, negative-case coverage, public trace status, body-import boundary, secret-exclusion scan, and scope limit booleans.
tests/test_agentic_vulnerability_discovery_patch_proof_replay.py::test_agentic_vulnerability_exported_bundle_validates_runtime_shape consumes the exported bundle and checks runtime mode, target/hypothesis/patch counts, executable regression witnesses, source-module manifest status, copied-body count, metadata-only import summary, secret-exclusion status, and public trace span count.
The rejection tests in the same file are the scope limit in executable form: they mutate false-positive promotion, remove regression tests, tamper executable witnesses, omit exploitability proof, cross-wire verifier result records, and alter source-module digests, then require blocked results and specific error codes instead of allowing patch-proof language.
What It Admits
The validator admits only metadata-only patch-proof evidence where trace refs, abstract proof refs, patch diff refs, regression tests, verifier result records, sandbox verdicts, and cold replay line up.
The result record fields to inspect first are target_count, issue_hypothesis_count, patch_diff_count, regression_test_count, verifier_receipt_count, observed_negative_cases, secret_exclusion_scan, public_agent_execution_trace, body_import_verification, and authority_ceiling.
Prior Art Grounding
This component is grounded in the recent line of agentic software-engineering and security-evaluation work that treats code repair as an executable, test-backed claim rather than a prose claim. SWE-bench popularized repository issue resolution as an LLM task with real codebases and test-based patch evaluation, while SWE-agent made the agent-computer interface itself part of the repair system. Security benchmarks such as CyberSecEval 2 and SecCodePLT motivate separating secure-code or vulnerability capability claims from uninspected generated patches.
Microcosm borrows the accountability pattern: issue hypotheses, trace evidence, patch diffs, regression tests, verifier result records, and negative cases must line up before patch-proof language is allowed. It does not import live targets, CVE exploitation authority, weaponized payloads, or benchmark performance claims.
Evidence class: core/organ_evidence_classes.json::agentic_vulnerability_discovery_patch_proof_replay records algorithmic_projection at rank 3.
Source-module manifest: examples/agentic_vulnerability_discovery_patch_proof_replay/exported_patch_proof_bundle/source_module_manifest.json declares nine copied source/control/standard/tool bodies, including strict_json_source_body_import.
Runtime result record: receipts/runtime_shell/demo_project/organs/agentic_vulnerability_discovery_patch_proof_replay/exported_patch_proof_bundle_validation_result.json
Sign-off result records: receipts/first_wave/agentic_vulnerability_discovery_patch_proof_replay/* and result records/sign-off/first_wave/agentic_vulnerability_discovery_patch_proof_replay_fixture_acceptance.json
Reader Evidence Routing
Bundle route: core/paper_module_capsules.json::paper_modules[5:paper_module.agentic_vulnerability_discovery_patch_proof_replay] is the JSON authority row. A diagram view is generated for this module; the Atlas card view is a staged exercise pending the component-atlas lane.
Mechanism route: core/mechanism_sources.json::mechanism.agentic_vulnerability_discovery_patch_proof_replay.validates_public_agentic_vulnerability_patch_proof_replay binds the validator command, exported-bundle validator command, focused regression, guardrails, input refs, result record refs, and runtime code locus.
Exported-bundle route: examples/agentic_vulnerability_discovery_patch_proof_replay/exported_patch_proof_bundle is the public runtime bundle for the synthetic patch-proof replay. Open source_module_manifest.json before trusting copied-body counts, then inspect the runtime validation result record.
Focused-test route: tests/test_agentic_vulnerability_discovery_patch_proof_replay.py verifies negative cases, public-relative metadata-only result records, exported-bundle runtime shape, exact copied source modules, digest mismatch rejection, command-card result record reuse, and public trace construction.
Cold-Agent Use
Open the source-module manifest first, then the runtime result record, then the component source. The useful claim is not that a real vulnerability was discovered or fixed.
The useful claim is that Microcosm can force an agentic security story to expose synthetic target refs, issue hypotheses, trace evidence, abstract exploitability refs, patch diffs, regression tests, verifier result records, sandbox verdicts, false-positive triage, cold replay, public trace spans, secret-exclusion scan, negative-case result records, and scope limits before patch-proof language is allowed.
Re-entry condition: after the sibling organ_atlas.json lane releases, bind this paper-module bundle, mechanism ref, and code locus into the atlas row and rerun python -m microcosm_core.doctrine_lattice --check.
Negative Cases
The contract rejects live_target_material, real_cve_exploitation, weaponized_payload_export, account_secret_material, network_exfiltration, exploit_instruction_steps, patch_without_tests, and benchmark_score_claim. These are falsification fixtures, not product evidence.
Validation Result record Path
Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:
The result records do not authorize live target testing, real CVE exploitation, weaponized payload export, account secret handling, network exfiltration, actionable exploit instructions, external model access, source-file changes, benchmark security scores, launch, or any whole-system security claim.
Scope limit
This module may claim public fixture evidence that synthetic target refs, issue hypotheses, trace-evidence refs, abstract exploitability refs, patch diff refs, regression-test refs, verifier result records, sandbox verdicts, false-positive triage rows, cold replay rows, public trace spans, source-module digest checks, secret-exclusion scans, negative-case labels, and metadata-only validation result records are checked by the listed runtime witnesses.
This module may not claim live target testing, real CVE exploitation, weaponized payload export, account secret handling, network exfiltration, actionable exploit instructions, live provider behavior, benchmark security scores, patch correctness on real repositories, source-file changes, publishing-scope decision, launch-scope decision, product-progress evidence, or whole-system security.
Source and projection details
Governing Lattice Relation
The governing row is mechanism.agentic_vulnerability_discovery_patch_proof_replay.validates_public_agentic_vulnerability_patch_proof_replay. It binds this reader module to concept.agent_reliability_and_safety_validator_bundle, P-1, P-2, AX-1, and the upstream paper_module.mission_transaction_work_spine dependency. The relation matters because the mechanism is a public safety validator bundle: the paper module can claim that Microcosm checks a source-open, synthetic patch-proof evidence chain, but the lattice ceiling prevents that claim from becoming live vulnerability discovery, exploit proof, benchmark claims, source-file changes, or launch-scope decision.
Source-Open Body Floor
The exported bundle carries nine exact copied source/control/standard/tool bodies under examples/agentic_vulnerability_discovery_patch_proof_replay/exported_patch_proof_bundle/source_modules/. The body floor is governed by source_module_manifest.json, which records digest-verified copies of:
the source pattern ledger
the high-novelty reconstruction result record
the component projection IR
the agent-execution trace runtime and standard
the extracted-pattern route-readiness standard
the mission-transaction preflight wrapper
the mission-transaction landing preflight runtime
the strict JSON helper
Result records and cards do not duplicate those bodies. They carry source_module_manifest_ref, source_open_body_import_refs, source_open_body_imports, body_material_status, and body_copied_material_count so a cold reader can open the real bodies.
The public result record surface stays free of account secrets, account or browser state, browser state, model-output data bodies, browser UI live access, recipient-send state, weaponized payloads, live targets, exploit steps, and account secret-equivalent material.
Materials Chemistry Closed-Loop Lab-Safety ReplayThe materials chemistry lab-safety replay validates metadata-only simulator-only closed-loop rows before any materials-lab or discovery language is allowed.
Materials Chemistry Closed-Loop Lab-Safety Replay is the public lab-safety claim boundary for Microcosm. It checks candidate material refs, safety-screen refs, simulator-only assay refs, active-learning decisions, failure taxonomy refs, cold replay refs, source bundle hashes, Lab/Evolve replay graph evidence, copied source/control/result record/standard bodies, metadata-only result record policy, and eight lab-safety overclaim negative cases while keeping wetlab protocols, hazardous synthesis steps, reagent quantities, controlled or bioactive targets, live lab account secrets, robot commands, private lab notebook bodies, live assay data, discovery claims, benchmark claims, model-output data, source notes, and launch-scope decision out of result records.
Scope limit Copied public Lab/Evolve source/control/result record/standard bodies, metadata-only simulator-only fixture result records, runtime bundle result records, and artifact safety/refusal validation only; no wetlab execution, hazardous synthesis guidance, reagent quantity, controlled or bioactive target, live assay, robot command, private lab notebook, external model access, discovery claim, benchmark claims, launch-scope decision, publishing-scope decision, or product-progress evidence.
"Closed-loop materials lab" is one of the easier phrases to overclaim. A fixture can look like an autonomous discovery loop while carrying nothing that should be spoken aloud: wetlab steps, reagent quantities, a controlled or bioactive target, robot commands, or a flat assertion that some material was discovered. This component exists to sit in front of that language and answer one question: is a closed-loop-lab-shaped fixture safe and grounded enough to be talked about at all, in a simulator-only frame, before any lab claim is allowed?
Its real name inside the runtime is the materials_chemistry_artifact_safety_refusal_validator. The public-promise name "closed-loop replay" was deliberately reframed because nothing here executes a wetlab loop or commands a robot. The unusual part is that the component does not trust the fixture's own conclusion. A normal replay would read a declared "selected candidate" label and report it. This validator instead recomputes the winner from public numbers, weighting an assay proxy, an active-learning score, and a safety gate, then treats a mismatch between that recomputed pick and the declared label as a failure rather than a footnote. A stale or flattering label cannot pass.
The second discipline is refusal as a first-class result. Eight categories of dangerous or overclaiming content each have a named forbidden code, and a fixture that smuggles one in is expected to be refused, not quietly accepted. The verdict is computed from public simulator rows, safety fields, source-module manifests, replay-graph status, negative-case coverage, and a sentinel scan, and it stays inside a simulator-only ceiling. It is a safety and refusal check, not a laboratory.
Abstract
materials_chemistry_closed_loop_lab_safety_replay is a public, simulator-only replay validator for materials-lab language. It does not claim a material discovery, a wetlab protocol, a robot loop, or a benchmark. It checks whether a closed-loop-lab shaped public fixture has enough evidence to be talked about at all: candidate material refs, safety-screen refs, simulator-only assay rows, active-learning decisions, a Lab/Evolve replay graph, source-module manifest digests, negative-case refusals, metadata-only result records, and an explicit scope limit.
The technical claim is a numeric verdict proof boundary. A passing run must recompute the selected candidate from score-backed fixture rows rather than trusting a declared label. The baseline fixture contains four candidates and selects mat_polymer_membrane_001 with score 0.917; perturbation tests prove that stale labels, missing score rows, out-of-range scores, and safety-gate failures block the verdict.
Mechanism
The runtime locus is src/microcosm_core/organs/materials_chemistry_closed_loop_lab_safety_replay.py. The relevant entrypoints are run for first-wave fixture validation and run_lab_bundle for exported-bundle validation. The validator loads a replay policy, candidate rows, experiment DAG rows, simulator assays, active-learning decisions, optional source-module manifests, and eight forbidden negative-case fixtures.
The sign-off rule is deliberately small:
Positive rows must link candidates, experiments, assays, safety screens, active-learning decisions, failure taxonomy refs, and cold replay refs.
Negative cases must be observed and refused.
Numeric replay must recompute the selected candidate from public numbers.
Source-module imports must verify copied bodies without putting bodies into result records.
The safety verdict must remain inside the simulator-only scope limit.
Source refs
numeric policy + expected label
replay_policy.json
4 candidate refs + safety gates
candidate_materials.json
4 public assay proxy values
simulator_assays.json
4 active-learning scores
active_learning_decisions.json
4 copied public body modules
source_module_manifest.json
Accepted
public_safe_simulator_replay_accepted
Blocked
blocked_public_safety_boundary
Diagram source
flowchart TD policy["replay_policy.json numeric policy + expected label"] candidates["candidate_materials.json 4 candidate refs + safety gates"] assays["simulator_assays.json 4 public assay proxy values"] decisions["active_learning_decisions.json 4 active-learning scores"] numeric["numeric replay weighted recompute of the winner"] labelcheck{"recomputed pick == declared label? safety gate >= 0.70?"} negatives["negative-case fixtures 8 forbidden lab classes"] refuse{"any forbidden MATERIALS_*_FORBIDDEN observed?"} manifest["source_module_manifest.json 4 copied public body modules"] replay["Lab/Evolve replay graph replay cases"] verdict["safety verdict"] accepted["public_safe_simulator_replay_accepted"] blocked["blocked_public_safety_boundary"] result record["metadata-only result records counts, digests, findings"] ceiling["scope limit no wetlab / no discovery / no launch"] policy --> numeric candidates --> numeric assays --> numeric decisions --> numeric numeric --> labelcheck labelcheck -->|stale label or gate fail| blocked labelcheck -->|match| verdict negatives --> refuse refuse -->|yes| blocked refuse -->|no| verdict manifest --> replay replay --> verdict verdict --> accepted accepted --> result record blocked --> result record result record --> ceiling
The focused regression test_materials_chemistry_numeric_replay_recomputes_verdict_from_fixture_numbers proves the pass case: status pass, verified_numeric_row_count == 4, selected candidate mat_polymer_membrane_001, selected decision decision_membrane_001, selected next action simulate_assay, score 0.917, realness rung R3, and verdict basis recomputed_from_public_assay_active_learning_and_safety_gate_fixture_numbers.
The verifier does not use expected labels for selection. Expected labels are checked only after the selected row is recomputed from candidate, assay, and decision content.
Test Matrix
Class
Evidence
Expected verdict
Real-good fixture
Baseline first-wave fixture with four candidate, assay, and decision rows
Exported bundle manifest with four copied modules and zero manifest findings
source_module_manifest_status: pass; verified_module_count: 4; result records remain metadata-only; current checked-in bundle still needs refreshed numeric rows before it is a full exported-bundle pass
Perturbation, moved pick without expectation update
Exported bundle recomputes sorbent as the winner while policy still expects membrane
Source manifest stays pass, but numeric replay blocks with MATERIALS_NUMERIC_REPLAY_EXPECTED_LABEL_STALE
These cases are source/test-backed by tests/test_materials_chemistry_closed_loop_lab_safety_replay.py. Fresh local first-wave result record output is the authority for current numeric replay; older archived first-wave result records and the checked-in exported bundle predate the numeric replay rows and should not be read as the numeric proof. The exported bundle still needs refreshed numeric rows before it is a full exported-bundle pass.
This replay exercises a closed-loop materials and chemistry lab controller with a safety gate over synthetic experiments. It is grounded in the self-driving laboratory literature, where a propose-run-measure loop is paired with safety interlocks that can refuse an unsafe experiment. Microcosm borrows the loop-plus-safety-gate shape on a simulator; the result is metadata-only simulator evidence, not a real laboratory controller, chemical-safety authority, or launch.
Validation Result record Path
Run the current runtime proof from the Microcosm root:
Inspect the exported source-body bundle. Until the exported fixture is refreshed with score-backed numeric rows, this command may return a blocked numeric verdict while still proving the manifest/body-floor boundary:
cd microcosm-substrate
PYTHONPATH=src ../repo-pytest tests/test_materials_chemistry_closed_loop_lab_safety_replay.py -q
cd microcosm-substrate
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
This lane intentionally does not run scripts/build_doctrine_projection.py --write; generated projections, atlas cards, and shared bundle surfaces belong to their owner lanes.
Scope boundary
Limitations
This module is a replay validator, not a laboratory. It does not synthesize materials, provide wetlab instructions, control robots, rank real compounds, validate live assay data, authorize external model access, or establish a discovery benchmark. Fixture numbers are public replay coordinates for a safety-gated contract; they are not experimental measurements.
The validator can prove local consistency across fixture rows, exported source-module manifests, replay graph records, negative-case checks, sentinel scans, numeric recomputation, and metadata-only result records. It cannot prove chemical safety, regulatory suitability, lab readiness, deployment readiness, public-site freshness, publishing-scope decision, or launch-scope decision.
Scope limit
This module may claim that Microcosm has a public, source-faithful, simulator-only replay contract that checks candidate refs, safety-screen refs, simulator-only assay rows, active-learning decisions, numeric replay, failure-taxonomy refs, cold replay refs, replay cases, source bundle hashes, copied source-module digests, negative-case result records, metadata-only result record policy, and scope limits.
It must not claim wetlab operation, material synthesis, robot control, hazardous synthesis guidance, reagent quantities, controlled or bioactive targeting, live assay data, private lab notebook export, live account secrets, external model service, material discovery, benchmark performance, safety certification, public sharing, hosting, launch-scope decision, source-file changes, or product-progress authority.
Scope limit
This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.
It may not claim more than its bundle scope limit allows: Copied public Lab/Evolve source/control/result record/standard bodies, metadata-only simulator-only fixture result records, runtime bundle result records, and artifact safety/refusal validation only; no wetlab execution, hazardous synthesis guidance, reagent quantity, controlled or bioactive target, live assay, robot command, private lab notebook, external model access, discovery claim, benchmark claims, launch-scope decision, publishing-scope decision, or product-progress evidence.
Source and projection details
Source-Open Body Floor
The exported bundle at examples/materials_chemistry_closed_loop_lab_safety_replay/exported_materials_lab_safety_bundle contains a source_module_manifest.json with four copied bodies:
deterministic replay graph construction, failure classification, restart-point selection, source-bundle hashing, and result record boundaries
materials_lab_evolve_replay_graph_body_import
public_macro_control_plane_body
replay graph body, restart points, source bundles, global teachings, and public claim boundary
materials_lab_evolve_receipt_body_import
public_macro_receipt_body
replay result record body proving the source evidence shape without moving private material into result records
laboratory_standard_body_import
public_standard_body
public laboratory standard floor for the replay
The bundle validator checks module_count: 4, verified_module_count: 4, source_module_manifest_status: pass, metadata-only result record policy, and zero source module findings. The current checked-in exported bundle is still a source-body floor, not the final numeric exported-bundle proof: run_lab_bundle requires refreshed score-backed numeric rows before it can pass as a full exported-bundle verdict. Focused tests inject those rows to prove the exported-bundle numeric path. The remaining bundle and result record refresh is tracked as outstanding work.
The validator also records the blocked source-open boundary for codex/doctrine/paper_modules/lab_oracle_evolve_pipeline.md: that source paper module cannot be imported as an exact body while raw operator-anchor language remains in scope.
Certificate Kernel Execution LabThe certificate kernel execution lab validates bounded public Lean/Lake certificate-kernel rows before any proof-adjacent claim is allowed.
Certificate Kernel Execution Lab is the public proof-adjacent execution boundary for Microcosm. It checks a Lean/Lake certificate-kernel fixture, generated certificate rows, analyzer metadata, transition traces, typed CP2 action translations, bounded Evolve reruns, source-module manifest digests, copied Lean/tool/profile bodies, metadata-only result record policy, and four negative cases while keeping proof bodies, raw tactic scripts, model-output data, oracle ideal answers, oracle-needed premise ids, private source paths, stdout/stderr bodies, account secrets, private Erdos #257 proof bodies, launch-scope decision, benchmark solve-rate, and general theorem-proof authority out of result records.
Scope limit Public Lean/Lake subprocess witness, copied source proof/tool/profile bodies, first-wave fixture result records, and exported bundle result records only; no general formal-result correctness, private proof body export, external model access, oracle authority, source-file changes, benchmark solve-rate, launch-scope decision, publishing-scope decision, or whole-system proof claim.
certificate_kernel_execution_lab is a source-available public runtime refactor of the source certificate-kernel pattern. It runs a small Lean/Lake certificate kernel, generated certificate rows, analyzer metadata, CP2 typed-action reruns, and bounded Evolve policy reruns without importing private proof bodies. The exported bundle also carries copied source body modules from the real Erdos #257 certificate-kernel system: Lean kernel files, generated certificates, the strike runner, toolchain files, and Lean profile result records. The v2 fixture carries both a simple NatSumCertificate row family and a miniature BoundedOrderCertificate family so the public lab is no longer only a single-shape arithmetic result record.
Purpose
This component exists to stop a proof-adjacent claim from resting on prose. The single question it answers is narrow: did a small Lean kernel actually compile and accept the declared certificate rows, here and now, with the command, the return code, and the source hashes on record? Everything else in the page is accounting that keeps the answer honest.
The reduction it relies on is the interesting part. A large class of proof-adjacent facts can be expressed as a finite certificate plus a decidable Boolean checker shaped like validate : Cert -> Bool. The agent is never asked to write a human proof. It is asked to supply the right certificate rows, and Lean decides. The fixture carries two checker families, NatSumCertificate over arithmetic and BoundedOrderCertificate over a bounded modular order, so the sign-off is not a single hard-coded shape. A row counts as accepted only when the runner shells out to lake env lean over a temporary copy of the public project and receives exit code 0.
What is unusual is the weight placed on rejection. Deliberately wrong rows, a missing certificate, a bad arithmetic certificate, a bad bounded-order certificate, must fail through the same real Lean route, in the residual class the fixture predicted. A bundle that can show only green sign-off is treated as a replay artifact, not as certificate-kernel evidence. The runner also keeps the proof channel separate from the language model channel: a transition that can see oracle structured source record or provider hypothesis text is rejected before execution, so a model's confidence can never be quietly counted as a proof. The result record records command identity, counts, and verdicts, and never the proof bodies themselves.
Shape
Diagram source
flowchart TD bundle["JSON bundle authority"] markdown["Markdown reader projection"] mechanism["mechanism source row"] component["certificate-kernel runtime"] fixture["first-wave Lean fixture"] bundle["exported certificate bundle"] manifest["certificate manifest"] lake["Lean/Lake subprocess"] analyzer["Lean analyzer metadata"] transitions["transition trace rows"] cp2["CP2 typed-action reruns"] evolve["bounded Evolve reruns"] source_modules["source-module body floor"] readout["public readout"] result records["metadata-only result records"] ceiling["scope limit"] bundle --> markdown bundle --> mechanism mechanism --> component component --> fixture component --> bundle fixture --> manifest bundle --> manifest manifest --> lake lake --> analyzer lake --> transitions transitions --> cp2 cp2 --> evolve source_modules --> analyzer analyzer --> readout evolve --> result records readout --> result records result records --> ceiling
The module shape is a bounded public certificate-kernel execution witness, not general theorem authority. This page points at the mechanism and runtime component; the runtime validates Lean/Lake command identity, source hashes, generated certificate rows, analyzer metadata, transition traces, CP2 typed-action reruns, bounded Evolve reruns, source-module manifest digests, negative cases, public readout, metadata-only result records, and an scope limit.
Mechanism
The mechanism is a finite-certificate execution reducer. The public entrypoints run and run_certificate_bundle both call _build_result, which loads the certificate lab packet, certificate manifest, Lean project, optional negative fixtures, and optional exported-bundle source manifest before any claim is recorded. The fixture path may run Lean/Lake in a temporary public workspace; the exported-bundle path validates the standalone runtime contract and copied body floor without rerunning private source machinery.
The reducer first establishes source and result record boundaries. _input_paths enumerates the public Lean files and JSON inputs, then scan_paths checks them against core/private_state_forbidden_classes.json. _source_module_manifest_result verifies the exported bundle's nine copied source bodies by material class, target presence, required anchors, and SHA-256 equality; _source_open_body_import_summary turns that manifest into the body floor that result records can cite without carrying proof bodies.
Execution evidence is split into three layers. _build_lake_project runs lake build MicrocosmCertificateLab for the fixture path, while _analyze_lean_project records public Lean imports, declarations, line counts, and hashes with body_in_receipt: false. _execute_transitions then sets certificate transition rows through Lean: accepted rows must return zero, missing or bad certificate rows must fail in the expected residual class, and CP2/Evolve rows must rerun within allowed action and artifact classes instead of mutating arbitrary source.
The negative cases are part of the proof consumer, not examples around it. EXPECTED_NEGATIVE_CASES requires rejection of provider/oracle-visible transition rows, CP2 proof-body leakage, Evolve source-file changes, and non-public source refs in the manifest. The focused regression test tests/test_certificate_kernel_execution_lab.py exercises those refusals, digest mismatch handling, cached command-card economy, public readout generation, and the counters that keep oracle/provider/proof-body/source-file changes at zero.
AUTHORITY_CEILING and RECEIPT_TRANSPARENCY_CONTRACT bind the mechanism back to the lattice relation. The module can claim bounded public fixture and bundle evidence over Lean/Lake command identity, certificate rows, analyzer metadata, transition outcomes, CP2/Evolve reruns, source manifest digests, and metadata-only result records. It cannot claim general theorem authority, provider proof authority, benchmark solve rate, private-body equivalence, source-file changes, launch, or whole-system correctness.
Public Surfaces
Component runner: python -m microcosm_core.organs.certificate_kernel_execution_lab run --input fixtures/first_wave/certificate_kernel_execution_lab/input --out receipts/first_wave/certificate_kernel_execution_lab
This component is grounded in proof-carrying and proof-assistant traditions. Necula's Proof-Carrying Code anchors the idea that an untrusted producer can supply a certificate checked by a small trusted verifier. The Lean theorem prover continues the small-kernel proof-assistant lineage, and LeanDojo shows why reproducible Lean environments, premise access, and programmatic proof-state interaction matter for theorem-proving agents.
Microcosm borrows the certificate-kernel discipline: certificate rows, Lean/Lake command identity, return codes, source hashes, transition traces, negative rows, and metadata-only result records must be visible before proof-adjacent language is allowed. It does not claim general theorem proof authority.
Research Bet
This component is the certificate-kernel bet in runnable form: a large class of proof-adjacent facts can be reduced to a finite certificate plus a decidable Boolean checker. The public lab keeps the agent task narrow. It does not ask the agent to synthesize a human proof; it asks for the right certificate rows, then lets Lean/Lake decide whether the checker accepts them.
The toy path uses a Lean certificate kernel shaped like validate : Cert -> Bool and accepts only when Lean can compile and run the declared check. The source-body import path carries the real Erdos #257 source floor: Lean kernel files, generated certificate shards, toolchain files, and profile result records from the Mathlib formalization family. The result record may say "accepted" only when the public runner shells out to Lean/Lake and receives exit code 0 for the declared bundle.
The negative floor is part of the proof, not decoration. Deliberately wrong certificate rows must be rejected by the real Lean route, including arithmetic and bounded-order failures. A bundle that cannot show genuine rejection cases is only a replay artifact, not certificate-kernel evidence.
Evidence class: core/organ_evidence_classes.json::certificate_kernel_execution_lab records external_subprocess_witness at rank 4.
Source-module manifest: examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle/source_module_manifest.json declares nine copied Lean/tool/profile body modules.
Runtime result record: receipts/runtime_shell/demo_project/organs/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle_validation_result.json
Sign-off result records: receipts/first_wave/certificate_kernel_execution_lab/* and result records/sign-off/first_wave/certificate_kernel_execution_lab_fixture_acceptance.json
Reader Evidence Routing
Bundle route: core/paper_module_capsules.json::paper_modules[7:paper_module.certificate_kernel_execution_lab] is the JSON authority row. A diagram view is generated for this module; the Atlas card for this module is staged and will appear once the component-atlas lane completes its binding pass.
Mechanism route: core/mechanism_sources.json::mechanism.certificate_kernel_execution_lab.validates_public_certificate_kernel_execution binds the validator command, exported-bundle validator command, focused regression, guardrails, input refs, result record refs, and runtime code locus.
Exported-bundle route: examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle is the public runtime bundle. Open source_module_manifest.json before using copied-body counts, then inspect the runtime validation result record and public readout.
Focused-test route: tests/test_certificate_kernel_execution_lab.py verifies Lean/Lake execution, analyzer output, transition batching, CP2/Evolve counters, public structured bundle shape, digest mismatch rejection, exact copied source modules, cached command-card economy, transparent metadata-only result records, and the cold-reader public readout.
Cold-Agent Use
Open the source-module manifest first, then the runtime result record, then the component source. The useful claim is not that Microcosm proved the Erdos #257 theorem, solved a benchmark, imported private proof bodies, or gained provider/oracle authority. The useful claim is that Microcosm can force a proof-adjacent story to expose Lean/Lake command identity, return codes, source hashes, declaration counts, certificate rows, transition traces, typed CP2 actions, bounded Evolve reruns, source-module body refs, negative-case result records, and authority counters before certificate-kernel language is allowed.
Re-entry condition: after the sibling organ_atlas.json lane releases, bind this paper-module bundle, mechanism ref, and code locus into the atlas row and rerun python -m microcosm_core.doctrine_lattice --check.
Validation Result record Path
Run the first-wave fixture into disposable result records from the Microcosm root:
Run the exported bundle through the same component:
cd microcosm-substrate
../repo-pytest tests/test_certificate_kernel_execution_lab.py -q
cd ..
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
Scope boundary
Authority Boundary
The lab proves only that the declared public Lean fixture compiled and that the declared transition rows were accepted, rejected, or left residual under the local verifier. The copied source body modules are public source-open body material, but result records cite them only by manifest row, hash, class, count, and required anchor. It does not expose proof text through result records, count oracle/provider output as proof authority, change source files, claim benchmark solve-rate, or include launch operations.
Result record Shape
Result records are public evidence. The lab exposes structured theorem/declaration names, Lean/Lake command identity, return codes, hashes, declaration counts, accepted/residual counts, negative-case ids, CP2 action classes, Evolve policy artifact ids, source-module manifest status, copied body-material counts, authority counters, scope limit, and scope boundary. It omits only proof, provider, oracle-answer, private-source, and stdout/stderr payload bodies, and records that omission through secret_exclusion_scan and body_in_receipt: false rather than treating absence as product evidence.
Lean/Lake build result record for MicrocosmCertificateLab.
Analyzer metadata for public Lean files: imports, declarations, hashes, and line counts with proof bodies omitted from JSON result records.
Transition rows for valid certificates, missing certificate rows, bad generated certificate rows, and bounded order-certificate rows.
CP2 typed-action translations over missing-certificate residuals, with Lean reruns proving downstream effect.
Bounded Evolve mutations over certificate row selection policy, accepted only after reruns and no leakage regression.
Source-open body import rows for the real source certificate-kernel body floor: exact copied targets under source_modules/ai_workflow, source/target hashes, material classes, and provenance anchors, with result record body text forbidden.
Scope boundary
This is a source-available certificate-kernel laboratory with copied source body material, not a private source dump and not general proof authority beyond the declared fixture rows and source-module body refs.
Scope limit
This paper module can claim a certificate-kernel laboratory backed by a structured doctrine row, with a diagram view generated from that row. The Atlas card for this module is staged pending the component-atlas lane's binding pass; that is honest coordination state, not a content gap.
It cannot claim formal-result correctness, benchmark solve rate, private proof body export, provider or oracle authority, source-file changes, publishing-scope decision, launch-scope decision, or whole-system proof authority. The Atlas card must be completed by the owning component-atlas/bundle route and builder regeneration, not by hand-editing Markdown.
Limitations
This module is a bounded public execution witness, not a theorem-proving authority. Its evidence depends on the shipped public Lean/Lake fixture, generated certificate rows, analyzer metadata, CP2 typed-action reruns, bounded Evolve reruns, and copied source-module manifest. A green run proves that this certificate-kernel bundle follows those constraints; it does not establish the Erdos #257 theorem, Mathlib coverage, benchmark solve rate, or correctness of private source proof bodies.
The source-open body floor is intentionally narrow. The exported bundle carries nine copied Lean/tool/profile bodies under source_modules/, and the result records may cite only refs, hashes, material classes, counts, required anchors, and verdicts. Proof bodies, raw tactic scripts, model-output data, oracle answers, private source paths, stdout/stderr bodies, account secrets, and private source-root material remain outside the public result record surface.
The focused regression covers the declared fixture and exported bundle shape. It checks Lean/Lake execution boundaries, analyzer output, transition batching, CP2/Evolve counters, digest mismatch rejection, exact copied source modules, cached command-card economy, transparent metadata-only result records, and public readout shape. It excludes future certificate families, generated Atlas/site public sharing, source-file changes, or public launch without the owning builder and launch lanes.
Source and projection details
Governing Lattice Relation
The bundle places this module under concept.formal_math_and_proof_witness_bundle: proof-adjacent public claims must be reduced to explicit witness artifacts before a reader is allowed to treat them as evidence. In this module, the witness artifacts are the public Lean/Lake subprocess result, generated certificate rows, analyzer metadata, transition traces, CP2/Evolve rerun evidence, copied source-module manifest, and metadata-only result records. Markdown explains that lattice; it does not replace the JSON bundle or the validator result records.
P-3 is the governing principle edge for the module's claim discipline. The runtime does not ask whether a proof story is persuasive; it requires a finite certificate family, a named verifier route, visible command identity, explicit return codes, public-relative refs, and result record transparency. That is why the mechanism row binds run, run_certificate_bundle, _source_module_manifest_result, _source_open_body_import_summary, _build_result, _receipt_freshness, and build_public_readout as the code locus instead of treating the paper module as independent proof evidence.
AX-2 is the hard boundary: public proof language must remain inside the declared certificate-kernel execution evidence. The standard's scope limit keeps formal_proof_authority limited to bounded public fixture rows and keeps external model access, oracle success, source-file changes, private-system equivalence, launch-scope decision, runtime correctness, and whole-system correctness false.
The dependency on paper_module.verifier_lab_execution_spine tells a reader how to interpret the lab. The certificate kernel is one proof-adjacent execution cell inside the verifier-lab spine: it can show accepted/residual transition rows and rerun effects, but it cannot promote those rows into launch, public sharing, benchmark, or theorem-authority claims without the sibling verifier and launch lanes.
Corpus Readiness Mathlib Absence GateThe corpus readiness Mathlib absence gate validates copied corpus/toolchain readiness bodies before any Mathlib-dependent proof or retrieval claim is allowed.
Corpus Readiness Mathlib Absence Gate is the public formal-math corpus readiness boundary for Microcosm. It checks copied PROVER smoke-run corpus readiness rows, Lean/Std toolchain probe rows, Mathlib absence status, consumer-gate decisions, absent corpus blocking, source-module manifest digests, metadata-only result record policy, and five negative cases while keeping proof bodies, model-output data, non-public source refs, benchmark-completeness claims, launch-scope decision, and Mathlib proof authority out of result records.
Scope limit Public algorithmic projection over copied corpus/toolchain readiness system, first-wave fixture result records, and exported bundle result records only; no Lean/Lake rerun, Mathlib availability claim, formal-result correctness, proof body export, external model access, benchmark/corpus completeness, launch-scope decision, publishing-scope decision, or whole-system proof claim.
corpus_readiness_mathlib_absence_gate is the public formal-math corpus readiness boundary for Microcosm. It carries copied corpus/toolchain rows from the 2026-05-11 proof-state curriculum smoke run and forces Mathlib absence, absent-corpus blocking, consumer gate decisions, and source-module digest coupling to be visible before any downstream retrieval, tactic-routing, or proof-witness language is allowed.
Purpose
Formal-math agents fail in a specific way: they treat "there is a corpus" as if it meant "this corpus is usable for the proof route I am about to take". A roster lists miniF2F, PutnamBench, ProofNet, LeanDojo and Mathlib, the agent assumes the libraries are present, and the failure only surfaces later as a broken import or a tactic that needs a premise the host cannot resolve. This component answers one question before that happens: for each corpus, is it actually present on this host, and is the Mathlib import lane actually available, or not?
The unusual part is that the gate does not take the answer on trust. It runs a bounded Lean/Lake import probe in a temporary directory: one small file that imports Std and is expected to compile, and one that imports Mathlib and is expected to be rejected with the toolchain's own unknown module prefix 'Mathlib' error. A corpus is only marked usable for Mathlib-dependent work when the runtime evidence agrees the corpus exists, carries a Lake file, and the Mathlib lane probe passes. In the current system the Mathlib probe stays false, so every Mathlib-dependent consumer is blocked, and the one consumer that passes is the Lean3 translation smoke, which needs no Mathlib project at all.
This closes the most common way a readiness claim drifts. Stale alias fields such as mathlib_available, or a PASS lean status, cannot turn the gate green on their own; they must agree with the live probe or the row is flagged. The probe is deliberately narrow. It checks that imports resolve and that Mathlib is genuinely absent. It does not run a lake build, prove any theorem, or claim Mathlib is installed. The output is a readiness board and a set of blocked consumer verdicts, bounded evidence.
Shape
Source refs
lake env lean: Std compiles, Mathlib import rejected
runtime_lean_import_probe
check SHA-256 digests, parse probe JSON
validate_runtime_source_artifacts
7 corpus rows, alias fields must agree with probe
validate_corpus_readiness
derive verdicts from readiness facts
validate_consumer_gate_cases
4 copied source artifacts, digest match
validate_source_module_imports
Diagram source
flowchart TD fixture["Fixture or exported bundle input corpus readiness rows + consumer gate cases"] probe["runtime_lean_import_probe lake env lean: Std compiles, Mathlib import rejected"] artifacts["validate_runtime_source_artifacts check SHA-256 digests, parse probe JSON"] mathlib{"Mathlib lane available? corpus exists + Lake file + probe passes"} corpus["validate_corpus_readiness 7 corpus rows, alias fields must agree with probe"] gates["validate_consumer_gate_cases derive verdicts from readiness facts"] imports["validate_source_module_imports 4 copied source artifacts, digest match"] allowed["Allowed: Lean3 translation smoke (needs no Mathlib project)"] blocked["Blocked: Mathlib-dependent and absent-corpus consumers"] result records["metadata-only result records result, board, validation, sign-off, bundle"] ceiling["Scope limit no Mathlib availability, proof, provider, launch"] fixture --> artifacts artifacts --> probe probe --> mathlib mathlib -->|no, probe false| corpus corpus --> gates gates --> allowed gates --> blocked fixture --> imports corpus --> result records gates --> result records imports --> result records result records --> ceiling
This reader diagram is intentionally smaller than the generated doctrine-lattice graph.
Mechanism
The mechanism is a readiness reducer, not a theorem-proving backend. The runtime entrypoints run and run_projection_bundle both call _build_result, which loads public fixture or exported-bundle inputs, scans those inputs against the non-public-state exclusion policy, verifies copied source artifacts, and then combines corpus readiness, consumer gate, source-module import, negative-case, and scope limit fields into one metadata-only result.
validate_runtime_source_artifacts anchors the reducer to four source refs: the corpus readiness rows, tactic-affordance probe, Mathlib import probe Lean file, and tactic portfolio availability JSON. It checks expected SHA-256 digests, parses the JSON source artifacts, and runs a bounded Lean/Lake import probe that can show Std imports and Mathlib remains absent without running a Lake build or exporting Lean bodies.
validate_corpus_readiness normalizes seven corpus rows against those runtime source artifacts. A corpus is usable for Mathlib-dependent work only when the runtime evidence says the corpus exists, has a Lake file, and mathlib_lake_project_import_available is true. In the current fixture and bundle evidence that field remains false, so Mathlib-dependent capabilities are blocked, absent corpora are recorded, and stale alias fields such as mathlib_available cannot turn the gate green.
validate_consumer_gate_cases then derives consumer verdicts from the normalized readiness facts instead of trusting expected-decision labels. The translation smoke consumer can pass because it does not require a Mathlib Lake project and names an available Lean3 reference corpus; Mathlib-dependent or absent-corpus consumers stay blocked. validate_source_module_imports adds the exported bundle floor by requiring the manifest class copied_non_secret_macro_body, material classes, target/source digest agreement, and no body material in result records.
The proof consumers are the two component commands, the focused regression test tests/test_corpus_readiness_mathlib_absence_gate.py, the paper-module corpus check, and the command-card surfaces emitted by result_card. Together they exercise the success path, contradictory Mathlib claims, consumer-gate skips, source digest tampering, private-path rewrites, runtime-probe blocks, and result record body exclusion. The resulting evidence relates the bundle's two mechanisms to concept.formal_math_and_proof_witness_bundle, P-8, and AX-7 by making readiness visibility a precondition for downstream formal-math claims while keeping the scope limit below theorem, provider, benchmark, or launch-scope decision.
Public Surfaces
Component runner: python -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run --input fixtures/first_wave/corpus_readiness_mathlib_absence_gate/input --out receipts/first_wave/corpus_readiness_mathlib_absence_gate
Runtime result record: receipts/runtime_shell/demo_project/organs/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle_validation_result.json
Reader Evidence Routing
Read this module in five passes:
Start with the source record at core/paper_module_capsules.json::paper_modules[8:paper_module.corpus_readiness_mathlib_absence_gate]. It is the source authority that names source_authority: json_capsule, the component subject, two mechanism subjects, the resolved runtime code locus, the concept concept.formal_math_and_proof_witness_bundle, the dependency paper_module.tactic_portfolio_availability, P-8, and AX-7.
The reader proof is the current row shape: eight generated relationship edges, Mermaid available_from_capsule_edges, Atlas blocked_until_organ_atlas_owner_lane_binds_edges, and no unpopulated paper-module selective dependency residual for the tactic-portfolio edge. The structured source record is wiring evidence, not theorem-correctness, runtime-correctness, launch, provider, or production authority.
Inspect the runtime locus src/microcosm_core/organs/corpus_readiness_mathlib_absence_gate.py. The load-bearing symbols are run, run_projection_bundle, validate_corpus_readiness, validate_consumer_gate_cases, validate_source_module_imports, _build_result, write_receipts, result_card, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_NAME, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.
For fixture evidence, use fixtures/first_wave/corpus_readiness_mathlib_absence_gate/input and the result records under receipts/first_wave/corpus_readiness_mathlib_absence_gate/ plus result records/sign-off/first_wave/corpus_readiness_mathlib_absence_gate_fixture_acceptance.json. The first-wave result result record records seven corpus rows, seven consumer cases, one allowed Lean3 translation-smoke case, six blocked absent or Mathlib-dependent cases, mathlib_lake_project_import_available: false, body_in_receipt: false, and the five negative cases mathlib_available_without_probe, consumer_skips_readiness_gate, private_corpus_source_ref, proof_body_leakage, and release_overclaim.
For exported-bundle evidence, use examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle/source_module_manifest.json and receipts/runtime_shell/demo_project/organs/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle_validation_result.json. The manifest verifies four copied source artifacts: corpus readiness JSON, tactic-affordance probe JSON, the Mathlib import probe Lean file, and tactic portfolio availability JSON. The exported result record records source_module_import_count: 4, copied_source_artifact_count: 4, source_modules_pass: true, body_in_receipt: false, and three blocked absent or Mathlib-dependent bundle consumer cases.
If a reader needs validation result records rather than prose, run the commands in ## Validation Result record Path, including the focused regression test and paper-module corpus check. Treat every result record as corpus-readiness boundary evidence only; it does not create Lean/Lake execution authority, Mathlib availability, theorem-proof authority, provider authority, private-system equivalence, or launch-scope decision.
Prior Art Grounding
This component is grounded in Lean corpus and neural theorem-proving work where library availability, premise access, and benchmark splits are part of the claim. The Lean mathematical library establishes Mathlib as a large community-maintained formal mathematics corpus, miniF2F gives a cross-system benchmark for formal Olympiad statements, and LeanDojo shows why reproducible corpus extraction and accessible-premise metadata matter for theorem-proving agents.
Microcosm borrows the readiness gate: corpus rows, Mathlib availability probes, blocked consumer cases, source-module digests, and negative leakage guards must be visible before retrieval, tactic-routing, or proof-witness language is allowed. It does not claim Mathlib is present or that any theorem was proved.
Research Bet
Formal-math agents fail when they treat "there is a corpus" as equivalent to "this corpus is usable for this proof route." This component makes that boundary runnable. It records seven corpus rows, blocks six absent or Mathlib-dependent consumer cases, allows only the Lean3 translation-smoke case, and keeps the Mathlib probe false until an actual passing probe is present.
The exported bundle carries four copied body artifacts: corpus readiness JSON, tactic-affordance probe JSON, the Mathlib import probe Lean file, and tactic portfolio availability JSON. Two rows are exact copies and two use a verified private-path rewrite. The result record records the manifest status, counts, material classes, digests, and metadata-only policy; the copied bodies stay under source_artifacts/, not inside result records.
Sign-off result records: receipts/first_wave/corpus_readiness_mathlib_absence_gate/* and result records/sign-off/first_wave/corpus_readiness_mathlib_absence_gate_fixture_acceptance.json
Cold-Agent Use
Open the source-module manifest first, then the runtime bundle result record, then the first-wave result result record. The useful claim is not that Microcosm has Mathlib or can prove downstream theorems. The useful claim is that Microcosm can force a formal-math route to expose corpus availability, Mathlib absence, consumer gating, source-module digest evidence, copied-body boundaries, negative-case result records, and an explicit scope boundary before any proof route is treated as usable.
Re-entry condition: the current atlas row already points at this paper module. After the sibling organ_atlas.json lane releases, bind this bundle's mechanism ref and code locus into the atlas row and rerun python -m microcosm_core.doctrine_lattice --check.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the public corpus-readiness board, result result record, and validation result record. The bundle command validates the exported source-module manifest and metadata-only runtime result record. The corpus check and jq structured source record query prove the bundle-derived projection currentness without hand-editing generated JSON. The focused test keeps the Mathlib absence boundary, consumer gate cases, source-module digest checks, non-public paths rewrite policy, and scope boundary behavior from regressing.
Passing these commands does not establish Mathlib is installed, rerun Lean/Lake, validate downstream formal-result correctness, benchmark a corpus, authorize external model access, or approve launch; it only proves the bounded fixture and exported bundle result records preserve the declared readiness boundary.
Scope boundary
Scope limit
This component is algorithmic projection over copied source system, not a Lean/Lake rerun and not Mathlib proof authority. Its strongest public claim is that a fixture and exported bundle agree about corpus readiness, Mathlib absence, blocked consumers, copied source-module digests, metadata-only result records, and negative leakage guards. It does not establish formal-result correctness, claim Mathlib is available, benchmark a corpus, expose proof/provider/private bodies, call a provider, change source files, or include launch operations.
Scope limit
The JSON bundle proves a public corpus-readiness boundary only: copied corpus/toolchain rows, absent-Mathlib blocking, consumer gate decisions, source-module digest coupling, metadata-only result records, and negative leakage guards. Mermaid availability reflects bundle edges, while the Atlas row still waits on the component-atlas owner lane. This module does not establish Mathlib is installed, rerun Lean or Lake, validate formal-result correctness, benchmark corpus quality, authorize retrieval or tactic routing, use external model services, expose private proof bodies, change source records, or approve launch.
Result record Shape
The first-wave result result record records corpus_count: 7, consumer_case_count: 7, allowed_case_ids, blocked_case_ids, absent_corpus_ids, mathlib_lake_project_import_available: false, body_in_receipt: false, the scope limit, and five observed negative cases:
mathlib_available_without_probe
consumer_skips_readiness_gate
private_corpus_source_ref
proof_body_leakage
release_overclaim
The exported runtime result record records source_module_import_count: 4, copied_source_artifact_count: 4, source_modules_pass: true, and the same metadata-only result record boundary.
Scope boundary
This is a source-backed corpus readiness boundary with copied source corpus/toolchain material, not Lean/Lake execution, Mathlib availability, theorem-proof authority, corpus benchmark authority, provider authority, or launch-scope decision.
Pattern Binding ContractThe public pattern-binding component validates pattern rows, source bundles, authority handles, exported system bundles, and route-readiness selector overlays while keeping mined rows component-first and fixture-bound.
Pattern Binding Contract is the public system membrane for mined pattern rows. It validates binding fields, source bundles, reference bundles, authority-chain handles, secret-exclusion scans, exported system bundles, and route-readiness selector overlays, then writes bounded result records that keep private bodies out and prevent individual pattern rows from becoming standalone public leaves.
Scope limit Public pattern-binding fixtures, exported system bundles, route-readiness selector bundles, source-bundle refs, and metadata-only result records only; no private pattern-ledger certification, launch-scope decision, external model access, hosted-public posture, recipient work, private-data equivalence, source-file changes, or whole-system correctness.
pattern_binding_contract is the public root component that binds pattern rows to source-available source bundles, public runtime refs, authority-chain handles, scope boundaries, and secret-exclusion result records. Synthetic rows are allowed only as regression controls or negative cases; they are not product evidence.
Purpose
A mined engineering pattern is a tempting thing to publish on its own. It reads like a self-contained insight, so it is easy to lift a single row out of a private ledger and present it as a finished public claim. This component exists to stop that. It answers one question: can a given pattern row be admitted to the public surface, and if so, under exactly what evidence and what ceiling?
The check is binding rather than display. Every pattern row must name a source bundle that points at a real public runtime ref or regression-harness ref, a governing standard, and an scope boundary. A row that lacks any of these, duplicates another row's id, or claims to be a standalone public leaf is rejected. The same validator runs deliberate negative cases alongside the positive control, so the result record proves not only that good rows pass but that each known failure mode is still caught.
The less obvious idea is truth accounting. When an exported bundle is validated, the component separates rows that merely describe runtime metadata from rows that represent a real pattern-ledger import, and records that a high accepted-row count is not the same as system progress. This guards against the quiet inflation where counting accepted rows starts to read like a measure of how much real work has landed. The route-readiness layer closes the matching gap on the selector side: a row can look selectable in isolation, but it is only admitted through the component that owns it, its fixture contract, and a gate that refuses to let hard no-standalone rows appear as selectable targets.
Public Contract
The validator checks required binding fields, duplicate pattern conflicts, unsupported authority-chain handles, unresolved reference bundles, secret/provider/operator body sentinels, and public-leaf overclaim failures. It emits command-owned result records under receipts/first_wave/pattern_binding_contract/.
The exported system bundle also carries the source route-readiness selector overlays as public source-open bodies: examples/pattern_binding_contract/exported_route_readiness_bundle/. The validator recomputes the selector contract against the imported pattern ledger, route-readiness audit, row-to-component router, route cards, fixture specs, decision matrix, dependency DAG, internal routing graph, and copied source validation report. This closes the old gap where a mined pattern row could look selectable without opening the component bundle that owns it.
Cold readers should use microcosm pattern-route-readiness validate-bundle against examples/pattern_binding_contract/exported_route_readiness_bundle/ when the question is selector admission rather than generic pattern binding. The older pattern-binding validate-route-readiness-bundle action remains a compatibility route to the same validator.
flowchart LR subgraph Inputs["Pattern-binding inputs"] Patterns["Pattern rows id, governing standard, scope boundary, source refs, projection posture"] Bundles["Source bundles metadata-only refs to public runtime or regression harness"] Handles["Authority-chain handles resolver result records"] end Validator["pattern_binding_contract required fields, duplicate ids, bundle resolution, secret-exclusion scan"] subgraph Negative["Refusal floor"] Dup["Duplicate id rejected"] Leak["Private body leak rejected"] Overclaim["Public-leaf overclaim rejected"] Unsupported["Unsupported authority handle not upgraded"] end subgraph Bundle["Exported-bundle path"] Truth["Truth accounting runtime-metadata rows vs real pattern-ledger import"] RouteReadiness["Route-readiness selector component-first admission, fixture contract, hard no-standalone gate"] end Result records["Result records refs, digests, counts, verdicts; body text omitted"] Patterns --> Validator Bundles --> Validator Handles --> Validator Validator --> Negative Validator --> Bundle Truth --> RouteReadiness Negative --> Result records Bundle --> Result records
Evidence Binding
Accepted component row: core/organ_registry.json::implemented_organs[pattern_binding_contract]. Evidence class: core/organ_evidence_classes.json::organ_evidence_classes[pattern_binding_contract] with rank 5 semantic-validator authority. The runtime locus is src/microcosm_core/organs/pattern_binding_contract.py, with focused coverage in tests/test_pattern_binding_contract.py.
Paper bundle authority: core/paper_module_capsules.json#paper_module.pattern_binding_contract. Mechanism source: core/mechanism_sources.json#mechanism.pattern_binding_contract.validates_public_pattern_bindings.
Reader Evidence Routing
Read this module as a public binding membrane for pattern rows, not as a private pattern-ledger certificate or a standalone public-leaf selector. Start with paper_modules/pattern_binding_contract.json for the bundle payload, then open standards/std_microcosm_pattern_binding_contract.json to check the required fields, public/private boundary, source-open body import floor, route-readiness rules, and result record expectations.
Use core/fixture_manifests/pattern_binding_contract.fixture_manifest.json before inspecting fixtures or exported bundles. The manifest and the source_module_manifest.json files name the copied source body floor; result record payloads should carry source refs, digests, anchors, counts, verdicts, and omission result records rather than inlining body text.
Treat route-readiness selection as component-first evidence. A mined pattern row can be selectable only through the route-readiness bundle, selector contract, and result records that keep duplicates, unknown refs, private leakage, missing fixture contracts, dependency cycles, hard no-standalone rows, and companion-overlay gaps rejected.
Prior Art Grounding
This component follows the software pattern-language tradition of making reusable engineering structures explicit, named, and reviewable. The Hillside patterns library is the direct prior-art family for treating patterns as shared vocabulary rather than loose implementation notes.
The binding layer also borrows from provenance and supply-chain attestation patterns. W3C PROV motivates the source/ref/evidence relation shape, while SLSA and in-toto motivate digest-bound artifact claims and step-level metadata. Microcosm applies those ideas to pattern rows and route-readiness selectors, not to launch certification.
Re-entry condition: if copied source bodies, route-readiness overlays, or negative-case rules change, rerun the three first commands above and update this paper module plus standards/std_microcosm_pattern_binding_contract.json from the new result record fields. Do not raise the scope limit from selector and binding validation to launch, public sharing, private-data equivalence, or standalone public-leaf authority.
Validation Result record Path
From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:
These checks validate public pattern-binding fixtures, system-bundle result records, route-readiness selector result records, and metadata-only authority handles only; they do not certify the private pattern ledger, hosted readiness, launch, external model access, private-data equivalence, or whole-system correctness.
The current authority is the runtime result record set under receipts/first_wave/pattern_binding_contract/; do not cite a separate pattern-specific sign-off result record unless an sign-off-lane artifact is actually present. Cold readers should inspect result record fields rather than markdown constants: status, secret_exclusion_scan, source_open_body_imports, truth_accounting, route_readiness_summary, selection_contract, and source_manifest.
Scope boundary
Scope limit
This module covers public pattern-binding mechanics: source-bundle validation, reference-bundle validation, authority-handle validation, route-readiness selector admission, duplicate and unknown-ref rejection, private-leakage sentinel checks, and metadata-only result record shape. It is evidence for the pattern_binding_contract component and mechanism.pattern_binding_contract.validates_public_pattern_bindings.
The ceiling stops before private pattern-ledger authority, hosted or public launch-scope decision, deployment posture, standalone public-leaf selector status, private-data equivalence, external model access, recipient work, source-file changes, publishing-scope decision, or whole-system correctness.
Scope boundary
This module documents public pattern-binding mechanics and regression harnesses. It does not certify the private pattern ledger, public launch operations, hosted-public posture, public sharing, recipient work, external model access, private-data equivalence, or whole-system correctness. Route-readiness import does not make any mined pattern row a standalone public leaf; selection remains component-first and fixture-bound.
Source and projection details
Source-Open Body Floor
The source-open body floor is the imported public bundle, not the private pattern ledger. Cold readers can open examples/pattern_binding_contract/exported_substrate_bundle/ and examples/pattern_binding_contract/exported_route_readiness_bundle/ to inspect the copied source module manifests, source bundles, reference bundles, authority-chain handles, route-readiness overlays, selector contract inputs, and copied source validation report. The required body floor is named by each source_module_manifest.json plus source_capsules.json, reference_capsules.json, and authority_chain_handles.json.
Result records and manifests must stay metadata-only where the standard requires it: they carry refs, digests, anchors, counts, verdicts, omission result records, and secret-exclusion results. They do not inline private source bodies, raw operator payloads, model-output data, recipient data, or hidden pattern-ledger material.
Bridge Phase Continuity RuntimeThe public bridge-continuity fixture validates disk-first continuation packets, heartbeat/resource-pressure boundaries, resume-once semantics, worker-skip dedupe, tracked result record-write gates, and non-public-state exclusion without live bridge transport.
Bridge Phase Continuity Runtime is the public observe/apply continuity membrane for detached work. It consumes synthetic transport fixtures, validates copied observe-runtime body digests, writes five metadata-only result records, checks seven negative-case classes, and keeps heartbeat/resume evidence below work-landing, provider, UI, source-file changes, and launch-scope decision.
Scope limit Public synthetic observe/apply fixture and copied body digest evidence only; no live bridge transport, provider/UI uptime, operator HUD/browser state, live phase runtime, prompt-shelf/private-memory bodies, work landing, source-file changes, launch-scope decision, or whole-system correctness.
bridge_phase_continuity_runtime is the public, executable synthetic transport continuity membrane for detached bridge work. It lets a cold agent validate the disk-first observe/apply handoff without opening live bridge transport, model-output data, operator HUD/browser state, prompt-shelf bodies, private memory, or active phase runtime state.
Purpose
This paper module exists to make detached bridge continuity testable as a public fixture instead of a trust story about hidden agents or provider sessions. The component asks one bounded question: can a disk-first observe/apply handoff be represented by public synthetic transport inputs, validated through continuation, heartbeat, resource-pressure, resume, worker-skip, and completion result records, and kept below live bridge/provider/UI/source-file changes?
The important mechanism is not "run a bridge." It is a continuity membrane: every claim must pass through explicit packet fields, negative-case checks, metadata-only result record writes, and an scope limit that says heartbeat is liveness evidence, resume is resume evidence, and neither is proof that work landed.
First command:
microcosm bridge-phase-continuity-runtime run --input fixtures/second_wave/bridge_phase_continuity_runtime/input --out /tmp/microcosm-bridge-continuity
Prior Art Grounding
This runtime borrows from durable execution, workflow orchestration, leases, and provenance practice. Useful anchors include:
Temporal, whose durable-execution model keeps workflow state resumable across process failure and retries.
Apache Airflow DAGs, which separate task ordering and retry/timeout policy from task internals.
Kubernetes Lease-based leader election, as a prior pattern for liveness evidence, lease renewal, and failover without confusing a heartbeat with work completion.
W3C PROV, for provenance records that let readers evaluate how an output was produced.
Microcosm borrows the resumable-workflow, DAG, lease, and provenance shapes, but keeps the component to public synthetic observe/apply fixture sign-off. It does not run live bridge transport, use external model services, prove UI uptime, land work, change source files, or include launch operations.
Result record set: receipts/second_wave/bridge_phase_continuity_runtime/*.json
Shape
Source refs
Blocked
tracked_receipt_writes_blocked
Diagram source
flowchart TD Inputs["Six synthetic transport inputs detached job, continuation packet, heartbeat rows, resource pressure, worker-skip result record, forbidden terms"] --> Transport["_validate_synthetic_transport_contract"] Transport --> Good{"Valid job? yielded to disk, packet not consumed, fresh heartbeat, phase and continuity match"} Good -->|"yes"| Accept["Positive path accepted"] Good -->|"no"| Refuse["Refusal floor: missing packet, missing fields, duplicate resume, heartbeat claims resume, stale heartbeat overclaim, dispatch blocked"] Refuse --> Codes["Concrete error codes"] Accept --> Fixture["_validate_fixture_contract source digests, completion finalizer, apply-failure rollback, public boundary"] Codes --> Fixture Fixture --> Scan["private_state scan fixture and transport inputs"] Scan --> Result records["Five metadata-only result records continuation, heartbeat, resource pressure, resume, completion transition"] Result records --> Gate{"Tracked result record-write gate"} Gate -->|"env set"| Written["Result records written"] Gate -->|"env absent"| Blocked["tracked_receipt_writes_blocked"] Written --> Ceiling["Scope limit: no live bridge transport, external model access, HUD/browser/private memory, source-file changes, launch, or whole-system proof"] Blocked --> Ceiling
The shape is the public continuity membrane: six synthetic transport inputs are checked for a single valid resumable job and against a refusal floor, the accepted and rejected paths both feed the fixture-contract and non-public-state checks, and only then are the five metadata-only result records written through the tracked-write gate. The result record roles delimit what a reader can trust.
Mechanism Pipeline
The runtime source locus is src/microcosm_core/organs/bridge_phase_continuity_runtime.py. Its public entry point run reads the fixture manifest, resolves public-relative fixture paths, and validates six synthetic transport inputs: detached_job.json, continuation_packet.json, heartbeat_rows.jsonl, resource_pressure.json, worker_skip_receipt.json, and private_state_forbidden_terms.json. JSONL heartbeat rows are streamed by _read_required_jsonl so malformed rows are findings, not a reason to ingest a whole live heartbeat body.
The central validator is _validate_synthetic_transport_contract. It separates five result record roles: continuation packet, heartbeat, resource pressure, resume result record, and completion transition. The implementation then writes the canonical result record set only through the result record-write gate. When the requested output is a tracked result record path and MICROCOSM_TRACKED_RECEIPT_WRITES=1 is absent, the component reports tracked_receipt_writes_blocked instead of silently refreshing tracked evidence.
The negative-case floor is source-declared in EXPECTED_NEGATIVE_CASES and validated from fixture contents. Missing continuation packets, missing required fields, duplicate resume attempts, heartbeat rows that claim resume authority, stale heartbeat overclaims, resource-pressure dispatch blocks, private HUD body leakage, resume-pass work-landing overclaims, and observe/apply validation rollback all become explicit error codes. A pass therefore means the fixture both accepted the positive path and observed the refusal floor.
Reader Evidence Routing
Reader evidence routes from this module to the runtime source locus, fixture manifest, source-module manifest, public result records, and focused regression. A diagram view and an atlas card are generated for this module. This page explains what a reader can infer from them.
Evidence class
What it supports
Proof consumer
Positive synthetic fixture
The runner consumes the observe/apply fixture, writes five metadata-only result record roles, keeps non-public-state scan clean, and preserves the scope limit.
The seven expected negative case classes are observed as concrete error codes rather than prose warnings.
Focused bridge-continuity negative-case tests in tests/test_bridge_phase_continuity_runtime.py
CLI card boundary
Compact command cards can summarize status without leaking forbidden private/live body classes.
Bridge-continuity CLI/card tests in tests/test_bridge_phase_continuity_runtime.py
What It Proves
The component proves a bounded public fixture contract:
A yielded synthetic job can be resumed only through an explicit continuation packet.
Missing packets, missing packet fields, and already consumed packets are rejected.
Heartbeat rows stay liveness evidence only; fresh or stale heartbeat rows do not become resume authority or provider/UI uptime evidence.
Resource pressure can block dispatch and must be recorded as a blocked decision.
Resume success is resume-only; it does not establish work landed without the completion transition result record.
Worker-skip result records dedupe a no-op without silently closing the claim.
The fixture and result records stay metadata-only for private/live-state classes.
The reusable mechanism is not "subagents are good." It is the concrete continuity membrane that future agents can run before relying on observe/apply bridge resumption claims.
Source-Backed System
The runtime consumes seven public fixture inputs:
observe_apply_session_fixture.json
detached_job.json
continuation_packet.json
heartbeat_rows.jsonl
resource_pressure.json
worker_skip_receipt.json
private_state_forbidden_terms.json
The fixture manifest declares five copied source body imports: codex_paths_body_import, markdown_routing_body_import, observe_memory_body_import, observe_surfaces_body_import, and observe_runtime_body_import. The component validates the copied target digests from observe_runtime_source_module_manifest.json; result record output keeps those bodies out of result records and records digest verdicts instead.
Result record Floor
A passing run writes five canonical result record roles:
continuation_packet.json
heartbeat.json
resource_pressure.json
resume_receipt.json
closeout_transition.json
Each result record carries organ_id, fixture_id, validator_id, checker_id, status, continuation_packet_status, heartbeat_status, resource_pressure_decision, resume_once_status, duplicate_resume_rejection, worker_skip_receipt_status, private_state_scan, authority_ceiling, anti_claim, and the full result record path set.
The runtime also enforces tracked result record-write gating. A direct run to tracked result record paths without MICROCOSM_TRACKED_RECEIPT_WRITES=1 reports tracked_receipt_writes_blocked rather than mutating tracked evidence silently.
Negative Cases
The expected negative-case floor is source-declared in the runtime and manifest:
The current result record error-code set includes MISSING_CONTINUATION_PACKET, MISSING_CONTINUATION_PACKET_FIELDS, CONTINUATION_PACKET_ALREADY_CONSUMED, HEARTBEAT_NOT_RESUME_AUTHORITY, STALE_HEARTBEAT_LIVENESS_CLAIM, RESOURCE_PRESSURE_DISPATCH_BLOCKED, BRIDGE_PACKET_PRIVATE_HUD_BODY, RESUME_PASS_OVERCLAIMS_WORK_LANDED, and OBSERVE_APPLY_VALIDATION_FAILED.
This module may claim public fixture evidence that synthetic observe/apply continuation packets, heartbeat rows, resource-pressure decisions, resume-once behavior, worker-skip result records, completion-transition result records, source-module manifests, negative cases, validation result records, and generated projections support the declared bridge-continuity fixture contract.
This module may not claim live bridge transport health, external model access, operator HUD/browser access, prompt-shelf or private-memory disclosure, live phase runtime truth, source-file changes, hosted-public posture, launch-scope decision, publishing-scope decision, implementation correctness beyond the listed witnesses, or whole-system correctness.
Scope limit
The component authorizes only public synthetic observe/apply fixture sign-off. It does not run live bridge transport, use external model services, read operator HUD/browser state, read live phase runtime state, read prompt-shelf or private-memory bodies, prove provider or UI uptime, land work, change source files, include launch operations, or certify whole-system correctness.
Read the five result records as fixture evidence, not as a bridge-health statement. A pass means the declared public continuity contract held for the synthetic fixture and copied body floor.
Cognitive Operator RegistryThe public cognitive-operator registry fixture validates operator-shape rows, active-operator dogfood result records, anti-sprawl decisions, copied source registry/standard/tool bodies, and scope limits without becoming operator source authority.
Cognitive Operator Registry is the public evidence membrane for reusable cognition as typed system. It validates public operator rows, dogfood result records with cognition-delta evidence, negative cases for missing fields, missing dogfood, sprawl, operator-voice claims, authority overclaims, and private-source leakage, then checks copied source registry, standard, and validator bodies by digest while keeping bodies out of result records.
Scope limit Public registry-contract fixture and copied source body evidence only; no live operator execution, registry mutation, source-file changes, external model access, launch-scope decision, private-data equivalence, operator correctness proof, or whole-system correctness.
cognitive_operator_registry is the public contract diagnostic for the source system's typed cognitive-operator system. It checks that each public operator row carries the required operator-shape fields, that every active operator is backed by a dogfood result record proving it changed a live decision, and that the registry policy declares explicit scope limits before a cold reader trusts the operators as real reusable cognition rather than inspirational prose.
Purpose
A team that writes down its reusable thinking moves as a registry tends to accumulate entries faster than it can prove any of them help. The single question this component answers is: which of these listed operators has actually changed a live decision, and which is just a tidy description of one? An entry may only call itself active if it points to a dogfood result record, and that result record must carry cognition_delta_evidence recording a concrete decision that came out differently because the operator was applied.
The unusual part is that the check refuses to take a row at its word. Where a result record cites evidence surfaces, command paths, or task-ledger handles, the validator resolves each one against the public system (see _dogfood_receipt_ref_resolves and _record_dogfood_evidence_resolution_findings in the source). A row whose prose says it was dogfooded but whose evidence does not resolve is recorded as a failure, not a pass. A second check, the anti-sprawl case, flags two operators that share a slug or a near-identical claim unless an accretion decision was recorded, so the registry cannot quietly grow two near copies of the same idea.
The evidence contract is source-open by default. The validator emits refs, hashes, counts, and verdicts; secret_exclusion_scan proves that secrets, account or session material, model-output data bodies, source notes, and account secret-equivalent access material are excluded. Operator bodies are never inlined into the JSON result record, so the positive evidence carries body_in_receipt: false, real_runtime_receipt: true, and synthetic_receipt_standin_allowed: false.
Prior Art Grounding
This component borrows from cognitive work analysis, provenance, schema validation, and policy-gated registries. Useful anchors include:
Cognitive Work Analysis, summarized in this information-systems design overview, as prior art for analyzing cognitive work in complex sociotechnical systems.
W3C PROV, for connecting operator claims to activities, agents, and evidence used to evaluate trustworthiness.
JSON Schema, for the required-shape validation pattern behind public operator rows.
Open Policy Agent, as a precedent for policy evaluation that remains distinct from the registry data being evaluated.
Microcosm borrows the cognitive-work, provenance, shape-checking, and policy registry patterns, but keeps this component to a public contract diagnostic. It does not mutate operators, prove operator correctness, expose private operator bodies or source notes, authorize providers, or include launch operations.
It consumes public operator_registry.json, operator_standard.json, and dogfood_index.json inputs that project real source operator rows and dogfood result records. Its result record contract is source-open by default: secret_exclusion_scan proves that secrets, account or browser material, model-output data bodies, source notes, and account secret-equivalent live-access material are excluded, while public_runtime_refs point at the real standard, component, sign-off, fixture, bundle, and paper-module system. Bodies are not inlined into JSON result records, so the positive evidence uses body_in_receipt: false, real_runtime_receipt: true, and synthetic_receipt_standin_allowed: false.
active operators with no backing dogfood result record
dogfood result records missing cognition_delta_evidence
near-duplicate operators (identical slug or near-identical claim) with no recorded accretion decision (the anti-sprawl governor case)
launch, provider, source-file changes, registry-mutation, or operator-correctness overclaims
operator rows that claim operator-voice or source note authority
private operator source bodies or model-output data bodies in public inputs
The exported bundle also imports three verbatim source bodies behind an import membrane: the cognitive-operator registry (codex/doctrine/cognitive_operators.json), the cognitive-operator standard (codex/standards/std_cognitive_operator.json), and the registry projection/validation tool (system/lib/cognitive_operator_registry.py). Each is copied byte-for-byte with a sha256 digest and required anchors; result records carry refs, hashes, counts, and verdicts only.
Shape
Source refs
Validator
cognitive_operator_registry validator
Diagram source
flowchart LR Registry["Public operator registry operator ids, roles, runtime refs"] Standard["Operator standard required fields, scope limit"] Dogfood["Dogfood result records cognition-delta evidence"] Validator["cognitive_operator_registry validator"] Source["Copied source bodies registry, standard, validator tool"] Negative["Negative floor missing fields, no dogfood, sprawl, overclaim, private leakage"] Result record["Result records refs, hashes, counts, verdicts; body text omitted"] Registry --> Validator Standard --> Validator Dogfood --> Validator Source --> Validator Validator --> Negative Validator --> Result record
Reader Evidence Routing
Read this module as a public contract diagnostic, not as a glossary of operators or a live execution surface. This page explains the shape a reader should verify; the structured data lives in the JSON files below.
Start with paper_modules/cognitive_operator_registry.json for the full module record, then use standards/std_microcosm_cognitive_operator_registry.json to check required fields, forbidden authority, public/private boundary rules, and result record expectations. Open core/fixture_manifests/cognitive_operator_registry.fixture_manifest.json before inspecting fixtures or copied source modules, because the manifest names the source-open body floor and the body-omission contract.
Read dogfood result records as evidence that an active operator changed a live decision; do not read them as proof that the operator is generally correct. Read negative cases as part of the positive claim: missing roles, missing dogfood, missing cognition-delta evidence, duplicate/sprawl pressure, operator-voice claims, authority overclaims, and private-source leakage must remain rejected.
Technical Mechanism
The runtime mechanism lives in src/microcosm_core/organs/cognitive_operator_registry.py. run() loads the first-wave public fixture inputs: operator_registry.json, operator_standard.json, and dogfood_index.json. _positive_findings() checks that operator rows have required ids, slugs, roles, claims, runtime refs, evidence refs, and scope limits, then requires each active operator to resolve to a dogfood result record with cognition-delta evidence. The dogfood evidence resolver follows public fixture refs and copied bundle handles rather than accepting a row because its prose says it was dogfooded.
Negative pressure is source-declared in EXPECTED_NEGATIVE_CASES. _negative_findings() exercises missing required fields, active operators without dogfood result records, dogfood rows without cognition-delta evidence, operator sprawl without accretion decisions, operator-voice authority claims, provider/source/launch/correctness overclaims, and private source or model-output data leakage. A pass is therefore not only "the positive rows parsed"; it also means the expected refusal classes were observed and recorded.
run_registry_bundle() is the body-floor consumer. It executes the same registry contract against examples/cognitive_operator_registry/exported_cognitive_operator_registry_bundle and makes _source_module_manifest_result() mandatory. The manifest must prove exact copied source bodies for codex/doctrine/cognitive_operators.json, codex/standards/std_cognitive_operator.json, and system/lib/cognitive_operator_registry.py; _source_open_body_import_summary() then records body ids, classes, line counts, hashes, and body_in_receipt: false. AUTHORITY_CEILING keeps those result records below registry mutation, operator correctness, provider authority, source-file changes, launch, and whole-system correctness.
Named Proof Consumers
microcosm_core.organs.cognitive_operator_registry.run is the first-wave fixture consumer. It reads the public registry, standard, and dogfood index, writes the result, board, validation, and sign-off result records, and checks the expected negative floor.
microcosm_core.organs.cognitive_operator_registry.run_registry_bundle is the exported-bundle consumer. It proves the copied source registry, standard, and validator bodies through source-module manifest equality while keeping copied body text out of result records.
tests/test_cognitive_operator_registry.py::test_cognitive_operator_registry_observes_negative_cases is the public-contract regression. It asserts that all expected negative cases are observed and that all fixture operators have dogfood result records.
tests/test_cognitive_operator_registry.py::test_cognitive_operator_registry_bundle_validates_runtime_shape is the bundle-shape regression. It checks operator counts, source-module manifest status, body-material ids, and the metadata-only result record boundary.
tests/test_cognitive_operator_registry.py::test_cognitive_operator_registry_source_modules_are_exact_macro_body_imports is the exact-copy proof consumer. It byte-compares every manifest source ref with the copied target and verifies the recorded sha256 digests.
Validation Result record Path
Run the first-wave fixture into disposable result records from the Microcosm root:
Run the exported bundle through the same component:
cd microcosm-substrate
../repo-pytest tests/test_cognitive_operator_registry.py -q
cd ..
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
The source atlas row carries the matching paper_module_ref, mechanism_refs, and code_loci entries.
Scope boundary
Scope limit
This paper module can claim a public cognitive-operator registry contract fixture with source-backed operator-shape checks, active-operator dogfood result record checks, cognition-delta evidence resolution, anti-sprawl accretion checks, expected negative cases, exact copied source body manifest equality, metadata-only result records, and a generated diagram view derived from the module's structured bindings.
It cannot become source authority for the cognitive-operator registry, mutate operators, prove operator correctness, expose private operator bodies or source notes, authorize providers, change source files, include launch operations or public sharing, or certify whole-system correctness.
If focused validation reports an exact-copy source-module body mismatch, route that repair through microcosm_exact_copy_refresh; do not treat this Markdown projection as source authority for copied source bodies.
Source and projection details
Governing Lattice Relation
That mechanism states the proof obligation in operational terms: operator rows must carry required shape fields, active operators must have dogfood result records, dogfood result records must include cognition-delta evidence, duplicate or near-duplicate operators must carry an accretion decision, and the exported bundle must prove copied registry, standard, and validator bodies by source module digest before any result record is trusted.
The generated JSON instance links this module to concept.architecture_and_navigation_route_contract_bundle, principles P-1, P-2, P-3, P-5, P-6, P-12, and P-15, and axioms AX-1, AX-4, AX-5, AX-7, AX-8, and AX-11. Those edges frame the module as an architecture-and-navigation contract validator. They do not make the Markdown or generated Atlas card source authority for operator definitions, live operator execution, or provider action.
Agent Completion Faithfulness AuditThe public completion-faithfulness fixture checks commit, ledger-cap, and pytest-span claims with real git/pytest subprocess witnesses while refusing unchecked pass overclaims.
Agent Completion Faithfulness Audit is the public fixture witness for completion evidence language. It builds a fixture repo, verifies commit and HEAD evidence with git subprocesses, runs pytest for the declared span, checks cap claims against a fixture ledger, validates a copied source-module manifest for agent experience diagnostics, and writes bounded result records that keep source bodies out.
Scope limit Public fixture and exported-bundle result records only; no arbitrary live commit proof, work log mutation or closure, live Git mutation, external model access, launch-scope decision, broad completion certification, or pytest-pass claim without explicit exit-zero status.
agent_closeout_faithfulness_audit checks the kind of sentence an agent writes when it finishes a task: "I committed the change, closed the ledger item, and the test passed." It runs the supplied public fixture evidence through real git and pytest subprocesses and refuses any claim that the evidence does not actually support.
Purpose
When an agent reports that work is done, the report is prose. The commit may or may not exist, the ledger row may or may not be there, and "the test passed" may mean the test ran, or it may mean nothing was checked at all. This component exists to answer one question over a fixed fixture: is each completion claim backed by an evidence object that genuinely exists, and is a "passed" claim backed by an explicit exit-zero status check rather than by the wording of the claim?
The approach is unusual in that it does not parse the completion prose or score it against a rubric. The fixture's public_fixture_repo is copied into a throwaway directory, initialised and committed with real git subprocesses, and its HEAD is read back with git rev-parse. A commit claim passes only when it points at that observed HEAD. A declared pytest span is run with python -m pytest <nodeid> inside that temporary repo, and only the exit code decides whether the span passed. The result record records the run as bytes of work that happened, not as a paraphrase of what the agent said.
The distinction the audit defends is narrow and easy to lose. "The span ran" and "the span passed" are separate facts, and a completion sentence that conflates them is the precise failure mode here. A pass claim is admitted only when pass_status_checked is true and the subprocess exited zero; a claim that expected a pass without that check is rejected with CLOSEOUT_PYTEST_PASS_STATUS_NOT_CHECKED. The same separation applies to commits and ledger caps, so a referenced commit object is not treated as a landed change and a named cap is not treated as closed work.
Primary result records: receipts/first_wave/agent_closeout_faithfulness_audit/agent_closeout_faithfulness_audit_result.json, receipts/first_wave/agent_closeout_faithfulness_audit/agent_closeout_faithfulness_audit_board.json, receipts/first_wave/agent_closeout_faithfulness_audit/agent_closeout_faithfulness_audit_validation_receipt.json, and result records/sign-off/first_wave/agent_completion_faithfulness_audit_fixture_acceptance.json
Generated posture: this paper module is authored doctrine. Refresh them through their owner commands instead of patching them by hand.
Shape
This module is a completion-claim accounting fixture, not a completion oracle. Its single question is: did the supplied public fixture evidence support the completion claims, and did the result record refuse the overclaims that should not pass?
Source refs
3 fixture claims
closeout_claims.json
Audit
agent_closeout_faithfulness_audit.run
2 cap rows
fixture_ledger.json
declared nodeid
tests/test_closeout_fixture.py
1 exact-copy source body
source_module_manifest.json
Diagram source
flowchart TD Claims[completion_claims.json 3 fixture claims] --> Audit[agent_completion_faithfulness_audit.run] Ledger[fixture_ledger.json 2 cap rows] --> Audit Repo[public_fixture_repo git fixture] --> Audit Pytest[tests/test_completion_fixture.py declared nodeid] --> Audit Manifest[source_module_manifest.json 1 exact-copy source body] --> Audit Audit --> Pass[pass result record 3 verified claims] Audit --> Neg[negative-case semantics 4 overclaim classes] Audit --> Ceiling[scope limit no live mutation or launch]
The accounting is source-backed:
Evidence input
Runtime check
Result record/accounting field
closeout_claims.json carries claim_public_head_exists, claim_cap_exists, and claim_pytest_span_passed
evaluate() loops over the three claim rows in src/microcosm_core/organs/agent_closeout_faithfulness_audit.py
claim_count: 3, verified_claim_count: 3
public_fixture_repo is copied into a temporary git repo
_prepare_public_fixture_repo() runs git init, config, add, commit, and rev-parse HEAD subprocesses
Negative cases are part of the Shape rather than an appendix because they define the claim boundary. EXPECTED_NEGATIVE_CASES names fake commit, fake cap, fake pytest node, and unchecked-pytest-pass classes; the focused tests assert the first three directly against fixture mutation and assert unchecked pass rejection against CLOSEOUT_PYTEST_PASS_STATUS_NOT_CHECKED. The runtime-bundle result record observes all four classes, so a cold reader can distinguish "the span ran" from "the pass claim had exit-zero evidence."
The source-body route is deliberately narrow. The exported bundle copies exactly system/lib/agent_experience_diagnostics.py to examples/agent_closeout_faithfulness_audit/exported_agent_closeout_faithfulness_audit_bundle/source_modules/system/lib/agent_experience_diagnostics.py; the manifest carries the matching digest, 1703 lines, required anchors Agent Experience Grand Rounds and completion, and body_in_receipt: false. Result records carry refs, hashes, counts, verdicts, and scope boundaries only. They do not carry copied body text, private root paths, model-output data, account or browser state, live work log authority, live work log authority, source-file changes, launch-scope decision, or whole-system completion truth.
Technical Mechanism
The fixture validator is centered on evaluate() in src/microcosm_core/organs/agent_closeout_faithfulness_audit.py. It loads closeout_claims.json and fixture_ledger.json, copies public_fixture_repo into a temporary repository, initializes and commits that copy with real git subprocesses, and records the resulting HEAD through git rev-parse HEAD. Commit claims pass only when the claim ref is HEAD or the actual subprocess-observed HEAD; fixture cap claims pass only when the cap id appears in the fixture ledger.
For pytest claims, evaluate() runs python -m pytest <nodeid> -q inside the temporary public fixture repo. A span can be counted as observed when the nodeid runs, but a pass claim is accepted only when pass_status_checked is true and the pytest subprocess exits zero. The same source file carries evaluate_negative_case(), which mutates one claim row at a time to force the fake commit, fake cap, fake pytest node, and unchecked pass paths. The expected error codes are declared in EXPECTED_NEGATIVE_CASES, so the negative floor is source-bound rather than inferred from prose.
The exported-bundle path uses run_agent_closeout_bundle() against examples/agent_closeout_faithfulness_audit/exported_agent_closeout_faithfulness_audit_bundle. That path reuses the same evaluator while making the source-module manifest floor mandatory: the copied diagnostic body must match the manifest digest, include required anchors, and remain absent from result records. AUTHORITY_CEILING then records the scope boundaries in machine-readable form: no live repo mutation, no launch-scope decision, no work log closure, and no pytest-pass claim without exit-zero evidence.
Named Proof Consumers
microcosm_core.organs.agent_closeout_faithfulness_audit.run is the first-wave fixture consumer. It materializes the public fixture repo, ledger, completion-claim rows, semantic negative cases, validation result record, board, and sign-off result record.
microcosm_core.organs.agent_closeout_faithfulness_audit.run_agent_closeout_bundle is the exported-bundle consumer. It validates the source-open bundle and the copied diagnostic body manifest while preserving body_in_receipt: false.
microcosm_core.organs.agent_closeout_faithfulness_audit.evaluate is the subprocess witness consumer. It checks commit, cap, and pytest-span claims against actual fixture evidence instead of accepting completion prose.
microcosm_core.organs.agent_closeout_faithfulness_audit.evaluate_negative_case is the falsification consumer for fake commit, fake cap, fake nodeid, and unchecked pytest-pass overclaims.
tests/test_agent_closeout_faithfulness_audit.py is the focused regression consumer. It asserts the public subprocess witness path, fake-claim rejections, semantic negative-case evaluation, exported-bundle metadata-only source manifest behavior, digest-mismatch rejection, and pytest-capable interpreter selection.
First Commands
From microcosm-substrate:
Validate the exported bundle when the question is whether the public source-open copy still matches the declared source body:
What It Proves
This component checks completion claims against public fixture evidence instead of trusting completion prose. A positive run proves four things:
the fixture repo exists and the referenced commit object is visible to real git subprocesses;
fixture HEAD is checked by subprocess evidence rather than by prose;
the declared pytest span actually ran;
work log style cap claims only point at rows present in the fixture ledger.
The useful distinction is narrow: verified means the referenced evidence object exists or the pytest span ran. A claim that a pytest span passed is valid only when the result record checked an explicit exit-zero status. That is the reader value of this component: it separates "I referenced a test" from "I proved the test passed."
Prior Art Grounding
This component is grounded in claim-verification and reproducibility patterns rather than in trust of summary prose. FEVER popularized fact extraction and verification as a separate task over cited evidence, while TruthfulQA made explicit that fluent model answers can be misleading without a truthfulness check. The artifact-review tradition also motivates separating a claim, its artifact, and its validation evidence instead of treating a report as self-validating.
Microcosm borrows that verification posture for agent completion: commit refs, work log refs, pytest spans, subprocess witnesses, and pass-status checks must line up before completion language is admitted. It does not certify all live completion prose or turn a referenced test into a passed test without exit-zero evidence.
Source-Backed System
The source-open body import is a single exact source body:
These cases are the claim-language guardrail. If they stop appearing in observed negative cases, the component no longer proves that public completion result records reject fabricated commit, cap, test-node, or unchecked-pytest-pass claims.
First command: PYTHONPATH=src python3 -m microcosm_core.components.agent_completion_faithfulness_audit run --input fixtures/first_wave/agent_completion_faithfulness_audit/input --out result records/first_wave/agent_completion_faithfulness_audit --sign-off-out result records/sign-off/first_wave/agent_completion_faithfulness_audit_fixture_acceptance.json.
Reader Evidence Routing
Start with the Route Card and JSON Bundle Binding to identify the component, standard, source row, runner, fixture input, exported bundle, and result record surfaces.
For behavior questions, read src/microcosm_core/organs/agent_closeout_faithfulness_audit.py and the focused tests before trusting this prose.
For source-open body questions, read the exported bundle's source_module_manifest.json; the manifest is the evidence for exact-copy relation, digest match, anchor match, and metadata-only result record posture.
For claim-language questions, read the Negative Cases and Result record Expectations together; the pass path only matters if the overclaim cases still fail.
Treat generated component Markdown, atlas cards, graphs, health files, and runtime result records as navigation or validation projections. They do not become source authority for broader completion truth.
Validation Result record Path
The focused proof consumer is tests/test_agent_closeout_faithfulness_audit.py. A passing result record has to show that completion language was checked against public fixture evidence: referenced commit objects, fixture work log rows, git subprocess witnesses, pytest subprocess witnesses, explicit pass-status checks, negative completion cases, and the exported source-module manifest. It must not rely on completion prose as its own proof.
For the focused test, the result record boundary is the asserted shape: three verified completion claims, at least five git subprocess witnesses, one pytest subprocess witness, one ran pytest span, one checked pass-status row, head_verified_by_subprocess=true, source-module digest and required-anchor matches, metadata-only result record posture, and semantic observation of the four negative completion classes. For the corpus check, the result record only proves bundle/instance parity; it does not close live work log work, mutate live work log state, certify arbitrary completion prose, prove launch-scope decision, or turn a referenced pytest span into a passed span without exit-zero evidence.
Validation Anchors
Focused coverage lives in tests/test_agent_closeout_faithfulness_audit.py and checks:
public git and pytest subprocess witness behavior;
fake commit rejection;
unchecked pytest pass rejection;
fake cap claim rejection;
fake pytest node id rejection;
metadata-only source manifest behavior in the exported bundle;
source-module digest mismatch rejection;
pytest-capable Python selection.
Scope boundary
Scope limit
This module may claim public fixture evidence that completion claims are checked against referenced commit objects, fixture work log rows, pytest subprocess witnesses, explicit pass-status checks, negative completion cases, a copied diagnostic body, source-module manifest digest equality, metadata-only result record posture, and validation result records.
This module may not claim live completion truth, live work log mutation, live work log mutation, live Git mutation, external model access, source-file changes, launch-scope decision, publishing-scope decision, deployment posture, all-agent faithfulness, formal-result correctness beyond the listed witnesses, or whole-system correctness.
Scope limit
This component is a public fixture witness for completion evidence. It does not:
prove arbitrary live commits landed;
close or mutate work log work;
mutate Git state;
include launch operations;
use external model services;
certify all completion prose;
turn a ran pytest span into a passed span without an explicit exit-zero check.
Its useful claim is narrower: over the supplied fixture repo, fixture ledger, completion claims, and copied diagnostic body, the component proves that completion evidence references are checked and that specific overclaims are refused.
Source and projection details
Governing Lattice Relation
That mechanism is active in core/mechanism_sources.json and says the component validates public completion evidence claims through fixture commit objects, fixture HEAD evidence, git subprocesses, pytest span execution, explicit pass-status checks, fixture-ledger cap rows, copied source-module digests, and stable overclaim negative cases before writing metadata-only result records.
The doctrine edge is narrow and constructive. The JSON instance reports concept.agent_reliability_and_safety_validator_bundle, principles P-1 and P-2, axiom AX-1, and dependency paper_module.durable_agent_work_landing_replay; those edges explain why this module is a validator-bundle proof instrument rather than a general completion truth oracle. The generated Mermaid and Atlas edges are navigation result records for that binding, not launch or correctness authority.
Cold-Reader Route MapThe public cold-reader route-map fixture validates first-run command order, docs refs, result record refs, scope limits, copied cold-entry source-module digests, and non-public-state exclusion without becoming route registry control.
Cold-Reader Route Map is the public first-run route membrane. It validates route rows, route-to-result record bindings, ordinal first-run sequencing, command/docs/result record refs, launch/provider/private-source overclaim rejection, exact copied source cold-entry source modules, and metadata-only result records so a cold agent can see what to run first and what proof bounds each row.
Scope limit Public route-map fixture, exported-bundle result records, and copied source cold-entry body evidence only; no route registry control, external model access, source-file changes, launch-scope decision, private-data equivalence, trading or financial decisions, or whole-system correctness.
cold_reader_route_map makes Microcosm's first ten minutes executable. It validates a public route map whose rows bind the first-run sequence to runnable commands, docs refs, result record refs, and scope limits.
Purpose
A cold technical reader should not have to infer the product path from a long README or raw result record tree. The route map answers one question: what should I run first, and what evidence proves that path is wired?
The unusual part is how the validator checks that proof. It does not merely confirm that each route row carries the right fields. It replays every route against real source: each row's command, its docs refs, its result record refs, and the human-readable signals it claims to show are matched against the actual text of copied source modules and public docs. A command whose material tokens do not appear anywhere in that source corpus is blocked, as is a docs ref that does not resolve to a real heading and a result record ref that does not open a pass-status result record. So a route cannot promise a command the system does not actually run, which is the failure mode a hand-written quick-start guide drifts into the moment the commands change underneath it.
The evidence contract is source-open by default: public route cards, route result record bindings, route policy, exported bundle refs, and generated result records carry the system, while secret_exclusion_scan excludes only private source bodies, model-output data, account or browser material, secrets, and account secret-equivalent live-access data. Result record bodies are not inlined; they are represented by body_in_receipt: false plus public runtime refs.
Shape
Diagram source
flowchart LR subgraph Entry["First-screen entry"] Project["Public project path repo -> .microcosm"] First["First-screen card claim frame, first command, evidence legend, exit rule"] end subgraph Accounting["Route-map accounting"] Route["Ordered route rows command, result record ref, evidence class"] Ceiling["Scope boundary and scope limit attached to each row"] end subgraph ReaderBranch["Reader branch"] Branch["Reader branch choose one first action"] Safety["Safety/evals status, authority, workingness"] Hiring["Hiring reviewer first-screen card, legibility scorecard"] Developer["Peer developer tour, observe, explain or compile"] end subgraph Boundary["Proof boundary"] Drilldown["Drilldowns tour, status, explain, observe, compile, serve"] Result record["Result records and route refs public refs only; body_in_receipt false"] end Project --> First First --> Route Route --> Ceiling Route --> Branch Branch --> Safety Branch --> Hiring Branch --> Developer Safety --> Drilldown Hiring --> Drilldown Developer --> Drilldown Ceiling --> Result record Drilldown --> Result record
Reader Evidence Routing
Start with core/paper_module_capsules.json::paper_modules[13:paper_module.cold_reader_route_map], then read the generated JSON projection for the resolved relationships. A diagram view is generated for this module and an atlas card entry is available. The route-map fixture, exported bundle, source-module manifests, and temporary result records are evidence for replay shape. This Markdown gives cold readers the interpretation order, source-linked only.
Prior Art Grounding
This component is grounded in documentation systems that treat reader state and task shape as first-class. Diataxis separates tutorials, how-to guides, reference, and explanation so readers are not forced through one undifferentiated documentation pile. Knuth's literate programming is an older anchor for the idea that executable systems should be written for human comprehension as well as machine execution.
Microcosm borrows the reader-route pattern: first command, result record ref, evidence class, scope boundary, scope limit, and next drilldown are ordered for a cold reader. It does not make the route map source authority or substitute documentation sequence for validator evidence.
Reader-Specific Evidence Routing
The route map should make the evidence-count frame visible before the reader chooses a drilldown. Honest counters are not progress badges:
A safety/evals engineer follows microcosm status --card, microcosm authority --card, and microcosm workingness --card first. The useful question is whether each claim names its evidence class, validator, failure mode, and scope limit.
A hiring reviewer follows the first-screen card and legibility scorecard first. The useful question is whether small verified counts are framed as honest proof boundaries instead of hidden or inflated.
A peer developer follows microcosm tour --card, microcosm observe --card, and then full microcosm observe, microcosm compile, or microcosm explain as drilldowns. The useful question is whether a fresh clone can reproduce the route/work/event/evidence chain locally without opening full event rows first.
The route map must therefore preserve both the command order and the evidence interpretation order: command, result record ref, evidence class, scope boundary, scope limit, then deeper route. Reader-specific branches may hide other branches, but they may not hide the accounting frame that prevents "1 verified import" from being read as either failure or marketing.
One-Screen Handoff Contract
The route map consumes the first-screen card as the handoff, not as another route row. A cold reader should see this sequence:
Route map: the accepted command order, with result record refs and scope limits attached to each command.
Reader branch: one audience-specific first action, one proof surface, one success criterion, and one next drilldown.
The handoff fails when the first screen turns into a complete route inventory, or when the route map assumes the reader already understands evidence classes. The first screen should compress; the route map should sequence; the reveal should demonstrate the path against public result records.
Comparison-Backed Route Rows
Each route row should make the unusual discipline visible by naming the normal failure mode it is avoiding. The route map is not just a command list; it is a sequence of claim-boundary checks:
Route row field
Failure avoided
Required reader cue
command_ref
Prose-only claims about what runs.
Show the exact local command before the claim it supports.
receipt_ref
Trusting generated summaries as source authority.
Point to the result record or validator that bounds the row.
evidence_class
Treating all evidence as equal proof.
Label body import, subprocess witness, projection, validator, or fixture evidence.
anti_claim
Letting a successful demo imply launch, production, provider, or proof authority.
State the forbidden read beside the positive claim.
failure_mode_ref
Governance looking like abstract ceremony.
Name the concrete overclaim or missing-standard case this row catches.
Rows that omit the comparison cue are still technically navigable, but they make the rigor invisible to a cold reader. The validator should prefer a shorter row with command, result record, class, scope boundary, and failure mode over a longer row that lists more components without explaining what each boundary prevents.
Observable Drilldown Order
Browser-first readers follow the same route map as terminal-first readers. The route order is compressed, not replaced:
First-screen card or compact browser board.
microcosm tour --card <project> as the shared behavior proof.
Selected route plus work/event/evidence refs.
Compact observatory view for the same route.
Full route map, result records, standards, and raw JSON drilldowns.
The compact observatory row must carry the same command ref, result record ref, evidence class, scope boundary, and scope limit as the terminal route row. If the browser board cannot show those fields, it is a preview only and cannot serve as the cold-reader route handoff.
readme_onboarding_route is the selected route only for projects with a README; folders without one still get a route/work/event/evidence path through the selected route emitted by tour and compile.
Each route card must include a command and public docs refs. Each route id must also resolve to at least one result record ref. The sequence must be ordinal sorted so the public entry does not drift into a bag of impressive but unordered components.
Validation
The fixture observes negative cases for missing command refs, missing result record refs, route sequence gaps, launch/provider overclaims, and private source body fields. The exported bundle omits negative cases and validates the real runtime shape used by microcosm run, with synthetic result record stand-ins explicitly disallowed as product evidence. If focused validation reports an exact-copy source-module body mismatch, route that repair through microcosm_exact_copy_refresh; do not treat this Markdown projection as the authority for copied source bodies.
Validation Result record Path
From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:
The focused pytest file is the proof consumer for this Markdown section. It asserts the fixture status, ten-route command and result record-ref counts, front-door route order, expected negative cases, route-source replay support, exported bundle shape, copied source-module digest and anchor matches, source-open fixture-manifest counts, no source bodies in public result records, streamed line-count and digest handling, and fresh exported-bundle card reuse. The corpus check verifies that this page remains in the 98-module paper-module set and that the JSON bundle, generated Mermaid, Atlas card, and Markdown projection stay mutually consistent.
These result records validate the route-map fixture, exported bundle result records, copied cold-entry evidence, and paper-module corpus membership only; they do not grant route registry control, external model service, source-file changes, launch-scope decision, private-data equivalence, financial decisions, publishing-scope decision, hosted readiness, or whole-system correctness.
Scope boundary
Scope limit
This module covers public cold-reader route-map validation: command refs, result record refs, ordinal route sequencing, evidence classes, scope boundaries, scope limits, exported-bundle provenance, copied cold-entry evidence, and negative cases for missing refs, sequence gaps, overclaims, and private body fields.
The ceiling stops before route-registry source authority, live session inspection, external model service, source-file changes, hosted readiness, launch, public sharing, private-data equivalence, or whole-system correctness. The route map can tell a cold reader what to run first and which result record bounds that run; it cannot promote the docs sequence into proof beyond those public fixtures and result records.
Scope limit
This component is projection-only metadata. It is not route registry control, it does not change source files projects, it does not use external model services, and it excludes launch, public sharing, trading or financial decisions, private-data equivalence, or whole-system correctness claims.
Source and projection details
Source-Open Body Floor
The source-open body floor is the public route-map fixture, route card set, route policy, exported cold-reader route-map bundle, source-module manifests, and generated result records. It carries public refs, digests, route ids, result record refs, evidence classes, scope boundaries, scope limits, and body_in_receipt: false markers instead of inlining private source or live state.
The floor excludes private source bodies, model-output data, account or browser material, browser or HUD state, account secret-equivalent live-access data, recipient state, and route-registry mutation authority. A reader can inspect the route map and exported bundle to reproduce the first-run sequence, but the bundle remains evidence for public replay shape rather than launch or production authority.
Proof Diagnostic Evidence SpineThe public proof-diagnostic evidence spine fixture validates Ring2 diagnostic result record refs, copied runtime artifact digests, provider/proof-body exclusions, stale-coupling visibility, and scope limits without becoming formal proof authority.
Proof Diagnostic Evidence Spine is the public evidence membrane before formal proof authority. It validates Ring2 failure-taxonomy and graph-update artifacts, verifier-trace repair and evidence-cell result record refs, copied runtime artifact digests, model-output data policy rows, negative cases, and a copied public component source-body floor while keeping proof bodies and provider output bodies out of public result records.
Scope limit Public diagnostic result record refs, copied Ring2 runtime artifacts, and copied public component source body only; no Lean/Lake execution, formal proof authority, formal-result correctness, external model access authority, runtime correctness, launch-scope decision, publishing-scope decision, private-system equivalence, or whole-system correctness.
proof_diagnostic_evidence_spine sits one step before formal proof authority. It holds diagnostic evidence from the formal-math evaluation and premise-retrieval pipeline as result record-backed cells, and refuses to let any of them be read as a proof.
Purpose
The component answers a single question: does a diagnostic check that claims to be backed by real Ring2 runtime evidence actually recompute against that evidence, or is it asserting more than its refs support? Without this membrane, a check row could name a failure-taxonomy report or a graph-update candidate set, declare itself passing, and be trusted on its own word. The spine refuses that.
What is unusual is that the validator does not trust the fixture's own pass label. It ignores the legacy expected_result field as a non-authoritative fixture label and rederives the verdict itself. For each check it resolves the named source_ref to a real file, re-hashes that file with sha256, and confirms the hash matches the expected digest. It then opens the named result record anchor and checks that the result record payload actually contains that source ref and digest. A check is accepted only when the source, the digest, and the result record all agree. The pass is a recomputation, not a claim copied from the fixture.
The second idea is that negative evidence is kept rather than hidden. A stale source fingerprint is recorded as source_fingerprint_status: stale and retained as diagnostic evidence; a provider advisory row is preserved as metadata while being rejected as authority; a forbidden proof-body field turns a row into a regression fixture rather than silently dropping it. The board shows what did not hold, which is the point of an evidence membrane.
Teleology
proof_diagnostic_evidence_spine is the body-safe evidence membrane before formal proof work. It records proof/evidence diagnostics while rejecting proof bodies, provider output bodies, source-authority upgrades, stale coupling, and runtime-correctness overclaims.
Public Contract
The validator consumes failure-taxonomy records, graph-update traces, verifier-trace repair artifacts, and formal evidence-cell anchor result record refs from the formal-math evaluation and premise-retrieval pipeline, then emits diagnostic result records over those refs. Provider-advisory rows are bounded evidence authority. Passing diagnostic checks do not become formal proof authority or formal-result correctness.
How a check is accepted
A check row carries three lists: source_refs, receipt_anchor_refs, and source_digest_refs. The validator does not take the row's word for whether it passes. It recomputes the verdict from the system.
For each source_ref it resolves a real file, reads it, and hashes the bytes with sha256. That hash must equal the expected digest the component holds for the ref. It then opens each result record anchor and checks that the result record payload actually contains the source ref and its digest, so a check is only "result record-backed" if the result record it cites genuinely references it. On top of that the component applies a semantic floor: a check whose id mentions a failure taxonomy must point at a source file that carries a failure-taxonomy report with representative failures and at a result record that carries a failure-mode ledger; a graph-update check needs graph-update candidates with ids and a matching result record anchor. The check is accepted only when every source resolves, every digest matches, every cited result record backs the ref, the semantic floor is satisfied, and no expected-negative error code is declared.
The concrete failure mode this guards against is a plausible-looking row that names real artifact paths but does not actually recompute: a digest that has drifted, a result record that does not mention the ref it claims, or a check labelled as failure-taxonomy evidence while pointing at an unrelated file. Each of those becomes a rejection finding rather than a silent pass. The recompute is also why a passing check stays bounded. It establishes that the named evidence is present and coupled, not that the underlying runtime is correct, which is why a row that adds claims_runtime_correctness is rejected as an overclaim.
Shape
Source refs
evidence accounting only
diagnostic_board.json
Diagram source
flowchart TD Check["Diagnostic check row source_refs, receipt_anchor_refs, source_digest_refs"] Resolve["Resolve source ref to real public file"] Hash["Re-hash file (sha256) compare to expected digest"] Result record["Open result record anchor does payload contain this ref and digest?"] Floor["Semantic floor failure-taxonomy / graph-update source and result record match"] Accept["Accepted check verdict = recomputed, body_in_receipt false"] Reject["Rejected / retained as diagnostic evidence"] Stale["Stale source fingerprint"] Provider["Provider advisory payload"] Proofbody["Forbidden proof-body field"] Check --> Resolve --> Hash --> Result record --> Floor Floor -->|all agree| Accept Floor -->|any mismatch| Reject Stale -. retained as evidence .-> Reject Provider -. metadata kept, authority denied .-> Reject Proofbody -. scrubbed, kept as regression .-> Reject Accept --> Board["diagnostic_board.json evidence accounting only"] Reject --> Board Board -. denies .-> Ceiling["no Lean/Lake run, no formal-result correctness, no provider authority, no launch"]
Evidence/accounting refs:
Bundle authority: core/paper_module_capsules.json::paper_modules[14] sets source_authority: json_capsule, names subjects proof_diagnostic_evidence_spine and mechanism.proof_diagnostic_evidence_spine.validates_ring2_diagnostic_evidence_membrane, resolves code_loci[0].path to src/microcosm_core/organs/proof_diagnostic_evidence_spine.py, and keeps generated_projections.markdown.generated: false, generated_projections.mermaid.status: available_from_capsule_edges, and generated_projections.atlas_card.status: linked_from_capsule_edges.
Generated instance boundary: paper_modules/proof_diagnostic_evidence_spine.json::paper_module_payload.projection_contract records authority_flip_status: not_flipped, while paper_modules/proof_diagnostic_evidence_spine.json::relationships.edges carries source-justified links to the component, mechanism, concept, principles, axioms, dependencies, and code locus.
Component/source locus: organs/proof_diagnostic_evidence_spine.json::organ_payload.source_atlas_row names the first command, claim_ceiling_restated, mechanism_refs[0], wires_to, and the same code-locus symbols implemented in src/microcosm_core/organs/proof_diagnostic_evidence_spine.py (PROOF_AUTHORITY_CEILING, EXPECTED_NEGATIVE_CASES, validate_copied_macro_body_artifacts, validate_evidence_receipts, validate_provider_payload_policy, validate_authority_ceiling, run, and run_evidence_bundle).
Standard contract: standards/std_microcosm_proof_diagnostic_evidence_spine.json::authority_boundary_detail limits the component to copied Ring2 diagnostic runtime artifacts, summary metrics, graph-variant metadata, and anchor result record refs. Its body_import_verification.source_open_body_import_floor records 13 copied artifact bodies, 10 exact copies, 3 public-light edits, and body_text_exported_in_receipts: false; its body_import_verification.public_organ_source_body_floor records one exact copied public component source body.
Bundle floor: examples/proof_diagnostic_evidence_spine/exported_evidence_bundle/bundle_manifest.json has schema_version: proof_diagnostic_evidence_spine_exported_evidence_bundle_v1, bundle_id: ring2_proof_diagnostic_evidence_runtime_example, copied_macro_body_artifacts count 13, and an scope limit of Ring2 diagnostic result record refs only, not formal proof authority.
Source-body floor: examples/proof_diagnostic_evidence_spine/exported_evidence_bundle/source_body_floor/source_module_manifest.json::modules[0] records source ref src/microcosm_core/organs/proof_diagnostic_evidence_spine.py, source_to_target_relation: exact_copy, sha256_match: true, body_in_receipt: false, and omitted material including model-output data bodies, account or browser state, browser UI live-access state, recipient-send state, private proof bodies, and oracle-needed premise ids.
Result record behavior: receipts/first_wave/proof_diagnostic_evidence_spine/proof_evidence_validation_receipt.json records accepted_count: 2, rejected_count: 1, missing_negative_cases: [], body_in_receipt: false, source_fingerprint_status: stale, and observed negative cases for source-authority upgrade, missing result record fields, runtime-correctness overclaim, provider/proof body rejection, and stale coupling. The sibling provider_payload_policy_result.json::provider_payload_policy preserves advisory metadata while rejecting the forbidden proof-body payload, and diagnostic_board.json::authority_ceiling rejects model-output data authority, source-authority upgrade, runtime-correctness claims, and formal prover execution.
Focused regression surface: tests/test_proof_diagnostic_evidence_spine.py asserts the observed negative cases match EXPECTED_NEGATIVE_CASES and checks the exported evidence bundle path. These tests support reader wiring and evidence accounting only; they do not establish formal-result correctness, provider authority, runtime correctness, publishing-scope decision, or launch-scope decision.
Reader Evidence Routing
Route currentness questions through ## JSON Bundle Binding and the validation commands in ## Validation Result record Path. The tests and corpus check confirm reader wiring and projection health; they do not establish proof authority.
Route source/body-floor questions through ## Source-Open Body Floor and the fixture/example paths named under ## Structured Lattice Bindings. The diagnostic artifact copies from the formal-math evaluation pipeline, public component-source copy, manifests, and digest coupling are evidence-accounting inputs; they are bounded evidence bodies, model-output data bodies, runtime correctness claims, or source-authority upgrades.
Route claim-safety and public-copy questions through ## Scope limit, ## Evidence-As-Accounting Shape, and ## Scope boundary, then pair this module with batch12_release_claim_language_gate when public wording is being checked. If the question is "did the validator still enforce the membrane?", use the focused pytest and corpus check in ## Validation Result record Path before citing the reader page.
Evidence-As-Accounting Shape
This component is the proof-adjacent evidence membrane behind Microcosm's scope limits. It accepts diagnostic runtime artifacts, result record refs, source digests, and negative-case results as evidence cells, while refusing to treat any of them as theorem authority.
The accounting rule is two-sided. A copied artifact from the formal-math evaluation and premise-retrieval pipeline can strengthen only the diagnostic claim named by its result record, digest, and validator; it cannot upgrade itself into formal-result correctness, provider authority, launch-scope decision, or private-system equivalence. Stale source coupling is retained as diagnostic evidence instead of hidden, and provider-advisory rows remain metadata without payload bodies.
Use this module with batch12_release_claim_language_gate when evaluating public copy: the evidence spine says what result record-backed cells exist, and the language gate decides whether a public sentence stays within that ceiling.
Prior Art Grounding
The evidence spine is grounded in assurance-case practice: evidence should be connected to claims, assumptions, and limits before it is treated as support. NASA's Goal Structuring Notation example for spacecraft assurance is a useful public analogue because it frames assurance as model-structured evidence rather than document-level persuasion: NTRS 20160005295.
The result record membrane also borrows from W3C PROV and observability practice: diagnostic artifacts are evidence cells with provenance, not theorem authority. That is why the component accepts digest-coupled diagnostic refs and negative cases while rejecting proof bodies, model-output data bodies, and stale source-coupling overclaims.
This module can claim reader wiring for the proof-diagnostic evidence membrane: the component and mechanism subject resolve, and the runtime source locus is named. It cannot claim Lean or Lake execution, formal proof authority, formal-result correctness, provider authority, runtime correctness of the imported systems, source-file changes, launch-scope decision, publishing-scope decision, hosted deployment, or whole-system correctness.
Diagnostic result records, copied runtime artifacts from the formal-math evaluation pipeline, copied public component source, source digests, and focused tests can support only bounded evidence-accounting claims: which public refs, manifests, negative cases, and body-hygiene checks were validated. A diagram view and atlas entry are generated for this module; they do not convert diagnostics into formal-result correctness or provider/publishing-scope decision.
Scope boundary
This module documents diagnostic result record anchors over real system from the formal-math evaluation and premise-retrieval pipeline, and keeps forbidden proof/provider body cases as regression-only guards. It does not run Lean, use external model services, expose proof bodies, prove runtime correctness, certify public launch operations, authorize public sharing or recipient work, establish secret export, or claim whole-system correctness.
Source and projection details
Source-Open Body Floor
The public bundle carries two bounded body floors. The runtime-artifact floor copies thirteen diagnostic artifacts from the formal-math evaluation and premise-retrieval pipeline under examples/proof_diagnostic_evidence_spine/exported_evidence_bundle/source_artifacts and records their source/target digest coupling in bundle_manifest.json. Three rows are source-faithful public-light edits that redact operator absolute paths and retain both source and target digests.
The component-source floor copies the public source body for src/microcosm_core/organs/proof_diagnostic_evidence_spine.py under source_body_floor/source_modules. Generated state/runs JSON artifacts are evidence bodies, not source-body authority. Neither body floor places body text in result records or workingness cards, and neither imports proof bodies, model-output data bodies, account or browser state, browser UI live access, recipient-send state, account secrets, private proof bodies, or oracle-needed premise ids.
Proof-Derived Governed Mutation AuthorizationThe public proof-derived governed-mutation fixture validates synthetic mutation proposals through proof cells, visible pre-execution policy verdicts, logged side effects, rollback result records, cold replay, negative cases, and copied source internal control bodies without granting live mutation authority.
Proof-Derived Governed Mutation Authorization is the public mutation-authority replay contract. It checks three synthetic proposals, proof-cell validator refs, visible policy verdicts, side-effect logs, rollback result records, cold replay, eight negative cases, and six exact copied source pattern/result record/internal control bodies while keeping account secrets, proof bodies, model-output data, account refs, source-file changes, and launch-scope decision out of result records.
Scope limit Public synthetic governed-mutation result records and copied source body digest evidence only; no standing account secret authority, live cloud/account mutation, source-file changes, external model access, hidden vote, policy-after-execution authority, benchmark security claim, launch-scope decision, public sharing, hosting, or whole-system correctness.
proof_derived_governed_mutation_authorization is the public mutation-authority replay component for showing that a mutation proposal cannot grant itself authority. It validates a synthetic governed-mutation bundle where read-only inspection, scoped config write, and rollback proposals are admitted only when proof cells, visible pre-execution policy verdicts, side-effect logs, rollback result records, cold replay, negative cases, non-public-state scan, and scope limits line up.
This module is source-backed public doctrine, not the source of authority. The source rows are the JSON bundle, mechanism registry row, component atlas binding, standard contract, fixture, exported bundle, component source module, and result records named below. Markdown remains an authored projection over those rows.
Purpose
The component answers one question: can a mutation proposal acquire the authority to change something just by asserting that it should? In an agent system the danger is an action that grants itself permission, for example by claiming a standing account secret, by recording a governance-vote nobody can see, or by reporting success after the fact. This fixture is the boundary that refuses each of those moves.
Authorisation here is derived, not asserted. A proposal is admitted only when an independent chain resolves: redacted proof cells that name validator result records, at least two visible policy verdicts evaluated before any execution identity is minted, a logged side-effect diff for write and rollback proposals, a paired rollback result record, and a cold-replay result record. The validator recomputes an evidence-chain hash from those resolved rows and rejects the proposal if the declared hash does not match. Impressive language, an admin-looking identity, or a final answer that says it worked all fail on their own.
The less obvious part is the anti-bake gate. Passing the synthetic chain is not enough: every authorised proposal must also bind to a real repository record, a concrete git commit that the validator resolves with a git subprocess and checks touched this component's own source or its focused test. The validator then re-derives the proof, policy, and rollback refs from the evidence indices and compares them to what the record declares. A fixture cannot pre-bake its answer, because the answer is reconstructed from real commit scope and the resolved rows rather than read from the file. The fixture admits exactly three synthetic proposals (read-only inspection, scoped config write, rollback) and rejects eight named overclaims; none of this grants any live mutation authority.
Shape
Subject: proof_derived_governed_mutation_authorization, with mechanism mechanism.proof_derived_governed_mutation_authorization.validates_synthetic_governed_mutation_authorization.
Runtime locus: src/microcosm_core/organs/proof_derived_governed_mutation_authorization.py, especially run, run_authorization_bundle, validate_mutation_proposals, validate_proof_evidence_cells, validate_policy_verdicts, validate_side_effect_ledger, validate_rollback_receipts, validate_cold_replay, _source_module_manifest_result, _source_open_body_import_summary, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
The positive fixture admits exactly three synthetic proposals: read-only inspection, scoped config write, and rollback.
Every admitted proposal must bind intent bundle refs, proof-cell validator result records, visible pre-execution policy verdicts, ephemeral execution identity refs, an evidence-chain hash, cold replay refs, and an scope limit.
Write and rollback proposals also need logged side-effect diff refs and a paired rollback result record before authorization.
The exported bundle imports six copied source bodies through source_module_manifest.json and validates them by exact-copy digest evidence without exporting source body text in result records.
flowchart TD Proposals["mutation_proposals.json 3 synthetic proposals: read-only, scoped write, rollback"] subgraph Evidence["Resolved evidence chain"] ProofCells["proof_evidence_cells.json validator-backed proof refs"] Policies["policy_verdicts.json 2+ visible verdicts before execution identity"] Effects["side_effect_ledger.json logged diff for write / rollback"] Rollbacks["rollback_receipts.json paired rollback result record"] Replay["cold_replay.json cold rerun per proposal"] end Hash{"Recompute evidence-chain hash declared == derived?"} Records["governed_mutation_records.json real repo record + git commit ref"] AntiBake{"Anti-bake gate git commit touched this source/test? re-derived refs match declared?"} SourceManifest["source_module_manifest.json 6 copied source bodies verified by digest"] Negatives["8 negative cases standing account secret, hidden vote, policy-after-execution, ..."] Result records["metadata-only result records result, board, validation, sign-off"] Ceiling["scope limit no account secrets, live mutation, provider, source-file changes, hosting, public sharing, or launch"] Proposals --> Evidence Evidence --> Hash Hash -->|match| AntiBake Hash -->|mismatch| Negatives Records --> AntiBake AntiBake -->|real record bound| Result records AntiBake -->|unbound or baked| Negatives SourceManifest --> Result records Negatives --> Result records Result records --> Ceiling
How it works
Take the scoped config write proposal. To be admitted it must carry the fourteen required fields, including proof_cell_refs, policy_verdict_refs, policy_evaluated_before_execution, side_effect_class, evidence_chain_hash, and cold_replay_ref. The validator then checks each one against the other input files rather than trusting the proposal's own summary.
For the proof refs it confirms each cell names the same proposal, carries evidence refs and validator-result record refs, is body-redacted, and does not export a proof body. For the policy refs it counts how many verdicts are visible to the result record, are not hidden votes, read allow or warn, and resolve back to a proof cell for that proposal. Fewer than two visible resolving verdicts blocks the proposal under GOV_MUT_CONSENSUS_WITHOUT_EVIDENCE. Because a scoped write has a reversible side effect, it also needs a logged diff ref in the side-effect ledger and a passing rollback result record for the same proposal. A write or rollback proposal with no rollback ref is rejected as an irreversible mutation.
The validator then recomputes the evidence-chain hash. It hashes the resolved proof digests, policy digests, side-effect ref, rollback ref, and cold-replay ref together and compares the result to the proposal's declared evidence_chain_hash. A mismatch fails the proposal, so the hash cannot be a hand-written constant. Only after the synthetic chain resolves does the real-record gate run. The governed-mutation record must declare a repo record class, a forty-character-or-shorter hex commit ref, and source refs covering git, mission-transaction, work-landing, and ledger material. The validator shells out to git to confirm the commit exists and that its changed files include this component's source module or its focused test, and it re-derives the proof, policy, and rollback refs from the indices so the record's claims must match independently computed values. An authorised proposal whose proposal id is not in the accepted real-record set is downgraded to blocked. The result is that a green result record requires three synthetic proposals, three real records bound to real commits, and a matching anti-bake status, none of which a static fixture can fake.
Public Contract
The source pattern is proof_derived_governed_mutation_authorization_compound.
The fixture lives at fixtures/first_wave/proof_derived_governed_mutation_authorization/input/.
The runtime example lives at examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle/.
The validator is microcosm_core.organs.proof_derived_governed_mutation_authorization.
The governing standard is standards/std_microcosm_proof_derived_governed_mutation_authorization.json.
The component model row is core/organ_atlas.json#proof_derived_governed_mutation_authorization.
The sign-off row is core/organ_registry.json#proof_derived_governed_mutation_authorization.
The fixture has three positive proposals: read-only inspection, scoped config write, and rollback. Every admitted proposal must cite an intent bundle, scope limit, proof cell, visible policy verdicts, ephemeral execution identity, evidence-chain hash, and cold replay ref. Write and rollback proposals also require logged side-effect diff refs and a verified rollback result record paired before the mutation is admitted.
Source-Backed Mechanism
The mechanism row mechanism.proof_derived_governed_mutation_authorization.validates_synthetic_governed_mutation_authorization points at these runnable source loci:
run and run_authorization_bundle for fixture and exported-bundle entry.
validate_mutation_proposals, validate_proof_evidence_cells, validate_policy_verdicts, validate_side_effect_ledger, validate_rollback_receipts, and validate_cold_replay for the authorization predicate.
_source_module_manifest_result and _source_open_body_import_summary for digest-verified copied source-body evidence without body text in result records.
EXPECTED_NEGATIVE_CASES and AUTHORITY_CEILING for falsification and scope boundary enforcement.
The exported governed-mutation bundle imports six source bodies through examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle/source_module_manifest.json. Those bodies are copied into source_modules/ with digest provenance:
Result records may report module ids, refs, counts, classes, hashes, and verdicts. They may not duplicate source body text, proof bodies, governance-vote bodies, model-output data, account secrets, account refs, or live access material.
Reader Evidence Routing
Open standards/std_microcosm_proof_derived_governed_mutation_authorization.json for required witnesses, negative-floor classes, denied authority, result record expectations, validator contract, and source refs.
Open core/fixture_manifests/proof_derived_governed_mutation_authorization.fixture_manifest.json for positive fixture inputs, eight negative fixtures, body-import summary, durable result record refs, and source-open omission rules.
Open examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle/source_module_manifest.json before inspecting copied source modules; result records carry refs, hashes, counts, and verdicts, not copied source body text.
Open tests/test_proof_derived_governed_mutation_authorization.py for the focused assertions on proposal counts, negative cases, source-module digest mismatch, public-relative redaction, and card result record reuse.
Run the fixture or exported-bundle route from microcosm-substrate/. The CLI supports --card, but it does not expose a --json flag.
Use scripts/build_doctrine_projection.py --check-paper-module-corpus to verify this Markdown projection still satisfies the shared paper-module coverage contract.
First Commands
From microcosm-substrate/, a cold agent can refresh the fixture result records with:
The exported bundle validator proves the copied source-body floor without writing durable result records:
result records/sign-off/first_wave/proof_derived_governed_mutation_authorization_fixture_acceptance.json
Current result record evidence records three proposals, three authorized synthetic mutations, three proof cells, six visible policy verdicts, two logged side effects, two rollback passes, three cold replay passes, no missing negative cases, private_state_scan.status=pass, and body_in_receipt=false for copied source source modules.
Negative Cases
The fixture rejects the eight named negative cases in core/fixture_manifests/proof_derived_governed_mutation_authorization.fixture_manifest.json: standing account secret authority, policy-after-execution, hidden governance-vote, live cloud account secret, irreversible mutation, unlogged side effect, consensus without evidence, and final-answer-only success.
These negative fixtures are the security argument. A proposal with impressive language, an admin-looking identity, hidden votes, post-hoc approvals, or a final answer that says it succeeded still fails unless the public evidence tables resolve to the authorization predicate.
Prior Art Grounding
The governed-mutation shape is grounded in admission-control and policy-as-code practice: a proposed state change is evaluated before it mutates the system, and the decision is separate from the actor's own assertion. The closest public anchors are Open Policy Agent, which separates policy decision-making from enforcement over structured input, and Kubernetes admission controllers, which validate or mutate API requests before persistence.
The rollback and side-effect portions are also adjacent to controlled rollout practice, including feature-flag and canary-launch patterns described by Martin Fowler. Microcosm keeps the pattern synthetic and replay-only: the component validates visible policy verdicts, side-effect logs, rollback result records, and cold replay without granting live mutation authority.
Public Scope
This component is a synthetic, public, source-open replay. It validates fixture and exported-bundle result records plus copied source bodies with digest provenance. The replay stays inside local files and does not use standing account secrets, access live cloud or account systems, use external model services, change source files, expose private proofs, expose policy-vote bodies, or claim benchmark safety.
This paper module can claim backed reader wiring for the synthetic governed-mutation replay: component and mechanism subjects resolve, the runtime source locus is named, and diagram and atlas views are generated for this module. It cannot claim live mutation authority, standing account secrets, cloud or account access, irreversible approval, source-file changes permission, provider authority, proof-body export, benchmark safety, launch-scope decision, hosted deployment, publishing-scope decision, or whole-system correctness.
Fixture result records, exported-bundle result records, focused tests, and source-copy digests can support only the bounded replay claim: synthetic proposal admission, proof-cell refs, visible policy verdicts, side-effect logs, rollback result records, cold replay refs, negative cases, and body-hygiene behavior. The diagram and atlas views are navigation aids derived from the module bindings; they do not expand the proof boundary.
Durable Agent Work-Landing ReplayThe public durable work-landing replay fixture validates recorded agent landing rows, copied source internal control source bodies, validation-before-commit ordering, HEAD-advance evidence, blocker capture, and work log completion without performing live Git work.
Durable Agent Work-Landing Replay is the public work-spine replay contract for agent landing claims. It checks claimed paths, owner-native validation refs, commit-attempt order, HEAD-before/after evidence, metadata-blocked rows, work log finalizer evidence, nine negative cases, and six exact copied source source bodies while keeping raw diffs, non-public paths, model-output data, and source bodies out of result records.
Scope limit Public synthetic replay result records and copied source work-landing/internal control source bodies only; no live Git mutation, unrelated staging, broad checkpoint authority, arbitrary commit proof, external model access, non-public body export, public sharing, hosting, launch-scope decision, or whole-system correctness.
Durable agent work-landing replay is the public work-spine component for showing how Microcosm treats agent work as a transaction instead of a chat claim. It binds owned-path claims, owner-native validation, scoped commit attempts, protected Git-metadata blockers, work log capture, work log finalizers, and seed reentry into a source-available replay contract.
The component is useful to a cold agent because it turns a landing claim into an evidence checklist: a row is not "landed" unless claimed paths, validation refs, commit-attempt refs, HEAD-before/after evidence, blocker capture, and ledger completion all line up in the recorded replay. It validates the replay contract and the negative fixtures. It does not perform the live landing itself.
Purpose
This component exists because an agent saying "I committed the fix" is cheap, and the claim is the part that tends to be wrong. The single question it answers is narrow: given a recorded landing attempt, does the evidence actually support the words used to describe it?
The approach worth noticing is that two ordinary-sounding rules are made into rejections rather than suggestions. A row that uses landed-commit language is rejected unless the recorded Git HEAD moved between before and after, so "I landed it" cannot stand on a HEAD that never advanced. A row on the commit path is rejected unless validation is recorded as preceding the commit attempt, so "it passed" cannot be back-filled after the fact. Those two checks, plus blocker capture for metadata-blocked rows and work log completion for every row, are what separate a transaction from a chat claim.
The replay is also source-backed rather than described from memory. The mechanics it checks rows against are not paraphrased; the actual source internal control files (work landing, mission preflight, scoped commit, the work log) are copied into the bundle by digest, so a reader can see which code the model was tested against. The component reads that evidence and rejects overclaims; it never runs Git, stages anything, or authorises a launch.
Shape
Source refs
Validator
durable_agent_work_landing_replay validator
Diagram source
flowchart LR Fixture["Public replay fixture claimed rows, validation refs, commit attempts, blocker rows"] Source["Copied internal control source bodies work landing, preflight, scoped commit, work log"] Validator["durable_agent_work_landing_replay validator"] Mechanics["Replay mechanics claim before mutation, validate before commit, HEAD movement before landed language"] Negative["Negative floor live Git authority, missing completion, uncaptured blocker, private leakage"] Result record["Result records board, result, validation, sign-off; no live mutation authority"] Fixture --> Validator Source --> Validator Validator --> Mechanics Validator --> Negative Mechanics --> Result record Negative --> Result record
Public Contract
The source pattern is durable_agent_work_landing_replay_compound.
The fixture lives at fixtures/first_wave/durable_agent_work_landing_replay/input/.
The runtime example lives at examples/durable_agent_work_landing_replay/exported_work_landing_replay_bundle/.
The validator is microcosm_core.organs.durable_agent_work_landing_replay.
The CLI command is microcosm durable-agent-work-landing-replay run-work-landing-bundle.
The governing standard is standards/std_microcosm_durable_agent_work_landing_replay.json.
The component model row is core/organ_atlas.json#durable_agent_work_landing_replay.
The sign-off row is core/organ_registry.json#durable_agent_work_landing_replay.
Technical Mechanism
The replay fixture imports six source internal control bodies through examples/durable_agent_work_landing_replay/exported_work_landing_replay_bundle/source_module_manifest.json. Those bodies are copied into source_modules/ with digest provenance instead of being summarized from memory:
The validator checks the replay rows against those source-backed mechanics rather than accepting a prose landing claim. validate_projection_protocol requires source pattern refs, projection result record refs, and public runtime refs. validate_landing_policy requires the scoped-commit, broad-checkpoint, metadata-blocked patch-bundle, and hard-stop lanes, with broad checkpointing kept behind explicit operator authorization and launch-scope decision kept false. validate_work_landing_runs enforces claim-before-mutation evidence, validation before commit attempt, HEAD movement before landed language, blocker capture before metadata-blocked completion, dirty-tree boundary evidence, and work log finalizer evidence.
The source-open body floor is enforced separately by validate_source_module_imports. The manifest must declare copied_non_secret_macro_body, body_in_receipt: false, exact-copy source-to-target relations, allowed public source material classes, expected digests, and required anchors inside each copied source body. That check keeps the reader claim tied to actual source internal control files while result records carry only refs, digests, counts, and verdicts.
The result builder merges projection-protocol, landing-policy, work-run, source-module, source-open-body, and secret-exclusion checks into one metadata-only result record set. The board result record records three claimed-path rows, two validation-before-commit mechanics, one metadata-blocked row, one landed-commit row, nine observed negative cases for the first-wave fixture, and zero authority for live Git mutation or launch.
Prior Art Grounding
This component is grounded in provenance and software supply-chain integrity patterns. The W3C PROV family provides a general model for entities, activities, and agents involved in producing an artifact. SLSA brings a similar concern to software builds: source, build process, provenance, and artifact integrity are tracked so consumers can reason about where an artifact came from and how it was produced.
Microcosm borrows that provenance posture for agent work landing: claimed paths, validation refs, commit attempts, HEAD-before/after evidence, blocker capture, Task/work log completion, and seed reentry are separate evidence fields. It does not perform a live Git landing or prove arbitrary commits outside the replay.
Reader Evidence Routing
Read the replay as an evidence-accounting component, not as a live landing controller. The board result record is the primary reader surface: it shows which claimed-path rows carried validation evidence, which rows were blocked by Git-metadata or dirty-tree constraints, and which rows had enough HEAD before/after evidence to use landed language.
Read the source-module manifest as provenance evidence for the imported control plane, not as a permission slip to mutate those source files. The manifest binds the copied bodies by digest and line count so a cold agent can see which mechanics the replay model was checked against.
Read negative cases as the authority floor. Rows that claim live Git mutation, broad checkpoint authority, missing work log completion, uncaptured blockers, launch-scope decision, or non-public paths/body export are supposed to fail. Passing those refusals is part of the positive claim.
First-wave runtime consumer: microcosm_core.organs.durable_agent_work_landing_replay run consumes the fixture input, writes result, board, validation, and optional sign-off result records, and observes the nine negative cases declared in EXPECTED_NEGATIVE_CASES.
Exported-bundle consumer: microcosm_core.organs.durable_agent_work_landing_replay run-work-landing-bundle consumes the exported bundle without durable result record mutation, validates the source-module manifest, checks copied source-body digests and anchors, and emits the command card path used by runtime-shell demos.
Scope limit consumer: standards/std_microcosm_durable_agent_work_landing_replay.json, the component AUTHORITY_CEILING, and the fixture negative cases keep live Git mutation, broad checkpoint authority, unrelated dirty-path staging, live Task/work log mutation, external model access, source-file changes, public sharing, launch, non-public body export, and whole-system correctness outside this module.
Negative Cases
The fixture rejects the nine named negative cases in core/fixture_manifests/durable_agent_work_landing_replay.fixture_manifest.json: missing validation evidence, validation recorded after a commit attempt, missing recorded completion, commit-landed language without a HEAD advance, live Git side-effect authority, missing dirty-tree boundary, uncaptured metadata blockers, overbroad distribution claims, and non-public path/body leakage.
This module may claim public replay evidence that claimed-path rows, validation-before-commit rows, HEAD before/after evidence, blocker-capture rows, work log finalizer evidence, copied internal control bodies, source manifests, metadata-only result records, and negative cases support the declared work-landing replay contract. The component, mechanism, code locus, governed concept, and principles are bound in the structured lattice bindings above.
This module may not claim live Git mutation, arbitrary commit-landed truth, live work log mutation, live work log mutation, external model access, broad checkpoint authority, source-file changes, hosted-public posture, launch-scope decision, publishing-scope decision, implementation correctness beyond the listed witnesses, or whole-system correctness.
Scope limit
This component is source-open replay evidence for synthetic result records and copied source bodies with digest provenance. It supports local inspection of recorded work-landing mechanics, while operational distribution and live Git side effects stay outside the public fixture.
Source and projection details
Governing Lattice Relation
The JSON bundle binds this module to mechanism mechanism.durable_agent_work_landing_replay.validates_public_work_landing_replay_contract, component durable_agent_work_landing_replay, concept concept.work_landing_and_continuity_control_bundle, principles P-5, P-10, P-14, P-15, and P-16, axioms AX-4 and AX-9, and the runtime code locus src/microcosm_core/organs/durable_agent_work_landing_replay.py. That lattice position makes the module a bounded work-landing accounting replay: it explains how evidence is recorded and rejected, not how to perform live Git mutation.
The concept edge is the scope limit. Broader work-continuity claims must route through sibling modules such as bridge_phase_continuity_runtime and work_landing_control_spine, while live landing behavior remains with the source internal control source files and work log/scoped-commit owner lanes. This module can cite their copied bodies as evidence, but it cannot promote itself into their live authority.
Work Landing Control SpineThe public work-landing control spine validates copied work-landing internal control source bodies without authorizing live Git, ledger, claim-launch, private-index, public sharing, or launch operations.
Work Landing Control Spine is the public source-open import for work-landing internal control bodies. It validates copied work_landing, work_landing_status, mission transaction preflight, landing preflight, and scoped-commit source modules by manifest digest, required anchors, no-live-mutation contract flags, secret-exclusion scan, and metadata-only result record policy while keeping live Git, work log, work log, claim launch, shared-index mutation, private-index execution, external model access, public sharing, and launch-scope decision out of scope.
Scope limit Copied source internal control source bundle, validator result record, source-manifest digest evidence, required-anchor evidence, no-live-mutation contract flags, and secret-exclusion evidence only; no live Git mutation, live work log or work log mutation, claim launch, shared-index mutation, private-index commit execution, broad staging, external model access, account secret export, public sharing, launch-scope decision, or whole-system correctness.
work_landing_control_spine makes the source work-landing control plane inspectable inside Microcosm by copying the command, reconcile, mission preflight, and private-index scoped commit source bodies into a public bundle. The point is not to let the public validator mutate Git or ledgers; it is to expose the real internal control mechanics that govern claims, owned paths, same-path conflicts, expected-parent checks, shared-index quarantine, finalizer ordering, and scoped commit discipline.
Purpose
The source repository this slice comes from is edited by several agents at once, so its hardest engineering problem is mundane: how does one agent land a small, finished commit without colliding with another agent's half-done work in the same files, and without claiming more than it actually did? The control plane that answers this lives in a handful of source modules. This module copies those bodies, non-secret, into a public bundle so a reader can inspect the real mechanics rather than take a description on trust.
The single question it answers is narrow: are these copied internal control bodies the genuine ones, with their load-bearing logic still present, and does the bundle keep within its stated limits? It is a witness over copied source, not a runner. It never touches Git, ledgers, or claims.
Two ideas in the copied source are worth a reader's attention. The first is the scoped commit. scoped_commit.py builds a throwaway Git index from the current HEAD, stages only the exact paths or hunks the agent declares it owns, writes a tree from that private index, and commits it against the captured parent with a compare-and-swap on the branch ref. The shared .git/index is never written. What was a three-step behavioural rule (add exact paths, check nothing else is staged, then commit) becomes an infrastructure invariant: an agent physically cannot sweep up a neighbour's dirty changes, because those changes were never in the index it committed from. The second is ordering. work_landing_status.py fixes a sequence of controller actions and a set of prerequisites, so that claims are only released after the work log session is finalised, and convergence is only recomputed after claims are released. The module checks that these anchors are still present in the copied bodies, which is what separates an honest copy from a stale hash bag.
Shape
Diagram source
flowchart LR A["Copied source source bundle 5 internal control bodies + manifest + contract"] --> B["Manifest digest and line-count check is the copy exact?"] B --> C["Required anchor scan is the load-bearing logic still present?"] C --> D["Runtime no-live-mutation contract are all authority flags false?"] D --> E["Secret-exclusion scan any private/account secret material?"] E --> F["Validation result record refs, hashes, counts, findings"] B --> G["Reject stale or missing source body"] C --> G D --> H["Reject authority overclaim"] E --> I["Reject private payload leakage"]
The shape is a public validation spine for copied work-landing internal control source. It validates exact copied module bodies, required anchors, contract flags, and secret-exclusion posture, then writes a result record that contains metadata, hashes, counts, gates, and findings. It does not execute live Git mutation, mutate work log or work log, launch claims, stage broadly, or run private-index commits.
Technical Mechanism
The validator in src/microcosm_core/macro_tools/work_landing_control_spine.py is a staged copied-source witness, not a live work-landing actuator. It loads bundle_manifest.json, source_module_manifest.json, and work_landing_control_runtime_contract.json; then it checks seven required inputs: five copied source source bodies plus the two manifest/contract JSON files. The source body rows must exist under source_modules/, keep the expected SHA-256 digests and line counts from the bundle manifest, stay inside the allowed material classes, and classify source modules as copied_non_secret_macro_body.
After file parity, the validator scans required anchors for each copied source body: work_landing.py must still expose the parser and admission/begin/status anchors; work_landing_status.py must still expose controller action and reconcile/finalizer models; the mission-transaction preflight wrapper and kernel must still expose owned-path, session-id, shared-index quarantine, and private-index admission anchors; and scoped_commit.py must still carry the private-index scoped-commit and shared-index non-mutation anchors. These anchor checks make the copied bundle more than a hash bag: the result record shows that the specific internal control mechanisms readers care about are still present.
The final gates are authority and payload boundaries. The runtime contract must keep every live-mutation flag false, including live Git mutation, work log mutation, work log mutation, claim launch, shared-index mutation, private-index commit execution, broad staging, external model access, public sharing, and launch-scope decision. The secret-exclusion scan runs over the copied source bodies and manifest/contract inputs, while the output result record records refs, hashes, counts, anchor rows, authority rows, and findings with body_in_receipt: false. Focused tests pin the pass case, the live-mutation overclaim blocker, streaming line-count behavior, source-manifest exact-copy checks, and the CLI smoke path.
The validator checks copied module digests, line counts, required source anchors, the no-live-mutation runtime contract, the originating overclaim Work item reference, and a secret-exclusion scan over the copied bundle. Source bodies live in the bundle; result records carry refs, hashes, counts, gates, and findings.
Governing Standard
standards/std_microcosm_work_landing_control_spine.json owns the result record contract, source refs, allowed public inputs, forbidden private inputs, and scope limit for this import.
This closes the old work_landing_tool_body_import overclaim by adding an exact copied-source bundle beneath the existing public dry-run refactor.
Governing Doctrine Relations
The generated structured source record reports sixteen bundle-derived edges for this page and zero unresolved selective relations. Its subjects bind the paper module to macro_projection_import_protocol and mechanism.macro_projection_import_protocol.validates_public_macro_projection_imports; its code-locus edges bind the reader page to src/microcosm_core/macro_tools/work_landing_control_spine.py and src/microcosm_core/organs/macro_projection_import_protocol.py. The mechanism claim is therefore narrow: Microcosm validates a public source-projection import bundle by copied-source parity, required anchors, no-live-mutation flags, and secret-exclusion evidence.
The concept edges are concept.work_landing_and_continuity_control_bundle and concept.import_projection_and_drift_control_bundle. The governing principles are P-5, P-10, P-14, P-15, and P-16; the governing axioms are AX-4 and AX-9; and the declared paper-module dependencies are paper_module.macro_projection_import_protocol, paper_module.durable_agent_work_landing_replay, and paper_module.mission_transaction_work_spine. Together these relations explain why the validator treats source-copy fidelity, transaction preflight, private-index containment, and no-launch/no-live-mutation ceilings as one control mechanism rather than as separate prose claims.
Reader Evidence Routing
Read a passing control-bundle result record as "the copied source bodies matched the manifest, required anchors were present, the no-live-mutation scope limit held, and no forbidden private material was found." Do not read it as a live landing operation or a proof that future work sessions are safe.
Read digest and line-count failures as stale-copy evidence only.
Read authority-overclaim failures as launch-boundary evidence. A contract that claims live Git, ledger mutation, claim launch, broad staging, private-index commit execution, external model access, or launch-scope decision must stay blocked.
Named Proof Consumers
Bundle validator consumer: PYTHONPATH=src ../repo-python -m microcosm_core.cli work-landing-control-spine validate-control-bundle --input examples/work_landing_control_spine/exported_work_landing_control_bundle --out /tmp/microcosm-work-landing-control-spine/result record consumes the copied source bodies, bundle manifest, source-module manifest, runtime contract, anchor scan, scope limit flags, secret-exclusion scan, and metadata-only result record writer.
Focused regression consumer: PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_work_landing_control_spine.py -q pins the green bundle path, exact-copy source-manifest relation, digest and line-count checks, live-mutation overclaim rejection, result record body omission, streaming line-count behavior, and CLI argument order.
Corpus consumer: PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus verifies that this Markdown remains consistent with the structured source record and bundle-backed corpus. It is a read-only consistency result record; it is not permission to hand-edit generated projections or shared bundle rows.
Prior Art Grounding
The landing spine is grounded in version-control staging, deterministic workflow history, and provenance-control patterns. Git's index separates selected changes from the rest of a dirty worktree, which is the practical ancestor of scoped path ownership. Temporal workflows show the value of recorded event history and deterministic replay for long-running work. Microcosm imports those ideas as a public control spine: preflight, scoped selection, reconcile/finalize checks, and copied source-body digests make work landing auditable without broad staging, private-index leakage, or live mutation from the paper module itself.
The command writes the copied-source validation result record under receipts/first_wave/work_landing_control_spine/, including exported_work_landing_control_bundle_validation_result.json. That result record is the public replay boundary for module digests, required source anchors, secret-exclusion posture, and the no-live-mutation runtime contract.
This result record path is reader-verifiable evidence only. It does not flip Mermaid/Atlas status, create bundle authority, run live Git mutation, mutate work log or work log state, execute private-index commits, launch claims, or aggregate doctrine-lattice coverage.
Scope boundary
Scope limit
This spine validates copied internal control source bodies for public inspection. It excludes live Git mutation, shared-index mutation, work log or work log mutation, claim launch, broad staging, private-index commit execution, external model access, account secret export, public sharing, hosting, or launch-scope decision.
Scope boundary
This spine is local internal control system for inspection and validation. It does not run live Git mutations; mutate work log or work log state; launch claims; stage broadly; execute private-index commits; use external model services; export account secrets, account or browser state, model-output data bodies, or recipient-send state; publish; host; or include launch operations.
Source and projection details
Source-Open Body Floor
The source-open floor for this module is the copied bundle plus the validator that checks it:
That floor lets a reader inspect the public internal control mechanics and replay the digest, anchor, no-live-mutation, and secret-exclusion checks. It is not a license to run live Git commands, mutate ledgers, claim launch-scope decision, or treat copied source bodies as a private-system-equivalent control plane.
Executable Doctrine GrammarThe public executable-doctrine grammar fixture validates public standard rows, paper-module sections, negative cases, copied executable-grammar and standards/type-plane source-module bodies, and metadata-only result records without claiming doctrine completeness.
Executable Doctrine Grammar is the public grammar membrane for doctrine-shaped runtime fixtures. It validates standard row fields, paper-module teleology and result record sections, duplicate-slug and overclaim negative cases, exact copied executable-grammar, standards-registry/type-plane, lattice, kind-atlas, and standards option-surface bodies, then emits metadata-only result records with source refs, digests, counts, and scope limits.
Scope limit Public fixture, exported standards bundle, and copied source specimen/body digest evidence only; no private standards engine export, source doctrine completeness, source notes export, model-output data export, source-file changes, launch-scope decision, publishing-scope decision, private-data equivalence, or whole-system correctness.
Doctrine in most systems is prose convention. A standard says a rule should hold, a paper module says a section should be present, and nothing checks whether the claim is actually true. This component exists to make doctrine shape a thing a program can pass or fail. It answers one question: does a standard row or a paper-module fixture carry the structure that doctrine here requires, or is it just text that looks the part?
What it checks is deliberately structural rather than semantic. A standard row must declare a teleology, a governing standard, result record expectations, and an scope boundary. A paper module must carry the matching sections by heading. The validator does not judge whether the prose is good. It judges whether the load-bearing fields are present, so a row cannot quietly drop its result record expectations or its scope boundary and still pass.
The less obvious part is that the failures are first-class. Five negative cases are part of the contract: a row missing its required fields, a prose-only standard that tries to claim executable authority, a source doctrine body copied into a public fixture, a duplicate standard slug, and a grammar pass that overclaims doctrine completeness. A run that does not observe each of these classes is blocked, so the checker is held to demonstrating that it can reject, not only that it can accept.
The component also imports copied source bodies, but only through a source-module manifest with declared SHA-256 digests, and never inlines a body into a result record. The result record reports refs, hashes, counts, and verdicts; the bodies live in the bundle. The point is to make the doctrine shape checkable without turning the public surface into an export of the private standards engine.
Teleology
executable_doctrine_grammar turns toy public standards and paper-module fixtures into deterministic grammar result records. It makes doctrine-shape claims checkable while importing copied, source bodies only through source-module manifests, digests, and result record boundaries.
flowchart TD A["Public doctrine fixtures fixtures/first_wave/executable_doctrine_grammar/input"] --> B["Executable grammar validator src/microcosm_core/components/executable_doctrine_grammar.py"] C["Exported standards bundle examples/executable_doctrine_grammar/exported_standards_bundle"] --> B D["Source-module manifest examples/executable_doctrine_grammar/exported_executable_grammar_metabolism_bundle/source_module_manifest.json"] --> B B --> E["metadata-only deterministic result records result records/first_wave/executable_doctrine_grammar/"] E --> F["Bundle and atlas evidence core/paper_module_capsules.json::paper_modules[18]"] F --> G["Bounded reader claim doctrine-shape validation, not launch-scope decision"]
Reader Evidence Routing
Reader evidence routes through the executable-grammar runtime, fixture inputs, exported standards bundle, executable-grammar metabolism bundle, source-module manifests, public result records, and focused tests. The Mermaid diagram and Atlas card are generated navigation projections; this page is the cold-reader explanation of the proof boundary.
Public Contract
The validator checks standard slugs, teleology, governing standard refs, result record expectations, scope boundaries, paper-module sections, source-body sentinels, duplicate slug conflicts, prose-only authority claims, and doctrine-completeness overclaims. It also validates the imported public executable-grammar specimen, standards registry, standards type-plane, lattice registry, kind-atlas runtime, and standards option-surface runtime as exact copied source modules.
Prior Art Grounding
This component is grounded in schema validation, parser generators, and executable semantics traditions. JSON Schema anchors the idea that document shape can be validated by a shared machine contract, Tree-sitter shows the practical value of generated grammars for inspectable source structure, and the K framework is a close reference point for turning semantic rules into executable artifacts.
Microcosm borrows the executable-contract pattern: doctrine shape, result record expectations, duplicate slugs, imported source bodies, and scope boundaries are checked by a validator instead of left as prose convention. It does not claim source doctrine completeness or launch-scope decision.
First Commands
From microcosm-substrate/, a cold agent can prove the fixture path:
This module documents a public grammar fixture plus exact source body imports. It does not claim source doctrine completeness, public launch-scope decision, hosted-public posture, public sharing, recipient work, external model access, private-data equivalence, or whole-system correctness.
Scope limit
This paper module can claim an executable-doctrine grammar fixture with a generated diagram view and an Atlas card. It can explain the public grammar specimen, exact source body imports, and metadata-only result record boundary.
It cannot claim source doctrine completeness, public launch-scope decision, hosted-public posture, publishing-scope decision, recipient execution, external model access, private-data equivalence, source-file changes, launch-scope decision, or whole-system correctness. Higher claims must land in the JSON bundle and generated projection before Markdown can narrate them.
Source and projection details
Source-Open Body Floor
examples/executable_doctrine_grammar/exported_executable_grammar_metabolism_bundle/source_module_manifest.json declares 12 copied source bodies. Result records may report refs, hashes, counts, classes, and verdicts, but body_in_receipt=false remains required.
The body-material classes are public_macro_receipt_body, public_macro_standard_body, and public_macro_tool_body. The body set covers the executable-grammar specimen README, board, and result record; standards registry and group-index standards; standard type-plane and core authority index; lattice registry and standard; and the kind-atlas / standards option-surface runtime tools.
Source Projection Import ProtocolThe public source-projection import protocol validates classified source-to-Microcosm projection cells, per-slice source-module manifests, digest relations, omission result records, intake statuses, and scope limits without claiming source or launch-scope decision.
Source Projection Import Protocol is the public import membrane for source-backed Microcosm growth. It validates fixture and exported projection bundles by checking source refs, public target refs, content digests, source-to-target relations, per-slice source-module manifests, validation refs, omitted-material result records, metadata-only result record policy, secret-exclusion scans, projection cell statuses, and negative cases while keeping true private bodies, model-output data, launch material, and static count claims out of public authority.
Scope limit Verified source body imports, fixture result records, exported projection-bundle result records, per-slice manifest refs, and metadata-only result record fields only; no live source source authority, private-system equivalence, launch, public sharing, hosted deployment, recipient work, provider or Lean/Lake execution, secret/non-public body export, source-file changes, or whole-system correctness.
macro_projection_import_protocol is the source-available membrane for bringing source system into Microcosm. It exists because Microcosm should be dense and alive without becoming a dump of private source bodies, operator context, model-output data, or launch material.
The component validates a projection packet with four public claims:
source bodies are copied or source-faithfully refactored only when the target file, digest, provenance, validation refs, and metadata-only result record contract verify;
private material is omitted with explicit omission result records;
public runtime refs are fixtures, standards, paper modules, exported bundles, copied body targets, and result record refs;
authority stays capped below launch, public sharing, private-system equivalence, and live source source authority.
Purpose
Microcosm grows by copying real material out of a much larger private codebase. The danger in that move is obvious: a dense public copy is exactly the kind of artefact that quietly carries a secret, an operator conversation, a model-output data, or launch material along with the genuinely useful code. This component exists to answer one question for every copied slice: was this body allowed out, and is the public copy honestly tied to the source it claims to come from?
The answer is an accounting check, not a trust statement. Each copied row declares its source ref, its public target ref, a content digest, and a material class. The protocol sorts that class into one of two sets. Five classes are source bodies (pattern, standard, tool, result record, proof) and may be copied with provenance. Nine classes are forbidden outright (source note, operator thread, model-output data, account secret, secret, recipient packet, launch packet, and the like) and can never appear as an imported body. Anything claiming to be must also carry a verification record naming the digest, the source-to-target relation, and the command or test that consumes the copy.
The unusual part is how the protocol treats a copy whose source has since changed. For an exact-copy row it re-hashes the live source file on disk and compares it against the digest recorded at import time. A mismatch is not reported as a failed import. It is recorded as live source drift: the original copy was still honest, the source has simply moved on, and the row is flagged for the refresh actuator rather than failed. The protocol deliberately separates a dishonest import from a stale one. That keeps the public copy faithful without forcing it to track every upstream edit in lock-step, and it stops a routine upstream change from being mistaken for a broken proof.
What the check does not do is just as load-bearing. A passing scan proves that the named slice omitted the forbidden material classes and kept result record bodies out of the result record. It does not establish the public copy is complete, equivalent to the private root, or ready to launch. The import is evidence about provenance and boundaries, never a launch decision.
Shape
The protocol is the membrane between source source and public Microcosm evidence. It reads projection cells, classifies the requested import, verifies source/target refs and digest relations, applies the secret-exclusion boundary, and emits metadata-only result records that a public reader can replay without gaining live source authority.
Its shape is deliberately two-level:
fixture and exported-bundle commands validate whole projection packets, negative cases, omitted-material result records, and the intake/status board;
source-module manifests bind each imported slice to source refs, target refs, digest relation, body-import class, validation refs, and scope limits.
That split keeps the component usable as a source-open body floor while preventing the paper module from becoming a static copy-count ledger. Counts, status totals, and current body-import floors live in result records and runtime status surfaces.
Runtime Shape
Run the fixture:
PYTHONPATH=src python3 -m microcosm_core.organs.macro_projection_import_protocol run --input fixtures/first_wave/macro_projection_import_protocol/input --out receipts/first_wave/macro_projection_import_protocol
Preview the next import slice without writing result records:
PYTHONPATH=src python3 -m microcosm_core.organs.macro_projection_import_protocol plan --input examples/macro_projection_import_protocol/exported_projection_import_bundle
The public CLI also exposes the same validator through:
The plan action emits macro_projection_import_intake_preview_v1. It does not write result records. It scores each proposed projection cell before import: source refs, public target refs, validation refs, selected pattern ids, copy policy, scope limit, omitted material, secret-exclusion scan count, verified body-import status, and ready/blocked status.
Exact-copy is a relation, not the whole protocol. Rows declared as exact-copy prove byte-identical source and target digests and may be maintained by the exact-copy refresh actuator. Rows declared as source-faithful public edits or refactors prove the source source digest and the improved public target digest separately, cite the rewrite or symbol mapping, and are maintained by their own validator/test lane. This is the lane for public-safety redaction, dependency trimming, Microcosm-standard compliance, or runnable local cleanup.
It also self-hosts the intake cell state machine. Every projection cell carries projection_status, cell_state, action_required, status reason, landed evidence refs, and a next runtime surface. The board totals those fields as status counts plus an open-actionable count so future passes can distinguish a ready but unlanded cell from a verified public runtime import, self-hosted protocol, or runtime bridge that is already consumed.
microcosm intake is the runtime bridge over that plan. It writes receipts/runtime_shell/intake_bridge/runtime_reveal_import_bridge.json, links the projection cells to the spine and reveal commands, and projects the same statuses into the first-run bridge. Current landed statuses are: public_runtime_import_landed for formal_math_readiness_extensions, self_hosted_status_protocol_landed for projection_protocol_self_host, and runtime_bridge_landed for runtime_reveal_import_bridge. These statuses do not raise authority above public metadata, fixture shape, and result record refs.
microcosm status and microcosm spine also expose the computed macro_body_import_floor. Treat that value as a result record-backed floor, not a stable prose constant: the current authority lives in result records/sign-off/first_wave/macro_projection_import_protocol_fixture_acceptance.json and the first-wave runtime result records under receipts/first_wave/macro_projection_import_protocol/. Cold readers should inspect public_safe_body_import_count, public_safe_body_import_status, projection_status_counts, open_actionable_cell_count, and secret_exclusion_scan there instead of trusting an old markdown count. The floor is still not a launch signal or private-system equivalence claim.
Trace-Bundle Source-Body Import
The trace-bundle slice is the current proof-grade example of a source-body import. Its source-module manifest is examples/macro_projection_import_protocol/exported_projection_import_bundle/trace_capsule_source_module_manifest.json; the projection cell is trace_capsule_prompt_edit_capture_source_modules_import. The cell imports four source source bodies into the bundle:
The manifest is the body-floor result record for this slice. It records module_count: 4, body_copied: true, body_in_receipt: false, sha256_match: true, line counts, byte counts, required anchors, source refs, target refs, and the shared copied_non_secret_macro_body classification. That means the public bundle carries the source bodies, while runtime result records carry paths, hashes, counts, anchors, and validation refs without duplicating the bodies.
Diagram source
flowchart TD A["Copied material row source ref, target ref, digest, material class"] --> B{"Material class?"} B -- "forbidden class (secret, account secret, source note, operator, provider, launch)" --> R["Reject: forbidden body import"] B -- "class (pattern, standard, tool, result record, proof)" --> C{"Verification record present and target digest bound?"} C -- "no" --> R2["Reject: unverified import"] C -- "yes, exact copy" --> D["Re-hash live source source on disk"] D --> E{"Source digest still matches?"} E -- "yes" --> F["body floor"] E -- "no" --> G["Flag live source drift (honest copy, refresh later)"] G --> F F --> H["Per-slice manifest + metadata-only result record"] H --> I["Reader projection"] H -. does not grant .-> J["live source authority, public sharing, launch, or source-file changes"]
The imported Python side supplies the trace-bundle runtime surface: cli_prompt_trace.py reads selected source files, rejects binary paths, supports line-range and symbol selection, redacts selected excerpt text, and emits numbered source lines with schema metadata. Its companion test module proves terminal validation semantics, repeated prompt interning, source excerpt priority, and completion-report behavior. The imported JavaScript side supplies the Agent Trace Structurer surface: parser.mjs preserves source_text as the exact copied string, treats source_lines and indexes as deterministic navigation projections, and builds lossless attachment clips where exact text is reconstructed from source_segments[].text. parser.test.mjs proves embedded file artifact indexing, Codex trace shape, final-message extraction, AIW thread classification, and bounded export behavior.
This is a mechanism/evidence claim, not a launch claim. The slice proves that these four named, source bodies were imported into the public bundle with manifest-backed digest and anchor checks, and that the parser and trace bundle behavior have public fixture coverage. It does not establish that live provider logs, browser UI state, account or browser state, account secrets, raw operator thread bodies, recipient-send material, or future trace-bundle bodies are or exported.
Those artifacts are the source-open floor. The result record bodies stay metadata-only, and private source note, operator thread content, model-output data bodies, account secrets, account or browser state, and launch or recipient material remain outside the public bundle.
Evidence Binding
The component's current public authority is the accepted component row in core/organ_registry.json plus the sign-off result record result records/sign-off/first_wave/macro_projection_import_protocol_fixture_acceptance.json. The JSON paper-module bundle is core/paper_module_capsules.json#paper_module.macro_projection_import_protocol, and the resolved mechanism row is core/mechanism_sources.json#mechanism.macro_projection_import_protocol.validates_public_macro_projection_imports. The runtime source locus is src/microcosm_core/organs/macro_projection_import_protocol.py, with focused regression coverage in tests/test_macro_projection_import_protocol.py.
The exported bundle does not have a single catch-all source-module manifest. It carries one *_source_module_manifest.json file per imported slice under examples/macro_projection_import_protocol/exported_projection_import_bundle/, plus copied targets under that bundle's source_modules/ tree. That per-slice manifest shape is part of the evidence: it lets each imported route, tool, standard, result record, proof, or runtime body keep its own source ref, target ref, digest relation, validation refs, and scope limit.
The first command for the fixture lane is:
PYTHONPATH=src python3 -m microcosm_core.organs.macro_projection_import_protocol run --input fixtures/first_wave/macro_projection_import_protocol/input --out receipts/first_wave/macro_projection_import_protocol
Reader Evidence Routing
Use this order when checking the module:
Read the JSON bundle and standard to confirm the paper-module binding, scope limit, source-module manifest contract, and result record fields.
Run the fixture command to validate projection cells and negative cases against temporary result records.
Run the exported-bundle command to validate the public bundle and copied source-module surfaces.
Inspect the source-module manifests for exact-copy versus source-faithful edit relations before deciding which refresh lane applies.
Run the focused regression and paper-module corpus checks before landing a markdown or manifest update.
If a manifest is dry but a bundle-level validator still fails, check whether a bundle manifest carries its own expected digest or line-count rows. Do not infer that all companion manifest surfaces were refreshed just because an exact-copy source-module dry run is clean.
Prior Art Grounding
The import membrane follows established provenance and software-supply-chain patterns: copied or refactored artifacts need source refs, target refs, digests, validation refs, omission records, and a claim boundary. The closest public anchors are W3C PROV for describing entity/activity/agent provenance, the SLSA specification for artifact integrity and provenance in software supply chains, and in-toto for linking supply-chain steps through signed metadata.
Microcosm applies those patterns to a public/private projection boundary rather than to launch attestation. The per-slice source-module manifests, secret-exclusion scans, metadata-only result records, and omission result records are inspired by that provenance lineage, but they remain a local validator contract for public Microcosm fixtures and exported bundles.
Negative Cases
The validator intentionally rejects:
private body import requests;
omitted source material without omission result record refs;
authority upgrades into live source source authority;
projection cells without validation refs;
launch, public sharing, recipient-work, or secret-export claims.
Validation Result record Path
From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:
These checks validate projection cells, per-slice manifests, omitted-material result records, and metadata-only result record policy only. A diagram view is generated for this module and an atlas card is linked. The checks do not authorize live source source authority, secret export, launch, public sharing, source-file changes, provider or Lean/Lake execution, or whole-system correctness.
Re-enter this module when a new projection cell lands, a source-module manifest is refreshed, or a result record count changes. The repair route is to rerun the component validator, refresh the first-wave and sign-off result records, and update the standard or paper module only where the result record contract changed. Do not raise the scope limit from documentation edits.
Scope boundary
Scope limit
This module can claim that the protocol validates projection cells, per-slice manifests, copied or source-faithful target bodies, omission result records, negative cases, and metadata-only result record policy. It can also claim that accepted result records expose current public_safe_body_import_count, public_safe_body_import_status, projection_status_counts, open_actionable_cell_count, and secret_exclusion_scan fields.
It cannot claim that Microcosm is launch-ready, equivalent to the private root, free of all private material, or authorized to publish. It also cannot raise an exact-copy refresh into permission to rewrite source-faithful public refactors, mutate live source source, use external model services, run Lean/Lake, or export operator/session bodies. Any stronger claim must come from the owning result record, standard, or launch gate.
Scope boundary: metadata, provenance, public runtime refs, copied-body presence, green fixture result records, digest refs, and intake status counts are bounded import evidence only. They are not launch-scope decision, publishing-scope decision, private-system equivalence, live source authority, semantic truth, complete secret-scan coverage, external model service, Lean/Lake execution, or whole-system correctness.
Scope limit
This paper module explains a public projection protocol. It excludes launch, hosted deployment, public sharing, recipient work, external model access, Lean/Lake execution, secret export, private source-body export, or whole-system correctness.
Source and projection details
Source-Open Body Floor
Exact-copy rows are refreshed by refresh-exact-copy-source-modules; source-faithful edit rows stay with their own validator/test lane because their target body is intentionally public cleanup, normalization, or path redaction rather than byte identity.
The bundle body floor is never inferred from prose. A reader should inspect:
examples/macro_projection_import_protocol/exported_projection_import_bundle/*_source_module_manifest.json for per-slice source-to-target relations;
the copied targets under examples/macro_projection_import_protocol/exported_projection_import_bundle/source_modules/;
receipts/first_wave/macro_projection_import_protocol/projection_import_intake_board.json for cell state, open actions, and landed evidence refs;
result records/sign-off/first_wave/macro_projection_import_protocol_fixture_acceptance.json for the accepted public authority result record.
Mission Transaction Work SpineThe public mission-transaction fixture validates work-landing, claim, dependency, scoped-commit, checkpoint-lane, result record-drain, completion, and copied control source-module contracts without mutating live ledgers or git.
Mission Transaction Work Spine is the public replay membrane for Microcosm work-landing discipline. It checks fixed Work item, claim, dependency, transaction, result record-drain, completion, scoped mutation, and checkpoint-lane rows; validates copied work log, work log, checkpoint, scoped-commit, and mission-preflight source modules by manifest; and writes metadata-only result records with secret-exclusion and scope limits.
Scope limit Public fixture and exported-bundle result records only; no live work log mutation, live work log mutation, live git mutation, private backup execution, broad checkpoint authorization, launch-scope decision, publishing-scope decision, or whole-system correctness.
This component exists because the riskiest moment in agentic code work is the one that feels safest: the agent runs a few checks, sees no errors, and concludes that its work is finished and committed. Those are different facts. A clean preflight describes the state of the checks. It says nothing about whether a competing claim already owns the same path, whether the branch has moved under the agent, or whether the commit ever actually landed. The single question this module answers is narrow and concrete: what evidence has to hold before a unit of work is allowed to land, and is that evidence checkable rather than asserted?
The interesting design choice is that the module refuses to trust its own declared verdicts. Most fixtures pass when their inputs carry the right labels. It then perturbs the input one field at a time: a same-path claim conflict, a stale expected-parent hash, a checkpoint lane mutated into an unauthorised broad commit. A genuine check has to break under each of those and stay clear under harmless ones, such as a claim on an unrelated path. That asymmetry, not the bare pass, is the claim.
The result is deliberately bounded. A pass means the public fixture, the exported source bodies, and the negative cases together preserve the work-landing contract and that its discriminating tests still discriminate. It does not touch the live work log, the live work log, or Git, and it grants no authority to commit, checkpoint broadly, back up, or launch.
Abstract
mission_transaction_work_spine is the public Microcosm paper module for work-landing discipline. Its telos is to make the boundary between "a check looked clean" and "work is actually allowed to land" inspectable as source, fixture, result record, and test evidence rather than as chat confidence or status arithmetic.
The component validates a fixed public mission-transaction bundle: Work item rows, work log path claims, dependency unlocks, transaction plans, result record drains, completion projections, scoped mutation policy, checkpoint-lane decisions, copied internal control source modules, and metadata-only result records.
The result is intentionally narrow. A pass means the public fixture and exported bundle preserve the mission-transaction contract, its source-open body floor, and its negative cases. It does not mutate work log, work log, or Git; it does not certify arbitrary live completion; and it does not grant broad checkpoint, backup, launch, public sharing, provider, or whole-system authority.
Problem
Agentic code work fails most often at transaction boundaries, not at isolated syntax checks. Common false positives include:
treating a clean preflight as a landed commit;
ignoring a competing work log claim on the same path;
accepting a claim whose expected parent no longer matches the repository;
marking a downstream Work item ready without hard-dependency evidence;
reading a dirty tree as a blocker for scoped commits while allowing broad staging without explicit operator authorization;
writing result records that smuggle private ledger or provider bodies into a public artifact.
The module turns those failures into a deterministic replay. A cold reader can inspect which public rows are projections, which copied source bodies implement the checks, which negative cases must be observed, and which authority claims remain forbidden even when every validator passes.
Shape
Source refs
JSON bundle
paper_module.mission_transaction_work_spine
Component runtime
mission_transaction_work_spine.py
Diagram source
flowchart TD Bundle["JSON bundle paper_module.mission_transaction_work_spine"] Fixture["First-wave fixture Work items, claims, deps, lanes, result records"] Bundle["Exported bundle work log, work log, checkpoint, scoped commit, preflight source bodies"] Snapshot["Real work log session snapshot active claims, heartbeat, source hash"] Runtime["Component runtime mission_transaction_work_spine.py"] R3["R3 replay verdict runtime-derived, not label-derived"] Result records["metadata-only result records refs, hashes, counts, limits"] Ceiling["Scope limit no live ledger, git, launch, or provider authority"] Bundle --> Runtime Fixture --> Runtime Bundle --> Runtime Snapshot --> Runtime Runtime --> R3 Runtime --> Result records R3 --> Ceiling Result records --> Ceiling
This Mermaid diagram is the reader flow. The generated lattice Mermaid remains available_from_capsule_edges, and the generated Atlas card remains linked_from_capsule_edges; both are derived from bundle and doctrine-lattice rows, not from this prose.
Technical Mechanism
The component exposes two validator paths.
The first-wave fixture command validates the local replay fixture and writes the canonical result record set:
That path loads public fixture rows, validates dependency unlocks, claim preflight, scoped result record authority, private-marker rejection, preflight overclaim rejection, checkpoint lane policy, and the real active-claims snapshot.
The exported-bundle command validates source-open import and bundle replay:
That path checks copied work log, work log, checkpoint, scoped-commit, and mission-preflight source modules by manifest, digest, anchor strings, secret-exclusion scan, and body_in_receipt: false. It also requires the real work log snapshot in the mission bundle. Commit da97bc6394 (Require real work log snapshot in mission bundle) landed the snapshot as a required bundle input; later source/test commits recomputed the snapshot verdict and bound the R3 claim to runtime evidence.
Prior Art Grounding
This component is the mission-transaction member of Microcosm's local work-landing family. Its closest sibling is durable_agent_work_landing_replay, which checks recorded landing rows, validation-before-commit ordering, HEAD movement, blocker capture, and work log completion evidence without performing live Git work. mission_transaction_work_spine narrows that pattern to the transaction preflight and work log seed-speed membrane: same-path claim conflicts, expected-parent mismatches, checkpoint-lane selection, dependency unlocks, result record drains, and session finalization posture.
It also supplies a source-import anchor used by adjacent public components such as concurrency_mission_control and macro_projection_import_protocol. Those links are structural evidence routes, not runtime invocation or launch-scope decision. The prior-art claim is therefore local and source-bounded: this paper module inherits the work-landing accounting shape, then tests the particular mission-transaction and work log session-snapshot boundary.
Data And Evidence Contract
The public evidence bundle is composed of source refs, hashes, rows, and result records. The source bodies live only in the exported bundle's source_modules/ tree; result records carry refs, counts, hashes, verdicts, and ceilings, not private or live internal control bodies.
The result record floor includes preflight, dependency blocked, work landing attempt, claim preflight, scoped mutation, checkpoint lane, completion projection, dependency unlock scheduler, reconcile plan, and exported-bundle validation result records. The fields must preserve schema and component ids, validator id, command, status, observed and missing negative cases, error codes, scope boundary, secret-exclusion status, public work-landing status, body-import status, body_in_receipt: false, scope limit, and result record paths.
Discriminating Tests
The positive claim is not "the fixture passes." The positive claim is that the fixture accepts real-good evidence and rejects targeted perturbations.
Real-good case: the real active-claims snapshot passes with R3 public_safe_real_work_ledger_session_snapshot_replay, a state/work_ledger/active_claims_snapshot.json source ref, a matching source hash, a bound session heartbeat, and five source-session claims.
Same-path perturbation: adding a competing claim on the requested path blocks preflight through work_ledger_runtime.active_claim_collisions_for_paths and emits SAME_PATH_CLAIM_CONFLICT.
Parent perturbation: changing the expected parent for a real claim blocks with EXPECTED_PARENT_MISMATCH; changing it back to the current parent clears.
Disjoint perturbation: adding a claim on a disjoint path does not create a collision for the requested path, so the public preflight remains pass.
Landing-row perturbation: mutating the checkpoint lane into an unauthorized broad checkpoint blocks with the checkpoint-lane violation floor.
Private-body perturbation: a fixture row that carries live private work log body material is rejected, while source bodies copied into the public bundle remain outside result records.
Overclaim perturbation: a clean preflight cannot claim that work is already landed.
Dependency perturbations: dangling dependency refs and ready rows with incomplete hard dependencies remain blockers.
Focused regression coverage lives in tests/test_mission_transaction_work_spine.py. The R3 tests assert that the verdict is re-derived from runtime evidence, expected labels are not sufficient, source hashes are bound, mutated or stale snapshots are rejected, clear perturbations move the verdict, and body_in_receipt is false.
Reader Evidence Routing
Read this module as an evidence-accounting paper, not as a live controller.
Open the mechanism row and standard to see the required bundle fields: work items, claim table, dependency graph, transaction plan, result record drain, completion projection, scoped mutation policy, checkpoint lane policy, copied source imports, body import verification, scope boundary, and scope limit.
Inspect the real active-claims snapshot to see the source ref, source hash, snapshot time, source session id, owned paths, checkpoint lane case, runtime session, and metadata-only posture.
Read the focused tests to verify R3 is runtime-derived: same-path conflicts, stale parents, landing-row violations, disjoint paths, and equal-parent mutations are all discriminated.
Treat generated JSON, generated Mermaid, Atlas, public-site docs, and result records as projections or validator outputs.
Limits And Non-Claims
The module's useful claim is compact: public fixture rows, copied control source bodies, a real work log session snapshot replay, discriminating negative cases, metadata-only result records, and focused tests preserve the mission-transaction work spine at R3.
It may not claim live work log authority, live work log authority, live Git mutation, broad checkpoint authorization, private backup execution, current repository completion, source-file changes, provider behavior, browser UI state, launch-scope decision, publishing-scope decision, hosted-product readiness, or whole-system correctness.
Validation Result record Path
For this Markdown-only paper-module update, use non-mutating checks from repo root:
This module may claim that public fixture rows, copied control source bodies, a real work log session snapshot replay, discriminating negative cases, metadata-only result records, and focused tests preserve the mission-transaction work spine at R3. That is a replay and evidence-shape claim.
This module may not claim live work log authority, live work log authority, live Git mutation, broad checkpoint authorization, private backup execution, current repository completion, source-file changes, provider behavior, browser UI state, launch-scope decision, publishing-scope decision, hosted-product readiness, or whole-system correctness.
Formal Math Readiness GateThe public formal-math readiness gate validates declared corpus, tactic, premise, routing, provider-budget, source-module manifest, copied PROVER probe body, and negative-case boundaries without claiming Lean/Lake or proof authority.
Formal Math Readiness Gate is the public readiness membrane before downstream proof work. It checks declared Mathlib and corpus readiness, tactic probe result records, proof-metadata-only premise indexes, target-shape routing, provider context budgets, formal_math_readiness_extensions intake rows, copied PROVER smoke-run artifacts, public component source body imports, digest manifests, secret exclusion, and negative cases, then emits readiness boards rather than theorem evidence.
Scope limit Public readiness metadata, copied PROVER smoke-run readiness/probe artifacts, public component source body floor, fixture result records, and exported-bundle result records only; no Lean/Lake execution, Mathlib availability beyond probe status, formal proof authority, formal-result correctness, external model access, private proof body, oracle premise id, launch-scope decision, publishing-scope decision, or whole-system correctness.
formal_math_readiness_gate is the public runtime cell that turns the formal math slice from a deferred slogan into an executable boundary. It validates synthetic readiness metadata for corpus availability, tactic probes, premise indexes, target-shape routing, and provider context recipes before any future Lean witness can claim authority.
The page should let a cold reader answer one question without rereading the component: what evidence has Microcosm actually validated, and where does that evidence stop?
Purpose
Formal-math tooling fails quietly when a library, tactic, or corpus is assumed present rather than checked. A pipeline that routes a proof to aesop when aesop is not actually available, or that treats a premise index as proof evidence because it happens to carry a proof body, has already lost the boundary between "ready to attempt" and "proven". This component exists to make that boundary explicit before any downstream proof work begins. It answers one question: which declared formal-math inputs are well-formed and honest enough that a later proof witness could safely consume them, and where exactly does that warrant stop?
The mechanism is a deterministic reducer over five public JSON inputs: corpus readiness, tactic-portfolio availability, a premise index, target-shape routing, and provider context recipes. It does not run Lean or Lake. Instead it reads what those inputs declare and refuses the specific ways they can lie. A corpus that claims Mathlib is available without a passing probe is rejected. A tactic marked available without a probe result record is rejected. A premise row carrying a proof_body or oracle_needed_premise_ids field is rejected. A route that admits a tactic the portfolio probe already marked unavailable is rejected. The output is a readiness board, not theorem evidence.
The design choice worth noticing is that the gate proves its own discipline through negative cases. Alongside the positive inputs, the fixture carries five inputs that each commit a known overclaim, and the run passes only when every one of those overclaims is caught and no unexpected finding appears. The gate is therefore not merely asserting "we check Mathlib availability"; it is demonstrating, on each run, that a falsified Mathlib claim is actually refused. A second guard keeps the floor source-open without leaking: copied prover probe bodies are verified by digest through a manifest, while proof bodies, model-output data, and private state stay out of the result records entirely.
Shape
Source refs
reject Mathlib-availability overclaim
validate_corpus_readiness
each available tactic needs a probe result record
validate_tactic_portfolio
reject route admitting an unavailable tactic
validate_target_shape_routing
reject over-budget or proof-body recipe
validate_provider_context_recipes
copied probe bodies, digest-checked
validate_source_module_imports
Diagram source
flowchart TD Inputs["Five public JSON inputs corpus, tactics, premises, routes, provider recipes"] Scan["Secret-exclusion scan zero blocking hits required"] Corpus["validate_corpus_readiness reject Mathlib-availability overclaim"] Tactics["validate_tactic_portfolio each available tactic needs a probe result record"] Premises["validate_premise_index reject proof_body / oracle premise ids"] Routing["validate_target_shape_routing reject route admitting an unavailable tactic"] Provider["validate_provider_context_recipes reject over-budget or proof-body recipe"] SourceFloor["validate_source_module_imports copied probe bodies, digest-checked"] Reconcile["Reconcile findings vs EXPECTED_NEGATIVE_CASES every known overclaim must be caught"] Board["Readiness board + extension board available / blocked capabilities, counts"] Ceiling["Scope limit no Lean/Lake, proof, provider, launch, or private-system authority"] Inputs --> Scan Scan --> Corpus Scan --> Tactics Scan --> Premises Scan --> Provider Tactics -->|unavailable tactic ids| Routing Corpus --> Reconcile Tactics --> Reconcile Premises --> Reconcile Routing --> Reconcile Provider --> Reconcile SourceFloor --> Reconcile Reconcile --> Board Board --> Ceiling
The machine graph remains the generated paper_module.formal_math_readiness_gate.mermaid projection derived from the source record, not from this hand-authored Mermaid block.
Reader Evidence Routing
Read this module in evidence order:
Start at core/paper_module_capsules.json::paper_modules[21:paper_module.formal_math_readiness_gate]. That row names the source authority, subjects, mechanism refs, code locus, Microcosm concept/principle/axiom refs, generated projection statuses, and the bundle scope limit.
Check the generated structured source record paper_modules/formal_math_readiness_gate.json. Its relationships.edges cite the bundle source refs and show the generated Mermaid status, Atlas status, source_authority: json_capsule, and unresolved selective-relation count.
Inspect the runtime locus src/microcosm_core/organs/formal_math_readiness_gate.py, especially run, run_readiness_bundle, validate_source_module_imports, write_receipts, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and SOURCE_MODULE_MANIFEST_NAME.
Use fixture evidence for the gate behavior: fixtures/first_wave/formal_math_readiness_gate/input, receipts/first_wave/formal_math_readiness_gate/readiness_gate_result.json, formal_math_readiness_board.json, formal_math_readiness_extension_board.json, formal_math_readiness_validation_receipt.json, and result records/sign-off/first_wave/formal_math_readiness_gate_fixture_acceptance.json.
Use exported-bundle evidence for source-open body-floor claims: examples/formal_math_readiness_gate/exported_formal_math_readiness_bundle/source_module_manifest.json, bundle_manifest.json, source_artifacts/, source_body_floor/source_modules/, and receipts/runtime_shell/demo_project/organs/formal_math_readiness_gate/exported_formal_math_readiness_bundle_validation_result.json.
Use tests/test_formal_math_readiness_gate.py for the behavioral result record boundary. The tests cover negative cases, exported bundle sign-off, source-module digest and target-ref mismatch rejection, bounded command-card output, source-body omission from result records, secret-exclusion/public-relative result record paths, and non-writing plan preview.
Do not route a proof claim through this page. It routes readiness evidence, result record integrity, and source-body-floor accounting only.
Technical Mechanism
The runtime is a deterministic readiness reducer over declared public inputs. run() evaluates the first-wave fixture directory with positive and negative JSON cases enabled; run_readiness_bundle() evaluates the exported public bundle without fixture-negative cases and requires the bundle source-module manifest. Both entrypoints call _build_result(), so the fixture and exported bundle result records share one scope limit, one secret scan, one source-module digest checker, and one readiness-board schema.
_build_result() first loads the five public input families: corpus_readiness.json, tactic_portfolio_availability.json, premise_index.json, target_shape_tactic_routing.json, and provider_context_recipes.json. It then scans those inputs plus any declared source artifacts through secret_exclusion_scan.scan_paths, using the public Microcosm forbidden-class policy. The scan is not advisory: the result can pass only when the scan has zero blocking hits, source-module imports pass, all expected fixture-negative cases are observed, and no unexpected positive-case findings remain.
The mechanism is split into six validators:
validate_corpus_readiness() records Lean and Mathlib readiness metadata and adds lean_std_synthetic_core:mathlib to blocked capabilities when Mathlib is unavailable. A Mathlib availability claim without a passing probe becomes MATHLIB_AVAILABILITY_OVERCLAIM.
validate_tactic_portfolio() separates available from unavailable tactics and requires every available tactic to carry a probe result record. Synthetic probe labels are accepted only when _tactic_probe_realness_evidence() binds them to copied source modules or fixture-manifest source-open evidence.
validate_premise_index() admits premise rows as metadata only. It counts premises, namespaces, retrieval terms, and split eligibility, but rejects proof_body, ground_truth_proof, provider_output_body, and oracle_needed_premise_ids.
validate_target_shape_routing() intersects each route case's allowed tactics with the unavailable tactics emitted by the portfolio validator. Any overlap becomes ROUTING_ALLOWS_UNAVAILABLE_TACTIC, so routing cannot silently re-enable a tactic that the probe plane blocked.
validate_provider_context_recipes() records byte budgets and deliverable shape while rejecting public recipes over 32,768 bytes or recipes that allow proof bodies or provider-body material.
validate_source_module_imports() verifies the exported bundle's source_module_manifest.json, target refs, source refs, line counts, target digests, source digests, exact-copy rows, and the two permitted private-path rewrites. It reports digest/ref failures without placing copied source bodies in result records.
After the validators run, _merge_observed() and _merge_findings() compare observed fixture failures against EXPECTED_NEGATIVE_CASES. This is the local scope limit: the fixture run must prove that the known overclaims are caught, while the exported-bundle run must prove that the positive public bundle has no unexpected findings. _build_extension_board() then projects the accepted metadata into the extension board: selected pattern ids, namespace and split counts, tactic availability counts, Mathlib-dependent unavailable tactics, blocked route cases, provider budgets, source-body import counts, the scope limit, and the scope boundary.
Result record writing preserves the same boundary. write_receipts() emits the gate result, readiness board, extension board, validation result record, and sign-off result record for fixture mode. run_readiness_bundle() emits the exported-bundle result record. The focused test suite asserts the mechanism rather than just file existence: it checks the five expected negative case ids, local Lean/Lake probe metadata with Mathlib unavailable, six available tactics with aesop blocked, eleven premises, five route cases, three provider recipes, thirteen verified source artifacts, source/target digest mismatch rejection, target-ref mismatch rejection, secret-exclusion/public-relative result record paths, and result record omission of copied body text.
Public Contract
The component does not run Lean or Lake. It consumes public JSON fixtures and exported bundles, records which capabilities are available or blocked, rejects Mathlib availability overclaims, rejects unprobed tactics, rejects premise rows that contain proof bodies, rejects routes that admit unavailable tactics, and rejects provider recipes that exceed the public budget or allow proof bodies.
The accepted result is a readiness board. That board can tell a later component what is safe to attempt, but it is bounded evidence evidence, benchmark evidence, or permission to execute a theorem prover.
Prior Art Grounding
This component is grounded in formal-math benchmark and environment-readiness work where the presence of a library, tactic, or corpus is not enough by itself. miniF2F motivates explicit benchmark split discipline for formal mathematics, LeanDojo motivates reproducible theorem-proving environments, and mathlib makes the availability of library imports a concrete precondition rather than a vague capability claim.
Microcosm borrows the readiness-gate pattern: corpus availability, Mathlib probes, tactic probes, premise indexes, target-shape routing, and context budgets must be checked before downstream proof or retrieval language is allowed. It excludes Lean execution or proof authority.
Runtime Surfaces
python -m microcosm_core.organs.formal_math_readiness_gate run --input fixtures/first_wave/formal_math_readiness_gate/input --out receipts/first_wave/formal_math_readiness_gate
python -m microcosm_core.organs.formal_math_readiness_gate plan --input fixtures/first_wave/formal_math_readiness_gate/input
microcosm formal-math-readiness-gate run --input fixtures/first_wave/formal_math_readiness_gate/input --out receipts/first_wave/formal_math_readiness_gate
microcosm formal-math-readiness-gate plan --input fixtures/first_wave/formal_math_readiness_gate/input
Relationship To Lean Witness
formal_math_lean_proof_witness remains deferred. This gate makes the deferral typed and testable: Mathlib is absent until a passing probe says otherwise, unavailable tactics cannot be routed, premise indexes cannot carry proof or oracle bodies, and provider recipes cannot smuggle proof-body deliverables.
This module may claim that Microcosm has a public readiness gate for formal math system preparation. The valid claim is bounded to corpus availability, Mathlib and tactic probe metadata, premise-index coverage, target-shape tactic routing, provider context budget checks, extension-board pattern ids, public PROVER smoke-run source artifacts, an exact public component-source body floor, and fixture or exported-bundle result records.
The module must not claim Lean/Lake execution, theorem proving, formal proof authority, formal-result correctness, Mathlib-dependent proof success, benchmark performance, provider-call execution, private proof-body import, oracle-needed premise disclosure, source-file changes, publishing-scope decision, hosted deployment, recipient work, secret export, or whole-system correctness. Its strongest launch-facing statement is readiness-boundary enforcement over public metadata and copied source artifacts.
Limitations
The runtime validates finite public fixtures and exported-bundle manifests. It does not execute Lean or Lake, import Mathlib in the current environment, call a provider, or check theorem statements. When the result reports blocked capabilities such as lean_std_synthetic_core:mathlib, that is a readiness boundary for downstream components, not an invitation to route around the gate.
The copied source artifacts are source-open body-floor evidence only. Digest and target-ref checks show that selected PROVER readiness/probe bodies and the public component source copy match their manifests; they do not authorize source-file changes, private source-root export, proof-body disclosure, recipient work, hosted deployment, or public sharing. Result records intentionally carry counts, digests, paths, negative-case coverage, and authority flags instead of copied body text.
The negative cases are also finite. They cover the known overclaims encoded in EXPECTED_NEGATIVE_CASES: Mathlib availability without a passing probe, unprobed tactic availability, premise rows with proof bodies, target routes that admit unavailable tactics, and provider recipes that exceed public budgets or permit proof bodies. A new formal-math claim needs either a new source-backed negative case here or a different proof consumer; this page should not be used as a generic formal-proof claim surface.
Scope boundary
This module documents a public readiness gate only. It excludes Lean/Lake execution, formal proof authority, Mathlib-dependent proof attempts, external model access, benchmark claims, public launch, hosted deployment, public sharing, recipient work, secret export, or whole-system correctness. It also does not make private source-root material, browser UI state, account or browser material, browser state, account secrets, source notes, model-output data bodies, recipient-send state, or private proof bodies part of the public Microcosm body floor.
Source and projection details
Source-Open Body Floor
The exported readiness bundle carries thirteen PROVER smoke-run readiness/probe bodies under source_artifacts. They cover corpus readiness, tactic-affordance probe metadata, Mathlib and trace probes, and the copied portfolio-core Lean probes used to decide which tactics are blocked or available. Two JSON rows are private-path rewrites; those rows retain source and target digests plus the rewrite mode.
The bundle also carries an exact public component-source copy for src/microcosm_core/organs/formal_math_readiness_gate.py under source_body_floor/source_modules. Generated state/runs Lean artifacts are runnable readiness evidence, not source-body authority. Neither floor places body text in result records or workingness cards, and neither imports model-output data bodies, account or browser state, browser UI live access, recipient-send state, account secrets, private proof bodies, or oracle-needed premise ids.
The source-module manifest and bundle manifest are the right surfaces for body-floor inspection. The validation result records intentionally carry status, digests, counts, and public-relative refs rather than copied source bodies.
Wave 011 adds the explicit extension board for the source intake cell formal_math_readiness_extensions. The board is still metadata-only, but it is more useful than the older flat counts: it records the selected pattern ids (lean_std_toolchain_premise_index, tactic_portfolio_availability_probe, target_shape_tactic_routing_gate), the source projection intake ref, public target refs, validation refs, namespace and split coverage for the premise index, tactic availability status counts, Mathlib-dependent unavailable tactics, target-shape routing admissibility, and provider context budgets.
Governing Lattice Relation
The bundle binds this module to concept.formal_math_and_proof_witness_bundle because the component is not a theorem prover; it is the membrane that decides which public formal-math inputs are safe enough for a later proof witness to consume. The governing mechanisms split that membrane in two. The validates_public_formal_math_readiness_bundle mechanism names the positive bundle path: run, run_readiness_bundle, validate_source_module_imports, and write_receipts validate the declared corpus, tactic, premise, routing, provider-budget, source-module-manifest, and source-body-floor evidence before writing readiness boards. The validates_public_readiness_boundary mechanism names the negative path: validate_corpus_readiness, validate_tactic_portfolio, validate_premise_index, validate_target_shape_routing, and validate_provider_context_recipes reject the cases that would turn readiness metadata into proof authority.
The principle and axiom refs are therefore operational, not decorative. P-1, P-2, and P-3 are expressed by keeping the JSON bundle, generated structured source record, runtime code locus, and result records as separate authority classes. P-6 and P-8 are expressed by the body-floor and secret-exclusion contracts: copied PROVER probe bodies and the public component source copy can be inspected through digests and manifests, while private proof bodies, model-output data bodies, and browser or account state stay outside the public floor. AX-1, AX-2, AX-5, and AX-7 are the local reason the downstream paper_module.formal_math_lean_proof_witness remains a dependency rather than an already-proven conclusion.
The generated lattice edge count is small on purpose: it proves that this page is bundle-backed, source-bound, and connected to one deferred proof-witness module.
Formal Math Lean Proof WitnessThe public Lean proof witness runs local Lean/Lake over a tiny synthetic project, validates copied public source-module digests and negative cases, and emits redacted result records without claiming general proof authority.
Formal Math Lean Proof Witness is the public runtime crossing from readiness metadata into a real local Lean/Lake subprocess witness. It copies a bounded public Lake project, records tool availability, Lake build status, source hashes, declaration names, line counts, source-module manifest verdicts, a Mathlib-blocked debt row, and four leakage/invalid-proof negative cases while keeping proof bodies and command output bodies out of result records.
Scope limit Declared public toy Lean/Lake fixture result records only; no Mathlib/Aesop/Batteries authority, general formal-result correctness, private proof import, external model access, benchmark performance, launch-scope decision, publishing-scope decision, source-file changes, or whole-system correctness.
This component exists to make one claim checkable instead of asserted: that Microcosm can actually run the Lean toolchain, not merely talk about it. The single question it answers is whether the installed Lean toolchain will compile a declared, tiny synthetic Lean project end to end, and whether that run can be recorded without leaking the proof.
The unusual part is the discipline around the run, not the run itself. The component copies a bounded public Lake project into a temporary workspace and invokes lake build, but the result record keeps only the return code, the standard-output and standard-error line counts, the source hashes, and the declaration names pulled out by a regular expression. The proof text and the raw command output never reach the result record. A reader gets evidence that the build happened and what it contained, without the page becoming a copy of the proof.
Two failure modes drive the design. The first is a proof-assistant integration that reports success without ever running the checker; the witness guards against that by executing a real subprocess and recording its exit status, and by deliberately compiling an invalid Lean file in a negative case to confirm the toolchain rejects it. The second is a circular pass, where the manifest quietly carries the answer. The component refuses manifests that embed a proof_body, a ground-truth proof, provider output, or oracle premise ids, so a green result cannot be smuggled in through the inputs.
The scope is small on purpose. Imports of Mathlib, Aesop, and Batteries are rejected before anything runs, so this is a witness for a toy theorem under a local toolchain, not a claim about library-dependent proof work. That boundary is the point: it shows the result record discipline a larger formal-math component would need, without borrowing authority it has not earned.
Teleology
formal_math_lean_proof_witness is the bounded public crossing from formal-math readiness into an actual local Lean/Lake run. It exists so a cold reader can see Microcosm compile a tiny synthetic proof witness with the installed toolchain while the result records stay redacted, public-relative, and honest about the narrow authority boundary.
Shape
Source refs
First-wave fixture
fixtures/first_wave/.../input
Exported public bundle
examples/.../exported_lean_proof_witness_bundle
Diagram source
flowchart TD A["First-wave fixture fixtures/first_wave/.../input"] --> B["run() include_negative=true"] C["Exported public bundle examples/.../exported_lean_proof_witness_bundle"] --> D["run_witness_bundle() include_negative=false"] B --> E["Validate witness manifest: reject embedded proof bodies, oracle ids, non-public source refs"] D --> F["Validate source_module_manifest.json: copied public source digests, exact-copy vs replacement"] E --> G["Copy Lake project to temp workspace lake build MicrocosmProofWitness"] G --> H["Negative cases run real Lean: invalid proof rejected, Mathlib/Aesop/Batteries import blocked"] F --> I["Standalone exported-witness contract or fresh bundle result record reuse (no live build)"] G --> J["metadata-only JSON result records: return code, line counts, hashes, declaration names"] H --> J I --> J J --> K["Scope limit: toy public witness only"]
Reader Evidence Routing
Route bundle/currentness questions through ## JSON Bundle Binding, the source record, and the structured source record. The expected generated-row evidence is source_authority: json_capsule, edge_count: 8, Mermaid available_from_capsule_edges, Atlas blocked_until_organ_atlas_owner_lane_binds_edges, and zero unresolved selective relations. That evidence proves reader wiring and source authority placement, not formal-result correctness.
Route runtime questions through the runtime locus and the two public input surfaces. The first-wave fixture runs run() against the public Lake project and checks the four expected negative cases. The exported bundle runs run_witness_bundle() against copied public source modules, validates source_module_manifest.json, and records digest/source-module status without placing proof bodies in JSON result records.
Route result record and test questions through the required result record paths, the focused pytest, and the corpus check. The focused test asserts local Lake build success for the tiny witness when Lean/Lake are available, eight compiled declarations, four negative-case observations for the fixture, public-relative redacted result records, five exported source-module rows, source digest checks, metadata-only result record policy, and tamper-blocking behavior. Those validation result records do not authorize Mathlib-dependent proofs, external model access, private proof import, benchmark claims, launch-scope decision, deployment posture, public sharing, hosted deployment, source-file changes, or private-system equivalence.
Public Contract
The component copies examples/formal_math_lean_proof_witness/exported_lean_proof_witness_bundle or the first-wave fixture Lake project into a temporary workspace and runs lake build. The public result record records tool availability, Lake build status, source hashes, declaration names, line counts, negative-case coverage, and the scope limit. It does not export proof bodies in JSON result records.
The accepted witness scope is deliberately small:
public synthetic Lean source is allowed;
JSON manifests and result records may not embed proof bodies;
Mathlib, Aesop, and Batteries imports are rejected until a wider scope limit exists;
non-public source refs, model-output data, oracle proofs, and private source run bodies remain outside the public root.
Prior Art Grounding
This component is grounded in the Lean proof-assistant lineage and the broader small-kernel theorem-proving tradition. The Lean theorem prover system description anchors the local Lean/Lake witness route, and the Lean mathematical library shows why proof authority depends on explicit imports, declarations, and checked environments.
Microcosm borrows the proof-witness discipline: a local toolchain run, source hashes, declarations, negative cases, and metadata-only result records must be visible before Lean witness language is allowed. It does not claim Mathlib-dependent proof authority or benchmark performance.
This module is a bounded public witness, not a formal-proof authority. Its positive evidence is one declared toy Lean/Lake fixture, one exported public witness bundle, five copied source-module body rows, local toolchain metadata, eight compiled declarations when Lean/Lake are available, and four expected negative-case observations. That evidence is enough to show the mechanism's result record discipline; it is not enough to prove arbitrary Lean goals, Mathlib coverage, formal-result correctness, benchmark performance, or private proof import equivalence.
The copied-body floor is public but narrow. Result records may cite source refs, hashes, material classes, declaration names, counts, manifest verdicts, tool-return summaries, and scope limit fields. They may not embed proof bodies, model-output data, oracle answers, non-public source refs, raw command output bodies, account secrets, account or browser state, or private source-root material. The source-open claim is therefore limited to the declared public fixture and exported bundle body classes.
The focused regression validates the stated fixture and exported-bundle shape. It checks streaming source scans, tool-version caching, temporary Lake project reuse, Lake build behavior, public-relative redacted result records, source-module digest parity, standalone exported-bundle handling, tamper rejection, negative case coverage, and the generated-row proof. It excludes future fixture families, Atlas/site public sharing, source-file changes, launch, or a larger formal-math proof claim without the owning builder and launch lanes.
Scope limit
This module authorizes only a tiny public fixture witness compiled by local Lean/Lake in a temporary workspace. It excludes Mathlib-dependent proofs, external model access, private proof import, benchmark performance claims, launch operations, hosted deployment, public sharing, recipient work, secret export, or whole-system correctness.
Scope limit
This module supports only the reader-verifiable claim that a tiny public Lean fixture witness can run in a temporary local workspace, emit metadata-only result records, and expose source hashes, declarations, and negative cases. It does not establish Mathlib-dependent theorems, benchmark performance, provider outputs, private proof imports, launch-scope decision, hosted deployment, publishing-scope decision, secret export safety, or whole-system correctness.
Source and projection details
Governing Lattice Relation
The bundle binds this module to concept.formal_math_and_proof_witness_bundle: public proof-adjacent language must pass through explicit witness artifacts before it becomes reader evidence. Here the witness artifacts are the temporary Lake project copy, local Lean/Lake tool probes, lake build MicrocosmProofWitness, source hashes, declaration metadata, source-module manifest checks, negative-case observations, and metadata-only result records. The Markdown page explains that lattice; it does not upgrade the generated JSON row, the local toolchain, or the copied source body floor into theorem authority.
P-3 is the governing principle edge for claim discipline. The mechanism rows do not ask a reader to trust a proof story from prose; they route the claim through run, run_witness_bundle, validate_source_module_imports, _build_result, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and SOURCE_MODULE_MANIFEST_NAME. Those symbols are the mechanism's concrete boundary: they decide which public source refs may be copied, which imports are blocked, which negative cases count, and which result record fields may be exposed.
AX-2 supplies the hard law boundary. Public proof claims stay inside declared fixture evidence, public-relative refs, source digests, declaration counts, tool-return metadata, and negative-case verdicts. Proof bodies, model-output data, non-public source refs, stdout/stderr bodies, private source-root material, launch decisions, and whole-system correctness remain outside the module's authority even when the focused test and corpus check are green.
The dependency on paper_module.corpus_readiness_mathlib_absence_gate prevents the most tempting overread. This witness intentionally rejects Mathlib, Aesop, and Batteries imports until a different scope limit exists. A reader can therefore interpret the module as a toy Lean/Lake execution cell upstream of larger formal-math components, not as evidence that Microcosm can certify Mathlib-dependent theorem work.
Formal Math Verifier Trace Repair LoopThe public verifier-trace repair fixture validates copied Ring2 failure taxonomy, graph-update, oracle-repair contrast rows, source-module digests, negative cases, and one deterministic toy rerun without claiming proof authority.
Formal Math Verifier Trace Repair Loop is the public evidence membrane for proof-lab repair mechanics. It checks copied Ring2 run refs and digests, verifier attempts, trace grades, repair actions, promotion gates, source-module manifests, secret exclusion, and seven leakage or overclaim negative cases, then writes metadata-only result records that keep proof bodies, oracle premise ids, model-output data, Lean/Lake execution, and theorem-correctness claims out of scope.
Scope limit Copied Ring2 verifier-trace repair metadata, source-module digests, public fixture result records, and one deterministic public toy-theorem rerun only; no Lean/Lake authority, formal-result correctness, proof body, oracle premise id, external model access, human approval as proof authority, launch-scope decision, publishing-scope decision, or whole-system correctness.
formal_math_verifier_trace_repair_loop is the source-available replay of a source proof-lab pattern over copied Ring2 run system: verifier feedback becomes a teaching signal only after a trace grade, a repair action, a failure-mode ledger append, a curriculum delta, and a cold rerun result record.
It is deliberately not a Lean/Lake proof component. It sits between the existing readiness, premise retrieval, tactic routing, proof diagnostic, and Lean witness surfaces so a cold reader can inspect real failure taxonomy, graph-update candidates, and oracle-repair contrast rows without seeing proof bodies, oracle premise ids, model-output data bodies, or private run logs.
Purpose
A failed proof attempt is cheap to throw away and expensive to learn from. The question this component answers is narrow: can a verifier's failure be turned into a reusable repair signal, on the public side, without that signal quietly inheriting the authority of a real theorem prover? It exists because the interesting work in a proof-repair loop is the bookkeeping, not the proving, and that bookkeeping is where overclaim usually creeps in.
The design choice worth noticing is that the loop refuses to collapse its stages into a single verdict. A verifier failure only counts as a teaching signal once it carries a trace grade backed by trace events, a repair action named against the verifier failure class it responds to, a failure-mode ledger append, a curriculum delta, and a cold-rerun result record. Each of those is a separate field, and promotion is blocked until the cold-rerun result record is present. The same separation keeps the dangerous material out: proof bodies, oracle-needed premise ids, and model-output data bodies are forbidden keys, so a row may name a failure class without ever exposing the proof or the oracle answer that produced it.
The failure mode it guards against is stale copied rows pretending to be live proof-lab evidence. The repair rows here are imported from a real Ring2 benchmark run, so the temptation is to treat the copy as if the run were happening now. The realness gate is the answer: it only reaches its top rung when every verifier attempt and curriculum row replays cleanly against the imported source bodies, and the focused tests deliberately perturb an oracle row, a manifest digest, an attempt label, and a curriculum count so that any drift downgrades the verdict rather than passing quietly. A single deterministic toy-theorem rerun is the one thing actually executed here, and it is plain arithmetic over public inputs, not a Lean proof.
Shape
Diagram source
flowchart TD Input["Fixture input or exported bundle copied Ring2 rows + source-module manifest"] Protocol["Projection protocol copied-material provenance"] Manifest["Source-module manifest digest, line and byte match, body_in_receipt false"] Secret["Secret-exclusion scan proof bodies, oracle ids, model-output data forbidden"] Attempts["Verifier-attempt replay grade needs trace events, repair needs failure class"] Curriculum["Repair-curriculum replay failure-mode ledger, curriculum deltas"] Promotion["Promotion policy requires cold-rerun result record"] Toy["Deterministic toy rerun fail then repair over public inputs"] Realness["Realness gate clean source replay -> top rung; any drift downgrades"] Result records["metadata-only result records result, board, validation, sign-off"] Ceiling["Scope limit repair-loop accounting, bounded evidence"] Input --> Protocol Protocol --> Manifest Manifest --> Secret Secret --> Attempts Attempts --> Curriculum Curriculum --> Promotion Promotion --> Toy Attempts --> Realness Curriculum --> Realness Toy --> Realness Realness --> Result records Result records --> Ceiling
Technical Mechanism
The named mechanism mechanism.formal_math_verifier_trace_repair_loop.validates_public_verifier_trace_repair_bundle is a staged public verifier-repair validator, not a proof executor. _build_result composes five checks over the fixture or exported bundle: projection-protocol density, copied source-module manifest integrity, verifier attempt replay, repair-curriculum replay, promotion policy, and one deterministic toy-theorem repair rerun. The result is pass only when the projection protocol has copied-material provenance, the secret scan has no blocking hits, source modules pass when required, verifier attempts and curriculum rows replay against their imported Ring2 source bodies, promotion requires a cold rerun reference, and the toy rerun succeeds.
The exported-bundle path is intentionally stricter than the fixture path. validate_source_module_manifest requires a source import class, body_in_receipt: false, one row for each declared Ring2 source ref, matching target digests, line counts, and byte counts, and a metadata-only source_open_body_imports summary. _validate_attempt_source_replay then dereferences the premise-run row, oracle-repair contrast row, and graph-update candidate for each verifier attempt. Mismatches become typed findings such as VERIFIER_TRACE_SOURCE_REPLAY_MISMATCH, VERIFIER_TRACE_ORACLE_REPLAY_MISMATCH, VERIFIER_TRACE_COLD_RERUN_SOURCE_MISMATCH, or VERIFIER_TRACE_CANDIDATE_REPLAY_MISMATCH; curriculum-source mismatches are checked separately by validate_repair_curriculum.
The realness gate is also mechanical. _runtime_realness_evidence reaches the R4 state only for an exported bundle with verified source modules, at least 30 source replay checks, zero source replay mismatches, at least three attempts, at least nine trace events, at least three failure modes, and a passing toy rerun. The focused tests deliberately perturb the oracle source row, a manifest digest, a verifier-attempt source label, and a curriculum source count; each mutation blocks the verdict or downgrades the realness evidence instead of letting stale copied rows masquerade as proof-lab evidence.
The proof consumer is tests/test_formal_math_verifier_trace_repair_loop.py: it asserts five attempts, 15 trace events, five repair actions, three cold-rerun promotions, three toy-theorem failures repaired into four passing rerun inputs, seven exported source modules, 37 source replay checks, compact-card omission and fresh-result record reuse, public-relative result record paths, no private/body fields in result records, and exact source module copies. Those checks consume the same fixture, bundle, source-module manifest, and mechanism row cited by this page, so the evidence is executable replay accounting rather than a prose-only description.
The governing lattice is deliberately narrow: the bundle binds the module to concept.formal_math_and_proof_witness_bundle, principles P-1, P-2, P-3, P-6, and P-8, axioms AX-1, AX-2, AX-5, and AX-7, and dependency modules for the Lean standard premise index, tactic portfolio availability, target-shape tactic routing, and formal-math premise retrieval. The standard allows only copied Ring2 verifier-trace repair result record schemas and metadata-only public fields. It does not widen a passing replay into Lean/Lake authority, formal-result correctness, proof-body evidence, oracle premise authority, provider authority, human-approval proof authority, publishing-scope decision, launch-scope decision, or whole-system correctness.
Evidence/accounting:
Bundle authority: core/paper_module_capsules.json::paper_modules[23:paper_module.formal_math_verifier_trace_repair_loop] sets source_authority: json_capsule, binds the component, binds mechanism.formal_math_verifier_trace_repair_loop.validates_public_verifier_trace_repair_bundle, and resolves src/microcosm_core/organs/formal_math_verifier_trace_repair_loop.py.
Generated instance: paper_modules/formal_math_verifier_trace_repair_loop.json reports paper_module_payload.source_authority: json_capsule, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, 17 relationship edges, and resolved paper_module.depends_on.paper_module edges to the Lean standard premise index, tactic portfolio, target-shape routing, and formal-math premise retrieval modules named by the active standard.
Runtime, fixture, and bundle: src/microcosm_core/organs/formal_math_verifier_trace_repair_loop.py exposes run, run_loop_bundle, validate_source_module_manifest, _write_receipts, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and SOURCE_MODULE_MANIFEST_REF. The fixture input and exported bundle replay copied Ring2 verifier-trace repair metadata, source-module digests, failure classes, repair actions, promotion gates, and one deterministic public toy-theorem rerun.
Result record and test floor: receipts/first_wave/formal_math_verifier_trace_repair_loop/formal_math_verifier_trace_repair_loop_result.json, verifier_trace_repair_board.json, formal_math_verifier_trace_repair_loop_validation_receipt.json, and result records/sign-off/first_wave/formal_math_verifier_trace_repair_loop_fixture_acceptance.json are metadata-only evidence. tests/test_formal_math_verifier_trace_repair_loop.py checks source-module manifest validation, negative cases, toy rerun evidence, and scope limits.
Claim boundary: standards/std_microcosm_formal_math_verifier_trace_repair_loop.json and the generated structured source record limit this module to copied Ring2 verifier-trace repair metadata, source-module digests, public fixture result records, and deterministic toy rerun evidence. They do not authorize Lean/Lake authority, formal-result correctness, proof bodies, oracle premise ids, external model access, human approval as proof authority, launch-scope decision, publishing-scope decision, or whole-system correctness.
Reader Evidence Routing
Those rows prove reader wiring, not formal-result correctness.
Route runtime and replay questions through ## Runtime, ## Receipts, and the fixture/bundle paths in the validation command. The fixture runner, exported bundle runner, CLI route, standard, and fixture manifest show how verifier-trace repair accounting is replayed over copied public rows without importing proof bodies, oracle-needed premise ids, model-output data bodies, or private logs.
Route claim-safety questions through ## What It Proves, ## What It Refuses, ## Result record Expectations, and ## Scope limit. If the question is whether the repair loop is still body-safe and result record-backed, run the focused pytest and paper-module corpus check before citing this page.
Prior Art Grounding
This component is grounded in interactive theorem-proving feedback loops and learning environments where failed proof attempts become structured training or repair signals. GamePad and HOList both expose theorem-proving interaction data for machine-learning experiments, while LeanDojo reinforces the need to keep proof assistant feedback, retrieval, and proof-state interaction reproducible.
Microcosm borrows the repair-loop accounting pattern: verifier events, grades, failure classes, repair actions, curriculum deltas, and cold rerun result records are separate fields. It does not treat human or provider advice as formal-result correctness.
Runtime
Component runner: python -m microcosm_core.organs.formal_math_verifier_trace_repair_loop run --input fixtures/first_wave/formal_math_verifier_trace_repair_loop/input --out receipts/first_wave/formal_math_verifier_trace_repair_loop
The authority boundary is copied Ring2 verifier trace repair public fields only. The component demonstrates control-loop mechanics over real run rows, not formal-result correctness.
Scope limit
This module supports only the reader-verifiable claim that copied Ring2 verifier rows can drive a public verifier-trace repair loop with trace-event requirements, failure-class routing, promotion gates, and metadata-only result records. It does not establish formal-result correctness, expose proof bodies, authorize human or provider advice as proof authority, publish private run logs, approve launch, or certify whole-system correctness.
Formal Evidence Cell Anchor ResolverThe formal evidence cell anchor resolver binds proof-language paper claims to public evidence cells, source anchors, machine-anchor metadata, copied source modules, and negative-case result records without claiming formal-result correctness.
Formal Evidence Cell Anchor Resolver is the evidence-legibility membrane for Microcosm's formal math claims. It resolves three paper claims to public evidence-cell ids, checks source-anchor and machine-anchor metadata, anchors the verifier-trace cell to real Ring2 verifier-trace repair result records, validates six copied source-open body modules, observes seven proof/private/human-approval/theorem-correctness negative cases, and emits metadata-only result records that make proof-language boundaries inspectable without becoming proof authority.
Scope limit Evidence-cell anchor metadata and source-open runtime result records only; no formal-result correctness, proof-body import, private source-ref authority, human approval as proof authority, Lean/Lake execution, external model access, launch-scope decision, publishing-scope decision, source-file changes, or formal-proof certification.
formal_evidence_cell_anchor_resolver makes Microcosm's formal-math evidence claims inspectable without turning result record summaries into proof authority. It resolves paper-module claims to evidence-cell ids, checks source-anchor refs, records machine-anchor classes, and enforces a claim-strength boundary before any proof-language claim can pass. Its formal-math trace cell anchors the real Ring2 verifier-trace repair result records.
It is not a theorem prover. It does not execute Lean or Lake, expose proof bodies, expose non-public source refs, use external model services, or claim formal-result correctness. It emits real runtime result records over the imported evidence-cell system, carries digest-bearing Ring2 failure-taxonomy and graph-update source refs, and uses secret-exclusion scanning only for account secret-equivalent or non-result record body payloads.
Purpose
Proof-adjacent prose is the easiest place for a claim to drift. A paper module can write "this proves the theorem" or "this is certified" and a cold reader has no cheap way to tell whether the words are backed by a checked artifact or by nothing at all. This component answers one question: when a claim uses proof language, can the words be resolved to a specific piece of public evidence, and does that evidence stay below theorem-correctness authority?
The mechanism is an evidence cell. A cell is a stable id that stands in for a bundle of result record-backed evidence: its source-anchor refs, a machine_anchor_class that names what kind of machine artifact backs it, and the list of claim strengths the cell is allowed to support. The policy proof_language_requires_machine_anchor is the rule that makes the resolver useful. A claim that uses proof language must name a cell, the cell must resolve in the registry, and its source anchors must point at files that actually exist on the public path. A claim that uses proof language but names no cell, or names a cell that is not in the registry, lowers the run to a blocked status rather than passing as green prose.
What is worth noticing is what the cell id buys. It is a compressed handle: one short reference that a reader can follow back to the real result records behind a claim, instead of inlining proof bodies or trusting narrative. Two boundaries sit on top of that handle. Claim strength is capped by the cell, so a claim cannot assert more than its anchored evidence allows. And human approval is refused as a substitute for a machine anchor, which keeps a sign-off from being treated as proof.
flowchart TD Bundle["source record core/paper_module_capsules.json::paper_modules[24]"] --> structured source record["structured source record paper_modules/formal_evidence_cell_anchor_resolver.json source basis: source record"] structured source record --> Mermaid["diagram view available_from_capsule_edges"] structured source record --> Atlas["map view blocked_until_organ_atlas_owner_lane_binds_edges"] structured source record --> Reader["this page this page"] Reader --> Runtime["runtime locus src/microcosm_core/components/formal_evidence_cell_anchor_resolver.py"] Runtime --> Fixture["first-wave fixture input fixtures/first_wave/formal_evidence_cell_anchor_resolver/input"] Runtime --> Bundle["exported evidence-cell anchor bundle examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle"] Bundle --> Manifest["source-open body manifest source_module_manifest.json"] Fixture --> Result records["validation result records result records/first_wave/... + result records/sign-off/..."] Bundle --> BundleReceipt["runtime-shell result record result records/runtime_shell/demo_project/components/formal_evidence_cell_anchor_resolver/..."] Result records --> Ceiling["proof boundary + scope limit anchor metadata only, not formal-result correctness"] BundleReceipt --> Ceiling
Read the diagram left to right: the bundle and generated structured source record name the relationships; the runtime validates fixture and bundle inputs; the result records show what passed; the scope limit prevents any of those surfaces from becoming proof, launch, provider, private-system, or theorem-correctness authority.
Reader Evidence Routing
A cold reader should inspect this module through these system surfaces, in order:
Authority seed: core/paper_module_capsules.json::paper_modules[24:paper_module.formal_evidence_cell_anchor_resolver]. This is the source record that binds the Markdown projection, generated JSON, runtime locus, fixture, exported bundle, mechanism rows, and scope boundaries.
Generated structured source record: paper_modules/formal_evidence_cell_anchor_resolver.json. Check relationships.source_authority, the 15 relationship edges, the generated_projections statuses, unpopulated_selective_relations, and the bundle-carried scope limit before trusting any prose summary.
Runtime locus: src/microcosm_core/organs/formal_evidence_cell_anchor_resolver.py. The relevant runtime symbols are run, run_anchor_bundle, validate_source_module_manifest, _build_result, _source_module_summary_card, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_REF, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.
Fixture and exported bundle: fixtures/first_wave/formal_evidence_cell_anchor_resolver/input, examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle, and examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle/source_module_manifest.json. The first-wave fixture exercises negative cases and Ring2 result record anchors; the exported bundle validates six source-open body modules by digest while keeping source bodies out of result records.
Result records: receipts/first_wave/formal_evidence_cell_anchor_resolver/formal_evidence_cell_anchor_resolver_result.json, receipts/first_wave/formal_evidence_cell_anchor_resolver/evidence_cell_anchor_board.json, receipts/first_wave/formal_evidence_cell_anchor_resolver/formal_evidence_cell_anchor_resolver_validation_receipt.json, result records/sign-off/first_wave/formal_evidence_cell_anchor_resolver_fixture_acceptance.json, and receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle_validation_result.json. These result records report pass/fail state, metadata-only public refs, negative-case observations, and explicit release_authorized=false, provider_calls_authorized=false, lean_lake_execution_authorized=false, formal_proof_authority=false, and theorem_correctness_authority=false ceilings.
Focused checks: tests/test_formal_evidence_cell_anchor_resolver.py, scripts/build_doctrine_projection.py --check-paper-module-corpus, and the JSON-row proof query in the validation section below. Those checks validate the reader route and generated-row parity; they do not authorize public sharing or formal proof claims.
Prior Art Grounding
This component is grounded in provenance and proof-certificate work where claims must point at checkable evidence rather than untyped narrative. The W3C PROV model is a general anchor for linking entities, activities, and agents in an evidence graph, while Proof-Carrying Code and small-kernel proof assistants motivate separating a certificate or anchor from the trusted checker that bounds its meaning.
Microcosm borrows the anchor-resolution pattern: proof-language claims must name evidence-cell ids, source anchors, machine-anchor classes, and claim strength limits. It does not turn metadata cells into theorem-correctness authority.
Runtime
Component runner: python -m microcosm_core.organs.formal_evidence_cell_anchor_resolver run --input fixtures/first_wave/formal_evidence_cell_anchor_resolver/input --out receipts/first_wave/formal_evidence_cell_anchor_resolver
Proof-language claims must resolve to a public evidence cell before this reader treats them as routed evidence.
Evidence cells must carry source-anchor refs.
Machine-anchor metadata is visible as metadata, not formal-result correctness.
Claim strength is bounded by the resolved cell.
Secret, account secret-equivalent, or non-result record body payloads must have explicit exclusion result records.
The verifier-trace cell is anchored to the first-wave formal_math_verifier_trace_repair_loop result, board, validation result record, and Ring2 failure-taxonomy source digest.
What It Refuses
Unknown evidence-cell ids used as proof authority.
Proof-language claims without evidence-cell ids.
Proof bodies in public claim rows.
non-public source refs in public claim or cell rows.
Human approval as proof authority.
Theorem-correctness claims from metadata cells.
launch, public sharing, secret export, or provider authority.
This module is a proof-adjacent evidence router, not a proof system. The fixture proves a bounded resolver contract over three paper claims, three evidence cells, seven declared negative-case classes, eight source anchors, three machine anchors, and zero copied source modules in fixture mode. The exported bundle proves the same public runtime shape over three claims, three evidence cells, five source anchors, six copied source-open body modules, and metadata-only result records. These counts are the claim boundary, not a scale claim about the formal-math corpus.
The source-module proof is digest and authority-ref parity for the six exported body modules named by the bundle manifest. It does not establish that every source formal-math source file has been imported, that future source drift is absent, or that copied body availability confers public launch-scope decision. A digest match also excludes exporting proof bodies, non-public source refs, model-output data, oracle material, account secrets, browser UI/operator UI state, or source notes.
The checker rejects unknown cells, missing source anchors, proof language without cells, non-public refs, proof bodies, theorem-correctness overclaims, and human approval as proof authority. That refusal coverage does not certify Lean or Lake execution, formal-result correctness, proof completeness, benchmark performance, deployment posture, or whole-system correctness.
Scope limit
The authority boundary is evidence-cell anchor resolution backed by real runtime result records. The component makes claim boundaries legible; it does not certify mathematical truth.
Scope limit
This module supports only the reader-verifiable claim that public evidence-cell anchor metadata can bind proof-language claims to result record-backed cells and exclude private bodies, proof bodies, model-output data, oracle material, and secret-equivalent refs. Its generated Mermaid/Atlas statuses and relationship counts are JSON-bundle projections; they do not certify formal-result correctness, proof completeness, launch-scope decision, publishing-scope decision, provider authority, or whole-system correctness.
Source and projection details
Governing Lattice Relation
The lattice edge is not just that this page "mentions" formal math evidence. The generated structured source record binds the page to one component, two mechanism rows, concept.formal_math_and_proof_witness_bundle, P-1, P-2, P-3, P-6, P-8, AX-1, AX-2, AX-5, AX-7, the sibling paper_module.formal_math_verifier_trace_repair_loop, and the resolved runtime source locus. That is the governing shape: proof-adjacent claims enter as paper-claim rows, evidence-cell ids, source anchors, machine-anchor classes, and copied source-module manifests; _build_result recomputes the pass or blocked status from those lower-level artifacts; _source_module_summary_card and run_anchor_bundle export compact, metadata-only evidence.
P-1 and AX-1 require a recomputed checker result rather than a label. P-2 and AX-2 keep the scope limit at the strength of the resolver and its certificates. P-3 makes the small resolver/manifest checker the authority surface instead of broad proof-language prose. P-6, P-8, AX-5, and AX-7 explain the blocked path: missing anchors, proof bodies, non-public source refs, source-module digest drift, theorem-correctness language, or human approval as proof authority must lower the status or return a refusal with evidence rather than preserving a green reader claim.
The focused proof consumer is tests/test_formal_evidence_cell_anchor_resolver.py. It asserts the fixture path observes all seven expected negative cases, resolves three claims to three evidence cells, records eight source anchors and three machine anchors, anchors the verifier-trace row to Ring2 result records, keeps formal-proof and theorem-correctness authority false, validates the exported bundle with six copied source modules, rejects theorem-correctness overclaims, rejects digest and rehashed-body swaps, and keeps command-card result records compact and metadata-only. Those checks are the local mechanism witness for the lattice relation.
Source-Open Body Floor
The exported bundle carries a source-open body floor at examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle/source_module_manifest.json. It imports the paper-module formal-evidence auditor, formal evidence-cell registry builder, focused runtime tests, public formal-evidence registry state, Erdos257 issue217 evidence-cell manifest, and the std_paper_module formal-evidence-cell contract body. Result records and workingness cards expose digests and validation status, not body text, proof bodies, model-output data, non-public refs, oracle material, or theorem-correctness authority.
Formal Math Premise RetrievalFormal math premise retrieval validates copied public Lean/Std premise metadata, query scoring, context budgets, strategy gates, body-floor provenance, and leakage negative cases without claiming proof authority.
Formal Math Premise Retrieval is the source-backed retrieval slice between the Lean/Std premise catalog and proof-witness boundary. It validates eleven public premise descriptors, four retrieval queries, forty-four considered candidates, exact and source-faithful source body imports, context recipe byte budgets, strategy ids, card freshness, and five leakage/overclaim negative cases while keeping proof bodies, oracle premise ids, model-output data, Lean/Lake execution, and launch claims out of result records.
Scope limit Copied public source retrieval metadata and runtime validation result records only; no formal-result correctness, proof-body import, oracle-needed premise authority, Mathlib authority, Lean/Lake execution, external model access, benchmark claim, launch-scope decision, publishing-scope decision, source-file changes, or general formal-proof authority.
formal_math_premise_retrieval is the source-available first real formal-math import slice after the source projection protocol. It turns the source prover lab's premise-index, term-scoring, context-budget, and strategy-selection patterns into a runnable Microcosm component.
It is still deliberately below proof authority. It validates:
Lean/Std premise metadata;
query term scoring across public premise ids, namespaces, declaration names, statement excerpts, and retrieval terms;
split eligibility;
context recipe budgets;
public strategy ids;
redacted result records;
negative cases.
It does not run Lean or Lake, use external model services, expose proof bodies, expose oracle-needed premise ids, tune on test split truth, claim formal-result correctness, or include launch operations.
Purpose
Before a model can attempt a formal proof, it has to find the right lemmas. A Lean library holds thousands of theorems and definitions, and the useful ones for a given goal are a handful. Premise selection is the step that narrows that library down to candidates worth putting in front of a prover. This component is the smallest honest version of that step: it takes a query, scores every public premise against it, and returns a ranked shortlist.
The single question it answers is narrow and checkable: given a copied catalogue of public Lean/Std premise metadata, does a transparent term-scoring retrieval return the premises a query should find, without ever touching a proof? Both halves matter. The retrieval has to actually work, so each fixture query carries the premise ids it is expected to surface and the run fails if the shortlist misses them. And the boundary has to hold, so the same run refuses any input that smuggles in a proof body, an oracle answer, or test-split truth.
What is unusual is the restraint. The retrieval index is not a learned embedding model and the scoring is not a benchmark claims. It is plain term overlap over fields that a reader can inspect: premise ids, namespaces, declaration names, statement excerpts, and retrieval terms. The interesting claim is therefore not "this retrieves well" but "this retrieves over real, copied Lean metadata and can be audited end to end, and the design forbids the shortcuts that would make a premise-selection result look better than it is".
Shape
Source refs
JSON source record
paper_module.formal_math_premise_retrieval
Runtime component
formal_math_premise_retrieval.py
Diagram source
flowchart TD bundle["JSON source record paper_module.formal_math_premise_retrieval"] --> instance["Generated paper-module instance 15 relationship edges"] instance --> component["Runtime component formal_math_premise_retrieval.py"] subgraph Inputs["Public inputs"] index["Premise index copied Lean/Std metadata"] queries["Retrieval queries terms, split, strategy, top_k"] recipes["Context recipes byte budgets"] negatives["Negative-case inputs proof body, oracle ids, test-split tuning, budget, strategy"] end component --> index component --> queries component --> recipes component --> negatives index --> split["Split gate skip premises not in allowed_for_split"] queries --> split split --> score["Term-overlap scoring shared tokens + strategy bonus"] score --> shortlist["Ranked top_k shortlist"] shortlist --> recall["Recall check vs expected premise ids"] negatives --> reject["Required rejections five leakage/overclaim guards"] recipes --> reject recall --> result records["metadata-only result records board, validation, sign-off"] reject --> result records result records --> ceiling["Scope limit metadata coherence, no Lean/Lake, no proof"]
Evidence/accounting:
Bundle authority: core/paper_module_capsules.json::paper_modules[25:paper_module.formal_math_premise_retrieval] has source_authority: json_capsule, three subjects, one resolved code_loci[0].path, depends_on naming paper_module.formal_math_lean_proof_witness, and generated projection statuses for Markdown, Mermaid, and Atlas.
Generated instance: paper_modules/formal_math_premise_retrieval.json::paper_module_payload repeats the bundle authority_ceiling, reports Mermaid status available_from_capsule_edges, and derives 15 relationships.edges with relationships.unpopulated_selective_relations: [].
Component atlas: core/organ_atlas.json::organs[9:formal_math_premise_retrieval] classifies the component in family: formal_math_and_proof, cites the runtime locus, and restates that retrieval metadata coherence is not Lean/Lake, provider, theorem-correctness, benchmark, or launch-scope decision.
Mechanism rows: core/mechanism_sources.json::mechanisms[27:mechanism.formal_math_premise_retrieval.validates_public_premise_retrieval_slice] and core/mechanism_sources.json::mechanisms[37:mechanism.formal_math_premise_retrieval.validates_public_premise_retrieval_projection] point at src/microcosm_core/organs/formal_math_premise_retrieval.py and name first-wave, sign-off, and runtime-shell result record refs.
Runtime and tests: src/microcosm_core/organs/formal_math_premise_retrieval.py exposes run, run_retrieval_bundle, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING; tests/test_formal_math_premise_retrieval.py checks 11 premises, 4 queries, 44 considered candidates, five negative cases, metadata-only result records, and compact runtime-shell cards.
Result records: receipts/first_wave/formal_math_premise_retrieval/formal_math_premise_retrieval_result.json records status: pass, 11 premises, 4 queries, 44 considered candidates, five observed negative cases, missing_negative_cases: [], and a secret-exclusion scan with blocking_hit_count: 0; the exported runtime result record at receipts/runtime_shell/demo_project/organs/formal_math_premise_retrieval/exported_premise_retrieval_bundle_validation_result.json records status: pass, the same premise/query/candidate counts, no negative cases, and secret_exclusion_scan.scanned_path_count: 11.
Standard ceiling: standards/std_microcosm_formal_math_premise_retrieval.json::authority_ceiling has status: pass while keeping formal_proof_authority, lean_lake_authority, provider_authority, and release_authority false.
Runtime Surfaces
Component runner: python -m microcosm_core.organs.formal_math_premise_retrieval run --input fixtures/first_wave/formal_math_premise_retrieval/input --out receipts/first_wave/formal_math_premise_retrieval
Microcosm can show a real formal-math retrieval mechanism in miniature:
a source-available Lean/Std premise index;
public field-haystack term-scored queries;
split-aware eligibility;
context recipe ceilings;
strategy gates;
redacted validation result records.
How retrieval scoring works
Each premise row contributes five inspectable fields to the haystack: its premise id, namespace, declaration name, statement excerpt, and a list of retrieval terms. A query carries its own terms, a data split, an optional strategy id, a context recipe, and the public premise ids it is expected to return.
Scoring is term overlap, computed per query. Both the query and each premise are tokenised into lowercase word counts. A premise is only considered if the query's split appears in that premise's allowed_for_split list, which is how test-split leakage is kept out at the structural level rather than by trust. For each eligible premise the score is the summed minimum count of every shared token across the five fields, so a term that appears in both the query and the premise contributes as many points as the smaller of the two counts. A premise that also carries the query's strategy id as a tag gets a single extra point. The ranked list is sorted by score descending, ties broken by premise id, and the top of that list up to the query's top_k is taken as the retrieval.
The retrieval is then graded against itself. Each query declares the public premise ids it should surface, and the component computes recall as the fraction of those expected ids that actually landed in the shortlist. A query that declares expectations but misses any of them blocks the run. In the first-wave fixture this is eleven premises and four queries, scoring forty-four considered candidates in total, and every query is expected to reach full recall.
The failure mode this guards against is a premise-selection result that looks good because it cheated. The five negative-case inputs each encode one such shortcut: a premise index that ships a proof body, a query that lists the oracle premise ids it is "meant" to find, a query that tunes on test-split truth, a context recipe that blows past the byte budget, and a query naming a strategy id outside the allowed set. The run is required to observe all five rejections; if any expected rejection is missing, the whole fixture is blocked rather than passed. Recall over copied real metadata is the positive signal; the refusals are what keep that signal honest.
Prior Art Grounding
This component is grounded in premise-selection and retrieval-augmented theorem proving work. LeanDojo is the closest modern anchor because it couples Lean interaction with retrieval-augmented premise selection. Earlier theorem-proving environments such as HOList and GamePad also motivate extracting proof-state or premise metadata for learning-assisted theorem proving.
Microcosm borrows the retrieval accounting pattern: premise ids, namespaces, statement excerpts, retrieval terms, split eligibility, context budgets, and strategy gates must be inspectable before premise-retrieval claims are admitted. It does not run Lean/Lake or expose proof bodies.
Negative Cases
premise_index_proof_body_forbidden
query_oracle_ids_forbidden
test_split_tuning_attempt
context_recipe_budget_overflow
unknown_strategy_id
Reader Evidence Routing
Start with the JSON Bundle Binding to identify the source record, generated instance, proof boundary, and scope limit.
Use Structured Lattice Bindings for navigation; the generated JSON row is the authority for relationship counts and dependency state.
Use Runtime Surfaces and Result record Expectations when checking metadata coherence, redaction, leakage checks, and source-available bundle behavior.
Use Negative Cases, Scope limit, and Scope limit together before admitting any formal-math public claim.
The component proves only that public retrieval metadata is internally coherent and leakage-checked. The deferred formal_math_lean_proof_witness boundary remains unchanged.
Scope limit
This module supports only the reader-verifiable claim that public premise metadata, retrieval terms, split eligibility, strategy gates, and redacted result records are coherent and leakage-checked. It does not run Lean or Lake, prove formal-result correctness, expose proof bodies, authorize oracle-needed premise ids, tune on test split truth, use external model services, approve public sharing, or expand the deferred Lean proof-witness boundary.
Lean/Std Premise IndexThe Lean/Std premise index validates a copied public Lean/Std descriptor catalog plus Ring2 premise-retrieval source bodies without claiming proof, Mathlib, Lean/Lake, provider, launch, or theorem-correctness authority.
Lean/Std Premise Index is the source-open formal-math catalog component for Microcosm. It imports a premise descriptor index, validates eleven Lean/Std premise rows across Nat, Bool, List, and Iff namespaces, checks six copied body modules through a source-module manifest, observes Mathlib/proof-body/oracle/test-split/source-ref negative cases, and writes metadata-only result records that make premise system inspectable without turning metadata into proof authority.
Scope limit Copied public Lean/Std descriptor index and Ring2 premise-retrieval source result records only; no Lean/Lake execution, Mathlib authority, proof-body import, oracle-needed premise authority, external model access, benchmark claim, launch-scope decision, publishing-scope decision, source-file changes, or theorem-correctness claim.
lean_std_premise_index is the closed public premise-index lane for the formal-math slice. It validates premise metadata and selected Ring2 premise-retrieval source result record bodies that a cold reader can inspect without importing Mathlib, exposing proof bodies, or relying on private source run state.
Purpose
A premise index is the catalogue a theorem-proving system reads before it tries to prove anything: a list of the named lemmas and definitions it is allowed to cite, with enough metadata to retrieve the relevant ones. This component answers a narrower question. Given that such an index already exists inside a private Ring2 benchmark run, can a cold reader inspect its public shape and be sure that what they are reading is a faithful copy of the real thing, and not a separate hand-written stand-in?
The answer rests on one design choice that is worth noticing. The validator does not just describe eleven premise rows; it opens the declared source artifact from the Ring2 premise-retrieval run, recomputes its SHA-256, and checks every public row against the matching source row by premise_id. The only permitted difference is a path rewrite: a raw Lean toolchain path becomes a public lean-toolchain://.../Init/... reference, so the reader sees where a lemma lives in the standard library without seeing a private filesystem. If the public catalogue ever drifts from the source it claims to copy, the digest or the row-signature comparison fails and the result record is blocked.
The interesting tension is the line between a useful index and a leaked answer key. A premise index for a benchmark is one edit away from telling a solver exactly which lemmas it needs. So the same pass that admits names, namespaces, retrieval terms, and train/dev/test eligibility rejects the things that would turn the catalogue into proof authority: Mathlib references, proof bodies, the oracle-needed premise ids that name the answer, and any flag that authorises tuning on the test split. The catalogue stays inspectable precisely because those are kept out.
Shape
This module is a cold-reader map from a JSON bundle and copied public Lean/Std premise artifacts into metadata-only validation result records. The readable path is bundle -> generated instance/status -> runtime validator -> fixtures and exported source bundle -> tests and result records -> scope limit; none of those projections expands the closed-index boundary.
flowchart TD bundle["core/paper_module_capsules.json paper_module.lean_std_premise_index source basis: source record"] instance["paper_modules/lean_std_premise_index.json generated instance from source record Markdown stays reader projection"] generated["Generated status Mermaid: available_from_capsule_edges Atlas: blocked_until_organ_atlas_owner_lane_binds_edges"] runtime["src/microcosm_core/components/lean_std_premise_index.py run / run_index_bundle / scope_limit"] standard["standards/std_microcosm_lean_std_premise_index.json closed Lean/Std premise-index contract"] fixtures["fixtures/first_wave/lean_std_premise_index/input projection_protocol, premise_index, index_policy, negative cases"] bundle["examples/lean_std_premise_index/exported_lean_std_premise_index_bundle source_module_manifest: 6 copied body modules"] tests["tests/test_lean_std_premise_index.py fixture, manifest, bundle, and runtime-shape checks"] result records["result records/first_wave/lean_std_premise_index result records/runtime_shell/demo_project/components/lean_std_premise_index"] ceiling["Scope limit no Lean/Lake, Mathlib, proof bodies, providers, benchmark authority, source-file changes, public sharing, or launch-scope decision"] bundle --> instance instance --> generated standard --> runtime fixtures --> runtime bundle --> runtime runtime --> tests tests --> result records generated --> ceiling result records --> ceiling
Technical Mechanism
The mechanism is a two-entry validator over copied public artifacts, not a proof engine. run reads the first-wave fixture inputs, opens the declared source premise-index source artifact, verifies the declared source_sha256, normalizes Lean toolchain paths into lean-toolchain://.../Init/... public refs, compares every public row against the source row signature, and then checks the protocol, policy, copied-material contract, namespace coverage, split coverage, negative cases, secret exclusion scan, and scope limit before writing metadata-only result, board, validation, and sign-off result records. run_index_bundle applies the same public boundary to the exported bundle and requires the source-module manifest to verify six copied body-material files by source ref, target ref, digest, line count, byte count, and source-to-target equivalence while keeping body text out of result records.
The proof consumer is therefore concrete and local: tests/test_lean_std_premise_index.py asserts that the validator observes all five negative cases, imports the real Ring2 premise-index source artifact, rejects digest, row-count, row-signature, source-ref, source-module digest, and rehash-body-swap mutations, and validates the runtime-shell bundle shape. The positive fixture carries 11 premise rows across Nat, Bool, List, and Iff; the source-open body floor carries one normalized Lean/Std premise index plus five Ring2 source result record or pattern bodies. This is evidence of a bounded public premise catalog and copied-source manifest, not evidence of Lean formal-result correctness.
The governing lattice is source-backed through the bundle-generated instance: paper_module.lean_std_premise_index explains the lean_std_premise_index component and the two mechanism.lean_std_premise_index.* mechanisms, is governed by concept.formal_math_and_proof_witness_bundle, cites P-1, P-2, P-3, P-6, and P-8, abides by AX-1, AX-2, AX-5, and AX-7, and depends only on paper_module.formal_math_premise_retrieval.
Inputs
projection_protocol.json records source pattern ids, source source refs, public replacement refs, projection result records, omitted material, and copy policy.
premise_index.json carries public metadata rows: premise id, declaration name, namespace, Init/ source ref, retrieval terms, and split eligibility.
index_policy.json keeps the closed-index scope limit explicit.
source_module_manifest.json records six source-open body imports: the normalized Lean/Std premise index plus five exact bodies from the formal-math premise-retrieval pipeline (source result records and graph-pattern bodies) under source_modules/.
Prior Art Grounding
This component is grounded in formal-library indexing and premise-selection work. The Lean mathematical library anchors the library-as-corpus side, while LeanDojo and HOList anchor the need for premise metadata, retrieval splits, and theorem-proving environments that can be inspected by learning systems.
Microcosm borrows the closed-index discipline: premise ids, declaration names, namespaces, source refs, retrieval terms, split eligibility, and source-module digests are public metadata, while proof bodies and oracle-needed ids remain outside the public boundary. It does not import Mathlib or prove theorems.
Negative Cases
The fixture rejects:
Mathlib premise refs;
proof-body leakage;
oracle-needed premise ids;
test-split tuning authority;
namespace rows without Init/ source refs.
These are stable negative cases because the index is intended to be useful without becoming proof authority.
Result records
The validator emits:
lean_std_premise_index_result.json;
lean_std_premise_index_board.json;
lean_std_premise_index_validation_receipt.json;
an sign-off result record under result records/sign-off/first_wave/.
Runtime-shell execution emits exported_lean_std_premise_index_bundle_validation_result.json after checking the source-module manifest, target file digests, line counts, byte counts, and secret-exclusion boundary.
Reader Evidence Routing
Start with the JSON Bundle Binding to identify the source row, generated instance, and scope limit.
Use Structured Lattice Bindings only as navigation evidence; the resolved dependency edge points to the premise-retrieval module and does not expand the closed-index proof boundary.
Use Inputs and Result records when checking whether public metadata, copied body manifests, and runtime-shell validation stayed body-safe.
Use Negative Cases and Scope limit together when deciding whether a proposed public claim exceeds the closed-index boundary.
This module supports only the reader-verifiable claim that public Lean/Std premise metadata, source refs, retrieval terms, split eligibility, and copied source-module digests can be indexed without exposing proof bodies or oracle-needed ids. It does not run Lean or Lake, import Mathlib, prove formal-result correctness, tune on test split truth, use external model services, include launch operations, or certify secret-export safety.
World-Model Projection Drift Control RoomThe world-model projection drift control room validates public metadata-only projection-drift rows and copied source-module bodies without treating projections as source authority or repair authority.
World-Model Projection Drift Control Room is the public projection-drift boundary for Microcosm. It validates eight drift rows, source refs, repair routes, validation refs, target refs, source-module digest evidence, copied world-model and view-quality source bodies, secret-exclusion policy, and eight negative cases while keeping private runtime bodies, model-output data, live repair, source-file changes, automatic doctrine changes, launch, public sharing, and source-authority claims out of scope.
Scope limit Public metadata-only runtime result record and copied source-module evidence only; no private runtime body inspection, source authority, source-file changes, live route repair, automatic doctrine changes, model-output data export, launch-scope decision, publishing-scope decision, or whole-system correctness claim.
world_model_projection_drift_control_room is Microcosm's public projection-drift control component. It turns projected world-model rows into an auditable runtime result record: each row must carry a source signal, source ref, target ref, repair route, validation ref, fact-authority mesh, and explicit scope boundary booleans before the projection can pass.
The mechanism is deliberately narrow. It validates that public, metadata-only projection rows remain tied to named source evidence and rejection policy; it does not claim that the projection is source authority, that a live route was repaired, that private runtime state was inspected, or that Microcosm is public sharing-authorized or launch-authorized.
Purpose
This component exists to answer one question: when a public read model says something has drifted, can that claim still be traced back to a real source artifact, or has the read model quietly started to stand in for the source?
The design choice that makes this more than a shape check is that the supplied drift_rows.json is never trusted as input. The validator recomputes the drift rows from the public runtime result record, then treats the supplied file only as an expected snapshot whose role is recorded as expected_snapshot_not_source_authority. If the snapshot disagrees with the recomputed rows, that is flagged as staleness, not accepted as fact. Each recomputed row is then diffed against a real source-state artifact: a row from the extracted-pattern ledger, or a view-quality action-map lens whose own summary is re-derived from its action rows. A row that cannot be re-derived from source, or whose guard reference or derivation path has changed, moves the verdict to blocked.
The same boundary holds in the other direction. A drift row may name a repair route, but the route stays a label rather than an action: the validator rejects any row that authorises live repair, source-file changes, automatic doctrine changes, or launch. A projection here can describe what is wrong and where to go next without ever being allowed to act on it or to speak for the source it describes.
Telos
Projection drift is the failure mode where a useful read model begins to look like truth. A dashboard row, generated structured source record, route card, or public runtime result record can be correct enough for navigation while still being downstream of a source artifact that owns the actual authority.
This component makes that boundary executable. It accepts public drift rows only when they retain:
a real source signal and source ref
a target ref that names where the projection appears
a repair-route label that remains a route, not a live mutation
a validation ref that can witness the row
a fact-authority record with authority, appearance, derivation, guard, and residual-route fields
metadata-only result record policy and an explicit scope limit
Technical Object
The runtime locus is src/microcosm_core/organs/world_model_projection_drift_control_room.py. The exported public example is examples/world_model_projection_drift_control_room/exported_projection_drift_control_bundle. The accepted first-wave fixture is fixtures/first_wave/world_model_projection_drift_control_room/input.
The component exposes two public validation routes:
The validator recomputes the public projection rows from runtime result records and source artifacts, then compares them with the supplied fixture snapshot. A row passes only when the recomputed projection, supplied snapshot, source-ref evidence, source-state diff, source-module manifest check, copied-body geometry probe, runtime result record witness, and non-public-state exclusion scan all stay inside the public boundary.
The core result payload records:
drift_summary.row_count: 8
source_ref_count: 8
target_ref_count: 8
repair_route_count: 8
validation_ref_count: 8
fact_authority_row_count: 8
guarded_projection_treatment_count: 8
unguarded_duplicate_count: 0
runtime_receipt_witnessed_row_count: 8
source_authority_claim_count: 0
live_repair_authorized_count: 0
source_mutation_authorized_count: 0
automatic_doctrine_promotion_count: 0
The source-state result record evidence is intentionally small and inspectable. The focused test suite expects exactly two source-state evidence classes: extracted_pattern_ledger_row_diff and view_quality_action_map_summary_diff.
Runtime Result record Evidence
The public result record floor is metadata-only. The first-wave result records live at:
The exported-bundle result record records body_import_status: real_runtime_receipt_landed, body_material_status: copied_non_secret_macro_body_landed, body_copied_material_count: 4, body_in_receipt: false, and release_authorized: false. Its scope limit also sets source_authority_claim, source_mutation_authorized, live_route_repair_authorized, automatic_doctrine_promotion_authorized, provider_payload_exported, publication_authorized, and release_authorized to false.
Source-Available Body Floor
The exported bundle includes copied source bodies so a reader can inspect the implementation class without receiving private runtime state in the result record. The source-module manifest is:
Every manifest row is body_copied: true, body_in_receipt: false, classification: copied_non_secret_macro_body, and material_class: public_macro_tool_body, with sha256_match: true. The largest bodies are the Station world-model reducer system/server/world_model.py, the /api/drift endpoint in system/server/main.py, the view-quality action-map builder tools/meta/observability/view_quality_census.py, and its focused source regression test system/server/tests/test_view_quality_census.py.
The body floor is therefore source-available by bundle, not by result record. Result records carry paths, hashes, counts, anchor checks, and verdicts; they do not duplicate private bodies, model-output data, browser UI state, account or browser material, source notes, recipient-send state, or account secret-equivalent payloads.
Mutation and Rejection Contract
The validator is not a shape-only check. The focused test suite mutates the public inputs and requires the verdict to move to blocked when authority or freshness is broken:
missing source refs produce DRIFT_SOURCE_REF_REQUIRED
missing repair or validation refs produce DRIFT_VALIDATION_REF_REQUIRED
Additional source-drift tests cover unwitnessed runtime rows, stale supplied snapshots, mutated runtime result record refs, missing source-ledger rows, source ledger rows without source_refs, view-quality source-file changes, internally consistent fake source refs, and selected-row order drift. These cases matter because a projection can be internally coherent and still lose authority if its source evidence, guard result record, or derivation path changes.
Shape
Source refs
Public runtime result record
public_projection_drift_control_lens.json
expected snapshot, source-linked only
Supplied drift_rows.json
Diagram source
flowchart TD Result record["Public runtime result record public_projection_drift_control_lens.json"] Recompute["Recompute drift rows from selected_pattern_ids + result record rows"] Snapshot["Supplied drift_rows.json expected snapshot, source-linked only"] SourceDiff["Source-state diff extracted-pattern ledger + view-quality action map"] Geometry["View-quality geometry grade via copied view_quality_census.py"] Witness["Runtime result record witness every recomputed row appears in the result record"] Reject["Rejection gates missing/fake refs, private export, source authority, live repair, source-file changes, doctrine changes, launch"] Result records["metadata-only result records first-wave, sign-off, exported bundle"] Ceiling["Scope limit projection evidence only"] Result record --> Recompute Recompute --> Snapshot Recompute --> SourceDiff Recompute --> Witness Recompute --> Geometry Snapshot --> Reject SourceDiff --> Reject Witness --> Reject Geometry --> Reject Reject --> Result records Result records --> Ceiling
Reader Evidence Routing
Read in this order:
Bundle and generated instance: core/paper_module_capsules.json::paper_modules[27:paper_module.world_model_projection_drift_control_room] and paper_modules/world_model_projection_drift_control_room.json.
Runtime source and focused tests: src/microcosm_core/organs/world_model_projection_drift_control_room.py and tests/test_world_model_projection_drift_control_room.py.
First-wave fixture and result records: fixtures/first_wave/world_model_projection_drift_control_room/input, receipts/first_wave/world_model_projection_drift_control_room/, and result records/sign-off/first_wave/world_model_projection_drift_control_room_fixture_acceptance.json.
Exported-bundle evidence: examples/world_model_projection_drift_control_room/exported_projection_drift_control_bundle/ and receipts/runtime_shell/demo_project/organs/world_model_projection_drift_control_room/exported_projection_drift_control_bundle_validation_result.json.
Generated projection evidence: Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges_after_atlas_binding, and the one selective dependency residual preserved by the generated JSON instance.
Prior Art Grounding
This control room watches a world-model projection for drift between what the model expects and what the runtime reports. It draws on the model-monitoring and concept-drift literature, which treats a growing gap between predicted and observed behaviour as an operational signal. Microcosm borrows the drift-as-signal shape over metadata-only result records; the result is fixture-bound monitoring evidence, source-linked only, private runtime inspection, or whole-system correctness.
The component validates metadata-only drift result records and public-source refs. It supports inspection of recorded drift rows; live repair, source control, doctrine changes, model-output export, public sharing, and launch are outside the fixture. It also does not claim that every possible world-model drift source is covered. Its claim is narrower: the named public drift rows are guarded by source refs, target refs, validation refs, fact-authority mesh, copied source body evidence, metadata-only result records, and negative-case rejection.
Scope limit
This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.
It may not claim more than its bundle scope limit allows: Public metadata-only runtime result record and copied source-module evidence only; no private runtime body inspection, source authority, source-file changes, live route repair, automatic doctrine changes, model-output data export, launch-scope decision, publishing-scope decision, or whole-system correctness claim.
The generated JSON instance reports source_authority: json_capsule, 19 resolved relationship edges, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges_after_atlas_binding, and one honest selective residual for paper_module.depends_on.paper_module because the bundle does not yet name a sibling dependency module.
Public Reveal WalkthroughThe public reveal walkthrough validates a ten-minute cold-reader path through commands, routes, evidence refs, source-open body imports, negative cases, and scope limits without claiming launch-scope decision or private-system equivalence.
Public Reveal Walkthrough is the source-backed public-entry membrane for Microcosm. It checks the declared reveal steps, runnable command set, evidence refs, claim-floor phrases, secret-exclusion scan, source-open body import manifest, runtime-bundle shape, and four overclaim negative cases while keeping copied bodies out of result records and launch/provider/private-equivalence claims out of scope.
Scope limit Public reveal fixture and exported-bundle result records only; no launch, hosted deployment, public sharing, recipient work, external model access, secret export, private-system equivalence, Lean/Lake execution, source-file changes, or whole-system correctness.
public_reveal_walkthrough is the accepted component that makes Microcosm's public reveal executable instead of descriptive.
It validates a ten-minute cold-reader path:
Compile a project into .microcosm/.
Inspect catalog, patterns, and routes.
Explain one route through patterns, standard pressure, work, events, and evidence.
Open the observatory causal chain before raw JSON drilldown.
Run microcosm intake to see the source projection intake cells connected to spine, reveal, and runtime evidence.
Read the result records and scope limit.
The component reads public fixtures from fixtures/first_wave/public_reveal_walkthrough/input/ and exported runtime input from examples/public_reveal_walkthrough/exported_public_reveal_bundle/.
result records/sign-off/first_wave/public_reveal_walkthrough_fixture_acceptance.json
The reveal path treats microcosm intake as a runtime bridge rather than a private planning note. The command exposes runtime_reveal_import_bridge, keeps formal_math_readiness_extensions visible as a public replacement when its extension board exists, and points back to the source projection intake board without copying private source bodies.
Purpose
A cold reader meeting Microcosm for the first time needs one thing the README cannot give them on its own: proof that the first ten minutes are real and not a tour of screenshots. This component answers a single question. Can a reader who has never seen the system run a short, fixed path from a command to local state, to a route, to the result record and source boundary behind it, with nothing on that path that the system does not actually run?
The validator enforces that path as an accounting floor rather than a narrative. A reveal only passes if it carries at least five steps, four distinct runnable commands, and four evidence refs, and if four overclaim fixtures stay rejected: a launch or hosting claim, a private-data equivalence claim, a step with no evidence ref, and marketing copy with no command behind it. The floor exists because a walkthrough drifts towards a hero pitch the moment it is allowed to. Removing the commands and the result record refs is the easiest way to make a reveal look more impressive and prove less.
The part worth noting is the real-lane witness. The fixture run does not pass on its own paperwork. It is gated on the exported reveal bundle actually running, with its copied source bodies present and digest-verified. If that backing run is missing or blocked, the fixture is marked blocked too, with real_runtime_receipt set to false. So the reveal cannot describe a runnable path while the runnable path is broken underneath it, which is the quiet failure mode of every quick-start guide that says more than it can execute.
Shape
Public Reveal Walkthrough is the source-backed entry membrane for a cold technical reader. It turns the local Microcosm first-run path into a runnable accounting exercise: commands produce local state, routes point at work and events, evidence refs point at result records, and scope limits keep the visual or browser layer from becoming a product or public-sharing claims.
Source refs
JSON bundle
paper_module.public_reveal_walkthrough
Runtime component
public_reveal_walkthrough.py
Diagram source
flowchart TD Bundle["JSON bundle paper_module.public_reveal_walkthrough"] Fixture["First-wave public reveal fixture 10-minute route + negative cases"] Bundle["Exported public reveal bundle 5 copied source bodies"] Runtime["Runtime component public_reveal_walkthrough.py"] Shell["Runtime shell bridge microcosm intake + public reveal view"] Result records["metadata-only result records result, board, validation, sign-off"] Reader["Cold reader route command -> route -> evidence refs -> ceiling"] Ceiling["Scope limit no launch, hosting, provider, or private-system claims"] Bundle --> Runtime Fixture --> Runtime Bundle --> Runtime Runtime --> Result records Runtime --> Shell Shell --> Reader Result records --> Reader Reader --> Ceiling
The runtime shape has five bounded inputs:
the public reveal fixture under fixtures/first_wave/public_reveal_walkthrough/input;
the exported reveal bundle under examples/public_reveal_walkthrough/exported_public_reveal_bundle;
the source-module manifest for copied source bodies;
the component source and focused tests that enforce command, evidence, and negative-case behavior;
the standard and JSON bundle that bind the paper module to the mechanism, source locus, and scope limit.
The proof shape is route-first rather than dashboard-first. A valid reveal shows a command, a selected route, the route explanation through work/events/ evidence, result record refs, evidence-class counts, and the scope boundary beside any impressive total. Generated cards, observatory views, and browser/video boards are presentation layers over that accounting path.
The negative-case shape is part of the floor. launch or hosting overclaims, private-data equivalence, missing evidence refs, and marketing-only reveal material must remain rejected. If those refusals stop appearing, the reveal is no longer bounded enough for a cold reader.
The source-open shape is also bounded. The exported bundle carries five copied public bodies, and the manifest verifies exact-copy relation, digests, material classes, and metadata-only result records.
Evidence/accounting:
Bundle authority: core/paper_module_capsules.json::paper_modules[paper_module.public_reveal_walkthrough] sets source_authority: json_capsule, binds the component, binds mechanism.public_reveal_walkthrough.validates_public_reveal_walkthrough, and resolves src/microcosm_core/organs/public_reveal_walkthrough.py.
Generated instance: paper_modules/public_reveal_walkthrough.json reports source_authority: json_capsule, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges_after_atlas_binding, 20 relationship edges, and a resolved paper_module.depends_on.paper_module edge to paper_module.first_screen_composition_root because the reveal path spends the first-screen composition contract before deeper route/evidence drilldown.
Runtime and shell consumers: src/microcosm_core/organs/public_reveal_walkthrough.py exposes run, run_reveal_bundle, _source_module_manifest_result, _source_open_body_import_summary, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and PUBLIC_SAFE_SOURCE_BODY_CLASSES. src/microcosm_core/runtime_shell.py routes the exported reveal bundle through public_reveal_walkthrough.run_reveal_bundle and publishes the public_reveal_view runtime lens.
Result record and test floor: receipts/first_wave/public_reveal_walkthrough/public_reveal_walkthrough_result.json, ten_minute_reveal_board.json, public_reveal_validation_receipt.json, and result records/sign-off/first_wave/public_reveal_walkthrough_fixture_acceptance.json are metadata-only evidence. tests/test_public_reveal_walkthrough.py checks the fixture path, exported-bundle path, source-module digest validation, negative cases, and public-relative result record posture.
Claim boundary: standards/std_microcosm_public_reveal_walkthrough.json, the generated structured source record, and this page limit the module to public reveal walkability, route/evidence accounting, exact-copy public source-body import evidence, negative-case rejection, and metadata-only result records. They do not include launch operations, hosted deployment, public sharing, recipient work, external model access, secret export, private-system equivalence, source-file changes, Lean/Lake execution, or whole-system correctness.
Source-Backed Mechanism
The source mechanism is mechanism.public_reveal_walkthrough.validates_public_reveal_walkthrough in core/mechanism_sources.json.
The runtime locus is src/microcosm_core/organs/public_reveal_walkthrough.py. The source symbols that matter for cold-agent drilldown are:
run
run_reveal_bundle
_source_module_manifest_result
_source_open_body_import_summary
EXPECTED_NEGATIVE_CASES
AUTHORITY_CEILING
PUBLIC_SAFE_SOURCE_BODY_CLASSES
The governing standard is standards/std_microcosm_public_reveal_walkthrough.json. Its paper_module_contract binds this Markdown module to core/paper_module_capsules.json#paper_module.public_reveal_walkthrough and to the mechanism row above.
The atlas source row is intentionally not claimed as complete in this pass: core/organ_atlas.json is the source surface that must later receive paper_module_ref, mechanism_refs, and code_loci for this component. The re-entry capture is cap_quick_public_reveal_atlas_edge_population_wait_147e39c7a896.
Source-Open Body Imports
The exported reveal bundle carries five copied source bodies under examples/public_reveal_walkthrough/exported_public_reveal_bundle/source_modules/. The authority manifest is examples/public_reveal_walkthrough/exported_public_reveal_bundle/source_module_manifest.json.
The public component source body that validates reveal commands, claims, digest evidence, and metadata-only result records.
All five rows are exact-copy imports, body_in_receipt=false, and digest checks must pass before the exported reveal bundle can count as source-backed. Result records may name refs, hashes, counts, and verdicts; they do not embed copied body text.
First Commands
From microcosm-substrate/, the first fixture command is:
The reveal board should not ask a cold reader to decode evidence-class numbers from context. When the walkthrough shows source-open body material counts, verified import counts, subprocess witnesses, algorithmic projection counts, or rows with source imports, it should pair each number with the evidence class and the scope boundary:
Counts prove that the public route exposes an inspectable accounting surface.
Counts do not prove launch-scope decision, whole-system correctness, or equal evidence depth across every component.
A small high-authority count is stronger than a large low-authority count for the claim it actually covers.
Generated or projected rows are reveal handles; source files, validators, result records, and scope limits remain the proof surfaces.
This keeps the public reveal from becoming a dashboard of impressive totals. The first reveal task is to show how a reader can move from number to result record to source boundary without crossing into private bodies, model-output data, account or browser state, or launch claims.
Reveal First View
The reveal board should open with the same compression grammar as the first-screen card, then widen only after the reader has a route to inspect:
Restate the bounded claim frame.
Show the command that produced the local state.
Show one route explanation with result record refs.
Show the evidence-count legend beside the result record refs.
Show the scope limit before any totals, drilldowns, or observatory links.
This gives video-first or browser-first readers a visible artifact without turning the reveal into a marketing hero. Motion, screenshots, and observatory views are allowed presentation layers only when the same evidence legend, scope boundary, and result record refs remain on the first view.
Discipline In The Reveal
The reveal should make discipline legible as prevented failure, not as a wall of policy labels. Before showing totals or motion, the board should pair each impressive-looking artifact with the boundary that keeps it honest:
Reveal artifact
Boundary shown beside it
What the boundary prevents
Local .microcosm/ state
source_files_mutated=false plus route/work/event/evidence refs.
Reading a local demo as source-file changes, hosted launch, or external model service.
Body-import counts
verified_macro_body_import rows with validator or result record refs.
Reading copied public material as private-system equivalence.
Projection counts
Source-coupling and generated-row scope boundaries.
Reading generated cards as source authority or domain proof.
Observatory views
Compact endpoint first, full model as drilldown.
Letting browser motion replace command, result record, and evidence-class checks.
Doctrine constraints
Failure mode or scope boundary beside the constraint.
Reading governance as ceremony rather than as a specific overclaim guard.
If the reveal cannot show those boundaries on the first view, it should defer the visual flourish and keep the compact result record-backed route visible instead.
Prior Art Grounding
The public reveal path is grounded in first-run CLI and progressive-disclosure practice. The Command Line Interface Guidelines motivate a single runnable command, examples, discoverable next steps, and machine-readable output. Nielsen Norman Group's progressive disclosure pattern motivates showing the bounded first route before expanding into full observatory or JSON drilldowns.
The reveal's evidence walk also borrows from provenance and tracing patterns: W3C PROV for moving from artifact to source and result record, and OpenTelemetry traces for representing causal chains as inspectable linked work. Microcosm applies those patterns to a local walkthrough so the visual board remains evidence accounting, not a launch or maturity claim.
Browser/Video Reveal Board
The reveal board is the public visual candidate for a 60-second walkthrough. It must therefore be more than raw JSON, but it must still be less than a product claim. The first browser/video frame should show:
The command that produced the local state.
The selected route and one-line route reason.
The route explanation through work, events, evidence, and result record refs.
The evidence legend, including evidence class and scope boundary.
The compact observatory or first-screen endpoint used for the board.
The scope limit before totals, motion, or full-model drilldown.
Motion is allowed to make the causal order easier to inspect: command to local state, local state to selected route, selected route to work/event/evidence, and evidence to result record or validator. Motion is not allowed to displace the command, result record/evidence ref, scope boundary, or scope limit from the first view.
The board should end by offering exactly three next steps: reader-specific branch, result record drilldown, and full observatory JSON. That keeps the visual surface from expanding into a second README while still making the public reveal inspectable by readers who will not start in the terminal.
The validated claim is narrow:
> Microcosm turns a repo into a local operating system: patterns, routes, > work transactions, events, evidence, and explanations.
Negative fixtures reject launch or hosting overclaim, private-data equivalence, missing evidence refs, and marketing-only reveal material without runtime commands.
Reader Evidence Routing
Start with the first commands and the JSON Bundle Binding to identify the fixture, exported bundle, source record, mechanism row, standard, and result record surfaces.
For behavior questions, read src/microcosm_core/organs/public_reveal_walkthrough.py and tests/test_public_reveal_walkthrough.py before trusting this prose.
For source-open body questions, read the exported bundle's source_module_manifest.json; it is the evidence for exact-copy relation, digest match, material class, and metadata-only result record posture.
For visual or browser walkthrough questions, read the evidence legend, result record refs, scope boundary, and scope limit before reading totals, observatory links, or motion as meaningful.
Treat generated atlas docs, generated coverage projections, generated result records, copied-body presence, and browser/video boards as navigation or validation projections. They do not become source authority for launch, hosting, provider, private-system-equivalence, or whole-system claims.
This paper module describes public reveal walkthrough validation only. It excludes launch, hosted deployment, public sharing, recipient work, external model access, secret export, private-system equivalence, Lean/Lake execution, source-file changes, or whole-system correctness.
Generated atlas docs, generated coverage projections, generated result records, copied-body presence, browser/video boards, and impressive evidence totals are source-linked only. The source authority remains with the standard, bundle, mechanism row, component source, source-module manifest, validators, and result record refs named above.
Scope limit
This module may claim a bounded public reveal walkthrough over the local fixture and exported bundle: runnable commands, selected route explanation, work/event/evidence refs, source-open body import manifest checks, evidence legend, negative-case refusals, metadata-only result records, and scope limits. A diagram view is generated for this module; an atlas card is a staged exercise pending atlas owner-lane binding. One selective dependency remains open and requires a governed bundle update to resolve.
It does not claim launch-scope decision, hosted deployment, publishing-scope decision, recipient work, external model service, secret export, private-system equivalence, Lean/Lake execution, source-file changes, or whole-system correctness. Visual boards, screenshots, observatory motion, copied-body counts, and generated cards remain presentation or navigation projections over the result record path.
Standards Meta DiagnosticsTerminal public coverage diagnostic: verifies every accepted component stays mapped to a standard, runtime contract, result record, and scope limit.
Checks accepted adapter-backed components against standards_inventory/organ_runtime_contracts/diagnostic_policy; rejects 5 boundary failures (missing standard_id/standard_ref, missing inventory row, missing result record ref, launch/provider/public sharing overclaim, private-body leak); secret_exclusion_scan with body_in_receipt:false and synthetic_receipt_standin_allowed:false.
Scope limit Projection-only diagnostic; never source authority for core/standards_registry.json, no source-file changes, no provider/launch-scope decision, no whole-system correctness.
standards_meta_diagnostics is the terminal public coverage diagnostic for the Microcosm runtime spine. It checks that accepted adapter-backed components remain mapped to standards, runtime contracts, result records, and explicit scope limits before a cold reader trusts the spine as coherent.
It consumes public standards_inventory.json, organ_runtime_contracts.json, and diagnostic_policy.json inputs backed by registry refs, runtime commands, sign-off result records, and the exported diagnostics bundle. Its result record contract is source-open by default: secret_exclusion_scan proves that secrets, account or browser material, model-output data bodies, raw operator bodies, and account secret-equivalent live-access material are excluded, while public_runtime_refs point at the real standards, component, sign-off, fixture, bundle, and paper-module system. Bodies are not inlined into JSON result records, so the positive evidence uses body_in_receipt: false, real_runtime_receipt: true, and synthetic_receipt_standin_allowed: false.
The component rejects five boundary failures:
accepted component rows without standard_id or standard_ref
accepted components missing from the standards inventory
accepted component rows without result record refs
launch, provider, public sharing, secret export, trading/advice, or whole-system correctness overclaims
private source bodies or model-output data bodies in public diagnostics
Purpose
A spine of accepted components is only coherent if each component is still attached to the things that make it accountable: a standard that describes it, a runtime contract that runs it, a result record that records its last verdict, and an explicit statement of what it is not allowed to claim. As the spine grows, those four attachments drift out of step one component at a time, and the drift is silent. A new component can be accepted into the runtime while its standard file, registry row, or result record ref is never added. Nothing breaks; the gap just sits there until a reader trusts the spine and finds a hole.
This component answers a single question: does every accepted component still resolve to a standard, a runtime contract, a result record, and an scope limit, with no extra and no missing entries? It treats the answer as a graph-closure check rather than a written audit. The accepted-component list, the standard rows, the runtime-contract rows, and the result record refs must agree on exactly the same set of components. Any component that appears in one surface but not another becomes a structured finding with a named error code, not a paragraph of prose.
The unusual choice is that the diagnostic refuses to grow its own authority. It projects its positive coverage from the live registry rather than a checked-in list, so a stale example cannot quietly become the thing the spine is measured against. It carries five negative fixtures that must each surface their expected failure, so the checker is itself falsifiable. And its result records deliberately hold refs, counts, hashes, and verdicts rather than the bodies they describe, so a coverage report can be read in the open without exporting private source.
Technical Mechanism
standards_meta_diagnostics is a public consistency validator over three finite surfaces: a standards inventory, component runtime contracts, and diagnostic policy. The positive path either reads those exported JSON inputs or projects them from the live public registry, then requires the accepted-component list, the standard rows, the runtime-contract rows, and the result record refs to agree on the same component set. This is a graph-closure check, not a narrative audit: an accepted component without a standard ref, registry-backed standard row, runtime step, validator command, or result record ref becomes a structured finding.
The mechanism has four guarded stages:
run loads standards_inventory.json, organ_runtime_contracts.json, and diagnostic_policy.json, or projects the positive rows from live public registry state when the caller asks for live positives.
The validator checks every accepted component row against a resolving std_microcosm_<organ_id> standard, the standards registry entry, the runtime shell step, a non-empty validator command, and non-empty result record refs with body_in_receipt: false.
Five negative fixtures exercise the expected boundary failures: missing_standard_ref, unmapped_accepted_organ, missing_receipt_ref, release_overclaim, and private_source_leakage.
The exported-bundle path revalidates the same shape through source_module_manifest.json, exact source-module digest checks, source-open body-import accounting, secret_exclusion_scan, and the projection-only AUTHORITY_CEILING.
The output card deliberately omits the covered-component list, findings, secret-exclusion detail, source refs, public runtime refs, scope boundary, scope limit, and source-module summary from the compact payload. Those keys remain in the full result record, which keeps the reader-facing card inspectable without turning it into a private-body export.
core/paper_module_capsules.json::paper_modules[29:paper_module.standards_meta_diagnostics] is the JSON authority row. It names the component and mechanism subjects, the resolved code locus src/microcosm_core/organs/standards_meta_diagnostics.py, and the projection-only scope limit.
paper_modules/standards_meta_diagnostics.json::paper_module_payload.source_authority is json_capsule; generated_projections.mermaid.status is available_from_capsule_edges; generated_projections.atlas_card.status is linked_from_capsule_edges; relationships.edges currently has 11 edges.
organs/standards_meta_diagnostics.json::organ_payload.source_registry_row records status: accepted_current_authority, the validator command, and the generated result record refs; its claim_ceiling keeps the diagnostic scoped to the declared public contract.
src/microcosm_core/organs/standards_meta_diagnostics.py names INPUT_NAMES, NEGATIVE_INPUT_NAMES, EXPECTED_NEGATIVE_CASES, PUBLIC_RUNTIME_REFS, and AUTHORITY_CEILING, which are the runtime contract this reader section summarizes.
tests/test_standards_meta_diagnostics.py asserts the fixture and exported bundle paths, the five expected negative cases, source-module digest checks, body_in_receipt: false, real_runtime_receipt: true, and synthetic_receipt_standin_allowed: false.
result records/sign-off/first_wave/standards_meta_diagnostics_fixture_acceptance.json records status: pass, accepted_organ_count: 77, standard_mapping_count: 77, runtime_contract_count: 77, five expected error codes, secret_exclusion_scan.blocking_hit_count: 0, and the scope boundary that the diagnostic excludes launch, providers, registry mutation, formal-result correctness, or whole-system correctness.
Reader Evidence Routing
Start with the JSON Bundle Binding to identify the source record and the projection-only scope limit before treating the diagnostic as evidence.
Use Structured Lattice Bindings to understand which wiring is resolved and which dependencies remain pending. Pending dependencies are honest residuals, not hidden failures.
Use Validation Result record Path for reproducibility: focused pytest exercises the diagnostic policy and negative cases; the corpus check verifies paper-module parity.
Treat secret-exclusion and public-runtime refs as result record evidence about public projection consistency. They do not mutate standards, include launch operations, expose private source material, or prove whole-system correctness.
Named Proof Consumers
tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_observes_negative_cases is the fixture consumer. It proves that the positive public inputs cover the accepted component set and that the five expected negative cases surface their named error codes.
tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_bundle_validates_runtime_shape is the exported-bundle consumer. It checks the bundle id, covered component set, source-module manifest status, source-open body-import counts, body_in_receipt: false, and the false scope limit flags.
tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_rejects_source_module_digest_mismatch, ::test_standards_meta_diagnostics_rejects_partial_source_module_digest_mismatch, and ::test_standards_meta_diagnostics_rejects_partial_target_module_digest_mismatch are the digest-drift consumers. They make copied source-module bodies falsifiable instead of relying on manifest prose.
tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_source_modules_are_exact_macro_body_imports is the exact-copy consumer for the three public source-body imports named in the exported bundle.
tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_receipts_use_secret_exclusion is the public/private boundary consumer. It checks that result record evidence uses the secret-exclusion lane and keeps private bodies out of public diagnostics.
tests/test_standards_meta_diagnostics.py::test_standards_meta_diagnostics_input_builder_tracks_live_registry and the live-positive projection tests are the registry-freshness consumers. They keep fixture inputs tied to public registry state instead of allowing a stale checked-in example to become silent authority.
Prior Art Grounding
This component is grounded in schema- and contract-validation practice rather than in a claim that diagnostics create authority. JSON Schema treats a schema as a machine-readable vocabulary for validating structured JSON data, and OpenAPI uses interface descriptions so consumers can understand an API without reading source code or observing traffic. The component imports that pattern into Microcosm's launch boundary: standards, adapter contracts, result records, and scope limits are checked as public projections, while the diagnostic remains bounded evidence about consistency rather than a new source of truth.
Prior-art anchors:
JSON Schema validation and structured-data constraints: https://json-schema.org/
OpenAPI interface descriptions and conformance expectations: https://spec.openapis.org/oas/latest.html
This module can claim that public standards inventory, runtime contracts, accepted-component refs, result record refs, diagnostic policy, and secret-exclusion checks are consistently projected into a reader-facing diagnostics result record. It cannot claim standards-registry mutation authority, provider authority, launch-scope decision, publishing-scope decision, private source export, or whole-system correctness.
Scope limit
This is a projection-only diagnostic. It does not become source authority for core/standards_registry.json, change source files surfaces, expose private source material, authorize providers, include launch operations, or prove whole-system correctness.
Source and projection details
Source-Open Body Floor
The public diagnostics bundle is source-open as evidence about refs, policies, runtime contracts, and result records. It may expose standards inventory rows, component runtime contract rows, diagnostic policy rows, sign-off result record refs, fixture refs, bundle refs, secret-exclusion scan verdicts, and public runtime refs.
It must not inline private source bodies, model-output data bodies, source notes, account or browser material, account secret-equivalent live-access material, launch-send state, or private source-root bodies. The positive result record evidence therefore stays at body_in_receipt: false, real_runtime_receipt: true, and synthetic_receipt_standin_allowed: false.
Finance Forecast Evaluation SpineForecast-evaluation component: Diebold-Mariano / Hansen-SPA / stationary-bootstrap stats over synthetic fixtures with typed refusal discipline; no market authority.
Runs admissible forecast-evaluation statistics (Diebold-Mariano loss-differential, Harvey-Leybourne-Newbold small-sample correction, Hansen SPA with recentering, Politis-Romano stationary bootstrap, Bartlett HAC long-run variance, purged/embargoed CV) over synthetic market-shaped fixtures and copied source bodies; refusal discipline returns typed refusals (horizon>=sample length, too-small samples, leakage-prone splits, missing SciPy, advice-shaped claims) instead of crashing; computed-statistic and refused-because-inadmissible are both valid validator outcomes.
Scope limit Synthetic market-shaped fixtures only; not investment or trading decisions, no live market data, no track record or performance claim, mutates no optimizer, SciPy absence is a typed HLN refusal.
finance_forecast_evaluation_spine is a Crown Jewel import component with real runnable system and a strict public scope limit. It consumes synthetic public fixtures, copied source source bodies, and source manifests that verify sha256 digests, line counts, required anchors, secret-exclusion status, and result record body omission.
Purpose
Comparing two forecasting models is harder than it looks. A lower average loss does not establish that one model genuinely predicts better, because losses are autocorrelated, samples are short, and a careless split can let a model peek at the answer. This component exists to carry the statistical machinery that economists use to answer that question carefully, and to do so without ever claiming the machinery has been pointed at a real market.
The single question it answers is narrow: given two paired loss series over a synthetic fixture, can the difference in predictive accuracy be called significant under an admissible test, or must the test refuse? It computes the Diebold-Mariano loss-differential statistic with a Bartlett HAC long-run variance, the Harvey-Leybourne-Newbold small-sample correction, Hansen's test for superior predictive ability with recentering, a model confidence set, and a Politis-Romano stationary bootstrap.
Failure is handled explicitly. The Harvey-Leybourne-Newbold correction returns its computed statistic, but when SciPy is absent it refuses the p-value with a typed reason rather than fabricating one. The same discipline rejects a horizon that reaches the sample length, a sample too small to estimate anything, a time split that lets the evaluation date sit at or after the event window, and any policy flag that smuggles in advice or a track-record claim. A refusal is recorded as a first-class validator outcome, not an error: "we declined to answer" is itself a valid result.
The guards run before the statistics. If a boundary policy or a leakage check fails, the result record is blocked before any statistics subprocess starts, so an inadmissible request never produces a number that could be misread as a result.
What it proves: synthetic fixture forecast-evaluation statistics only; no investment-related actions, live market data, track record, or performance claim.
How to run it:
microcosm finance-forecast-evaluation-spine run --input fixtures/first_wave/finance_forecast_evaluation_spine/input --out receipts/first_wave/finance_forecast_evaluation_spine
Negative cases covered by the fixture manifest: finance_hln_dependency_refusal, finance_leakage_lookahead_split, finance_no_advice_overclaim.
Source provenance is anchored by examples/finance_forecast_evaluation_spine/exported_finance_eval_bundle/source_module_manifest.json and result records carry refs, digests, counts, verdicts, and scope boundaries only.
Shape
Source refs
Runner
finance_forecast_evaluation_spine.run
Diagram source
flowchart TD Fixture["Synthetic fixture inputs family_loss_matrix, paired_loss_series, finance_boundary_policy, projection_protocol"] Source["Copied finance modules plus source manifest digests"] Runner["finance_forecast_evaluation_spine.run"] Guards["Guards run first policy no-advice flags, lookahead-split leakage check"] Blocked["Blocked result record statistics subprocess never starts"] Branch{"Admissible and exported bundle?"} Subprocess["Statistics subprocess DM/HAC, Hansen SPA, MCS, stationary bootstrap, HLN refusal"] Standalone["Standalone statistics contract no live source-root subprocess"] Result record["Result records refs, hashes, counts, verdicts, scope boundaries; body_in_receipt false"] Fixture --> Runner Source --> Runner Runner --> Guards Guards -->|"boundary fails"| Blocked Guards -->|"boundary passes"| Branch Branch -->|"first-wave fixture"| Subprocess Branch -->|"exported bundle"| Standalone Subprocess --> Result record Standalone --> Result record Blocked --> Result record
Technical Mechanism
The module is a deterministic forecast-evaluation harness around CrownJewelSpec, not a finance product. The spec fixes four required fixture inputs (family_loss_matrix.json, paired_loss_series.json, finance_boundary_policy.json, and projection_protocol.json), names the three required negative cases, binds the source manifest, and restricts the source-open import to required anchors in model_selection_stats.py, spa_statistics.py, loss_differentials.py, and family_loss_matrix.py.
At runtime, run delegates to run_crown_jewel_organ with evaluate and evaluate_negative_case. evaluate loads the synthetic loss matrix, paired loss series, and boundary policy, then calls _evaluate_payloads. That function first enforces the policy and lookahead-split guards; if either boundary fails, it returns a blocked result record before any statistics subprocess can run. Only after those guards pass does it run the copied statistics modules or, for the exported bundle path, use _standalone_exported_statistics_contract so the standalone public bundle does not depend on a live source-root subprocess.
The statistical witness is therefore deliberately narrow: Reality Check, Hansen-SPA, MCS, Diebold-Mariano/HAC, stationary bootstrap, and the HLN refusal are result record fields over the synthetic fixture. The same mechanism treats finance_hln_dependency_refusal as a typed negative case when SciPy support is absent, treats policy overclaims as FINANCE_NO_ADVICE_OVERCLAIM, treats temporal leakage as FINANCE_LOOKAHEAD_SPLIT_FORBIDDEN, and keeps copied source bodies out of result records with body_in_receipt: false.
Reader Evidence Routing
Read the positive fixture as a small statistical witness, not as a market result. The current result record has status: pass, sample_size: 40, candidate_count: 3, reality_check.status: computed_bootstrap, spa.status: computed_bootstrap, mcs.implemented: true, paired_loss.diebold_mariano.status: computed_hac_normal_approximation, and a five-replicate stationary-bootstrap witness. Those fields show that the component can exercise the copied forecast evaluation code paths on public synthetic data.
Read the negative floor as equal evidence. The observed negative cases are finance_hln_dependency_refusal, finance_leakage_lookahead_split, and finance_no_advice_overclaim, with stable error codes FINANCE_HLN_TYPED_REFUSAL_REQUIRED, FINANCE_LOOKAHEAD_SPLIT_FORBIDDEN, and FINANCE_NO_ADVICE_OVERCLAIM. The HLN case refuses because SciPy is unavailable for the t-distribution; that is the intended scope limit, not a missing p-value to fill in by hand.
Read source-open evidence through the manifest, not through result records. The source bundle carries 13 copied finance modules; result records carry references, hashes, counts, verdicts, and scope boundaries, and keep body_in_receipt: false. The local claim therefore stays at "synthetic fixture forecast-evaluation statistics and typed refusals." It does not become investment-related actions, live-market data, a track record, performance proof, optimizer authorization, or launch-scope decision.
Forecast-Evaluation Discipline
This component is evidence that the Microcosm can carry professional forecast evaluation logic without pretending to carry market authority. The admissible statistics include Diebold-Mariano loss-differential testing, the Harvey-Leybourne-Newbold small-sample correction, Hansen's SPA test, a Politis-Romano stationary bootstrap, Bartlett HAC long-run variance, and purged/embargoed cross-validation in the Lopez de Prado style.
The important doctrine is refusal discipline. Horizons greater than or equal to sample length, samples too small to estimate a statistic, leakage-prone splits, missing SciPy support, and advice-shaped claims must return typed refusals instead of crashes or meaningless numbers. Hansen-style recentering of poor or irrelevant alternatives is part of the SPA contract because it is the boundary between a useful superior-predictive-ability test and White Reality Check style over-penalization.
Result records should therefore distinguish "computed statistic" from "refused because inadmissible." Both are successful validator outcomes when the fixture asked for that behavior.
Named Proof Consumers
Runtime fixture consumer: finance_forecast_evaluation_spine.run over fixtures/first_wave/finance_forecast_evaluation_spine/input must produce status: pass, the three observed semantic negative cases, false advice/live-data/performance authority flags, and metadata-only source-manifest result record material.
Exported-bundle consumer: run-finance-forecast-bundle over examples/finance_forecast_evaluation_spine/exported_finance_eval_bundle must validate the 13 copied finance modules by digest and use the standalone statistics contract rather than a live source subprocess.
Focused pytest consumer: tests/test_finance_forecast_evaluation_spine.py must keep the positive statistical fixture, no-advice overclaim, live-market overclaim, lookahead split, semantic-negative-case, standalone-bundle, and digest-mismatch tests green.
Corpus consumer: scripts/build_doctrine_projection.py --check-paper-module-corpus must keep the 98-module Microcosm paper-module corpus valid without hand-editing the generated JSON instance.
Scope limit consumer: any public or dissemination copy must preserve the local ceiling that this is synthetic fixture forecast-evaluation evidence, not investment-related actions, live data, performance proof, optimizer authorization, or launch-scope decision.
Prior Art Grounding
This component is grounded in forecast-evaluation statistics rather than trading systems. The core anchors are the Diebold-Mariano test for comparing predictive accuracy, the Harvey-Leybourne-Newbold small-sample correction for prediction-error tests (DOI reference), Hansen's test for superior predictive ability, and proper-scoring-rule work such as Gneiting and Raftery. The purged/embargoed split discipline also follows the financial ML concern that temporal leakage can make backtests look stronger than they are.
Microcosm borrows the professional evaluation posture: compute admissible statistics when the fixture supports them, return typed refusals when it does not, and keep evaluation separate from advice, live market data, or performance claims.
Finance forecast evaluation spine proves only synthetic market-shaped forecast-evaluation fixture behavior, copied source manifest integrity, metadata-only result records, admissible statistic computation, and typed refusals for inadmissible finance claims. A diagram view and atlas navigation entry are generated for this module, but those navigation projections do not expand the proof. This module is not investment or trading decisions, uses no live market data, proves no track record or performance claim, mutates no optimizer, certifies no trading strategy, and treats SciPy absence as a typed HLN refusal rather than a hidden statistical success.
Source and projection details
Governing Lattice Relation
The generated JSON instance resolves six bundle-derived edges for this module: it explains component finance_forecast_evaluation_spine, explains mechanism mechanism.finance_forecast_evaluation_spine.validates_public_finance_forecast_evaluation_spine, is governed by concept concept.research_and_science_replay_evidence_bundle, is governed by principle P-8, abides by AX-7, and cites the code locus src/microcosm_core/organs/finance_forecast_evaluation_spine.py. Those edges come from core/paper_module_capsules.json::paper_modules[30:paper_module.finance_forecast_evaluation_spine] and the generated structured source record, not from this Markdown prose.
Mechanically, P-8 and AX-7 show up as refusal discipline: an admissible statistic can pass, but advice-shaped policy flags, live-market authority, leakage-prone time splits, source digest mismatch, and fake HLN p-values must block. The concept edge keeps the module in the research/science replay-evidence family, where proof value is a reproducible fixture and source-manifest witness rather than a claim about markets.
Engine Room DemoComposition component: verifies the 14 staged Engine Room jewel targets and their owned bundle surfaces through the public fixture chain; composition contract only.
Wraps the bundles under microcosm_core.engine_room, verifies the 14 controller-selected jewel targets, checks each owned staged bundle surface (module source, fixture input, fixture manifest, paper module, standard, tests), executes the staged demo through the public fixture chain, and observes a negative fixture where an expected target is intentionally absent.
Scope limit Validates the declared public composition contract only; not deployment posture, not private-system equivalence, not a frontier theorem-proving claim, not a complete security proof, not benchmark validation, not launch-scope decision.
engine_room_demo is the accepted Microcosm composition component for the staged Engine Room set. It wraps the bundles under microcosm_core.engine_room, runs the composed demo/audit path, and writes first-wave result records without promoting fixture rows into private-system or launch-scope decision.
Purpose
The Engine Room set is ten separate bundles: a Lean proof-search lab, a metabolism runtime, command singleflight, a generated-projection drift gate, a derived-fact engine, a public-projection leak gate, an egress self-compliance gate, a navigation-fitness benchmark, a bridge-campaign DAG, and an reference knowledge router. Each bundle has its own fixture and result record. This component exists so that a reader does not have to trust ten claims separately. It answers one question: do the ten bundles together cover the fourteen targets the controller asked for, and does each one still own its full surface and run.
A bundle "owns its surface" only when six files exist for it: module source, fixture input, fixture manifest, paper module, standard, and test. The audit checks all six per bundle, runs each fixture through its declared evaluator, and unions the targets the bundles actually declare against the fourteen the controller expected. A passing run means the set is complete and every fixture executed, not that any single bundle is finished or correct.
The design choice worth noting is in the negative case. Rather than compare against a frozen answer key, the negative fixture recomputes the live set of covered targets and fails only when the fixture names a target that is genuinely outside it. That keeps the refusal honest as the bundle set grows: the test cannot drift into agreement with a stale list, because there is no stored list to agree with.
A second deliberate boundary is that the runner reads the shared component registry, sign-off file, and atlas, but never writes to them. It reports whether the composition component is integrated into those shared surfaces as a separate visibility line, and always records shared_registry_mutated: false. Composition coverage and shared-registry integration are kept as two distinct facts, so a green demo cannot quietly imply registry authority it does not hold.
What It Runs
Verifies the 14 Engine Room jewel targets selected by the controller prompt.
Checks the owned staged bundle surfaces: module source, fixture input, fixture manifest, paper module, standard, and tests.
Executes the staged bundle demo through the public fixture chain.
Observes a negative fixture where an expected target is intentionally absent.
Shape
Diagram source
flowchart LR A["Engine Room fixture cases"] --> B["Accepted component wrapper"] B --> C["Controller coverage audit"] C --> D["10 staged bundle evaluators"] D --> E["14 covered jewel targets"] C --> F["Shared surface integration check"] B --> G["Result, board, validation result record"] G --> H["Sign-off result record"] A --> I["Missing-target negative case"] I --> C
The shape is a composition proof over declared public bundles. The wrapper asks the staged Engine Room runner to verify target coverage, surface presence, fixture execution, shared-surface visibility, and the missing-target negative case. It writes public result records and an sign-off result record without exporting private source run state or turning the staged demo into launch-scope decision.
Technical Mechanism
src/microcosm_core/organs/engine_room_demo.py is a result record-writing wrapper around src/microcosm_core/engine_room/demo.py. The wrapper loads one or more fixture cases, calls _evaluate_case for each case, and writes four metadata-only artifacts: result, board, validation result record, and optional sign-off result record. The positive case delegates to audit_controller_coverage; the negative case does not compare against a static answer key, but recomputes the actual staged target set and fails only when the fixture names a target outside that set.
audit_controller_coverage is the mechanism that makes the composition claim specific. It enumerates the ten CAPSULES, unions their declared jewel targets against EXPECTED_JEWEL_TARGETS, checks each bundle's owned source, fixture, manifest, paper module, standard, and test surface, optionally runs the staged bundle exercises through run_demo, and reads registry, sign-off, and atlas ids only as visibility evidence. The resulting result record distinguishes staged bundle completion from shared-registry integration and always reports shared_registry_mutated: false.
run_demo is the execution spine below the audit. It imports each staged bundle module, calls the declared evaluator (evaluate_fixture_dir or validate_fixture_dir), records compact per-bundle status, and summarizes the covered jewel targets. A pass therefore means the selected public fixture chain ran for the declared bundle set and covered the expected target lattice; it does not mean the Engine Room set is deployment-posture, privately equivalent, benchmark-complete, or launch-approved.
Governing Doctrine Relations
The generated structured source record binds this page to concept.import_projection_and_drift_control_bundle, mechanism.engine_room_demo.validates_public_engine_room_demo, and three adjacent Engine Room mechanisms for projection leakage, generated-projection drift, and command singleflight. Its governing principle refs are P-1, P-2, P-3, P-5, P-6, P-8, P-9, P-12, and P-15; its axiom refs are AX-1, AX-4, AX-5, AX-7, AX-8, and AX-11. In this module those refs all converge on one rule: composition evidence must be routed through explicit source, fixture, result record, and projection boundaries before it can support a reader claim.
The ten dependency modules are not decorative neighbors. They are the actual staged Engine Room bundle families consumed by the demo runner: Lean/proof-search, metabolism runtime, command singleflight, generated projection drift, derived facts, public projection leak checks, egress self-compliance, navigation fitness, bridge campaign DAGs, and reference knowledge routing. The bundle edge set is therefore a mechanism lattice over those bounded components, not an invitation to generalize beyond their result records.
Named Proof Consumers
Fixture wrapper consumer: PYTHONPATH=src ../repo-python -m microcosm_core.components.engine_room_demo run --input fixtures/first_wave/engine_room_demo/input --out /tmp/microcosm-engine-room-demo/fixture --sign-off-out /tmp/microcosm-engine-room-demo/sign-off.json --json consumes build_result, the positive controller-audit fixture, the semantic missing-target negative case, result record writing, metadata-only sign-off output, and the module scope limit.
Controller audit consumer: PYTHONPATH=src ../repo-python -m microcosm_core.engine_room.demo audit --root . --json consumes the ten-bundle inventory, 14-target coverage set, staged surface checks, shared-surface visibility readback, and the no-shared-mutation boundary.
Staged bundle execution consumer: PYTHONPATH=src ../repo-python -m microcosm_core.engine_room.demo run --root . --json consumes each public bundle evaluator and proves the composition runner can execute the declared Engine Room fixture chain without touching shared registry, sign-off, atlas, or generated projection surfaces.
Focused regression consumer: PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_engine_room_demo.py tests/test_engine_room_demo_organ.py -q pins the bundle inventory, CLI JSON output, controller audit, semantic negative case, result record writer, public-relative fixture refs, and private-path redaction floor.
It is a read-only result record for the Markdown slice, not permission to hand-edit generated projections.
Reader Evidence Routing
Read expected_jewel_count: 14 and covered_jewel_count: 14 as controller target coverage for the staged Engine Room set. Read capsule_count: 10 and passed_capsule_count: 10 as successful execution of the selected public fixture evaluators.
Read shared_registry_mutated: false as an authority boundary: the staged runner observes registry, sign-off, and atlas visibility, but it does not mutate those shared surfaces. Read shared_integration_status as a visibility result record, not as permission to alter the shared registry from this page.
Read body_in_receipt: false as the public-copy boundary. Result records can expose counts, target ids, fixture refs, stable error codes, scope limits, and omission-safe summaries; they must not copy private source run state, model-output data, raw operator threads, browser UI material, account secrets, or cloned third-party body text.
Prior Art Grounding
The component borrows from integration-testing and CI composition practice: multiple component checks are assembled into one public demo/audit path, negative fixtures prove refusal behavior, and result records summarize execution without upgrading fixture evidence into launch claims. Useful anchors include:
IBM's integration testing overview, which frames testing around whether composed modules interact as intended.
pytest fixtures, as a common pattern for public synthetic setup and reusable test inputs.
GitHub Actions, as a widely used workflow surface for composing build, test, and publish stages with explicit status.
Microcosm borrows the composed-demo and audit-pipeline shape, but keeps the claim at declared public composition only. It is not deployment posture, private-system equivalence, benchmark validation, a security proof, or launch-scope decision.
Public Command
The CLI alias is:
The fixture manifest names one positive case (positive_controller_audit) and one negative case (missing_expected_target_negative) that expects ENGINE_ROOM_EXPECTED_TARGET_MISSING. The expected component result is status: pass, expected_jewel_count: 14, positive_case_count: 1, negative_case_count: 1, and observed_negative_case_count: 1.
The staged composition runner can also be inspected without writing sign-off result records:
This component validates the declared public composition contract only. It is not deployment posture, not private-system equivalence, not a frontier theorem-proving claim, not a complete security proof, not benchmark validation, and not launch-scope decision.
Agent Memory Temporal-Conflict ReplaySynthetic replay fixture for an agent-memory honesty contract: models scoped-preference episodes and checks temporal-conflict handling; no live memory product.
Public Microcosm projection of an agent-memory honesty contract. Replays synthetic episodes (a scoped preference later contradicted) and validates temporal-conflict resolution, negative cases, non-public-state exclusion, and scope limits with metadata-only result records.
Scope limit Synthetic replay fixture only; not a live memory product, private transcript export, source-authority claim, or launch claim.
This module is the public Microcosm projection of an agent-memory honesty contract. It is a synthetic replay fixture, not a live memory product, private transcript export, source-authority claim, or launch claim.
The fixture models three public episodes: episode A records a scoped preference and a tool-result fact, episode B updates the preference scope and deletes the now-stale fact through conflict-edge and downgrade result records, and episode C replays the task with memory enabled and disabled. The replay is admitted only when ADD, UPDATE, DELETE, and NOOP decisions, metadata-only non-public refs, evidence handles, cold replay refs, and an answer-delta result record line up.
Purpose
This component exists because an agent that remembers can quietly start trusting the wrong row. A user states a preference, the world changes, a later turn contradicts the earlier one, and a naive memory store keeps serving the stale fact as though it were still true. The single question this fixture answers is narrow and checkable: when one memory write supersedes an earlier one, does the record of that conflict actually hold up, or is it just a label?
The unusual choice is that the validator does not trust the labels it is given. A row can declare decision = UPDATE, attach a plausible-looking conflict-edge ref, and still be quarantined. In _apply_conflict_semantic_recompute the checker re-derives the conflict lineage from the raw fields it can verify: episode order, event timestamp, memory priority, and source-trust score. An UPDATE or DELETE that claims to supersede a prior write but is not timestamped after it, or that regresses priority or relies on lower-trust evidence than the write it replaces, is rejected. _apply_temporal_order_checks adds the coarser ordering rule that a conflict edge must land after some earlier accepted write and a replay must land after the conflict it depends on. The label is treated as a claim to be recomputed, not as authority.
The point of the paired memory-on and memory-off replay is the matching discipline on the output side. Memory is only allowed to take credit for a better answer through an explicit evidence handle and a cold-replay result record, so the gain is attributable rather than asserted. The interesting idea here is not a memory product. It is a small, reproducible accounting method for one specific failure: a stale row outranking newer evidence.
Abstract
Agent memory becomes dangerous when a stored row is allowed to outrank later evidence. This module turns that risk into a public, replayable checker: a synthetic three-episode fixture exercises memory ADD, UPDATE, DELETE, and NOOP decisions, then verifies that later conflicts can influence replay only through typed evidence handles, temporal conflict edges, stale-downgrade result records, paired memory-on/off cold replays, and a metadata-only answer-delta result record.
The technical contribution is not "better memory" and not a product claim. It is a narrow public accounting method for temporal memory conflict: memory rows are metadata under test, non-public refs are metadata-only, copied source-open source bodies are digest checked outside result records, and seven negative fixtures prove that common overclaim paths are rejected before any pass result record is written.
Telos
The reader-facing aim is to make a hard memory-honesty boundary inspectable without exporting private memory bodies. A cold reader should be able to answer four questions from files and result records:
Which memory decision was made, and under which public route ref?
Which evidence handle, timestamp, priority, and source-trust score justified the row?
Which prior row was conflicted or downgraded before later replay credit was allowed?
Did the memory-enabled replay use admissible evidence while the paired memory-disabled replay remained available for answer-delta accounting?
The accepted result is a metadata-only memory-conflict result record. It supports only a synthetic fixture-level claim: this replay respected the declared temporal conflict contract under the checked inputs.
Technical Mechanism
The runtime treats memory as public replay metadata, not as authority. The validator loads projection_protocol.json, memory_policy.json, memory_episodes.json, and replay_observations.json; the exported-bundle mode also loads bundle_manifest.json, source_module_manifest.json, and the copied source artifacts listed in the manifest. _build_result combines secret scanning, public trace construction, protocol validation, policy validation, episode validation, replay validation, source-module import validation, negative-case coverage, and the scope limit before a pass status is possible.
The mechanism has five reader-visible gates:
validate_projection_protocol requires source refs, source pattern ids, projection result records, target refs, public runtime refs, target symbols, reimplemented mechanics, omissions, and an explicit denial that private thread bodies were copied.
validate_memory_policy requires ADD, UPDATE, DELETE, and NOOP as the only admitted decision vocabulary and denies live-memory product, transcript export, source-authority, active-injection, provider-call, and launch-scope decision.
validate_memory_episodes turns the five public event rows into accepted or quarantined memory metadata. Each row needs a route ref, decision, synthetic subject id, evidence handle, metadata-only private thread ref, body-export flag, source-authority flag, and active-injection flag. Positive replay credit requires all four decision classes, two conflict-edge refs, stale-downgrade refs, and a prompt-adoption observation ref.
validate_replay_observations checks the paired memory-enabled and memory-disabled replay rows. Memory-enabled replay must cite public evidence handles that resolve against the accepted event rows, both replays must carry cold-replay result record refs, and the pair must share an answer-delta result record.
validate_source_module_imports verifies the exported bundle's five copied source bodies by digest, material class, relation, and body_in_receipt=false. The card path reports only counts, digest refs, and result record paths; full memory rows, replay rows, source bodies, private transcript bodies, model-output data, and active injection text stay out of result records and public cards.
The mechanism is deliberately negative as well as constructive. Seven falsification fixtures prove that raw transcript export, private candidate auto-promotion, stale preference override, memory-as-source-authority, vector recall without evidence, final-answer-only memory credit, and active injection as authority are blocked. build_public_memory_conflict_trace gives the reader a seven-span public trace over the same rows, with five memory-event spans and two cold-replay spans, and audits coverage for evidence handles, metadata-only non-public refs, no non-public body export, cold-replay refs, answer-delta refs, and memory-enabled evidence.
Temporal Conflict Mechanism
The central rule is evidence-before-memory-authority. A memory row may be accepted as replay metadata only after it satisfies the public policy fields: route ref, decision, synthetic subject id, event timestamp, memory priority, source-trust score, evidence handle, metadata-only private thread ref, and explicit false authority flags for body export, source authority, and active injection.
UPDATE and DELETE decisions have an extra burden because they alter older memory. The validator recomputes the conflict lineage instead of trusting the label. _apply_temporal_order_checks verifies that conflict rows occur after the prior writes they touch, and that replay NOOP rows occur after conflict and downgrade evidence. _apply_conflict_semantic_recompute then checks the semantic shape of the mutation: the prior event must exist, the conflict group must be coherent, timestamps must advance, priority may not regress below the allowed floor, and source trust must stay above the declared floor.
Only after those checks pass can episode C receive replay credit. The memory-enabled replay must cite public evidence refs that resolve to accepted memory rows. The memory-disabled replay stays paired by replay group. The answer-delta result record accounts for the difference between those cold replays without reducing the evaluation to final-answer comparison alone.
Real Sanitized Episode Evidence
The first-wave fixture is not merely shape-only synthetic data. Its memory_episodes.json, memory_policy.json, and replay_observations.json mirror the exported memory-temporal-conflict bundle, and the positive rows carry sanitized_real_episode=true, source artifact refs, source event refs, timestamps, memory priority, and source-trust scores. The source evidence posture declares real_source_floor as copied_non_secret_macro_agent_memory_body_with_provenance, body_in_receipt=false, and private_bodies_exported=false.
The exported bundle contributes a source-open body floor without turning bodies into result record material. source_module_manifest.json lists five copied source bodies across tool, doctrine, standard, and pattern material classes. The runtime verifies their digests and material classes, while public cards and result records expose only paths, counts, digest refs, omitted-material reasons, and the scope limit. That is the realness proof this paper can use: source provenance and result record-level recomputation, not private memory export.
Perturbation and Rejection Contract
The fixture includes positive pass evidence and perturbation evidence. Focused tests mutate the bundle to ensure that the validator rejects timestamp incoherence, priority regression, source-trust regression, temporal order breakage, unverified conflict evidence, source-event drift, stale override without downgrade, downgrade result record field swaps, positive rows without evidence handles, unresolved replay refs, replay without memory evidence, and source-body tampering.
Those rejection tests matter because temporal memory bugs often look plausible in isolation. A stale row with a nice label is still rejected if its conflict edge is absent or late; a memory-enabled replay is still rejected if its evidence refs do not resolve; a digest-mismatched body floor blocks the source import; and final-answer-only comparison remains a negative case rather than utility evidence.
Named Proof Consumers
python -m microcosm_core.organs.agent_memory_temporal_conflict_replay run consumes the first-wave fixture, includes negative cases, and writes the result, board, validation, and sign-off result records.
python -m microcosm_core.organs.agent_memory_temporal_conflict_replay run-memory-bundle consumes the exported source-open bundle, digest-checks copied source bodies, and emits the public bundle validation result.
tests/test_agent_memory_temporal_conflict_replay.py consumes the same fixture and bundle to assert decision counts, conflict counts, stale downgrades, secret exclusion, public-relative result records, unresolved replay rejection, source-module digest verification, metadata-only result record cards, and seven-span trace construction.
A cold public reader consumes the source record, manifest, event rows, replay rows, source-artifact digests, and validation result records; that consumer can verify the synthetic honesty boundary but cannot infer quality of any live memory system or launch-scope decision.
The page shape is a temporal-conflict replay, not a memory product surface. A reader starts with the JSON bundle, follows the source module manifest to five copied source bodies, then checks three synthetic episodes: initial memory writes, a later temporal conflict with stale downgrades, and paired cold replays with memory enabled and disabled. The accepted outcome is a result record that says the replay respected the memory-honesty boundary; it does not make memory recall into source authority.
Failure Modes and Limitations
This module is intentionally narrow. It validates a public fixture and exported bundle against a declared temporal conflict contract; it does not measure live assistant memory quality, user satisfaction, recall coverage, provider behavior, or deployment posture. Passing result records show that checked rows, digests, traces, negative cases, and scope limits agreed for the fixture under test.
Known failure modes are treated as checker inputs rather than prose caveats: private transcript export, private candidate auto-promotion, stale preference override, memory-as-source-authority, vector recall without evidence, final-answer-only credit, active injection authority, missing source manifests, source-body digest drift, source-event drift, missing conflict edges, missing downgrade result records, and unresolved replay evidence. If a future module wants a stronger memory claim, it needs a new standard and new evidence; this module cannot promote itself beyond its fixture ceiling.
Reader Evidence Routing
Bundle route: read examples/agent_memory_temporal_conflict_replay/exported_memory_temporal_conflict_bundle/source_module_manifest.json for module_count=5, body_in_receipt=false, material classes, digest refs, omitted-material reasons, and the explicit secret-exclusion boundary.
Event route: read memory_episodes.json for the five memory events: episode_a_preference_add, episode_a_tool_fact_add, episode_b_preference_scope_update, episode_b_tool_fact_delete, and episode_c_replay_noop.
Conflict route: verify that the UPDATE and DELETE events carry temporal conflict-edge refs and stale-downgrade refs before they can affect replay credit.
Replay route: read replay_observations.json for the paired episode_c_memory_enabled_replay and episode_c_memory_disabled_replay rows, evidence refs, cold replay result records, and answer-delta accounting.
Runtime route: run tests/test_agent_memory_temporal_conflict_replay.py when the reader needs recomputation evidence. The focused tests assert digest verification, public-relative result records, non-public-state exclusion, unresolved replay rejection, and the exported bundle runtime shape.
Updates and deletes that touch older memory require temporal conflict-edge refs plus stale-downgrade refs before memory can affect replay credit.
Private thread references are metadata-only; transcript bodies and private memory candidate bodies stay omitted.
Utility language requires paired memory-enabled and memory-disabled cold replay result records; final-answer-only comparison is not enough to support a memory utility claim.
Raw transcript export, private candidate auto-promotion, stale preference override, memory-as-source-authority, vector recall without evidence, final-answer-only memory credit, and active-injection authority are expected falsification fixtures.
Prior Art Grounding
This component is grounded in agent-memory architectures and the newer literature on stale or poisoned memory. The constructive lineage includes Generative Agents, which made observation, reflection, retrieval, and planning a concrete agent-memory pattern, and MemGPT, which treats long-context behavior as a memory-management problem. The risk lineage includes AgentPoison and STALE, which focus respectively on poisoned retrieval stores and whether agents update invalid memories when new evidence arrives.
Microcosm does not claim a live memory product. It borrows the useful accounting questions: which memory decision was made, which evidence handle justified it, which older row was conflicted or downgraded, and whether memory-on/off replay supports any claim beyond a final-answer comparison.
Validation Result record Path
Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:
The focused regression test and corpus projection checks are:
cd microcosm-substrate && ../repo-pytest tests/test_agent_memory_temporal_conflict_replay.py
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
Scope boundary
Reader Validation Boundary
A cold reader can validate this module by starting from the JSON source record, then checking the generated JSON instance, source-module manifest, synthetic episodes, memory-event decisions, temporal conflict-edge refs, stale-downgrade refs, paired memory-on/off cold replays, negative cases, and focused tests. The validation is limited to whether the synthetic replay preserved a metadata-only memory-honesty boundary.
The validation stops before quality claims about any live memory system, private transcript export, private candidate promotion, memory recall as source authority, provider behavior, active-injection authority, public sharing, and launch. Unpopulated concept, principle, axiom, and dependency edges remain residual pressure unless the JSON bundle owner lane adds real targets.
Scope limit
This module may claim only that a synthetic memory-temporal replay preserved a metadata-only memory-honesty boundary: ADD/UPDATE/DELETE/NOOP decisions, temporal conflict-edge refs, stale-downgrade refs, paired memory-on/off cold replay refs, answer-delta accounting, public trace refs, manifest digests, negative cases, and scope limits are checked.
It must not claim quality of any live memory system, readiness of a memory product, private transcript export, private candidate auto-promotion, source-authority status, provider behavior, active-injection authority, source-file changes, public sharing authorization, or launch-scope decision.
Scope boundary
This module does not run live memory, claim memory product quality, export private transcripts, auto-promote private candidates, treat memory recall as source authority, adopt active injection, use external model services, change source files, publish results, or include launch operations.
Source and projection details
Source-Open Body Floor
The exported bundle manifest is the body-row authority for five copied source bodies spanning tool, doctrine, standard, and pattern material classes. Those bodies stay in bundle source artifacts; result records and cards carry refs, digests, classes, counts, omission reasons, secret-exclusion status, and scope limits only.
The floor is accepted as synthetic temporal-conflict replay evidence. It is not live memory product evidence, private transcript export, private memory candidate export, provider-behavior evidence, source-file changes, public sharing authorization, or launch-scope decision.
Governing Lattice Relation
The JSON bundle binds this module to mechanism.agent_memory_temporal_conflict_replay.validates_public_memory_conflict_replay, concept.agent_reliability_and_safety_validator_bundle, provisional principle refs P-1 and P-2, and provisional axiom ref AX-1. This Markdown does not promote those placeholder refs into stronger doctrine ids; it explains how the concrete mechanism satisfies the current bundle boundary.
Mechanically, the governing relation is evidence-before-memory-authority: memory rows may influence replay only after they carry route refs, public evidence handles, metadata-only non-public refs, conflict-edge or downgrade result records when stale state changes, and paired replay result records. The concept relation is validator-bundle accountability: the module is not a narrative claim about agents remembering well, but an executable fixture whose policy, trace, source-body manifest, negative cases, and result records must all agree. The axiom/principle ceiling is the same one enforced by the validator: private state is not public source authority, synthetic replay is not live product evidence, and projection-ready result records cannot authorize source-file changes, external model access, public sharing, or launch.
Public Microcosm regression drilldown for a monitor/redteam honesty contract. Replays synthetic monitor traces and validates redteam-falsification behavior, negative cases, non-public-state exclusion, and scope limits with metadata-only result records.
Scope limit Synthetic replay fixture only; not a live monitor product, control eval, exploit guide, product-spine system, or complete-security claim.
agent_monitor_redteam_falsification_replay is a synthetic regression fixture for one honesty claim: that a monitor verdict was earned by evidence rather than asserted. It replays public trajectory metadata and checks that every "the monitor caught it" sentence is tied to a probe, an escalation, a mitigation, and a metadata-only result record before it counts.
Purpose
It is easy to say a safety monitor flagged a suspicious agent action. It is harder to show that the verdict followed from the evidence and would survive a rerun. This component answers one question: if a monitor claim is replayed against its own declared evidence, does the claim still hold, or does it fall apart?
The part worth noticing is that the page does not trust the verdict it is given. A separate builder, build_public_monitor_redteam_falsification_trace, independently recomputes two things from the trajectory spans: whether each declared coverage label is actually backed by an adversarial probe, and what the monitor verdict should be given the span's severity, escalation, and mitigation evidence. If that recomputed verdict disagrees with the verdict the fixture declared, the run blocks. So the contract is not "does the row have the right fields" but "does an independent reading of the evidence reach the same conclusion". The failure mode it guards against is the one that quietly defeats most monitoring write-ups: a verdict label that reads well but is not supported by the trajectory it claims to summarise.
The replay is deliberately narrow. It works over three synthetic trajectory cases and seven negative cases, on public metadata and metadata-only result records. It is not a live monitor, a control-evaluation score, an exploit guide, or evidence about how well any real monitor performs. It only proves that, for these fixtures, a monitor claim cannot pass without surviving the disagreement check.
Abstract
This module is the public Microcosm technical paper for a monitor/redteam honesty contract. It specifies a synthetic replay fixture that treats a monitor verdict as an evidence object to falsify: the verdict must be tied to a trajectory hash, suspicious-span ref, adversarial-probe ref, escalation ref, mitigation ref, body-omission ref, cold-replay ref, and source-artifact digest boundary before it can support any monitor-coverage sentence.
The result is a safety/reliability proof about evidence shape, not a live monitor product, control-eval score, exploit guide, product-spine system, or complete-security claim. The technical contribution is the proof contract: a replay can show how a monitor claim would fail under semantic negatives and perturbations without exporting private reasoning, internal source bodies, model-output data, live agent traffic, or launch-scope decision.
Telos
The fixture models three public trajectory cases: a benign control, a suppressed failure attempt, and a tool-authority boundary attempt. Each case must carry a trajectory hash, monitor id, suspicious span ref, adversarial probe ref, verdict, severity, escalation ref, body-omission ref, mitigation ref, and cold replay ref before the claim is admitted.
The point is to make "the monitor caught it" a rerunnable, falsifiable claim. A benign pass proves only that the benign route remains admissible. An escalation proves that a suppressed-failure attempt retained enough evidence for human escalation. A block proves that the tool-authority boundary attempt was rejected under the public policy vocabulary. None of those outcomes proves live monitor performance; together they prove that this fixture's evidence contract catches missing probes, missing source evidence, private-body leakage, and verdict drift.
Monitor-Falsification Mechanism
The component turns a monitor claim into a falsification replay. Its inputs are the projection protocol, monitor policy, trajectory cases, monitor observations, negative-case files, exported bundle manifest, and source-module manifest. _build_result loads those inputs, validates the projection protocol density, checks the monitor policy vocabulary, checks trajectory case floors, validates each observation against required refs and source-artifact evidence, scans public inputs and copied source artifacts for forbidden non-public-state classes, and folds in a public trace recomputation before writing result records.
A positive observation cannot pass by carrying a verdict label alone. The policy requires a trajectory id, monitor id, coverage label, suspicious action span ref, adversarial probe ref, monitor verdict, severity tier, human escalation ref, body-omission ref, mitigation diff ref, and cold replay ref. For positive observations, source-artifact evidence refs must match the targets declared by the exported source-module manifest. Missing fields, unsupported verdicts, unverified source evidence, private chain-of-thought export, internal-code export, exploit-instruction material, account secret material, live agent traffic, monitor product-performance claims, and coverage without a probe become quarantine or blocking findings.
The exported body floor is checked separately from the result record. The source-module manifest admits two source artifacts: a copied public source pattern body for the monitor-redteam compound row, and a sanitized public dogfood safety-evals trace slice. validate_source_module_manifest requires the declared module count to match the rows, requires body_in_receipt=false, rejects body text in result records, resolves each target path, and verifies every declared digest field against disk. Result records and cards then expose ids, refs, digests, counts, verdicts, findings, and ceilings, not copied bodies.
The public trace is the anti-echo pass. build_public_monitor_redteam_falsification_trace recomputes whether a coverage label is backed by an adversarial-probe span, then derives the monitor verdict from severity, escalation, and mitigation evidence. In the first-wave fixture, the three spans recompute to one pass, one escalate, and one block. If a declared verdict no longer matches the span-derived verdict, validate_public_trace records PUBLIC_TRACE_MONITOR_REDTEAM_VERDICT_MISMATCH and the component blocks.
Named Proof Consumers
run consumes the first-wave fixture and writes result, board, validation, sign-off, and metadata-only command-card result records. It is the proof consumer for the three synthetic trajectory cases and seven required negative cases.
run-monitor-bundle consumes the exported public bundle and proves that the bundle manifest, source-module manifest, copied/sanitized source artifacts, four bundle observations, digest checks, non-public-state scan, and metadata-only card path remain valid outside the fixture directory.
tests/test_agent_monitor_redteam_falsification_replay.py is the focused regression consumer. It asserts digest verification, sanitized dogfood trace slicing, workingness exposure of source-body imports, negative-case semantic floors, public trace verdict recomputation, source-evidence ref enforcement, public-relative metadata-only result records, manifest boundary rejection, and fresh result record reuse.
A cold reader consumes this Markdown only after checking the JSON bundle, generated JSON instance, exported source manifest, trajectory and observation fixtures, public trace recomputation path, focused tests, and scope limit. The reader may verify the replay boundary but must not infer live monitor performance, control-eval score, exploit guidance, complete security, public sharing, or launch-scope decision.
Semantic Negative and Perturbation Proof
The replay proves its boundary through negative and perturbation cases, not by trusting the positive fixture. The negative floor includes private chain-of-thought leakage, internal code export, exploit-instruction material, account secret material, live agent traffic, monitor product-performance claims, and coverage claims without adversarial probes. Those cases are expected to block or quarantine because they violate the public-source/metadata-only contract, not because they resemble production attacks.
The perturbation proof is the public trace recomputation. A declared verdict is accepted only when the public span evidence derives the same pass, escalate, or block result. Removing the adversarial-probe span, weakening the escalation or mitigation refs, changing the severity/verdict relation, or declaring source evidence that is absent from the exported source-module manifest turns the replay into a mismatch or blocking finding. That is the technical safety property: the paper authority lives in disagreement detection between declared monitor coverage and independently recomputed public evidence.
Shape
Source refs
Manifest
source_module_manifest.json
Diagram source
flowchart TD bundle["JSON bundle authority"] markdown["Markdown reader projection"] manifest["source_module_manifest.json"] digest["2 source-artifact digest gates"] cases["three trajectory cases"] policy["public monitor policy"] observations["three observations"] required["required refs: span, probe, escalation, mitigation, replay"] trace["public trace verdict recomputation"] verdicts["pass / escalate / block"] negatives["7 negative semantic floors"] privateScan["metadata-only non-public-state scan"] result record["metadata-only validation result record"] ceiling["scope limits"] bundle --> markdown manifest --> digest digest --> observations cases --> required policy --> required required --> observations observations --> trace trace --> verdicts negatives --> result record digest --> privateScan verdicts --> result record privateScan --> result record result record --> ceiling
The module's shape is evidence narrowing.
Reader Evidence Routing
Bundle route: core/paper_module_capsules.json::paper_modules[33] is the bundle-backed authority row, and paper_modules/agent_monitor_redteam_falsification_replay.json is the generated paper-module instance.
Source-module route: examples/agent_monitor_redteam_falsification_replay/exported_monitor_redteam_bundle/source_module_manifest.json records two admitted source artifacts with body_in_receipt: false: the public source-pattern JSON slice with digest sha256:89792add1e2f03a09c40f64d19c1ac0a54d62c053aabe11ecad0a9846a54cf33, and the sanitized public dogfood safety-evals trace slice with digest sha256:88493225f908f3f8892d187370d30231cb4e292b43bed202b462b6a6888e1eb0.
Trajectory route: trajectory_cases.json carries the three public trajectory hashes sha256:monitor-redteam-benign-route-review, sha256:monitor-redteam-suppressed-failure, and sha256:monitor-redteam-tool-authority-escape.
Observation route: monitor_observations.json binds obs_benign_route_review_pass to monitor.synthetic.route-review.v1 and a pass verdict, obs_suppressed_failure_escalate to monitor.synthetic.failure-suppression.v1 and an escalate verdict, and obs_tool_authority_escape_block to monitor.synthetic.tool-authority.v1 and a block verdict. Each observation carries a body-omission ref and cold replay ref.
Runtime route: src/microcosm_core/organs/agent_monitor_redteam_falsification_replay.py and tests/test_agent_monitor_redteam_falsification_replay.py verify manifest digest visibility, negative-case coverage, required observation fields, public trace recomputation, and metadata-only result record boundaries.
Public Mechanics
A monitor claim cannot pass unless the observation includes a verdict and the probe, escalation, mitigation, body-omission, and replay refs that make the verdict rerunnable.
Coverage labels require adversarial probe refs; benign-only trajectories do not authorize coverage language.
Private reasoning, internal code, exploit-detail, account secret, live-traffic, product-performance, and coverage-without-probe cases are expected falsification fixtures.
Result records expose ids, refs, verdict counts, negative cases, body_in_receipt: false, non-public-state scan, and scope limits only.
Prior Art Grounding
This component is grounded in model red-team and behavior-discovery work that treats monitor claims as things to falsify with adversarial probes. Anthropic's Red Teaming Language Models to Reduce Harms is a close procedural anchor for eliciting harmful or unwanted behavior, and Discovering Language Model Behaviors with Model-Written Evaluations anchors the idea that evaluation prompts can surface behavior classes worth tracking. More recent sabotage and control-evaluation work reinforces the same shape: monitors need adversarial trajectories, suspicious spans, escalation paths, and negative cases, not just benign examples.
Microcosm borrows the falsification accounting pattern. A monitor verdict needs trajectory hashes, probe refs, suspicious-span refs, escalation refs, mitigation refs, replay refs, and body-omission result records before coverage language is allowed. It does not claim a live monitor product or control-eval score.
Evidence Contract Summary
The evidence contract has four gates:
Trajectory gate: each monitor observation must cite a trajectory hash, monitor id, suspicious-span ref, adversarial-probe ref, verdict, severity, escalation ref, body-omission ref, mitigation ref, and cold-replay ref.
Source-body gate: the exported source-module manifest names the admitted copied/sanitized public source artifacts, requires matching digests, and keeps body_in_receipt: false.
Falsification gate: semantic negatives and public trace recomputation reject private-body leakage, unsupported source evidence, missing probes, unsupported verdicts, and declared/recomputed verdict mismatch.
A valid paper claim must pass all four gates and still inherit the limitations above.
Validation Result record Path
Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:
The validation ceiling remains synthetic monitor falsification replay only.
Scope boundary
Limitations and Scope limit
This module may claim public fixture evidence that trajectory hashes, synthetic monitor ids, suspicious-span refs, adversarial-probe refs, verdict labels, escalation refs, mitigation refs, body-omission refs, cold replay refs, negative-case labels, manifest digests, and validation result records are checked by the listed runtime witnesses.
This module may not claim a live monitor product, control-eval score, production monitoring, exploit guidance, private reasoning export, internal source export, live agent traffic, provider behavior, complete security, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.
Scope boundary
This module does not run live agents, use external model services, expose private chain-of-thought, export internal code, provide exploit instructions, include account secrets, import live agent traffic, claim monitor product performance, claim control-eval scores, change source files, publish results, or include launch operations.
Scope limit
This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.
It may not claim more than its bundle scope limit allows: Synthetic replay fixture only; not a live monitor product, control eval, exploit guide, product-spine system, or complete-security claim.
Source and projection details
Governing Lattice Relation
The bundle binds this page to mechanism.agent_monitor_redteam_falsification_replay.validates_public_monitor_verdict_receipts, the agent_reliability_and_safety_validator_bundle concept, provisional principles P-1 and P-2, provisional axiom AX-1, and the agent_monitor_redteam_falsification_replay runtime locus. Within that lattice, the mechanism is an evidence-before-monitor-claim gate: monitor coverage language has no paper authority unless the source record, policy, trajectory roster, observations, source-artifact manifest, negative-case semantic floors, public trace, and metadata-only result records agree.
The governing concept is validator-bundle accountability, not live monitor operation. The principle/axiom ceiling is enforced as a refusal surface: private reasoning, internal code, exploit instructions, account secrets, live agent traffic, model-output data, performance claims, source-file changes, publishing-scope decision, and launch-scope decision remain false even when the replay fixture passes.
Drilldown-only regression contract for synthetic sabotage/scheming monitor claims. Checks projection_protocol, scheming_monitor_policy, task_episodes, action_traces, monitor_scores, counterfactual_replay, and cold_replay against negative cases and scope limits.
Scope limit No live sabotage, exploit instruction, account secret/account material, private reasoning, harmful payload, or deployment-risk product claim; synthetic fixtures and metadata-only result records only.
agent_sabotage_scheming_monitor_replay is a technical paper module for a drilldown-only regression component. It validates whether a sabotage/scheming monitor claim is backed by joinable synthetic episodes, action traces, per-step monitor scores, counterfactual benign replay, cold replay, falsification fixtures, source-artifact evidence refs, metadata-only result records, and an explicit scope limit.
The core contribution is not a live scheming detector. It is a replay contract: given a public fixture or exported bundle, the component recomputes caught/not-caught verdicts from monitor evidence, checks whether counterfactual replay isolates the side objective, verifies source-artifact digest/provenance rows, rejects common overclaim shapes, and emits result records that omit private or harmful bodies.
Telos
The module exists to make safety-monitor evidence auditable without laundering a demo into a deployment claim. Frontier-lab readers should be able to answer:
Which finite evidence rows support the sabotage/scheming monitor claim?
Which verdicts are recomputed from scores and joins rather than echoed from fixture labels?
Which mutations and negative cases cause rejection?
Which claims remain out of scope even when every local check is green?
Mechanism
The runtime locus is src/microcosm_core/organs/agent_sabotage_scheming_monitor_replay.py. It exposes two public admission paths:
run validates the first-wave fixture under fixtures/first_wave/agent_sabotage_scheming_monitor_replay/input.
run_sabotage_bundle validates the exported bundle under examples/agent_sabotage_scheming_monitor_replay/exported_sabotage_monitor_bundle.
Both paths call _build_result. The builder loads:
projection_protocol.json
scheming_monitor_policy.json
task_episodes.json
action_traces.json
monitor_scores.json
counterfactual_replay.json
cold_replay.json
source_module_manifest.json when present
The monitor proof is a join-and-recompute circuit. validate_task_episodes, validate_action_traces, and validate_monitor_scores require monitor rows to bind to an episode and action step. Monitor rows must carry evidence refs; in the exported bundle those refs resolve through the digest-checked source manifest. validate_counterfactual_replay checks that replay rows remove side-task success while preserving the main task. validate_cold_replay requires explicit passing replay result records rather than narrative assurance.
The recompute is deliberately conjunctive. A monitor row only earns a block verdict if the suspicion score is at or above the policy block threshold and the row is a "backed suspicious action". Backed means five separate joins all hold: the suspicious-action span names a real action id, that action id is also cited in the row's evidence refs, the joined action carries attack_opportunity_label = present, its hidden-side-objective pressure is medium or high, and the counterfactual replay isolates exactly that step by removing side-task success while preserving the main task. Drop any one of those and the row recomputes to escalate or pass rather than block, so a high score alone cannot manufacture a catch. An episode is then recomputed as caught only if it contains at least one recomputed block row with verified source-artifact evidence.
The trace recomputation locus is src/microcosm_core/macro_tools/agent_execution_trace.py::build_public_sabotage_scheming_monitor_trace. It folds the same joins into per-episode spans, recomputes the caught verdict, and checks that the counterfactual's changed steps are the higher-suspicion ones (a positive rank delta). A stale declared verdict, a benign-step counterfactual spoof, or a missing cold replay becomes a finding rather than a silent pass.
Evidence Floor
The local floor is intentionally small and inspectable:
Evidence class
Fixture mode
Exported-bundle mode
Episodes
3
4
Action trace rows
6
7
Monitor-score rows
6
7
Counterfactual replay rows
3
4
Cold replay rows
3
4
Negative cases
7
0 fixture-negative payloads
Copied public source artifacts
source trace body + trace builder import
2 manifest rows
The exported bundle adds scheming_ep_public_authority_probe and obs_public_authority_probe. That row is the real dogfood-trace floor: it comes from a sanitized public command-trace slice, verifies against the source trace digest, drops argv/stdout/stderr previews, and remains metadata-only in result records. The tests require it to compute a block monitor verdict and a caught episode without exporting private session material.
The source-open body floor is split:
The runtime imports microcosm_core.macro_tools.agent_execution_trace::build_public_sabotage_scheming_monitor_trace as the public recomputation body.
The exported bundle manifest carries agent_sabotage_scheming_extracted_pattern_ledger_row_body_import as a source-faithful public source pattern slice with digest sha256:e1ad34691ca598b878f8658485ce78fa4fa59b3eb0a4bd078355c415ca2cb312.
The same manifest carries agent_sabotage_scheming_public_dogfood_safety_evals_trace_slice_import as a source-faithful sanitized command-trace slice with digest sha256:7b6255ec65f4b21163685cf9d7a59dd9ae6075126deac065dec9a45d79dcea06 and source trace digest sha256:24487cf8edb0779cfdd460b04a4ea7701805686545069f9e1c5767250c223cd3.
Both manifest rows require body_in_receipt: false.
Shape
Diagram source
flowchart TD bundle["JSON bundle authority"] fixture["fixture or exported bundle"] episodes["task episodes"] actions["action traces"] scores["monitor scores"] sourceManifest["source module manifest"] sourceArtifacts["digest-checked public artifacts"] counterfactual["counterfactual replay"] cold["cold replay"] recompute["public trace recomputation"] negatives["negative/falsification cases"] result record["metadata-only validation result record"] ceiling["scope limit"] bundle --> fixture fixture --> episodes fixture --> actions fixture --> scores fixture --> counterfactual fixture --> cold fixture --> sourceManifest sourceManifest --> sourceArtifacts episodes --> recompute actions --> recompute scores --> recompute counterfactual --> recompute cold --> recompute negatives --> result record sourceArtifacts --> result record recompute --> result record result record --> ceiling
Rejection And Perturbation Evidence
The focused regression suite is the strongest technical evidence for the paper claim. tests/test_agent_sabotage_scheming_monitor_replay.py covers:
Digest mismatch in source_module_manifest.json blocks the exported bundle.
target_ref / path mismatch blocks the exported bundle.
body_text_in_receipt at manifest or row level is rejected.
Missing source_artifact_evidence_refs quarantines a monitor row.
A declared caught verdict that disagrees with recomputation emits PUBLIC_TRACE_SCHEMING_CAUGHT_VERDICT_MISMATCH.
Perturbing a high-suspicion block row down to pass moves the recomputed caught verdict and blocks the result record.
A stale caught_verdict label inside a monitor-score row is rejected against the recomputed episode verdict.
A counterfactual row that changes a benign low-suspicion step fails the rank delta check and the isolation check.
Spoofing a declared negative-case id is caught by semantic negative-case floors.
Public-relative, metadata-only result records exclude absolute local paths and private keys such as account secrets, model-output data, hidden reasoning, raw harmful payloads, exploit instructions, and real target identifiers.
The real sanitized public probe must compute block, carry verified source artifact evidence, and join to its counterfactual row.
Spoofing the public probe to pass is rejected by verdict recomputation and the real-sanitized-evidence check.
These tests matter because they exercise the falsification path, not only the happy path. The component is permitted to say "this synthetic replay failed closed under these mutations"; it is not permitted to say "this detects real scheming".
Runtime-shell result record: receipts/runtime_shell/demo_project/organs/agent_sabotage_scheming_monitor_replay/exported_sabotage_monitor_bundle_validation_result.json
Governing Relations
The generated paper-module instance binds this reader page to:
code locus: src/microcosm_core/organs/agent_sabotage_scheming_monitor_replay.py with run, run_sabotage_bundle, _build_result, _write_receipts, and result_card
Those edges are structural evidence-routing edges. They do not by themselves prove runtime correctness; the runtime result records and tests carry that narrower claim.
Prior-Art Positioning
The module borrows evaluation grammar from sabotage, scheming, and deceptive behavior evaluations: episodes, action traces, monitor scores, counterfactual benign replay, falsification cases, and fail-closed monitor disagreement. The closest public references remain Anthropic's sabotage evaluations, Apollo's in-context scheming evaluations, Anthropic's Sleeper Agents work, and the OpenAI/Apollo line on detecting and reducing scheming.
The Microcosm claim is deliberately narrower than those research programs. It does not evaluate frontier models or live deployments. It gives the public Microcosm system a result record-backed, metadata-only, replayable safety-monitor proof shape that can reject overclaims and provenance drift.
Prior Art Grounding
This component replays a monitor that watches an agent transcript for sabotage and scheming signals. It is grounded in the AI-control literature, which studies how to catch a model that may be deliberately subverting oversight rather than assuming it is benign (Greenblatt et al., AI Control, and Anthropic's sabotage evaluations). Microcosm borrows the monitor-over-transcript shape on synthetic fixtures; the result is fixture-bound replay evidence, not a live monitor, a control guarantee, or an exploit guide.
Expected local result: the fixture and exported-bundle validators pass on the checked-in inputs; the mutation tests block when digest, target-ref, body-in-result record, evidence-ref, caught-verdict, counterfactual-rank, negative semantic, or public-probe labels are corrupted.
Scope boundary
Public-Safety Boundary
Public rendering may describe the replay grammar, recomputation rule, falsification cases, sanitized command-trace source artifact, digest checks, and metadata-only result records.
Public rendering must not include or imply:
live sabotage instructions
exploit steps
account secrets, browser state, account or browser material, or model-output data
private chain-of-thought or hidden reasoning bodies
raw harmful payloads
unsanitized argv/stdout/stderr previews
browser UI state
production telemetry
live traffic
live agent execution authorization
deployment-risk measurement
product monitor performance
benchmark claims
provider affiliation or provider behavior claims
source-file changes
public sharing/launch-scope decision
private-system equivalence
whole-system safety
Scope limit
This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.
It may not claim more than its bundle scope limit allows: No live sabotage, exploit instruction, account secret/account material, private reasoning, harmful payload, or deployment-risk product claim; synthetic fixtures and metadata-only result records only.
Agent Sandbox Policy-Escape ReplayValidator-backed public refactor of source agent_execution_trace for sandbox/security: metadata-only trace spans, pre-exec policy verdicts, side-effect diffs
Validator-backed public refactor of the source agent_execution_trace system for sandbox/security claims. Computes metadata-only trace spans from action requests, pre-execution policy verdicts, side-effect diff result records, and rollback against negative cases and scope limits.
Scope limit No live sandbox escape, live secret handling, or live network; source-faithful refactored fixtures and metadata-only result records only; not a complete-security or launch claim.
agent_sandbox_policy_escape_replay is a validator-backed public refactor of the source agent_execution_trace system for sandbox/security claims. It asks a narrow question: can Microcosm compute metadata-only trace spans from action requests, pre-execution policy verdicts, side-effect diff result records, rollback result records, cold replay, falsification fixtures, and an explicit scope limit?
Purpose
Agent sandbox claims are easy to assert and hard to evidence. A system can say it blocks an untrusted action, or that a side effect was rolled back, without ever showing the record that would make the statement checkable. This component exists to hold one narrow claim to account: that for a fixed set of synthetic action requests, the policy decision was recorded before execution, blocked actions left no side effect, and permitted side effects each carry a diff and a rollback result record.
The interesting choice is that the validator does not trust the verdict it is handed. For each request it derives the expected verdict from the request's own shape, its action kind, requested capability, risk class, and source trust label, using a small fixed policy table (_derived_sandbox_policy_verdict). A secret read from untrusted tool output derives to block; a low-risk public fixture write derives to allow; a mock database update derives to review. Any action shape the table does not recognise fails closed to block. The recorded verdict row is then checked against that derivation. A declared allow that should have derived to block is a finding, not a pass. The same fail-closed semantics drive the side-effect check: a request whose shape requires a block must show no execution and a zero diff count regardless of what the verdict row claims.
Because every check runs over symbolic references, the page can report concrete numbers, six action requests, four derived blocks, six metadata-only trace spans, while staying honest about what those numbers are not. They demonstrate that the pre-execution accounting pattern is wired and replayable over public fixtures. They do not demonstrate containment in a real host, resistance to a live exploit, or network isolation. That gap is the point of the scope limit below, and it is the line this component is built to keep visible.
Named Proof Consumers
The primary proof consumer is tests/test_agent_sandbox_policy_escape_replay.py. Its 17 tests exercise both runtime entry points (run and run_sandbox_bundle) and the public trace builder from microcosm_core.macro_tools.agent_execution_trace. The consumer does not accept declared labels at face value: it mutates policy verdicts, side-effect rows, cold replay labels, exported bundle rows, source-module digests, source/target manifest fields, body-boundary fields, and cached-card freshness to prove that the validator recomputes the sandbox-policy result from source rows.
The fixture proof path is microcosm_core.organs.agent_sandbox_policy_escape_replay run. Its success result record must include six action requests, six pre-execution verdicts, four derived blocked rows, one derived allow row, one derived review row, six side-effect result records, two rollback-verified rows, six cold replay passes, all expected negative cases, and a six-span public trace. The negative-case rows are not auxiliary examples; they are the admission boundary that rejects semantic policy drift, blocked-action execution, executable escape payload material, tool-output authority bypass, raw environment exposure, and broad security or benchmark overclaim.
The exported-bundle proof path is microcosm_core.cli agent-sandbox-policy-escape-replay run-sandbox-bundle. It has no fixture-only negative cases, so its proof surface shifts to bundle shape: the bundle id, source-module manifest, seven copied source bodies, digest equality, required anchors, metadata-only result records, public-relative paths, and public trace spans must all validate. The same test file also breaks the manifest in targeted ways to prove that missing manifests, invalid material classes, body-in-result record flags, count mismatches, missing target copies, and partial source or target digest drift block the result.
The corpus proof consumer is scripts/build_doctrine_projection.py --check-paper-module-corpus. It proves only that this reader page remains aligned with the JSON bundle-backed paper-module corpus. It does not refresh generated Mermaid, Atlas, site, or verifier projections, and it does not raise the claim above public fixture and bundle replay evidence.
Technical Mechanism
The mechanism is a validating transducer over public refs, not a sandbox. The runtime entry points run and run_sandbox_bundle load the fixture or exported bundle, then _build_result recomputes each claim from lower-level rows before any result record is accepted. The named proof consumer is tests/test_agent_sandbox_policy_escape_replay.py, with the corpus-level projection consumer scripts/build_doctrine_projection.py --check-paper-module-corpus.
The validator first establishes an input boundary. _load_payloads reads the projection protocol, sandbox policy, action requests, verdicts, side-effect result records, rollback result records, and cold replay rows with strict JSON parsing. scan_paths checks the same public files and copied source-module bodies against core/private_state_forbidden_classes.json, while _source_module_manifest_result verifies that the seven copied source bodies are present, digest-matched, by material class, and excluded from result record body fields.
The policy mechanism is then recomputed row by row. validate_action_requests admits only symbolic request metadata with redacted bodies and no live network target. validate_policy_verdicts joins each request to a pre-execution verdict and checks the verdict against the request's risk class instead of trusting the declared label. validate_side_effect_receipts enforces the mechanical consequence: block decisions must have no execution and no diff, while allow/review decisions must carry a non-empty public diff result record. validate_rollback_receipts requires rollback refs for side-effecting actions, and validate_cold_replay requires replay rows that reproduce verdict and side-effect state.
The trace layer converts accepted public rows into metadata-only PublicTraceSpan records through build_public_sandbox_policy_trace. Each span carries a request id, authority verdict ref, side-effect or rollback ref, outcome, digest, and sandbox_policy_action tool label. This is why the module can claim six public trace spans and outcome counts, but cannot claim live sandbox security: the trace proves replay consistency over refs, not containment in a real host environment.
Negative cases define the refusal surface. The focused test suite mutates the fixture and exported bundle to verify semantic mismatch, blocked-action execution, source-module digest mismatch, source-module manifest boundary breakage, public-relative result record paths, and card reuse. These tests are the source-bound evidence that the validator fails closed for the named public contract.
Shape
Diagram source
flowchart TD bundle["JSON bundle authority"] markdown["Markdown reader projection"] manifest["source module manifest"] requests["six action requests"] verdicts["six pre-execution verdicts"] effects["six side-effect result records"] rollbacks["two rollback result records"] replay["six cold replay rows"] trace["public agent-execution trace"] negative["negative-case refusals"] tests["focused proof consumer"] result record["metadata-only validation result record"] ceiling["scope limits"] bundle --> markdown manifest --> result record requests --> verdicts verdicts --> effects effects --> rollbacks effects --> replay verdicts --> negative effects --> negative replay --> trace negative --> tests trace --> result record tests --> result record result record --> ceiling
The module's shape is pre-execution containment accounting. Public action requests are normalized into symbolic refs, policy verdicts must exist before execution, blocked actions carry zero side effects, allowed or reviewed side effects need diff refs and rollback refs, cold replay must reproduce the public state, and the trace builder emits metadata-only spans over those refs without promoting the fixture into live sandbox-security authority.
Reader Evidence Routing
Bundle route: core/paper_module_capsules.json::paper_modules[35] is the bundle-backed authority row, and paper_modules/agent_sandbox_policy_escape_replay.json is the generated paper-module instance.
Action route: the six public request ids are req_secret_read_attempt, req_network_exfil_attempt, req_destructive_delete_attempt, req_shell_obfuscation_attempt, req_safe_file_edit, and req_reviewed_mock_db_update.
Verdict route: the six verdict rows are pre-execution policy decisions under sandbox-policy-v1-public-synthetic, with four block, one allow, and one review outcome.
Side-effect route: all six requests have side-effect result records; blocked rows use zero diffs, while the public fixture edit and reviewed mock database update carry diff refs plus rollback refs.
Manifest route: source_module_manifest.json records seven copied source bodies, body_in_receipt: false, body_text_in_receipt: false, and the boundary excluding keys, account secrets, account or browser material, model-output data, live network payloads, raw environments, executable escape payloads, and account secret-equivalent material.
Runtime route: src/microcosm_core/organs/agent_sandbox_policy_escape_replay.py, src/microcosm_core/macro_tools/agent_execution_trace.py, and tests/test_agent_sandbox_policy_escape_replay.py verify negative cases, public trace-span construction, exact source-module imports, manifest digest rejection, result record public-relativity, and card result record reuse.
Contract
Input shape: projection_protocol, sandbox_policy, action_requests, policy_verdicts, side_effect_receipts, rollback_receipts, and cold_replay.
Positive evidence: six metadata-only action requests converted into public agent_execution_trace spans, six pre-execution policy verdicts, six side-effect result records, two verified rollback result records, and six cold replay rows.
Negative cases: real secret material, live network access, raw environment export, policy after execution, unlogged side effect, tool-output policy bypass, executable escape payload, and security benchmark claim.
Result record boundary: the validation result record proves the source-faithful trace refactor mechanics, negative-case coverage, secret-exclusion scan, and scope limit.
Scope limit: no live sandbox escape, live secret handling, live network access, host filesystem mutation, executable payload export, raw environment export, external model access, security benchmark claim, source-file changes, or launch-scope decision.
Projection Protocol
Copied: the public shape of the source agent-execution trace membrane and the idea that containment must be proven before a security claim is admitted.
Source-faithfully refactored: PublicTraceSpan construction, sequence-ordered trace rows, authority verdict refs, side-effect and rollback refs, public summary counts, trace digests, local JSON validators, and result record generation.
Cleaned: real secrets, host paths, live network targets, raw environment data, executable payloads, provider data, and account state.
Omitted: live exploit material, hosted sandbox details, real account secrets, raw tool-output bodies, real filesystem paths, raw environment variables, and security benchmark claims claims.
Public runtime surface: a metadata-only sandbox policy bundle plus generated result records under receipts/first_wave/agent_sandbox_policy_escape_replay/ and receipts/runtime_shell/demo_project/organs/agent_sandbox_policy_escape_replay/.
Source-open body floor: the exported bundle carries source_module_manifest.json plus seven exact copied source bodies: the extracted-pattern ledger, the high novelty reconstruction result record, the canonical component model, the source system/lib/agent_execution_trace.py runtime, std_agent_execution_trace, the extracted-pattern route-readiness checker, and the strict JSON helper required by the refreshed trace body. Result records and cards cite the manifest, hashes, material classes, and counts only; full body text stays in the bundle source module files.
Prior Art Grounding
This component is grounded in least-privilege sandboxing and agent-security evaluation work, not in a new exploit technique. The security-control lineage is Saltzer and Schroeder's least-privilege / complete-mediation principles and capability-oriented confused-deputy thinking. The agent-evaluation lineage is closer to sandboxed tool-use benchmarks such as AgentDojo and misuse/harm evaluations such as AgentHarm, where an agent's requested actions, tool calls, and policy outcomes are evaluated under controlled conditions.
Microcosm does not claim real sandbox security, exploit resistance, or live environment isolation. It borrows the pattern that containment must be checked before action, side effects must be logged, rollback needs its own result record, and harmful payloads must stay out of the public surface.
Validation proves the projection boundary and public trace-refactor mechanics for this contract; it does not establish real sandbox security, live model behavior, benchmark claims, exploit resistance, or whole-system safety.
Validation Result records
The focused proof consumer is tests/test_agent_sandbox_policy_escape_replay.py. A passing result record has to show that the fixture validator and exported-bundle validator both recompute the public sandbox-policy trace from action-request refs, pre-execution verdict refs, side-effect result record refs, rollback refs, cold replay rows, and the source-module manifest instead of trusting declared labels.
For the focused test, the result record boundary is the asserted shape: six action requests, six policy verdicts, four blocked-without-execution rows, two verified rollback result records, six cold replay rows, six public trace spans, manifest digest checks, public-relative result record paths, and negative cases for semantic mismatch, blocked-action execution, digest mismatch, manifest-boundary breakage, and unsafe card reuse. For the corpus check, the result record is only parity evidence that the JSON bundle and generated paper-module instance still agree; it is not a live sandbox-security result.
Validation Result record Path
Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:
The focused regression test and corpus projection checks are:
cd microcosm-substrate && ../repo-pytest \
tests/test_agent_sandbox_policy_escape_replay.py
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
The result record path proves pre-execution policy replay over public refs, not live sandbox security, exploit resistance, or host isolation.
Scope boundary
Scope limit
This module may claim public fixture evidence that action request refs, pre-execution policy verdict refs, side-effect result record refs, rollback refs, cold replay rows, metadata-only trace spans, source manifests, digest checks, secret-exclusion scans, negative cases, and validation result records are checked by the listed runtime witnesses.
This module may not claim live sandbox escape resistance, live secret handling, live network isolation, host filesystem mutation authority, executable payload export, raw environment export, provider behavior, security benchmark performance, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.
Source and projection details
Governing Lattice Relation
The JSON bundle binds this paper module to the component agent_sandbox_policy_escape_replay and to mechanism.agent_sandbox_policy_escape_replay.validates_public_sandbox_policy_trace. The mechanism row states the actual computation: check action requests, pre-execution policy verdicts, side-effect result records, rollback result records, cold replay rows, public trace spans, source-module manifest boundaries, secret-exclusion scans, and escape negative cases before writing bounded result records.
AX-1 supplies the axiom-level rule: derivation must precede assertion, and a claim cannot be stronger than the checker that accepted it. P-1 specializes that rule into recomputation over lower-level evidence instead of echoing fixture labels, declared verdicts, or public copy lines. P-2 lowers the module's public claim to the strength of the named validator, which is why the scope limit stops at metadata-only public sandbox-policy replay result records. The governing concept, concept.agent_reliability_and_safety_validator_bundle, groups this component with agent reliability and safety validators whose public value is bounded result record evidence, not a broad claim that agents are safe in the world.
Belief-State Process Reward ReplayPublic projection of a belief-state process-reward claim contract, backed by the agent-execution trace refactor and copied source bodies.
Public Microcosm projection of a belief-state process-reward claim contract. Backed by the public agent-execution trace refactor lane plus copied source source bodies; validates trajectory groups, cold replay result records, negative cases, and scope limits.
Scope limit Source-faithful refactored fixtures and copied source bodies only; not fixture-echo product evidence, benchmark result, or launch claim.
This module is the public Microcosm projection of a belief-state process reward claim contract. It is backed by the public agent-execution trace refactor lane plus copied source source bodies. It is not a hidden-reasoning export, live RL run, neural-judge-only label set, hidden-gold benchmark, external model access, source-file changes, benchmark-score claim, or launch claim.
The public bundle models three partially observable tasks: terminal investigation, mock purchase, and formal-planning toy. A process-reward claim is admitted only when public observation digests, typed belief-state summaries, predicted next evidence, verifier or observed feedback refs, belief-discrepancy scores, dense process rewards, outcome rewards, reward-hacking trap results, trajectory groups, cold replay result records, negative cases, scope limits, and a source-faithful public trace span set line up.
Purpose
Process-reward language is easy to assert and hard to verify. A row can claim that a step earned a reward "for good reasoning" while the underlying evidence is a hidden gold label, a neural-judge guess, or formatting that gamed the scorer. This component exists to answer one narrow question: does a public process-reward claim actually reconstruct from lower-level public evidence, or is it just a label asserting its own correctness?
The interesting part is the recomputation. The validator does not trust any single fixture file. If any of those refs is missing or points somewhere inconsistent, the claim is blocked with a specific reason code rather than passed. A reward cannot point at a belief that points at a different episode, or cite feedback that belongs to another trajectory, and still count.
That cross-referential check is what separates this from a shape linter. The failure mode it guards against is a process-reward claim that looks correct field by field but does not survive being recomputed end to end. Two further design choices keep the result honest: outcome rewards are carried beside process rewards so a final answer cannot be re-labelled as step-level evidence, and every belief summary, feedback ref, and reward event stays metadata-only, so the validator proves the accounting structure without ever reading hidden reasoning.
Shape
The local governing standard is standards/std_microcosm_belief_state_process_reward_replay.json, whose authority boundary is synthetic belief-state process-reward replay only, not live training, benchmark, provider, source-file changes, public sharing, or launch-scope decision.
flowchart TD bundle["JSON source record paper_module_capsules.json[36]"] standard["Local standard std_microcosm_belief_state_process_reward_replay.json"] component["Runtime locus belief_state_process_reward_replay.py"] fixtureMode["run (fixture mode) 8 positive + 7 negative inputs"] bundleMode["run_reward_bundle (bundle mode) copied-body manifest floor required"] floors["Per-file floors projection protocol, reward policy, episodes, belief states, feedback, rewards, trajectory groups, cold replay"] recompute["Semantic recompute rebuild belief -> feedback -> process reward -> trajectory -> outcome reward -> cold replay"] negatives["Negative cases 7 planted traps must be observed"] scan["Secret-exclusion scan plus metadata-only public trace span set"] gate{"All floors pass, chain recomputes, every trap observed, no secret hit?"} pass["status: pass"] blocked["status: blocked with reason codes"] result records["Result records + compact card refs, hashes, counts, verdicts; body_in_receipt false"] ceiling["Scope limit source-faithful replay only"] bundle --> component standard --> component component --> fixtureMode component --> bundleMode fixtureMode --> floors bundleMode --> floors floors --> recompute recompute --> negatives negatives --> scan scan --> gate gate -->|yes| pass gate -->|no| blocked pass --> result records blocked --> result records result records --> ceiling
The generated instance reports eight relationship edges and zero unpopulated selective relations: it explains the belief_state_process_reward_replay component and the validating mechanism, is governed by P-1, P-2, and concept.agent_reliability_and_safety_validator_bundle, abides by AX-1, depends on paper_module.agent_route_observability_runtime, and cites src/microcosm_core/organs/belief_state_process_reward_replay.py as the resolved code locus. The component atlas adds the human/agent gloss and result record set; it classifies the evidence as algorithmic_projection and restates that the validator operates on recorded synthetic fixtures rather than live agent behavior.
The fixture manifest fixtures/first_wave/belief_state_process_reward_replay/fixture_manifest.json names eight positive input files and seven planted negative cases: hidden-chain-of-thought export, neural-judge-only labels, hidden gold labels, reward-by-formatting, verifier bypass, benchmark-performance claims, and final-answer-only scoring. The exported bundle manifest carries the source-open body floor: seven copied source modules under source_modules/, checked by digest and anchor refs, with body_text_exported_in_receipts: false. The focused test file tests/test_belief_state_process_reward_replay.py covers the fixture validator, exported bundle validator, public-root copy, negative cases, exact source body imports, and route/result record shape.
The honest ceiling is therefore narrow: this module can support public, metadata-only belief-state process-reward replay over synthetic tasks with verifier-backed process feedback, separated outcome rewards, cold replay, and negative-case coverage. It cannot support hidden reasoning export, live RL, reward-model quality, hidden-gold benchmark claims, provider behavior, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.
Reader Evidence Routing
Read this page from the structured bindings outward. The bindings name the component, mechanism, concept, dependency module, runtime code locus, principle and axiom refs. The fixture result records, bundle result records, and focused test then show the metadata-only replay behavior. This page explains that chain for readers.
Technical Mechanism
The runtime validator is a two-mode replay checker. In fixture mode, run loads eight positive fixture files plus the seven planted negative inputs named by EXPECTED_NEGATIVE_CASES; _build_result then validates the projection protocol, reward policy, task episodes, belief states, verifier feedback, reward events, trajectory groups, cold replay, negative cases, secret-exclusion scan, and public trace projection before _write_receipts writes the result, board, validation, and sign-off result records. A pass requires all required positive floors to pass, every expected negative case to be observed, zero secret-scan blocking hits, public trace status pass, and no positive finding outside the expected falsification cases.
In exported-bundle mode, run_reward_bundle validates the public bundle without negative inputs and makes the copied-body floor mandatory. The source_module_manifest.json path must declare seven copied source body modules, body_in_receipt: false, body_text_in_receipt: false, public material classes, exact source/target digests, existing copied targets, and all declared anchors. Digest mismatch, missing manifest, wrong body class, result record-body leakage, count mismatch, missing target, and missing anchor cases block the bundle instead of degrading silently.
Between the per-file floors and the result records sits validate_semantic_recompute, which is where most of the real work happens. It checks that the cited feedback belongs to the same episode, that the process reward references the same belief, episode, trajectory, feedback ref, and belief-discrepancy value, that the trajectory actually lists that episode and that reward, that the trajectory's outcome reward is a real outcome event for the same episode, and that the cold replay both exists and passes. Any inconsistency appends a precise reason code such as feedback_episode_mismatch, belief_discrepancy_mismatch, or trajectory_process_reward_missing, and a single blocked row is enough to block the whole result. This is the check that a label-only fixture cannot fake: the references have to recompute into one coherent chain, not merely be present.
The validator links process-reward claims to public belief summaries rather than private reasoning. build_public_belief_state_process_reward_trace emits six metadata-only trace spans from the exported bundle, and the card path reports only compact counts, status, freshness digest, source-body floor metadata, and result record refs. CARD_OMITTED_FULL_PAYLOAD_KEYS keeps findings, scans, trace bodies, row payloads, source imports, scope limit, and scope boundary text out of the command card so public surfaces carry proof handles rather than copied private or source bodies.
Named Proof Consumers
microcosm_core.organs.belief_state_process_reward_replay.run is the first-wave fixture consumer. It writes result, board, validation, and sign-off result records for the synthetic episodes and negative-case floor.
microcosm_core.organs.belief_state_process_reward_replay.run_reward_bundle is the exported bundle consumer. It validates copied source bodies, public trace spans, digest/anchor contracts, and metadata-only result record behavior.
microcosm_core.organs.belief_state_process_reward_replay.result_card is the compact public card consumer. It reports counts and validation state while omitting the heavy/private payload classes named by CARD_OMITTED_FULL_PAYLOAD_KEYS.
tests/test_belief_state_process_reward_replay.py is the focused regression consumer. It asserts the three episode groups, six belief states, six process rewards, three outcome rewards, three cold replays, seven expected negative cases, exact source-body imports, digest mismatch blockers, manifest-boundary blockers, public-relative redacted result records, fresh-card reuse, and metadata-only public trace projection.
microcosm_core.macro_tools.agent_execution_trace.build_public_belief_state_process_reward_trace is the trace-projection consumer. It converts the exported bundle into six public spans with belief-state, feedback, process-reward, outcome-reward, and cold-replay coverage while preserving body_in_receipt: false.
Public Mechanics
Belief-state JSON is a public summary, not hidden chain-of-thought.
Process rewards must cite deterministic verifier or observed environment feedback refs; neural-judge-only labels are rejected.
Outcome rewards are carried beside process rewards so final answers cannot masquerade as process evidence.
Reward-hacking traps and cold replay result records must pass for each trajectory group.
microcosm_core.macro_tools.agent_execution_trace:: build_public_belief_state_process_reward_trace turns the public bundle into ordered trace spans that preserve belief, verifier, process-reward, outcome-reward, and cold-replay refs while keeping bodies out of result records.
examples/belief_state_process_reward_replay/ exported_belief_state_process_reward_bundle/source_module_manifest.json verifies exact copied source bodies for the extracted-pattern ledger, high-novelty reconstruction result record, canonical component model, agent-execution trace runtime, trace standard, and route-readiness checker. Those bodies live in source_modules/; result records carry refs, hashes, counts, and verdicts only.
Hidden reasoning export, hidden gold labels, reward-by-formatting, verifier bypass, benchmark-performance claims, and final-answer-only scoring are expected falsification fixtures.
Prior Art Grounding
This component is grounded in three older ideas: belief-state tracking under partial observability, process supervision, and reward-hacking controls. The belief lineage comes from POMDP work such as Kaelbling, Littman, and Cassandra's Planning and Acting in Partially Observable Stochastic Domains. The process-reward lineage follows OpenAI's Let's Verify Step by Step, where step-level feedback is separated from outcome-only supervision. The reward-hacking lineage comes from Concrete Problems in AI Safety and related work on specification gaming.
Microcosm does not train a reward model or expose hidden reasoning. It borrows the accounting form: public belief summaries, verifier-backed process feedback, outcome rewards kept separate from process rewards, reward-hacking traps, and cold replay result records before process-reward language is admitted.
Validation Result record Path
Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:
The replay is intentionally small and synthetic. It covers three partially observable task families, six accepted belief summaries, six process rewards, three outcome rewards, three trajectory groups, three cold replays, seven negative cases in fixture mode, and seven copied source modules in exported-bundle mode. Those counts are proof boundaries, not scale claims. They show that the public validator can separate belief summaries, verifier-backed process feedback, outcome rewards, reward-hacking traps, and cold replay under declared fixtures.
The mechanism does not estimate reward-model calibration, generalize to unseen tasks, compare agent policies, certify live training behavior, or score a benchmark. build_public_belief_state_process_reward_trace emits metadata-only trace spans, so it can prove trace structure and privacy boundaries, not hidden reasoning fidelity. The copied-source manifest proves exact declared public source bodies and anchors for this bundle; it excludes private source root export, external model access, source-file changes, public sharing, or launch.
Scope limit
Source-faithful refactored fixtures and copied source bodies only; not fixture-echo product evidence, hidden reasoning export, live RL training, neural-judge sufficiency, hidden-gold benchmarking, provider behavior, benchmark claims, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.
Scope limit
This paper module can claim a metadata-only belief-state process-reward replay over synthetic tasks, with public belief summaries, verifier-backed process feedback, separated outcome rewards, reward-hacking traps, and cold replay result records.
It cannot claim hidden reasoning export, live RL training, reward-model quality, hidden-gold benchmark claims, provider behavior, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness. Any higher claim must land first in core/paper_module_capsules.json and the generated paper-module projection.
Scope boundary
This module does not export hidden reasoning, run RL or train a model, use hidden gold labels, rely on neural-judge-only labels, claim benchmark performance, use external model services, change source files, publish results, or include launch operations.
Source and projection details
Governing Lattice Relation
The governing lattice relation is that belief-state process-reward language is admissible only after the runtime recomputes the claim from lower-level public evidence. The generated JSON instance resolves eight edges and leaves no selective relation open: the bundle explains the belief_state_process_reward_replay component and mechanism.belief_state_process_reward_replay.validates_public_belief_state_process_reward_replay, is governed by concept.agent_reliability_and_safety_validator_bundle, P-1, and P-2, abides by AX-1, depends on paper_module.agent_route_observability_runtime, and cites src/microcosm_core/organs/belief_state_process_reward_replay.py as the code locus.
That relation matters because the module is not trying to make reward quality plausible from a label. P-1 requires recomputation rather than echoing fixture verdicts, so _build_result rechecks projection protocol, reward policy, episodes, belief rows, feedback rows, reward rows, trajectory groups, cold replay, expected negative cases, secret-exclusion scans, public trace shape, and copied source-module manifests. P-2 and AX-1 then lower the paper claim to what those checks derive: a local replay certificate over declared public inputs. The focused proof consumer is tests/test_belief_state_process_reward_replay.py, which exercises both fixture and exported-bundle modes, mutates real positive feedback linkage, rejects digest and manifest boundary violations, verifies exact source-body imports, checks freshness over live source authority, and confirms the command card omits full payload keys.
Sleeper Memory Poisoning Quarantine ReplaySynthetic replay fixture for a persistent-memory security contract: quarantine of poisoned memory, audit refs, rerun result records, negative cases, authority
Public Microcosm projection of a persistent-memory security claim contract. Replays synthetic memory-poisoning episodes and validates quarantine behavior, audit refs, rerun result records, negative cases, and scope limits with metadata-only result records.
Scope limit Synthetic replay fixture only; not a live memory product, live user memory import, benchmark security result, private memory export, or launch claim.
Persistent agent memory is an attack surface. If an agent reads a poisoned source in one session and writes a memory from it, that memory can quietly shape a later session's actions, long after the poisoning is out of view. This module asks one question: if an agent quarantines a poisoned memory write, can it show, from result records alone, that the quarantine actually held when the memory was retrieved later, and that a rollback genuinely removed it?
The interesting part is that the runtime grades the whole chain, not just the final answer. A naive memory-security story checks that the agent reached the right conclusion. This validator refuses that. It requires the poisoned write to carry provenance, the later retrieval to be blocked before any action and to cite the same memory ref the write quarantined, and the rollback to carry a deletion audit ref, a cold-rerun result record, and proof that the memory is absent after the rerun. A blocked retrieval that cannot name the quarantine audit ref or the cold-replay result record for the memory it gates is treated as unproven, not as a pass.
It is a synthetic fixture, deliberately narrow. The inputs are public metadata rows, never live user memory or private bodies, and the result records carry refs, hashes, counts, and verdicts rather than any memory text. It borrows the control shape from prior work on sleeper triggers and memory poisoning; it does not secure a live memory system or claim a benchmark result.
Abstract
This module is the public Microcosm projection of a persistent-memory security claim contract. It is a synthetic replay fixture, not a live memory product, live user memory import, benchmark security result, private memory export, or launch claim.
The fixture models four public sessions: a poisoned source bundle is seen, a memory write proposal is quarantined, later retrieval is blocked before action, and rollback plus cold rerun proves the poisoned memory is absent at the result record boundary. The claim is admitted only when source bundle refs, provenance refs, quarantine verdicts, classifier labels, retrieval influence gates, rollback audit refs, rerun result records, negative cases, and scope limits line up.
Shape
Diagram source
flowchart TD inputs["Public metadata inputs sessions, write proposals, retrieval replays, rollback rows"] subgraph Gates["Four ordered gates"] provenance["Provenance gate poisoned write quarantined + provenance-bound control admitted"] influence["Delayed-influence gate later retrieval blocked before action, same memory ref + audit + cold-replay ref"] rollback["Rollback gate deletion audit + cold-rerun result record + memory absent after rerun"] bodies["Source-body gate copied bodies digest-checked, result records stay metadata-only"] end negatives["Eight negative cases each must be observed as a typed finding"] result records["metadata-only result records refs, hashes, counts, verdicts"] ceiling["Scope limit synthetic replay only"] inputs --> provenance provenance -->|quarantined memory ref| influence influence --> rollback rollback --> bodies inputs --> negatives provenance --> result records influence --> result records rollback --> result records bodies --> result records negatives --> result records result records --> ceiling
The module's shape is a public memory-security replay, not a live memory product. Public metadata inputs pass through four ordered gates: provenance, delayed influence, rollback, and source-body handling. The delayed-influence gate is coupled to the provenance gate, so a blocked retrieval must target the same memory ref the write quarantined and cite its audit and cold-replay refs. Alongside the positive chain, eight negative cases must each surface as a typed finding, and every path lands in metadata-only result records under the scope limit.
Mechanism
The mechanism is a replay reducer over public metadata, not a memory runtime. src/microcosm_core/organs/sleeper_memory_poisoning_quarantine_replay.py loads six positive input families through _build_result: projection protocol, memory policy, session chain, quarantine events, retrieval replays, rollback/cold-rerun rows, and the source-module manifest. When run is used on the first-wave fixture it also loads the expected negative fixtures; when run_quarantine_bundle is used on the exported bundle it validates the public bundle without treating that bundle as negative-case authority.
The first gate is provenance. validate_memory_write_proposals accepts a memory write only when it carries the required source bundle ref, provenance ref, trust tier, classifier labels, audit ref, quarantine verdict, and redacted body posture. An untrusted source context with the sleeper-poisoning classifier cannot silently become trusted memory. Missing provenance, private memory body export, raw transcript export, live user-memory claims, and trusted promotion from untrusted context become typed findings instead of admissible memory authority.
The second gate is delayed influence. validate_retrieval_replays checks that later retrieval of the quarantined memory is blocked before any action can use it. The row must carry a retrieval ref, influence grade, action gate, cold replay result record ref, public evidence refs, and a quarantine audit ref coupling back to the write proposal it gates. This is the anti-final-answer check: the runtime rejects a memory-security story that grades only the final answer while omitting retrieval, influence, or rerun evidence.
The third gate is rollback. validate_rollback_rerun requires a rollback result record ref, deletion audit ref, rerun result record ref, and memory_absent_after_rerun=true. Rollback language is therefore admitted only when deletion and cold rerun are both present. Tests mutate these fields to show that nonempty but bogus rollback refs, missing result record refs, and absence failures block rather than becoming evidence.
The fourth gate is source-open body handling. validate_source_module_manifest and _source_open_body_import_summary verify seven copied public source bodies, their declared material classes, their digest fields, and their metadata-only result record posture. _write_receipts and result_card then expose public ids, counts, refs, digests, verdicts, negative-case status, and scope limits while omitting retrieval rows, rollback rows, and copied source bodies from command cards.
The proof consumer is therefore a pair of bounded runs plus focused tests: run proves the first-wave fixture with expected negative cases; run_quarantine_bundle proves the exported public bundle; and tests/test_sleeper_memory_poisoning_quarantine_replay.py verifies mutated positive rows, stale baked labels, retrieval/quarantine coupling, rollback result record shape, source-body digest checks, public-relative redaction, and card payload omission.
Reader Evidence Routing
Bundle route: core/paper_module_capsules.json::paper_modules[37:paper_module.sleeper_memory_poisoning_quarantine_replay] is the JSON authority row. A diagram view and an atlas card are generated for this module.
Mechanism route: core/mechanism_sources.json::mechanism.sleeper_memory_poisoning_quarantine_replay.validates_public_sleeper_memory_poisoning_quarantine_replay binds the code locus, input refs, result record refs, validator commands, focused regression, and guardrails.
Runtime route: src/microcosm_core/organs/sleeper_memory_poisoning_quarantine_replay.py owns run, run_quarantine_bundle, _build_result, _write_receipts, result_card, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and the metadata-only source-module import checks.
Source-module route: source_module_manifest.json records seven copied public source bodies, including the growth result record, memory-plane paper modules, operator-memory tests, agent execution trace runtime, strict JSON helper, and agent execution trace standard; result records keep source bodies out with body_in_receipt: false.
Focused-test route: tests/test_sleeper_memory_poisoning_quarantine_replay.py verifies negative cases, public-relative redacted result records, exported-bundle runtime shape, digest mismatch rejection, exact copied source bodies, and card result record reuse.
Prior Art Grounding
This component combines two prior-art lines: sleeper/deceptive trigger behavior and long-term-memory/RAG poisoning. The sleeper-trigger lineage is Anthropic's Sleeper Agents. The memory-poisoning lineage includes AgentPoison, MemoryGraft, and Hidden in Memory, which all treat retrieved or persistent agent memory as an attack surface rather than a neutral cache.
Microcosm does not claim to secure live memory systems. It borrows the control shape: memory writes need provenance, untrusted source context cannot silently become authority, later retrieval must pass an influence gate, and deletion or rollback needs an audit ref plus cold rerun evidence.
Validation Result record Path
Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:
The focused regression test and corpus projection checks are:
cd microcosm-substrate && ../repo-pytest tests/test_sleeper_memory_poisoning_quarantine_replay.py
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
The result record path proves synthetic memory-poisoning quarantine replay over public metadata refs, not live memory safety, provider behavior, or benchmark security.
Scope boundary
Scope boundary
This module does not run live memory, claim memory product quality, import live user memory, export private memory bodies or raw transcripts, promote untrusted context into trusted memory, use external model services, change source files, claim benchmark security, publish results, or include launch operations.
Scope limit
This module may claim synthetic sleeper-memory poisoning quarantine replay over public metadata refs: source bundle refs, provenance refs, quarantine verdicts, classifier labels, retrieval influence gates, rollback audit refs, cold rerun result records, expected negative cases, source-module digest checks, metadata-only result records, and validation result records.
It does not claim live memory product quality, live user-memory handling, trusted promotion from untrusted context, provider behavior, source-file changes, benchmark security, private memory export, public sharing, launch-scope decision, or whole-system correctness. The generated diagram and atlas card are navigation aids, not security benchmark results.
Indirect Prompt-Injection Information-Flow Policy ReplayValidator-backed claim: a source-faithful trace refactor separated trusted instructions from untrusted web/tool/browser text before any privileged action.
Validator-backed public claim contract for indirect prompt-injection information-flow policy. Admits one narrow claim: a source-faithful trace refactor separated trusted instructions from untrusted web/tool/browser text before any privileged action or answer, checked against negative cases and scope limits.
Scope limit Source-faithful refactored fixtures and metadata-only result records only; not a live information-flow product, complete-security proof, or launch claim.
This validator-backed claim contract admits one narrow public claim: a source-faithful public trace refactor separated trusted instructions from untrusted web/tool/browser text before any privileged action or answer claim was accepted.
The runnable contract requires source trust labels, taint labels, source-to-sink flow rows, pre-action policy verdicts, sanitized-output result records, cold replay, secret-exclusion scan, negative cases, a public agent-execution trace, and an explicit scope limit.
Purpose
An agent that reads web pages, tool output, or retrieved documents takes in text from sources it does not control. Indirect prompt injection is the case where that untrusted text carries an instruction, and the agent acts on it as if the operator had asked. This component exists to make one specific safety property checkable on a synthetic trace: untrusted text was kept separate from trusted instructions, and no untrusted source reached a privileged action without being gated first. It answers a single question. Did the trust boundary actually hold through the flow, or only on paper?
The unusual part is that the validator does not trust the labels the fixture declares. Each flow row claims a set of taint labels and a policy verdict, but the runtime ignores those and recomputes both. It propagates taint along the source-to-sink graph from the labelled sources, so a sink inherits the taint of everything that fed it, and it derives the verdict from that propagated taint plus the sink's privilege, the sanitizer state, the sink kind, and the proposed action. If the declared taint or the declared verdict disagrees with the recomputed one, the row is blocked. A flow cannot quietly relabel an untrusted source as clean, or mark a dangerous action as allow, because the contradiction is recomputed rather than read back.
That recomputation is the point. The failure mode it guards against is a trace that looks safe because the labels were written to look safe. By deriving the labels and verdicts from the graph itself, the contract catches the mislabelled flow that a field-by-field check would wave through. To stay honest about live behaviour, it also takes one generated public tool-call trace span and pushes it through the same machinery as untrusted tool output, so the runtime is seen to treat tool output as data until a policy gate reviews it, never as instruction authority.
Primary result record: receipts/runtime_shell/demo_project/organs/indirect_prompt_injection_information_flow_policy_replay/exported_prompt_injection_flow_bundle_validation_result.json
First-wave fixture result record: receipts/first_wave/indirect_prompt_injection_information_flow_policy_replay/indirect_prompt_injection_information_flow_policy_replay_validation_receipt.json
Shape
Diagram source
flowchart TD sources["Source rows trusted and untrusted, with taint labels"] flows["Source-to-sink flow rows (declared taint and verdict)"] propagate["Propagate taint along the source-to-sink graph"] derive["Derive verdict from taint + sink privilege + sanitizer + sink kind + action"] compare{"Declared labels and verdict match the derived ones?"} blocked["Block the row (relabelled or wrong verdict)"] gate{"Untrusted into a privileged sink?"} verdicts["Pre-action verdict allow / warn / review / block"] outputs["Sanitized output no trusted context disclosed, no untrusted instruction obeyed"] toolspan["One public tool-call trace span treated as untrusted tool output"] result records["metadata-only result records refs, digests, counts, status"] sources --> flows flows --> propagate propagate --> derive derive --> compare compare -- no --> blocked compare -- yes --> gate gate -- yes --> verdicts gate -- no --> verdicts verdicts --> outputs toolspan --> propagate outputs --> result records blocked --> result records
The module's shape is a public information-flow replay, not a live prompt-injection defense. This page points at the mechanism and runtime component; the runtime validates source trust labels, taint propagation, privileged sink gates, pre-action verdicts, sanitized outputs, cold replay, public trace spans, source-module digest anchors, negative cases, and metadata-only result records.
Technical Mechanism
The runtime mechanism is an evidence compiler plus an information-flow validator. run loads the first-wave fixture with negative cases enabled; run_prompt_injection_bundle loads the exported public bundle and leaves the fixture-only negative cases out. Both routes call _build_result, which loads the projection protocol, injection policy, source-document rows, flow graph, policy verdict rows, sanitized outputs, cold replay rows, public trace, copied source-module manifest, and secret-exclusion policy before it writes any result record.
The source and flow validators separate instruction authority from untrusted data before claim admission. validate_source_documents requires every source row to carry source id, trust label, channel, body ref, taint labels, instruction-authority flag, body redaction, synthetic-fixture status, and no raw or real-account body export; untrusted sources cannot carry instruction authority. validate_information_flow_graph joins each flow to its source row, derives taint labels through _taint_propagation_receipt, derives the expected policy verdict from propagated taint, sink kind, sink privilege, sanitizer state, and proposed action, and rejects hand-written taint or verdict drift.
Policy and output validation then bind the pre-action membrane. The injection policy must name allow, warn, block, and review verdicts; require the source, flow, verdict, and output field floors; and deny real accounts, raw prompt bodies, account secrets, tool-output authority, hidden-message promotion, live tool calls, general robustness claims, and launch. validate_policy_verdicts requires verdicts to join to flows, precede action, cite rules, stay redacted, and match the derived flow verdict. validate_sanitized_outputs requires output refs to join to flows, disclose no trusted context, obey no untrusted instruction, and avoid external action on blocked flows.
Replay and trace validation keep the public claim metadata-only. validate_cold_replay requires replay commands and result record refs to reproduce each verdict and sanitized output without trusted-context disclosure. The component uses build_public_prompt_injection_trace to build five public trace spans, then _live_tool_call_trace_promotion promotes one generated public tool-call trace span back through the same taint-graph machinery as an untrusted tool-output source. That promotion is evidence that the runtime treats tool output as data until a policy gate reviews it, not as instruction authority.
The copied-source floor is checked independently. _source_module_manifest_result requires the exported bundle's source_module_manifest.json to classify copied material as source body material, keep body text out of result records, match declared module counts, resolve path and target_ref to the same copied body, stream SHA-256 digests over each target, and verify required anchors. _source_open_body_import_summary exposes only body ids, classes, manifest refs, counts, and ceiling flags; the copied bodies remain under source_modules/.
The result record mechanism is intentionally small. _write_receipts writes first-wave result, board, validation, and sign-off result records for fixture mode, while exported-bundle mode writes the bundle validation result. result_card emits a compact command card and omits findings, secret-scan details, scope limit bodies, scope boundary text, source refs, target refs, public trace spans, source rows, flow rows, verdict rows, sanitized output rows, cold replay rows, board rows, and copied source-module bodies. The card preserves counts, status, negative-case coverage, trace span count, body-floor status, and result record refs.
The lattice binding is the source record paper_module.indirect_prompt_injection_information_flow_policy_replay, the mechanism row mechanism.indirect_prompt_injection_information_flow_policy_replay.validates_public_indirect_prompt_injection_information_flow_policy_replay, principles P-9 and P-14, axiom AX-8, and concept.agent_reliability_and_safety_validator_bundle. Those refs are used as an admission-control lattice: source-labelled public evidence may enter the claim surface, while untrusted instruction authority, private bodies, model-output data, live account material, source-file changes, and launch claims remain out of scope.
Input Contract
projection_protocol.json: source-available projection statement and omitted private material.
injection_policy.json: required source, flow, verdict, and output fields plus authority denials.
source_documents.json: synthetic trusted and untrusted sources with trust labels and taint labels.
information_flow_graph.json: source-to-sink flow rows before claim admission.
policy_verdicts.json: allow, warn, block, and review verdicts before synthetic action.
sanitized_outputs.json: output refs proving no trusted context disclosure and no untrusted instruction obedience.
cold_replay.json: rerunnable command and result record refs that reproduce verdicts and sanitized state.
Public Trace Refactor
The product evidence is no longer the fixture verdict fields alone. The component uses microcosm_core.macro_tools.agent_execution_trace::build_public_prompt_injection_trace to emit metadata-only spans over the public source, flow, verdict, output, and replay refs. That builder is a Microcosm refactor of the source system/lib/agent_execution_trace.py span model, so the accepted result record can show sequence, authority, audit, coverage, and digest mechanics without copying real accounts, prompt bodies, model-output data, or live tool material.
Reader Evidence Routing
Bundle route: core/paper_module_capsules.json::paper_modules[38:paper_module.indirect_prompt_injection_information_flow_policy_replay] is the JSON authority row; a Mermaid diagram and an Atlas card are generated for this module from that row.
Mechanism route: core/mechanism_sources.json::mechanism.indirect_prompt_injection_information_flow_policy_replay.validates_public_indirect_prompt_injection_information_flow_policy_replay binds the code locus, fixture refs, exported bundle refs, result record refs, validator commands, focused regression, and guardrails.
Source-module route: source_module_manifest.json records five copied public source bodies: the extracted-pattern ledger row, high-novelty reconstruction result record, agent execution trace runtime, agent execution trace standard, and strict JSON helper. Result records carry refs, digests, counts, and status only; source body text stays in the bundle's source_modules/ tree.
Focused-test route: tests/test_indirect_prompt_injection_information_flow_policy_replay.py verifies negative cases, public-relative redacted result records, exported-bundle runtime shape, source-module digest and target-ref failures, exact copied source bodies, card result record reuse, and public trace span construction.
Microcosm does not claim a general prompt-injection defense. It preserves the prior-art internal control lesson: untrusted content must be labelled as data, source-to-sink flows must be visible before privileged action, and sanitized outputs need result records. The local component turns that lesson into a metadata-only replay contract with explicit scope boundaries and negative cases.
Negative Cases
The validator rejects real account material, secret or trusted-context exfiltration, raw prompt body export, untrusted tool output treated as instruction authority, hidden system-message promotion, account secret exfiltration, final-answer-only success, and ungated untrusted flow into a privileged sink.
These are falsification fixtures. They are part of the contract, not examples of live exploit traffic.
Named Proof Consumers
The named proof consumer is tests/test_indirect_prompt_injection_information_flow_policy_replay.py. It checks first-wave negative-case coverage, five sources, three untrusted and two trusted source labels, five information flows, derived taint paths, derived policy verdicts, allow/warn/block/review counts, blocked-without-external-action counts, sanitized-output non-disclosure, cold replay, scope limit flags, public trace spans, public tool-call trace promotion through taint propagation, public-relative redacted result records, exported-bundle validation, source-module digest mismatch rejection, target-ref/path mismatch rejection, partial digest mismatch rejection, manifest body-text boundary rejection, streaming source-module digests, exact copied source body imports, fresh --card result record reuse, public trace construction, and fixture-manifest binding to the body-open refactor.
The runtime proof consumers are the two module commands in the Validation Result record Path: fixture mode via indirect_prompt_injection_information_flow_policy_replay run, and exported bundle mode via indirect_prompt_injection_information_flow_policy_replay run-prompt-injection-bundle. Fixture mode must observe all eight negative cases and write metadata-only result, board, validation, and sign-off result records. Bundle mode must validate the public bundle shape, source-module manifest, public trace spans, and metadata-only exported bundle result record.
The corpus proof consumer is scripts/build_doctrine_projection.py --check-paper-module-corpus. It is a corpus check only; it does not refresh generated Mermaid, Atlas, site, verifier, or bundle state.
Validation Result record Path
Run the first-wave fixture validator from the repo root and write its result record outside the repo working tree:
The focused regression test and corpus projection checks are:
cd microcosm-substrate && PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_indirect_prompt_injection_information_flow_policy_replay.py -q
cd microcosm-substrate && PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
The result record path proves synthetic information-flow replay and body omission, not general prompt-injection robustness or live account safety.
Scope boundary
Limitations
The replay is intentionally small and synthetic. Fixture mode covers five source documents, three untrusted and two trusted source labels, five source-to-sink flows, five pre-action verdicts, five sanitized outputs, five cold replay passes, five public trace spans, one generated public tool-call trace promoted through the taint graph, five copied source bodies, and eight negative cases. Exported-bundle mode validates the public bundle, source-module manifest, trace spans, and metadata-only result record shape, but it does not carry the fixture-only negative-case payloads.
Those counts are proof boundaries, not scale claims. They show that this local validator recomputes source trust, taint propagation, pre-action verdicts, sanitized output constraints, cold replay, source-module digest anchors, and metadata-only result record shape over declared public inputs. They do not estimate attack coverage, compare defenses, score a benchmark, certify hidden-message handling in production, or demonstrate live email, browser, account, tool, or provider behavior.
The source-open body floor is also narrow. The manifest proves byte parity and declared anchors for the five copied source bodies in the exported bundle. It excludes private source-root export, raw prompt or system body export, account secret-bearing material, source-file changes, public sharing, hosting, launch-scope decision, complete security, or product readiness.
Scope limit
This module supports only the public claim that the replay exposes and checks a prompt-injection information-flow policy over source trust labels, taint labels, source-to-sink flow rows, pre-action policy verdict refs, sanitized-output refs, cold replay refs, public trace spans, live public tool-call trace taint promotion, copied source-module digests, negative-case result records, secret-exclusion checks, and metadata-only scope limits.
The copied source-module digest row proves byte parity for the named source body only; it does not widen the replay into live source authority.
It does not claim general prompt-injection robustness, live account safety, live tool or provider behavior, raw prompt/system/tool body export, account secret-bearing account data, hidden-message production handling, benchmark security or performance, source-file changes, publishing-scope decision, hosting authority, launch-scope decision, complete security, or product-progress authority.
Scope limit
Passing result records prove only that this public trace refactor satisfies the named prompt-injection information-flow contract over metadata-only rows. They do not prove general prompt-injection robustness, benchmark performance, live account safety, provider behavior, tool behavior, hidden-message handling in a real system, source-file changes, publishing-scope decision, or launch operations.
Source and projection details
Governing Lattice Relation
The generated JSON instance gives this page a specific admission lattice, not a loose security story. The only unresolved selective relation is the dependency edge; it remains a residual because the bundle does not name a sibling paper-module dependency.
The governing law is provenance propagation and non-interference. P-9 requires every source, fixture, result record, public-copy, provider-shape, or private-boundary crossing to carry provenance class and scope limit. P-14 requires byte or row basis and provenance to travel together. AX-8 requires data labels to propagate along flows, with untrusted labels entering privileged sinks only through declared transforms that satisfy the sink policy.
The runtime implements that lattice in _build_result: it loads the projection protocol, source documents, information-flow graph, policy verdicts, sanitized outputs, cold replay rows, public trace spans, source-module manifest, and secret-exclusion policy before status is admitted. validate_source_documents rejects untrusted instruction authority, validate_information_flow_graph derives taint labels and policy verdicts instead of trusting hand-written rows, _live_tool_call_trace_promotion treats generated public tool-call trace spans as untrusted tool output, and _write_receipts/result_card keep public result records metadata-only. The focused proof consumer is tests/test_indirect_prompt_injection_information_flow_policy_replay.py, which checks fixture and exported-bundle modes, taint/verdict derivation, negative cases, source-module digest boundaries, exact copied source-body imports, card redaction, fresh result record reuse, public trace spans, and fixture-manifest binding to the body-open refactor.
Source-Open Body Floor
The exported bundle carries exact copied source bodies under source_modules/ai_workflow/, governed by source_module_manifest.json. The imported bodies are:
The manifest records source refs, target refs, hashes, material classes, and required anchors. Result records and cards expose refs, counts, and validation status only; they do not embed ledger, reconstruction, prompt, account, account secret, browser UI, model-output data, or live-access bodies.
MCP Tool Authority ReplaySynthetic MCP-like replay fixture for a tool-authority claim contract: replay result records, negative cases, scope limits; no live MCP/provider/account secret
Public Microcosm projection of a tool-authority claim contract. Replays a synthetic MCP-like tool-authority scenario and validates replay result records, negative cases, and scope limits with metadata-only result records.
Scope limit Synthetic MCP-like replay fixture only; not a live MCP account test, external model access, account secret-handling certification, benchmark security result, or launch claim.
This module is the public Microcosm projection of a tool-authority claim contract. It is a synthetic MCP-like replay fixture, not a live MCP account test, external model access, account secret-handling certification, benchmark security result, or launch claim.
The fixture models three public tools: a readonly docs lookup, a write-capable ticket update, and an untrusted result source. The claim is admitted only when tool manifest scope refs, call argument hashes, approval token refs, side-effect ledger refs, rollback result records, untrusted-output instruction/data splits, cold replay result records, negative cases, and scope limits line up.
Purpose
When an agent uses tools through a protocol like MCP, the sentence "the agent used the tool safely" is cheap to write and hard to back. This component answers one question: given a recorded tool-use trace, does the evidence actually support the authority the trace claims, or is the safety language unearned? It exists so that tool-authority claims have to be replayed against metadata before prose is allowed to call them safe.
The approach is to treat a tool call as a small transaction that must show its working. Each call cites a narrow capability scope, an argument hash, and, if it writes, an approval token, a side-effect ledger entry, and a rollback result record. Those references are not taken on trust: the side-effect ledger and the cold replay rows are cross-checked against the accepted call rows by call id, so a rollback result record that no call refers to, or a write that skips approval, is caught rather than waved through. The point is that a reference string is not authority until something downstream resolves it.
Two failure modes are worth naming because they are specific to tool-using agents. The first is the confused deputy: a call that asks for a scope wider than its task needs (*, account_full_access) is rejected before it runs, so a tool cannot quietly borrow more authority than it was granted. The second is tool-output-as-instruction, the prompt-injection shape where text returned by an untrusted tool is obeyed as a command. Here untrusted output must stay data and cite an instruction/data split; a row that lets output become instruction is one of the eight negative cases the fixture is built to catch.
This is deliberately a synthetic replay, not a live test. The component never opens an MCP account, calls a provider, or handles a account secret. It reads only public metadata and digests, and it keeps every payload, result body, and account secret out of the result records it writes. What it offers is narrow and honest: a way to check that a tool-authority story is internally consistent and metadata-only, not a certificate that any real tool integration is secure.
The module's shape is a public tool-authority replay, not a live MCP security claim.
Technical Mechanism
The replay is a fail-closed authority lattice over a synthetic MCP-like tool story. _build_result loads the fixture or exported bundle, runs load_forbidden_classes and scan_paths over input JSON and copied source modules, then validates each contract plane separately: projection protocol, tool policy, tool manifest, tool calls, tool results, side-effect ledger, cold replay rows, public trace spans, and source-module manifest rows. The final status is pass only when every sub-validator passes, no expected negative case is missing, the secret scan has zero blocking hits, and the source-module floor is either present when required or explicitly not required for the first-wave fixture.
Tool authority is checked before prose can turn it into evidence. Manifest rows define the declared tool ids and allowed tool classes. validate_tool_calls then rejects undeclared tool ids, overbroad scopes, missing argument hashes, hidden account secret export, live account access, unapproved side effects, tool-output-as-instruction, final-answer-only grading, and unredacted payload export. Write-capable calls must carry approval token refs, side-effect ledger refs, and rollback result record refs; untrusted-result calls must keep instruction and data boundaries explicit. validate_side_effect_ledger and validate_cold_replay make those refs observable instead of leaving them as decorative strings.
The exported-bundle path adds a body-floor check without moving bodies into result records. _source_module_manifest_result streams digest verification over each copied source source module, checks required anchors, requires body_in_receipt: false, and reports a metadata-only source-open import summary. build_public_mcp_tool_authority_trace contributes three public trace spans for the tool calls; _body_import_verification binds that public refactor back to the source source and Microcosm target digests. _write_receipts emits the result, board, validation result record, and sign-off result record for fixture runs, and result_card deliberately exposes counts, status bits, digest freshness, and omission result records rather than tool rows or source bodies.
The mechanism is intentionally narrower than a tool-use security benchmark. It accepts only public metadata and digest evidence, and it treats every generated projection as a result record over source rows rather than as source authority.
Public Mechanics
Every tool call must bind to a narrow capability scope ref before admission.
Write-capable calls require approval token refs, side-effect ledger refs, and rollback result record refs.
Untrusted tool output is data, not instruction, and must cite an instruction/data split ref.
Call arguments, tool outputs, account refs, and result bodies stay redacted or metadata-only.
Overbroad scopes, hidden account secret export, tool-output-as-instruction, unapproved side effects, live account access, final-answer-only grading, missing rollback result records, and unredacted tool payloads are expected falsification fixtures.
Reader Evidence Routing
Bundle route: core/paper_module_capsules.json::paper_modules[39:paper_module.mcp_tool_authority_replay] is the JSON authority row; a diagram view and an atlas card are generated for this module from the source record.
Mechanism route: core/mechanism_sources.json::mechanism.mcp_tool_authority_replay.validates_public_mcp_tool_authority_replay binds the code locus, fixture refs, exported bundle refs, result record refs, validator commands, focused regression, and guardrails.
Source-module route: source_module_manifest.json records seven copied public source body rows, while the exported source-open body summary exposes at least six body materials. The floor includes high-novelty and extracted-pattern evidence, agent execution trace runtime and standard bodies, route-readiness standard material, mission-transaction preflight internal control material, and the strict JSON helper. Result records carry refs, digests, counts, and status only.
Focused-test route: tests/test_mcp_tool_authority_replay.py verifies negative cases, public-relative redacted result records, exported-bundle runtime shape, source-module digest failures, exact copied source bodies, card result record reuse, and public trace span construction.
Named Proof Consumers
First-wave fixture consumer: PYTHONPATH=src ../repo-python -m microcosm_core.components.mcp_tool_authority_replay run --input fixtures/first_wave/mcp_tool_authority_replay/input --out /tmp/microcosm-mcp-tool-authority-replay/fixture --sign-off-out /tmp/microcosm-mcp-tool-authority-replay/sign-off.json --card consumes the fixture route, expected negative cases, secret scan, public trace construction, scope limit, metadata-only result record writer, and command-card omission contract.
Exported-bundle consumer: PYTHONPATH=src ../repo-python -m microcosm_core.organs.mcp_tool_authority_replay run-tool-authority-bundle --input examples/mcp_tool_authority_replay/exported_mcp_tool_authority_bundle --out /tmp/microcosm-mcp-tool-authority-replay/bundle --card consumes the public bundle, source-module manifest, copied body-floor digest checks, public trace spans, metadata-only exported-bundle result record, and fresh card reuse path.
Focused regression consumer: PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_mcp_tool_authority_replay.py -q pins the negative-case matrix, undeclared-tool rejection, redacted/public result record paths, source-module digest mismatch failures, exact copied source bodies, public trace span coverage, and card omission behavior.
This check excludes hand-editing generated projections; it is a read-only result record for this Markdown slice.
Prior Art Grounding
This component is grounded in capability security, least privilege, and current MCP authorization guidance. The classic security lineage is Saltzer and Schroeder's Protection of Information in Computer Systems and Hardy's Confused Deputy: authority should be narrow, mediated, and bound to the object/action being requested. The MCP-specific lineage is the official MCP authorization and security best practices guidance, especially least-privilege scopes and token audience boundaries.
Microcosm does not claim live MCP security or account certification. It borrows the prior-art authority shape and makes it replayable: every public tool story must expose capability scope refs, approval refs, side-effect refs, rollback refs, instruction/data split refs, and scope boundaries before write authority is treated as evidence.
Validation Result record Path
Run the first-wave fixture into disposable result records from the Microcosm root:
Run the exported bundle through the same component:
This module may claim only that a synthetic public MCP-like replay preserved tool-authority boundaries over metadata rows: capability scopes, argument hashes, approval refs, side-effect refs, rollback refs, instruction/data split refs, cold replay refs, source-module digests, negative cases, and metadata-only validation result records.
It must not claim live MCP account safety, account secret-handling certification, live tool behavior, provider behavior, benchmark security, source-file changes, publishing-scope decision, complete security, or launch-scope decision.
Scope boundary
This module does not access live MCP accounts, export account secrets or model-output data, obey tool output as instruction, run live tools, change source files, claim benchmark safety, publish results, or include launch operations.
Source and projection details
Governing Lattice Relation
The source record declares concept.agent_reliability_and_safety_validator_bundle, principles P-4 and P-16, and axiom AX-3 as the governing lattice. The generated component row repeats the paper and mechanism links and adds an component-level P-18 relation; that component relation is useful context but does not expand the paper-module bundle's declared authority.
P-4 and AX-3 are the local authority rule: a tool handle, account secret-shaped string, role name, or trusted-session label is bounded evidence of authority. The runtime must derive authority from dereferenced manifest policy, capability scope refs, approval refs, side-effect refs, rollback refs, cold replay refs, and public trace spans. P-16 supplies the transaction boundary: a write-capable tool call is admissible only when the call is scoped, the side-effect is ledgered, rollback evidence is present, and the result record says which scope limit still holds.
The mechanism row deliberately leaves sibling/upstream mechanism relations as residual pressure. That residual is part of the scope limit: this module can show a public replay lattice for synthetic MCP-like tool authority, but it does not infer neighbouring mechanisms from prose, certify live MCP security, or promote generated Mermaid, Atlas, site, or corpus projections into source authority.
Source-Open Body Floor
The exported bundle carries copied source bodies under source_modules/, governed by source_module_manifest.json. The manifest records source refs, target refs, hashes, material classes, required anchors, and result record body exclusions for:
The floor is source-open body evidence, not live-account or provider authority. Result records and command cards expose refs, digest status, counts, and verdicts only; they do not embed copied source bodies or private/live payload material.
Tactic Portfolio AvailabilityEnvironment-scoped tactic availability rows gate downstream tactic routing without becoming proof, benchmark, or launch-scope decision.
Tactic Portfolio Availability Probe validates copied Lean/Std tactic affordance rows before downstream routing can treat a tactic as usable. It checks compile status, Mathlib absence handling, probe portfolio membership, negative cases, source digests, and metadata-only result records while keeping proof bodies, model-output data, benchmark claims, and launch-scope decision out of scope.
Scope limit Copied tactic affordance probe rows and public fixture/exported-bundle result records only; no Lean/Lake rerun, theorem proof, benchmark performance, external model access, Mathlib-dependent proof authority, launch-scope decision, publishing-scope decision, or whole-system correctness.
tactic_portfolio_availability_probe is the public component that turns tactic callability into an explicit artifact before routing or proof search treats a tactic as usable.
The fixture is copied from real source system: the 2026-05-11 PROVER_PROOF_STATE_SEARCH_CURRICULUM smoke run's Lean/Std tactic affordance probe. It records compile-status rows for rfl, decide, omega, simp, simp_all, grind, native_decide, and aesop, with source digests for the run-level affordance probe, the portfolio_core_v0 tactic availability artifact, and the paired corpus-readiness boundary. The Mathlib-dependent aesop row is marked environment_fail because the paired environment probe reports mathlib_lake_project_import_available=false.
The component validates:
every tactic has an environment-scoped compile_status;
Mathlib-dependent tactics are not marked available without a passing Mathlib import probe;
downstream consumers reference only tactics present in the probe portfolio;
proof bodies, raw model-output data, benchmark claims, launch-scope decision, and non-public paths stay out of the public artifact.
The generated board is a callability map, bounded evidence evidence. It can make target-shape routing cheaper and more honest, but it cannot prove a goal, widen Lean/Lake authority, use external model services, claim benchmark performance, or include launch operations.
The result record contract reports body_material_status=copied_non_secret_macro_body_with_provenance, tactic_availability_status=real_lean_std_tactic_affordance_probe_rows, source digests, target refs, and secret_exclusion_scan. It does not use body-redaction or non-public-state-scan grammar as product evidence.
Purpose
A tactic name is not a usable tactic. aesop is callable only if the surrounding Lean and Std environment actually carries the imports it needs; omega is callable in one project layout and not in another. Routing or proof search that trusts a bare tactic name will reach for tactics that the current environment cannot run, and then misread the resulting failure as a property of the goal rather than a property of the environment. This component answers one question: in the observed Lean/Std environment, which tactics were actually callable, and on what evidence?
The interesting part is how it treats failure. A copied probe row that reports a Lean FAIL is not flattened into a single "unavailable" verdict. When a tactic declares requires_mathlib and the paired environment probe reports that the Mathlib import is absent, the failure is classified as environment_fail with the reason MATHLIB_IMPORT_MISSING. The same Lean FAIL for a tactic that does not depend on Mathlib is classified as compile_fail. The distinction keeps a missing import from masquerading as a broken tactic, and it preserves Mathlib absence as a recorded fact about the environment rather than discarding it. A downstream router can then re-attempt the same tactic in a different environment instead of striking it off permanently.
The second deliberate choice is that none of this is a measurement of quality. The component copies probe durations and bands them as fast, moderate, or slow so a router can prefer a cheaper available tactic, but the latency profile is stamped as environment-scoped, not benchmark authority. Callability and speed in one observed environment are useful for cheaper routing; they are explicitly not evidence that a tactic is correct, that a goal was proved, or that Lean was rerun by this component.
Shape
Source refs
Tactic portfolio availability probe
tactic_portfolio_availability_probe
Environment probe
mathlib_lake_project_import_available
Diagram source
flowchart TD A["Copied Lean/Std affordance probe rows (compile_status, requires_mathlib, duration_ms)"] --> B["tactic_portfolio_availability_probe"] C["Environment probe mathlib_lake_project_import_available"] --> B B --> D{"Copied compile_status"} D -->|PASS| E["available band duration fast / moderate / slow"] D -->|FAIL + requires_mathlib + Mathlib absent| F["environment_fail reason MATHLIB_IMPORT_MISSING"] D -->|FAIL otherwise| G["compile_fail"] E --> H["Availability board for target-shape routing"] F --> H G --> H I["Downstream tactic reference"] --> J{"Tactic in probed portfolio?"} J -->|no| K["Rejected: unprobed tactic referenced"] J -->|yes| H B --> L["metadata-only fixture and bundle result records no proof, Lean, or provider bodies"] L --> M["Generated paper-module row and validation result records"]
The flow is deliberately smaller than the generated doctrine-lattice graph.
Reader Evidence Routing
Read this page in four passes:
Start with the bundle source row at core/paper_module_capsules.json::paper_modules[40:paper_module.tactic_portfolio_availability]. It names the public component subject, mechanism subject, resolved code locus, Microcosm concept, governing principles, axioms, and sibling paper-module dependencies that generate the relationship edges.
Inspect the runtime system at src/microcosm_core/organs/tactic_portfolio_availability_probe.py. The load-bearing symbols are run, run_availability_bundle, _build_result, _write_receipts, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING; those are the code-loci symbols that make the paper module about an executable component instead of a prose topic.
Reproduce the evidence floor with the fixture input fixtures/first_wave/tactic_portfolio_availability_probe/input, the exported bundle examples/tactic_portfolio_availability_probe/exported_tactic_portfolio_availability_bundle, the focused test tests/test_tactic_portfolio_availability_probe.py, and the paper-module corpus check. Treat the result records as environment-scoped tactic-callability evidence only; validation result records do not widen the proof boundary, scope limit, launch posture, provider posture, or benchmark posture.
Prior Art Grounding
The module is patterned after feature-detection probes and proof-assistant tactic inventories. GNU Autoconf's configure workflow established the habit of testing local capability before relying on it; Lean's tactic documentation shows that tactic use is environment- and goal-sensitive, so a tactic name is not enough to justify downstream routing. This component applies that older probe discipline to Microcosm: it records which tactics were callable in the observed Lean/Std environment and preserves Mathlib-dependent absence as evidence, without treating callability as proof quality.
Prior-art anchors:
GNU Autoconf feature/configuration probing: https://ftp.gnu.org/old-gnu/Manuals/autoconf-2.57/html_chapter/autoconf.html
From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:
The expected projection row is paper_module.tactic_portfolio_availability with 18 generated relationship edges, no unpopulated selective relations, Mermaid status available_from_capsule_edges, and Atlas status linked_from_capsule_edges. These checks validate environment-scoped tactic availability rows and bundle result records only; they do not turn callability into proof quality, benchmark performance, Mathlib proof authority, or launch-scope decision.
Scope boundary
Scope limit
The JSON bundle and generated row prove only environment-scoped tactic callability evidence: copied Lean/Std tactic affordance rows, compile-status rows, Mathlib absence evidence, downstream tactic-reference checks, source digests, secret-exclusion checks, negative cases, and validation result records. They do not prove formal-result correctness, expand Lean or Lake authority, use external model services, claim benchmark performance, export non-public paths, include launch operations or public sharing, or treat tactic callability as proof quality.
Scope limit
This component is environment-scoped tactic callability evidence only. It does not establish formal-result correctness, expand Lean/Lake authority, use external model services, claim benchmark performance, export non-public paths, include launch operations, or treat tactic callability as proof quality.
Target Shape Tactic RoutingPre-execution tactic routing admits or rejects tactics from target shape and probe evidence without proving the target.
Target Shape Tactic Routing is the pre-execution gate between tactic availability and proof attempts. It validates Ring2 problem-domain, failure-class, graph-update, and tactic-probe references, rejects unavailable or unprobed tactics, and records metadata-only route decisions before any Lean/Lake proof authority is claimed.
Scope limit Public route-reference fixture and exported-bundle result records only; no Lean/Lake execution, formal-result correctness, post-execution routing, external model access, proof body export, launch-scope decision, publishing-scope decision, or whole-system correctness.
target_shape_tactic_routing_gate is the public Microcosm component for the pre-execution tactic admissibility layer.
It turns real problem-domain, failure-class, and graph-update candidate refs from the formal-math evaluation pipeline into route decisions: which tactics are admitted, which are rejected as unavailable, which are rejected as unprobed, and which are rejected because they do not match the declared goal shape.
Purpose
A proof attempt is expensive, and most of that cost is spent on tactics that were never going to work: tactics the environment cannot run, tactics absent from the probe portfolio, or tactics that do not match the shape of the goal. This component answers one question before any Lean call is made: given the target shape and the current availability probe, which tactics may a route even attempt?
The decision is deliberately made early. Routing happens before execution, so a case that carries Lean result records, execution results, or a post-execution stage is rejected outright rather than trusted. The point is to decide admissibility from evidence that already exists, not from the outcome of the attempt the gate is meant to filter.
What is unusual is that the gate recomputes the choice rather than accepting the declared one. Each target shape carries a small preferred-tactic order (for example omega for integer linear arithmetic, decide for closed natural-number decisions). The gate walks that order, skips any preferred tactic that is unprobed or unavailable, records why it skipped, and falls back to the next allowed candidate or to a default safe order for shapes it does not recognise. A route whose declared selection disagrees with this computed preference is flagged rather than honoured. The route is a claim about what should run; the gate treats it as something to check, not something to believe.
Shape
Source refs
JSON bundle
paper_module.target_shape_tactic_routing
Generated structured source record
paper_modules/target_shape_tactic_routing.json
Runtime component
target_shape_tactic_routing_gate.py
Diagram source
flowchart TD Bundle["JSON bundle paper_module.target_shape_tactic_routing"] structured source record["Generated structured source record paper_modules/target_shape_tactic_routing.json"] Component["Runtime component target_shape_tactic_routing_gate.py"] Portfolio["Tactic probe portfolio available/unavailable tactic ids"] Routes["Target-shape route cases pre_execution selected tactics"] SourceFloor["Copied Ring2 source artifacts 4 body imports, body_in_receipt=false"] Decisions{"Route admissible before proof execution?"} Result records["Result records result, board, validation, sign-off"] Tests["Focused tests negative cases and digest checks"] Ceiling["Scope limit no Lean/Lake, proof, provider, post-execution, launch"] Bundle --> structured source record Bundle --> Component Component --> Portfolio Component --> Routes Portfolio --> Decisions Routes --> Decisions SourceFloor --> Decisions Decisions --> Result records Tests --> Result records Result records --> Ceiling
Technical Mechanism
The named mechanism mechanism.target_shape_tactic_routing_gate.validates_public_tactic_routing_boundary is a fail-closed scorer over two public input planes: the tactic probe portfolio and the target-shape route cases. _build_result loads the fixture or exported bundle payloads, scans the inputs and copied source artifacts for forbidden body material, derives known/available/unavailable tactic sets, scores every route case, checks copied Ring2 source-artifact digests, and emits metadata-only result, board, validation, and sign-off result records.
For each route case, _decision_for_tactic rejects a candidate before selection if the tactic id is absent from the public probe portfolio, marked unavailable, or outside the case's declared allowed_tactic_ids. Only a tactic that is probed, available, and target-shape-admissible can receive TARGET_SHAPE_ADMISSIBLE. _shape_preferred_selection then applies the local target-shape preference map, records the unknown-shape default fallback when no specific map exists, and records the preferred-unavailable fallback when the first preferred tactic is known but not usable. _route_integrity_findings turns any unavailable admission, unprobed admission, post-execution route, or declared-selection mismatch into typed findings.
The proof consumer is tests/test_target_shape_tactic_routing_gate.py: it asserts seven pre_execution route cases, shape-preferred selection for the real Ring2 cases, unknown-shape and unavailable-Mathlib fallback behavior, rejection of mutated shape and availability inputs, exported-bundle sign-off, four copied source artifacts with digest verification, compact card omission of the full routing board, and result record text without non-public paths or body fields. Those tests consume the same fixture and exported-bundle surfaces named by the mechanism row, so this page's evidence is the runnable route-reference and result record contract rather than a prose-only claim.
The governing lattice stays explicit: the bundle binds the module to concept.formal_math_and_proof_witness_bundle, principles P-1, P-2, P-3, P-6, P-8, and P-9, axioms AX-1, AX-2, AX-5, AX-7, and AX-8, and dependency modules for tactic portfolio availability, formal-math readiness, proof-diagnostic evidence, verifier-trace repair, and formal evidence-cell anchor resolution. The standard narrows that lattice to one allowed claim: public pre-execution route cases may admit only tactics that were both probed and available before proof execution. The same standard forbids widening this mechanism into formal-result correctness, Lean/Lake execution, external model access, proof or provider body export, post-execution route authority, publishing-scope decision, or launch-scope decision.
Evidence/accounting:
Bundle authority: core/paper_module_capsules.json::paper_modules[41:paper_module.target_shape_tactic_routing] names source_authority: json_capsule, subjects component:target_shape_tactic_routing_gate and mechanism.target_shape_tactic_routing_gate.validates_public_tactic_routing_boundary, the resolved code locus src/microcosm_core/organs/target_shape_tactic_routing_gate.py, and generated projection statuses mermaid.status: available_from_capsule_edges plus atlas_card.status: linked_from_capsule_edges.
Generated structured source record: paper_modules/target_shape_tactic_routing.json carries relationships.edges for the bundle subjects, concept/principle/axiom refs, dependency paper modules, and code locus; relationships.unpopulated_selective_relations: []; and scope boundaries that the JSON row does not establish runtime correctness, launch-scope decision, or whole-system completeness.
Runtime contract: standards/std_microcosm_target_shape_tactic_routing_gate.json limits the allowed claim to pre-execution tactic admission from probed, available tactics; its required_fields bind tactic_portfolio_availability.tactics[].tactic_id, availability_status, target_shape_routes.route_cases[].target_shape, allowed_tactic_ids, selected_tactic_id, and route_stage.
Source-body accounting: examples/target_shape_tactic_routing_gate/exported_target_shape_tactic_routing_bundle/source_module_manifest.json records source_import_class: copied_non_secret_macro_body, module_count: 4, body_in_receipt: false, three verified_public_safe_private_path_rewrite rows, and one exact_copy row.
Fixture/bundle behavior: examples/target_shape_tactic_routing_gate/exported_target_shape_tactic_routing_bundle/target_shape_routes.json has seven pre_execution route cases, while tactic_portfolio_availability.json marks decide, omega, simp_all, and rfl available and aesop unavailable.
Result record floor: receipts/first_wave/target_shape_tactic_routing_gate/target_shape_tactic_routing_result.json, target_shape_tactic_routing_board.json, target_shape_tactic_routing_validation_receipt.json, and result records/sign-off/first_wave/target_shape_tactic_routing_gate_fixture_acceptance.json report status: pass, route_case_count: 7, copied_source_artifact_count: 4, source_artifacts_pass: true, missing_negative_cases: [], secret_exclusion_scan.blocking_hit_count: 0, and authority flags with Lean/Lake, proof, provider, post-execution routing, and launch-scope decision set false.
Test boundary: tests/test_target_shape_tactic_routing_gate.py checks observed negative cases, shape-preferred selection, unknown-shape and Mathlib-unavailable fallback, exported-bundle sign-off, source-module digest verification, compact card omission of full boards, and result record output without non-public paths or body fields.
Reader Evidence Routing
Read this module as a pre-execution admissibility gate, not as a proof attempt. The primary reader path is:
Start with the problem-domain, failure-class, graph-update candidate, and tactic-probe refs in the fixture input. They are the public route evidence the gate is allowed to inspect before any Lean/Lake work in the formal-math evaluation and premise-retrieval pipeline.
Compare each target-shape route case against the selected tactic ids and rejection reasons: admitted tactics must match both the declared goal shape and the public availability probe.
Inspect negative cases before the happy path. The important behavior is that unavailable tactics, unprobed tactics, proof/provider body leakage, post-execution routing, and launch overclaims all fail closed.
Use the structured source record only for structural lattice proof: it confirms subjects, code loci, doctrine refs, and dependency edges; it does not establish the tactic route can solve the target.
unavailable_tactic_admitted rejects an aesop route while Mathlib is absent.
unprobed_tactic_allowed rejects a tactic absent from the public probe portfolio.
proof_body_leakage rejects proof/provider/Lean body fields.
post_execution_route rejects route selection after execution evidence.
release_overclaim rejects proof, provider, Lean/Lake, public sharing, and launch-scope decision overclaims.
Prior Art Grounding
The routing layer follows established proof-search and policy-gating patterns: match a goal shape to methods that are known to be available before spending runtime on them. Lean's tactic documentation supplies the local proof-assistant context for goal-directed tactic choice, while Isabelle/Sledgehammer represents a mature prior-art pattern for selecting external provers and relevant facts from a goal. Microcosm narrows that idea to a pre-execution admissibility filter: target shape, allowed references, and current tactic availability must line up before a tactic route can be exported.
Isabelle Sledgehammer user guide: https://isabelle.in.tum.de/doc/sledgehammer.pdf
Why It Matters
After corpus readiness and strategy scoring, Microcosm needs a visible gate that prevents wasted or misleading proof attempts. This component shows that gate over the formal-math evaluation and premise-retrieval pipeline already feeding verifier repair, evidence anchoring, and proof diagnostics: a tactic is not tried just because it exists; it is admitted only when the target shape and the public availability probe both allow it.
Validation Result record Path
From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:
These checks validate route-reference fixture and bundle result records only; they do not widen the no-Lean/no-proof scope limit.
Scope boundary
Scope limit
This component does not run Lean or Lake and does not establish a target. It validates only the route references that must exist before a proof attempt in the formal-math evaluation and premise-retrieval pipeline: tactic probe availability, target-shape route cases, selected tactic ids, failure-class refs, graph-update candidate refs, and negative-case result records.
Forbidden outputs include proof bodies, provider bodies, post-execution route selection, Lean result record claims, external model access, launch claims, and Mathlib-dependent proof authority.
Scope limit
This module covers only public pre-execution tactic routing evidence: the route references used before a formal proof attempt, tactic probe availability, target-shape cases, selected tactic ids, failure-class refs, graph-update candidate refs, negative-case result records, source-module digest evidence, and validation result records. It does not run Lean or Lake, prove formal-result correctness, export proof bodies or provider bodies, authorize post-execution route selection, use external model services, claim Mathlib-dependent proof authority, authorize public sharing, include launch operations, or prove whole-system correctness.
Ring-2 Premise Precision RecallAfter-the-fact premise retrieval metrics separate retrieval misses from proof failures without becoming theorem authority.
Ring-2 Premise Precision Recall validates copied Ring2 retrieval rankings against after-the-fact metric labels. It computes precision/recall classes, rejects oracle-label leakage and tuning shortcuts, and keeps proof bodies, model-output data, benchmark claims, and theorem-correctness claims outside the public result record boundary.
Scope limit Copied Ring2 retrieval records and public fixture/exported-bundle result records only; no Lean/Lake execution, theorem proof, benchmark performance, external model access, provider-context label flow, launch-scope decision, publishing-scope decision, or whole-system correctness.
ring2_premise_retrieval_precision_recall_harness is the public Microcosm component for evaluating copied Ring-2 premise retrieval rankings against after-the-fact labels.
The component computes precision and recall per problem, then classifies the result as retrieval_hit, partial_retrieval_miss, retrieval_miss, or proof_failure_despite_hit. That distinction matters because a failed proof with all needed premises retrieved is a different failure than a missing premise retrieval path.
Purpose
When a proof search fails, it is easy to blame the prover and miss the simpler cause: the right supporting facts were never put in front of it. This component exists to keep those two cases apart. It answers one question: did the retrieval step actually surface the premises a problem needed, or did the failure happen somewhere downstream after the premises were already in hand?
It answers that by recomputing precision and recall from copied records rather than trusting a reported figure. For each problem it intersects the retrieved premise ids with the labelled needed-premise ids, then reads the proof outcome alongside that overlap. Full recall with a passing proof is a retrieval_hit; full recall with a non-passing proof is proof_failure_despite_hit, the case where retrieval did its job and the fault lies elsewhere. Partial overlap and zero overlap are graded as partial_retrieval_miss and retrieval_miss.
The unusual part is the direction the labels are allowed to flow. The needed premise ids are after-the-fact measurement labels, and the component treats them as strictly one-way: they may be used to score a finished run, but they may not be fed back into the retrieval ranking, used to tune on a test split, or carried into a provider-context recipe. Planting an oracle label inside a ranking, or tuning on test answers, is a typed refusal, not a higher score. The point is a metric that cannot quietly become the very advantage it is meant to measure, and that never inflates a retrieval result into a claim about formal-result correctness.
flowchart TD Bundle["source record core/paper_module_capsules.json[42]"] --> JSON["structured source record paper_modules/ring2_premise_precision_recall.json"] JSON --> Markdown["this page reader projection"] JSON --> Mermaid["diagram view available_from_capsule_edges"] JSON --> Atlas["map view organ_atlas.ring2_premise_retrieval_precision_recall_harness"] Fixture["fixture input fixtures/first_wave/.../input"] --> Runtime["runtime component ring2_premise_retrieval_precision_recall_harness.py"] Bundle["exported bundle examples/.../exported_ring2_precision_recall_bundle"] --> Runtime Runtime --> Metrics["precision/recall labels retrieval vs proof-failure attribution"] Runtime --> Result records["validation result records first_wave + runtime_shell"] Runtime --> Negatives["negative cases leakage, tuning, overclaim, missing decoy"] Result records --> Boundary["proof boundary metrics and copied artifacts only"]
Technical Mechanism
The runtime splits the proof consumer into three evidence classes before it reports any metric. _load_payloads reads the declared fixture or exported bundle inputs; _validate_run_material checks that copied Ring-2 run material carries source refs, target refs, validation refs, digests, and the expected copied_non_secret_macro_body_with_provenance status; and _validate_source_artifacts verifies the four copied source artifacts against either the source digest or the private-path rewrite digest. The result record therefore proves the presence and provenance of the copied public artifacts before the precision/recall scores can be interpreted.
The scoring core is _evaluate. It indexes after-the-fact labels by problem_id, applies the policy default_top_k or per-ranking top_k, truncates retrieved premise ids to that cutoff, intersects retrieved ids with labelled needed-premise ids, and computes precision_at_k = hits/top_k and recall_at_k = hits/needed. Aggregate precision and recall use total hit, candidate, and needed-premise counts, then compare the computed aggregate metrics with the policy's expected values. This is why the paper module can distinguish a retrieval miss from a proof failure after full premise recall without asserting anything about the downstream proof.
The failure taxonomy is mechanical rather than rhetorical. Full recall plus a passing proof is retrieval_hit; full recall plus a non-passing proof is proof_failure_despite_hit; partial overlap is partial_retrieval_miss; and zero overlap is retrieval_miss. The policy floor also requires expected failure modes and an adversarial decoy whose needed premise is absent or missed. Those gates make the metric harness test the shape of the evaluation set, not just the happy path.
The negative cases enforce the scope limit. EXPECTED_NEGATIVE_CASES requires oracle labels planted in rankings, proof-body leakage, test-split tuning, metric-overclaim, and missing-decoy inputs to produce typed refusal codes. The result record-writing path then exposes import ids, target refs, digest status, aggregate counts, failure-mode counts, and secret-scan status while keeping proof bodies, model-output data, and non-public paths outside the public result record. That implements the bundle's P-1/P-2/P-6/P-8/P-9 and AX-1/AX-2/AX-5/AX-7 posture: metrics are recomputed from copied artifacts, blocked states stay blocked, and no metric label becomes Lean, provider, benchmark, or launch-scope decision.
Reader Evidence Routing
Bundle authority: core/paper_module_capsules.json::paper_modules[42:paper_module.ring2_premise_precision_recall] names the component subject, mechanism subject, concept ref, principle refs, axiom refs, dependencies, runtime code locus, and projection statuses. Edit the source record, not this page, if those relationships change.
Generated structured source record: paper_modules/ring2_premise_precision_recall.json is the structured source record to inspect for source_authority: json_capsule, the 18 generated relationship edges, zero unresolved selective relations, Mermaid available_from_capsule_edges, and Atlas linked_from_capsule_edges.
Runtime locus: src/microcosm_core/organs/ring2_premise_retrieval_precision_recall_harness.py owns run, run_precision_recall_bundle, _build_result, _write_receipts, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING. It computes aggregate precision/recall, enforces copied source-artifact digests, writes result records, and carries the provider/proof/launch refusal flags.
Fixture and exported bundle: fixtures/first_wave/ring2_premise_retrieval_precision_recall_harness/input/ includes the public input records plus five negative cases; examples/ring2_premise_retrieval_precision_recall_harness/exported_ring2_precision_recall_bundle/ is the runtime-shell bundle. Both routes expose source artifacts under source_artifacts/ while result records carry import ids, target refs, and digest status rather than private proof bodies.
Result record and test surfaces: receipts/first_wave/ring2_premise_retrieval_precision_recall_harness/ring2_precision_recall_result.json, receipts/first_wave/ring2_premise_retrieval_precision_recall_harness/ring2_precision_recall_validation_receipt.json, result records/sign-off/first_wave/ring2_premise_retrieval_precision_recall_harness_fixture_acceptance.json, receipts/runtime_shell/demo_project/organs/ring2_premise_retrieval_precision_recall_harness/exported_ring2_precision_recall_bundle_validation_result.json, and tests/test_ring2_premise_retrieval_precision_recall_harness.py are the reader-verifiable validation result records for the local public boundary.
The fixture and exported bundle both carry exact copied source artifacts under source_artifacts/ for the Ring2 aggregate report, graph-variant run summary, graph comparison, and problem-source manifest. The validator treats those four digest-matched files as source_open_body_imports with body_in_receipt=false: workingness can count the real source result record bodies, while result records expose only import ids, target refs, and digest status.
proof_body_leakage rejects proof, provider, or private body fields.
test_split_tuning_attempt rejects retrieval tuned on test labels.
metric_overclaim rejects proof, benchmark, provider, launch, or publishing-scope decision claims.
missing_adversarial_decoy rejects a metric harness without a decoy miss case.
Prior Art Grounding
This component is grounded in information-retrieval evaluation. NIST's TREC evaluation measures provide the older precision/recall frame for judging retrieval systems, and scikit-learn's precision/recall metric API shows the common machine-learning interface for reporting those labels.
The theorem-proving side is adjacent to premise-selection and hammer workflows, such as Isabelle Sledgehammer, where finding the right facts is a distinct step from replaying a proof. Microcosm keeps that distinction explicit: precision/recall can say whether needed support was ranked, but it cannot become Lean correctness, benchmark performance, or provider-output authority.
Why It Matters
Premise retrieval should be measurable without becoming theorem authority. This component gives Microcosm a compact public harness for asking whether a retrieval path missed the needed support, hit the support but failed later, or hid a dangerous truth-side shortcut inside the public runtime.
Validation Result record Path
From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:
The expected projection row is paper_module.ring2_premise_precision_recall with 18 generated relationship edges, zero unresolved selective relations, Mermaid status available_from_capsule_edges, and Atlas status linked_from_capsule_edges. These checks validate copied retrieval records, metric labels, and bundle result records only; they do not become Lean/Lake, benchmark, provider, or theorem authority.
Scope boundary
Scope limit
This component does not run Lean or Lake, use external model services, emit proof bodies, tune retrieval on test answers, claim benchmark performance, prove formal-result correctness, or include launch operations. Its labels are metric labels only; they are not allowed to flow into provider context recipes.
Scope limit
This module supports only the reader-verifiable claim that copied public premise-retrieval records can be scored for precision/recall labels, adversarial decoys, body-floor imports, and metric overclaim refusals. It does not establish Lean correctness, benchmark performance, provider output quality, theorem truth, launch-scope decision, publishing-scope decision, or whole-system correctness.
Limitations
The harness is a local evidence-accounting check over copied artifacts. It does not execute Lean, Lake, Sledgehammer, or any external prover; it does not inspect proof bodies; and it does not decide whether a theorem is true. A retrieval_hit label means the needed-premise ids appeared in the ranking under this fixture policy, not that the downstream proof search is sound or complete.
The reported precision and recall are bounded by the declared Ring-2 fixture and exported bundle. Different corpora, retrieval cutoffs, premise labels, decoy construction, or source-artifact digests require rerunning the component and cannot be inferred from this page. The negative cases prove specific forbidden flows are rejected here; they do not exhaust all possible leakage, tuning, non-public-state, provider-output, or benchmark-gaming failures.
Source and projection details
Governing Lattice Relation
Ring-2 precision/recall sits between premise retrieval and proof diagnosis. The bundle explains the runtime component and the mechanism.ring2_premise_retrieval_precision_recall_harness.validates_public_premise_retrieval_attribution mechanism, which is grounded in the same component source and in concept.formal_math_and_proof_witness_bundle. That relation is deliberately proof-adjacent rather than proof-authoritative: it can show whether copied retrieval rankings hit the labelled needed premises, but it cannot promote a hit into a Lean proof, a benchmark claim, or a provider-context label.
The governing principles make the scoring path stricter than a label echo. P-1 requires recomputing precision and recall from copied rankings and labels; P-2 keeps the scope limit at metric-checker strength; P-3 concentrates authority in the small harness and focused tests; P-6 keeps missing source artifacts, negative cases, or digests blocked; P-8 turns leakage, tuning, and overclaim cases into typed refusals; and P-9 preserves provenance as records cross from source run artifacts into public fixture and bundle result records. The axiom layer matches that mechanism: AX-1 and AX-2 require derived checker evidence, AX-5 and AX-7 force blocked or refused states instead of inflated metrics, AX-6 keeps the labelled premise domain explicit, and AX-8 prevents metric labels from flowing into forbidden sinks.
Mathematical Strategy AtlasPre-oracle strategy hypotheses make the first proof-search move inspectable without claiming proof or provider authority.
Mathematical Strategy Atlas scores public problem features into explicit pre-oracle strategy hypotheses before retrieval or proof execution. It validates strategy ids, copied public source tool bodies, retrieval-term effects, oracle-label exclusion, negative cases, and metadata-only result records while keeping proof and provider authority out of scope.
Scope limit Copied public strategy metadata, public source tool bodies, and public fixture/exported-bundle result records only; no Lean/Lake execution, formal-result correctness, oracle-label visibility, external model access, benchmark performance, launch-scope decision, publishing-scope decision, or whole-system correctness.
mathematical_strategy_atlas_hypothesis_scorer is the public pre-oracle strategy layer for Microcosm formal-math work. It turns problem feature tags into an explicit strategy hypothesis before premise retrieval or proof execution, then records the result as redacted result records.
The point is not to prove anything. The point is to make the first mathematical move inspectable: an iff_goal shape selects iff_split, a recursive list shape selects recursive_data_induction, arithmetic normalization selects the arithmetic lens, and unmapped shapes become a typed STRATEGY_SELECTION_MISS instead of a hidden failure mode.
The current body-floor import carries eight copied source bodies: the prover graph benchmark harness, the provider result record reducer, their strategy-boundary regression tests, the compute-provider strategy classification standard, and three public runtime artifacts from PROVER_PROVIDER_CONTEXT_SWEEP_20260510_v0 (strategy_cards.json, strategy_hypothesis_set.json, and prover_skill_atlas.json). They live in source_artifacts/ under both the first-wave fixture input and the exported bundle; result records carry refs, counts, hashes, anchors, and verdicts instead of body text.
Purpose
A proof search has to start somewhere. Before any premise is retrieved or any tactic is run, an agent has already committed to a first move: a goal shape, a lens, a family of tactics it expects to use. That choice is usually implicit, buried inside a model call or a prompt. This component exists to pull it into the open. The single question it answers is: for a given problem shape, which strategy did the system pick first, and on what visible evidence?
The interesting part is what the answer is allowed to depend on. The scorer never sees the oracle's expected strategy, the ground-truth proof, or any provider output. It works only from public problem features and a strategy atlas of trigger features, negative triggers, and retrieval-expansion terms. The selected strategy is therefore a hypothesis, recomputed from inputs a cold reader can also read, not a result borrowed from the answer key.
That constraint is what the page guards. The common failure mode for a "strategy classifier" is to bake the answer in: declare the chosen strategy as a plain label, or score it on shallow feature overlap that happens to line up with the known-good label. The component rejects both. A declared selection must match the score the scorer recomputes from evidence, and a strategy chosen on overlap alone is a typed negative case rather than a pass.
Shape
The local component standard, when changing runtime behavior or the claim envelope, is standards/std_microcosm_mathematical_strategy_atlas_hypothesis_scorer.json; the general paper-module contract remains standards/std_microcosm_paper_module.json.
The diagram below traces the scorer's runtime flow inside that projection: how public inputs become a per-candidate score, how a selection or a typed miss is chosen, and how the result is recomputed and written as metadata-only result records under the scope limit.
Source refs
trigger / negative / retrieval terms
strategy_atlas.json
feature tags, oracle hidden
problem_features.json
candidate strategy ids
hypothesis_cases.json
Diagram source
flowchart TD subgraph Inputs["Public inputs"] atlas["strategy_atlas.json trigger / negative / retrieval terms"] features["problem_features.json feature tags, oracle hidden"] cases["hypothesis_cases.json candidate strategy ids"] end subgraph Scoring["Per-candidate scoring"] score["score = trigger_hits x4 - negative_hits x3 + retrieval_bonus (cap 2)"] rank["rank positive scores tie-break by order, then id"] end select{"any positive score?"} selected["selected_strategy_id + score components"] miss["STRATEGY_SELECTION_MISS (unknown)"] recheck["recompute vs declared selection / score / ranking"] result records["metadata-only result records refs, counts, hits, verdicts"] ceiling["Scope limit no Lean/Lake, oracle labels, external model access, or launch"] atlas --> score features --> score cases --> score score --> rank rank --> select select -- yes --> selected select -- no --> miss selected --> recheck miss --> recheck recheck --> result records result records --> ceiling
The generated instance currently exposes 19 concrete relationships.edges: two subject edges for the component and mechanism, one governing concept edge, six principle edges, six axiom edges, three sibling paper-module dependency edges, and one resolved code-locus edge into src/microcosm_core/organs/mathematical_strategy_atlas_hypothesis_scorer.py. relationships.unpopulated_selective_relations is empty, so the module-level unresolved selective-relation count available from this instance is 0.
Runtime evidence enters through the fixture input fixtures/first_wave/mathematical_strategy_atlas_hypothesis_scorer/input, the exported bundle examples/mathematical_strategy_atlas_hypothesis_scorer/exported_mathematical_strategy_atlas_bundle, and their copied source_artifacts/ / source_module_manifest.json bundles. The focused test file is tests/test_mathematical_strategy_atlas_hypothesis_scorer.py; result records include receipts/first_wave/mathematical_strategy_atlas_hypothesis_scorer/mathematical_strategy_atlas_result.json, mathematical_strategy_atlas_board.json, mathematical_strategy_atlas_validation_receipt.json, result records/sign-off/first_wave/mathematical_strategy_atlas_hypothesis_scorer_fixture_acceptance.json, and runtime-shell exported-bundle validation result records.
The honest ceiling is narrow by design: this module can say that public pre-oracle strategy hypotheses, retrieval-lens metadata, copied public source tool/standard/runtime bodies, source-artifact digests, and negative cases are inspectable. It cannot say that Lean or Lake ran, that a theorem was proved, that oracle labels or model-output data are visible, that benchmark performance is certified, that public sharing is approved, that launch is approved, or that the private root has been made public-safe.
How it works
The scorer reads three public inputs: a strategy atlas, a set of problem features, and a set of hypothesis cases. For each candidate strategy in a case it computes a single integer score from three terms. Each problem feature that matches a strategy's trigger_features adds four points. Each feature that matches the strategy's negative_triggers subtracts three. Retrieval-query terms that appear in the strategy's expansion terms add one point each, capped at two. Plain feature overlap is recorded as a diagnostic count but is deliberately kept out of the score.
Selection is then a deterministic sort. Only strategies with a positive score are eligible. Among those, the scorer ranks by score (highest first), breaking ties by the strategy's declared order and then its id, and takes the top row. If no candidate scores positive, the case resolves to the typed STRATEGY_SELECTION_MISS rather than guessing. The output for each case carries the selected id, the score, the component breakdown, the ranked candidate scores, and the trigger, negative, and retrieval hits that produced them, so the choice can be re-derived by hand.
The weights matter because they encode the design intent. Trigger matches are worth more than retrieval matches, so a strategy is chosen mainly for the shape it claims to handle, not for how many search terms happen to coincide. Negative triggers can veto a strategy that looks superficially apt. The retrieval cap stops a strategy from winning on keyword volume alone. A fixture that tries to score on overlap without these terms is caught by the superficial_overlap_only_scoring negative case.
The same recomputation is what enforces honesty. When a case declares its own selected_strategy_id, score, classifier, retrieval_bonus, or candidate_scores, the component recomputes each from the evidence and reports a stale-declaration finding on any mismatch. Declaring the selected strategy as a bare label, with nothing for the scorer to check against, is itself rejected: a label with no derivable evidence is not strategy evidence. Alongside this, the copied source artifacts are checked for leakage policy, so the strategy cards, hypothesis set, and skill atlas stay pre-oracle, free of proof bodies, and free of oracle strategy ids.
Reader Evidence Routing
Read this module as a pre-oracle strategy-hypothesis audit, not as a proof result. The primary reader path is:
Start with strategy_atlas.json, problem_features.json, and hypothesis_cases.json to see how public feature tags select a strategy id before retrieval or proof execution.
Check source_module_manifest.json and the copied source_artifacts/ bodies to verify that the imported source bodies are public tool/runtime bodies with exact digests, required anchors, and body-floor result records.
Inspect the fixture and exported-bundle result records to confirm that strategy ids, retrieval-term effects, oracle-label exclusion, source-card consistency, and negative cases are checked without exposing proof bodies or model-output data.
Use the structured source record only for structural lattice proof: it confirms bundle-backed subjects, code loci, doctrine refs, and dependency edges; it does not establish the scorer's correctness or any theorem.
Public Inputs
strategy_atlas.json defines the known strategy enum, match features, and retrieval-term additions.
problem_features.json carries synthetic public problem features with oracle labels hidden.
source_module_manifest.json binds copied source body files to exact source refs, SHA-256 digests, byte counts, line counts, material classes, and required anchors.
The strategy atlas is grounded in the formal-methods practice of separating problem-shape classification from proof execution. Lean's tactic model, as introduced in Theorem Proving in Lean 4, gives the immediate precedent: proof work is often arrange around tactics chosen for a goal shape, while the kernel checks the final proof state. The mathlib overview also motivates explicit retrieval terms and domain tags because a large formal library is navigated by topic, structure, and reusable theorem families.
The atlas is also adjacent to hammer-style premise and method selection, such as Isabelle Sledgehammer, where a front-end tool searches for useful facts or proof methods before replay. This module keeps the pattern pre-oracle and metadata-only: it records why a first strategy hypothesis was selected, not whether the proof can be completed.
A green result record proves only pre-oracle strategy-hypothesis metadata, copied public source tool bodies, source artifact digests, and negative-case enforcement; it does not run Lean or Lake, prove formal-result correctness, reveal oracle labels, export proof bodies, use external model services, certify benchmark performance, authorize public sharing, or include launch operations.
Scope boundary
Scope limit
The atlas is metadata and strategy-hypothesis machinery only. It does not run Lean or Lake, claim formal-result correctness, reveal oracle strategy labels, expose proof bodies, use external model services, tune on test answers, include launch operations, or make Mathlib-dependent proof claims. The copied runtime artifacts are public strategy traces, not oracle labels, model-output data, or proof bodies.
Scope limit
This module supports only the reader-verifiable claim that public strategy-hypothesis metadata, copied source tool bodies, source artifact digests, and negative cases can be checked before oracle labels or proof execution. It does not run Lean or Lake, prove formal-result correctness, reveal oracle labels, expose proof bodies, use external model services, certify benchmark performance, authorize public sharing, include launch operations, or make Mathlib-dependent proof claims.
Verifier Lab Execution SpineBounded public verifier execution result records witness command execution without upgrading output into theorem authority.
Verifier Lab Execution Spine records bounded public Lean transition execution evidence: command intent, tool facts, return codes, result record refs, omitted dangerous payload fields, negative cases, and source-open body imports. It separates real execution evidence from formal-result correctness, provider text, oracle answers, proof-body exposure, source-file changes, and launch-scope decision.
Scope limit Bounded public fixture and exported-bundle execution result records only; no general formal-result correctness, benchmark solve-rate claim, external model access, oracle-to-proof authority, proof body export, source-file changes, launch-scope decision, publishing-scope decision, or whole-system correctness.
verifier_lab_execution_spine is the public execution witness for the verifier lab lane. It is narrower than verifier_lab_kernel: it actually runs bounded Lean transition candidates in a throwaway Lake project, records the return code of each run, and keeps every line of generated proof text and tool output out of the result record. A reader can then separate real execution evidence from overstated proof claims.
The component consumes a public execution packet with:
transition candidates, each naming a problem id, a target shape, and one action class from a fixed vocabulary (rfl, decide, cases, induction, exact_premise, and similar);
a small Lake project whose MicrocosmProofWitness library the component builds once and reuses;
CP2 translation requests that ask for the next typed action after a residual, and Evolve mutations that adjust bounded policy artifacts;
negative fixtures that smuggle a proof body, an oracle structured source record, a provider hypothesis, or an unbounded source-file changes into a row.
The component writes one .lean file per transition, runs lake env lean on it, and treats a zero exit code as accepted. It records the return code, the action class, and the failure class, but never the proof text, the stdout body, or the stderr body. The exported-bundle lane re-validates the same shape from a copied source-module manifest without re-running Lean, so a third party can inspect the bundle without a Lean toolchain installed.
Purpose
Automated proof systems can blur how a result was obtained. A model can be handed the answer by an oracle, or prompted with the proof by a provider, and still report the result as if it had found the proof unaided. This component exists to keep that blurring out of the result record. It answers one question: did a bounded Lean candidate actually pass the verifier, with no help that the result record is hiding?
The discipline that makes this work is the separation of authority classes. Every row lands in exactly one bucket: lean_verified for candidates the verifier accepted, oracle_compared and provider_suggested for rows that existed only as references, cp2_translated for the typed next-action layer, retrieval_miss and proof_synthesis_fail for residuals, and contract_rejected for anything that broke the leak rules. The unusual choice is what does not happen: an oracle match never increments forward success, and provider text is never counted as a proof. The counters oracle_forward_success_increment_count and provider_results_counted are held at zero by construction.
The second idea is that real execution and clean result records are not in tension. A candidate carrying oracle_visible: true, or a forbidden field such as proof_body or raw_tactic_script, is rejected before Lean is ever invoked, so the run cannot be contaminated. The transition then runs for real, and the result record carries the return code and the failure class while the proof text and the stdout and stderr bodies stay out. The result record is public evidence precisely because the only things omitted are the things that would leak.
Shape
Diagram source
flowchart TD Packet["Execution packet transition candidates, CP2 requests, Evolve mutations, oracle/provider refs"] Gate["Leak contract gate forbidden fields? oracle/provider visible? action class out of vocabulary?"] Rejected["contract_rejected rejected before Lean runs"] Build["Build Lake project lake build MicrocosmProofWitness (once, cached)"] Run["Run candidate write .lean, lake env lean, return code = accepted?"] Verified["lean_verified return code 0"] Residual["retrieval_miss / proof_synthesis_fail non-zero return code"] CP2["cp2_translated typed next action, no proof body"] Evolve["evolve_candidate / evolve_accepted bounded policy artifacts only"] Refs["oracle_compared / provider_suggested references, never counted as success"] Counters["Authority counters oracle_forward_success = 0, provider_results = 0, proof_body_export = 0"] Result records["metadata-only result records result, board, validation, sign-off; return codes kept, bodies omitted"] Ceiling["Scope limit bounded public transition result record only"] Packet --> Gate Gate -->|leak found| Rejected Gate -->|clean| Build Build --> Run Run -->|exit 0| Verified Run -->|non-zero| Residual Packet --> CP2 Packet --> Evolve Packet --> Refs Verified --> Counters Residual --> Counters CP2 --> Counters Evolve --> Counters Refs --> Counters Rejected --> Result records Counters --> Result records Result records --> Ceiling
Evidence/accounting used for this shape:
core/paper_module_capsules.json::paper_modules[44:paper_module.verifier_lab_execution_spine] is the source bundle with source_authority: json_capsule, subjects for component: verifier_lab_execution_spine and mechanism.verifier_lab_execution_spine.validates_public_verifier_transition_witness, resolved code_loci.path: src/microcosm_core/organs/verifier_lab_execution_spine.py, and generated projection statuses available_from_capsule_edges / linked_from_capsule_edges.
paper_modules/verifier_lab_execution_spine.json::paper_module_payload.source_row carries the generated copy of that source record; relationships.edges has 19 entries and relationships.unpopulated_selective_relations is empty. This is readback evidence only, not an editable source.
core/organ_atlas.json::organs[18] classifies the component as evidence_class: external_subprocess_witness, names the first command, resolves the mechanism edge, and restates that the scope limit is bounded public Lean transition rows only.
src/microcosm_core/organs/verifier_lab_execution_spine.py defines the runtime spine: EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, RECEIPT_TRANSPARENCY_CONTRACT, ANTI_CLAIM, validate_source_module_imports, _build_lake_project, _build_result, write_receipts, run, and run_execution_bundle.
core/fixture_manifests/verifier_lab_execution_spine.fixture_manifest.json names the fixture inputs, four expected negative cases, stable error codes, generated result record paths, result record field floor, and body_copied_material_count: 5 for the exported body-floor lane.
examples/verifier_lab_execution_spine/exported_verifier_lab_execution_spine_bundle/source_module_manifest.json records module_count: 5, body_in_receipt: false, exact-copy digest matches, validation refs, and blocked private/external model service payload bodies.
result records/sign-off/first_wave/verifier_lab_execution_spine_fixture_acceptance.json records status: pass, accepted_scope: bounded_public_lean_transition_execution_only, accepted_transition_count: 4, residual_transition_count: 2, zero provider/oracle/proof-body/source-file changes counters, the four observed negative cases, and release_authorized: false.
tests/test_verifier_lab_execution_spine.py checks fixture execution, exported-bundle structure, source-module digest blocking, metadata-only result record transparency, and exact public body-floor manifest behavior.
Reader Evidence Routing
A cold-reader audit starts with the module definition and structured source record proof, then moves to the fixture and exported bundle.
Evidence should be read in this order:
Bundle proof: core/paper_module_capsules.json::paper_module.verifier_lab_execution_spine and paper_modules/verifier_lab_execution_spine.json.
Execution proof: declared command intent, fixture input ref, tool version facts, stdout/stderr classification, validator result record refs, and sign-off result record refs.
Bundle proof: exported execution-bundle run and the same command/tool/result record membrane in disposable outputs.
Negative boundary proof: missing command intent, missing tool facts, missing result record refs, stale execution facts, proof-authority overclaiming, proof-body export, model-output data export, benchmark solve-rate certification, hosted deployment, and launch-scope decision.
Prior Art Grounding
This component is grounded in reproducible execution and proof-assistant witness patterns. Lean/Lake execution inherits from the small-kernel proof-assistant tradition represented by the Lean theorem prover and by LCF/HOL systems such as HOL Light. Artifact evaluation practice also motivates recording command identity, tool facts, stdout/stderr classification, and result record refs separately from the claim they support.
Microcosm borrows the execution-spine discipline: a command can witness that a bounded tool run happened, but tool output must not become theorem-certification or benchmark authority. It does not expose proof bodies or certify solve rates.
Validation Result record Path
Run from microcosm-substrate:
A green result record proves only bounded execution-spine evidence: command intent, tool facts, stdout/stderr classification, result record refs, and explicit missing-fact failures. It does not establish general proof certification, proof-body safety beyond the fixture membrane, benchmark solve rate, hosted deployment, or launch.
Scope boundary
Scope limit
This paper module can claim the following for the verifier lab execution spine: the component subject resolves, the runtime source locus is named, a diagram view is generated for this module, and an atlas card is generated for this module. It cannot claim general proof certification, Mathlib-dependent proof authority, proof-body safety beyond the fixture membrane, benchmark solve-rate certification, provider authority, source-file changes, hosted deployment, launch-scope decision, publishing-scope decision, or whole-system correctness.
Fixture result records, exported execution-bundle result records, focused tests, command intent, tool-version facts, stdout/stderr classification, result record refs, and missing-fact failures can support only bounded execution-spine evidence. The diagram and atlas views are navigation aids derived from the module definition; they do not promote a tool run into proof certification, benchmark authority, or launch-scope decision.
Scope limit
This paper module describes public execution-spine result records only. It does not establish general proof certification, authorize Mathlib-dependent proof authority, expose private proof bodies, certify benchmark solve rates, use external model services, change source files, include launch operations, or authorize hosted deployment.
Bounded Autonomy Campaign PacketBounded autonomy campaign packets propose guarded agent work without authorizing source-file changes or unsupervised repair.
Bounded Autonomy Campaign Packet validates public campaign packet fixtures: proposed gaps, policy gates, repeated-failure digests, negative cases, source-open body imports, and scope limits. It keeps campaign proposals separate from self-repair, source writes, live scheduling, external model access, launch-scope decision, public sharing, and whole-system correctness.
Scope limit Self-proposal campaign packet fixture and exported-bundle result record evidence only; no self-repair authority, unsupervised source-file changes, live scheduler authority, external model access, launch-scope decision, publishing-scope decision, or whole-system correctness.
bounded_autonomy_campaign_packet is a Crown Jewel import component with real runnable system and a strict public scope limit. It consumes synthetic public fixtures, copied source source bodies, and source manifests that verify sha256 digests, line counts, required anchors, secret-exclusion status, and result record body omission.
What it proves: self-proposal campaign packet only; no self-repair or unsupervised source-file changes.
Purpose
An agent can usefully notice its own coverage gaps and draft a plan to close them. The danger is that "draft a plan" quietly becomes "do the work": a proposal grows a write surface, and a system that was meant to suggest starts mutating its own source unsupervised. This component exists to keep those two steps apart. It answers one question: can an agent emit a draft campaign proposal from real coverage gaps without that proposal carrying any authority to act on them?
The design choice that makes this interesting is where the candidate count comes from. The component does not invent a plausible-looking list of work. It runs a real source campaign builder in read-only mode (build_standard_skill_pairing_campaign.py --check --report) and accepts its witness only when the builder reports candidate targets and leaves wrote_packet unset. The proposal is therefore derived from a surface that could do real work, observed in a mode where it did not. Each drafted candidate is then stamped write_surface: none, source_mutation_authorized: false, and requires_human_review: true, so the act of proposing can never be mistaken for the act of authorising.
Two refusals guard the boundary. A campaign policy that lists write_source among its allowed actions is rejected outright, before any candidate is drafted. And a campaign digest that already appears in the failed-campaign ledger more than once is refused, so a plan that has already failed cannot be quietly re-proposed under a fresh wrapper. Both refusals are checked by mutating the fixture and confirming the expected error code fires, not by trusting a declared label.
Shape
Source refs
Read-only builder witness check --report
build_standard_skill_pairing_campaign.py
Diagram source
flowchart TD Inputs["Public synthetic inputs coverage_gaps, campaign_policy, failed_campaign_digests"] PolicyGate{"campaign_policy allows write_source?"} Witness["Read-only builder witness build_standard_skill_pairing_campaign.py --check --report"] WitnessGate{"reports candidate targets and wrote_packet unset?"} Draft["Draft candidate packet write_surface: none, requires_human_review, source_mutation: false"] DigestGate{"failed digest repeated?"} Refuse["Refuse SOURCE_WRITE_FORBIDDEN / REPEATED_FAILED_DIGEST / witness blocked"] Result records["metadata-only result records refs, digests, stdout/stderr hashes; builder output bodies excluded"] Ceiling["Scope limit no self-repair, source-file changes, providers, launch, or public sharing"] Inputs --> PolicyGate PolicyGate -- "yes" --> Refuse PolicyGate -- "no" --> Witness Witness --> WitnessGate WitnessGate -- "no" --> Refuse WitnessGate -- "yes" --> Draft Draft --> DigestGate DigestGate -- "yes" --> Refuse DigestGate -- "no" --> Result records Refuse --> Result records Result records --> Ceiling
This diagram is a reader aid. The machine graph remains the generated paper_module.bounded_autonomy_campaign_packet.mermaid projection derived from the JSON source record.
Technical Mechanism
The runtime is intentionally narrower than "autonomous repair." SPEC declares the four required public inputs, the source-module manifest, the expected negative cases, and an AUTHORITY_CEILING in which self-repair, unsupervised source-file changes, source-write packets, external model access, and launch are all false. run() and run_bounded_autonomy_bundle() then route both the fixture and exported bundle through run_crown_jewel_organ, so the same evaluator, source-manifest checks, metadata-only result record policy, and semantic negative-case evaluator guard both command surfaces.
The positive lane is witnessed by _campaign_builder_witness(), not by a fictional campaign row. It invokes tools/meta/factory/build_standard_skill_pairing_campaign.py --check --report --max-targets <n> from the source root, then accepts the witness only when the builder returns standard_skill_pairing_campaign_summary, reports at least one candidate target, emits a source_digest, and leaves wrote_packet unset. This makes the campaign packet a read-only proposal derived from a real builder surface; the result record stores return code, digest fields, and stdout/stderr hashes, but keeps builder output bodies out of the result record.
_candidate_packet_subprocess() converts the witnessed target count into draft candidate rows. Each candidate is tied to one fixture coverage gap when available, carries the builder ref and builder source digest, sets write_surface: none, requires human review, and records source_mutation_authorized: false. evaluate() then applies the policy checks: write_source in campaign_policy.allowed_actions is a hard refusal; blocked builder witness or empty candidate packet is a hard refusal; any candidate that authorizes source-file changes or writes to the source surface is also refused.
The negative cases are semantic mutations of the input, not trusted labels. evaluate_negative_case() copies the required inputs into a temporary directory and mutates the relevant file: source_write_campaign_packet appends write_source to campaign_policy.allowed_actions, while repeated_failed_campaign_digest rewrites the failed-digest ledger to contain a duplicate digest. The component passes its own evidence floor only when these mutations produce BOUNDED_AUTONOMY_SOURCE_WRITE_FORBIDDEN and BOUNDED_AUTONOMY_REPEATED_FAILED_DIGEST; stale declared error-code labels cannot satisfy the proof consumer.
Reader Evidence Routing
The primary evidence for this module is the fixture result record and the exported-bundle result record, which demonstrate the bounded campaign packet behavior under synthetic public inputs. Source-module manifests and digest checks are evidence for copied body provenance. This page is an explanation of those sources; the underlying JSON and test outputs are the authority.
Prior Art Grounding
This component borrows from AI risk-management, policy gating, and controlled workflow-automation patterns. Useful anchors include:
NIST's AI Risk Management Framework, which frames AI work in terms of governance, mapping, measuring, and managing risk rather than assuming autonomy is inherently authorized.
Open Policy Agent, as a policy-engine pattern for deciding whether a proposed action may proceed.
GitHub Actions workflow syntax, as a widely used automation surface where jobs, permissions, and concurrency behavior are declared before execution.
Microcosm borrows the governed-campaign and preflight-gate shape, but keeps the component to draft self-proposal packets over synthetic public coverage gaps. It does not self-repair, change source files unsupervised, use external model services, or include launch operations.
How to run it:
microcosm bounded-autonomy-campaign-packet run --input fixtures/first_wave/bounded_autonomy_campaign_packet/input --out receipts/first_wave/bounded_autonomy_campaign_packet
If the fixture or bundle reports source-module digest drift, route that through microcosm_exact_copy_refresh; this page is source-linked only for copied source bodies. If the full projection check fails because another active session holds shared lattice outputs, treat that as unrelated contention and use the corpus check as the local gate for this module.
Negative cases covered by the fixture manifest: repeated_failed_campaign_digest, source_write_campaign_packet.
Source provenance is anchored by examples/bounded_autonomy_campaign_packet/exported_bounded_autonomy_campaign_packet_bundle/source_module_manifest.json and result records carry refs, digests, counts, verdicts, and scope boundaries only.
Scope boundary
Scope limit
This component emits a draft self-proposal from public synthetic coverage gaps and refuses source-write or repeated-failure packets. It does not self-repair, change source files unsupervised, use external model services, include launch operations or public sharing, or widen the proof boundary beyond the copied source bodies, synthetic fixtures, source manifests, negative cases, and validation result records.
Scope limit
This paper module demonstrates a bounded-autonomy fixture that builds a draft campaign packet and refuses unsafe packets under public synthetic inputs. A diagram view and atlas card are generated for this module.
It cannot claim autonomous repair, unsupervised source-file changes, external model access, launch-scope decision, publishing-scope decision, production campaign safety, private-system equivalence, or whole-system correctness.
Computer-Use Action Trace ReplayValidator-backed public replay for synthetic computer-use action traces under the route-observability runtime.
Computer-Use Action Trace Replay validates a synthetic computer-use episode through visible observations, affordances, action rows, authority verdicts, state-transition and recovery result records, cold replay rows, public trace spans, source-module manifests, negative cases, and metadata-only result records. It is a reader-facing contract under Agent Route Observability Runtime, not live browser or desktop control.
Scope limit Synthetic public computer-use action-trace fixtures, exported bundle metadata, copied source-module digests, and metadata-only result records only; no live account action, account secret entry, external network mutation, purchase/send authority, destructive host action, hidden screen-state claim, benchmark-score claim, launch-scope decision, or whole-system correctness.
computer_use_action_trace_replay is a validator-backed claim contract under agent_route_observability_runtime. It asks a narrow eval-harness question: does a claimed computer-use episode bind visible observations, affordances, actions, pre-action authority verdicts, state-transition result records, recovery result records, cold replay, falsification fixtures, non-public-state scan posture, and an explicit scope limit?
The fixture rejects live account action, account secret entry, external network mutation, purchase/send without approval, destructive action without review, hidden screen-state claims, actions without observation and affordance refs, and benchmark-score claims.
Purpose
A computer-use agent produces a stream of screenshots, clicks, keystrokes, and "it worked" assertions. The hard question for anyone reviewing such a trace is not whether the agent moved the mouse, but whether the record actually supports the claim that something happened safely. A trace can look complete while hiding the two failures that matter most: an action that was blocked or sent for review but is later narrated as a success, and a success that is asserted without any state evidence to back it. This module exists to make that question decidable on a synthetic episode, offline, before any of the language reaches a reader.
The single question it answers is: does each recorded action line up, row by row, with a prior visible observation, a pre-action authority verdict, and a state-transition result record whose outcome agrees with that verdict? The mechanism is a typed join, not a screenshot replay. An action must cite the observation it reacted to and an affordance that was visible in it; a verdict must be stamped before the action and must explicitly deny live-account, account secret, network, destructive, and purchase or send authority; a transition result record must then match the verdict. If the verdict said allow, the result record has to show the action was executed and an oracle confirmed the resulting state. If the verdict said block or review, the result record has to show the action was not executed and the status reads blocked or review-required. Nondeterministic "it probably succeeded" claims are refused outright.
What is genuinely unusual here is the inversion. Most action-trace tooling treats a screenshot as the proof. This module treats the screenshot as the one thing it will not trust: observations enter only as a digest and a visible-state hash, with raw pixels, hidden-state assertions, and live-browser state all required to be absent. The evidence that carries weight is the agreement between the verdict and the transition, not the image. The result record that comes out the other end records counts, refs, hashes, and the redaction posture, and never the raw bodies it checked. It describes a synthetic episode under the route-observability runtime; it does not drive a live browser or desktop.
Shape
Source refs
Component
agent_route_observability_runtime runtime
Diagram source
flowchart TD bundle["JSON source record"] bundle --> mermaid["generated Mermaid available"] bundle --> atlas["generated Atlas linked"] bundle --> component["agent_route_observability_runtime runtime"] component --> bundle["exported computer-use bundle"] bundle --> observations["visible observations: digest + visible-state hash, no raw pixels"] observations --> actions["action rows: cite observation + affordance, allowed kind, redacted"] actions --> verdicts["pre-action authority verdict per action"] verdicts -->|allow| executed["transition: executed + oracle status pass"] verdicts -->|block or review| held["transition: not executed + blocked / review-required"] held --> recovery["recovery result record, no upgrade to executed"] executed --> cold["cold replay reproduces action, verdict, transition"] recovery --> cold cold --> trace["public trace spans: refs, counts, hashes, redaction posture"] trace --> result record["metadata-only validation result record"] result record --> ceiling["scope limit: no live control"]
The shape is a reader route over a synthetic computer-use action trace validator. The evidence path runs through the source record, fixture manifest, exported bundle, runtime validator, public trace builder, metadata-only result records, and explicit scope limit. A diagram view and Atlas entry are generated for this module from the source record.
Technical Mechanism
The runtime entry point is run_computer_use_action_trace_bundle in src/microcosm_core/organs/agent_route_observability_runtime.py. It first loads the bundle through the strict JSON path and decides whether the input is the full fixture with negative cases or the public exported bundle. It then checks the projection protocol, interaction policy, task episodes, screen observations, action trace, authority verdicts, state transitions, recovery result records, cold replay rows, source-module manifest, non-public-state scan, and public trace spans before writing a result record. The status is pass only when positive findings are empty, required negative cases are observed for the fixture path, the non-public-state scan passes, and copied public source-module digests verify.
The mechanism is a typed join, not a screenshot replay. Actions must cite prior observation and affordance refs. Authority verdicts must cite action ids before state transitions can be credited. Cold replay rows must cover the action ids and reproduce the action, verdict, and transition relation. Recovery result records cover blocked or review-required actions without upgrading them into executed mutations. The public trace builder then emits bounded spans over refs, counts, hashes, and redaction posture, while the result record deliberately omits raw screen bodies, account secrets, hidden screen state, model-output data, private source bodies, absolute local paths, and benchmark-score claims.
Named Proof Consumers
validate-computer-use-bundle is the reader command. On the exported bundle, it should produce exported_computer_use_action_trace_bundle_validation_result.json with four episodes, six observations, eight actions, eight authority verdicts, eight state-transition result records, one recovery result record, four cold replay rows, eight public trace spans, copied source-module digest verification, and an explicit no-live-control scope limit.
tests/test_agent_route_observability_runtime.py::test_computer_use_action_trace_replay_observes_negative_cases is the negative fixture consumer. It checks that live account action, account secret entry, external network mutation, unapproved purchase/send, destructive file action, hidden screen-state claims, action-without-observation rows, and benchmark-score claims are rejected.
tests/test_agent_route_observability_runtime.py::test_computer_use_action_trace_receipt_is_public_relative_and_redacted is the result record-safety consumer. It verifies public-relative paths and absence of account secret values, hidden screen state, absolute paths, and raw bodies.
tests/test_agent_route_observability_runtime.py::test_computer_use_action_trace_exported_bundle_validates_runtime_shape is the public-bundle consumer. It checks the exported-bundle shape, action kinds, source-module digest posture, public trace coverage, and no benchmark authority.
tests/test_agent_route_observability_runtime.py::test_computer_use_trace_loader_rejects_duplicate_json_keys is the parser-integrity consumer. It prevents a replay bundle from passing by hiding conflicting values behind duplicate JSON keys.
Reader Evidence Routing
Bundle route: core/paper_module_capsules.json::paper_modules[46:paper_module.computer_use_action_trace_replay] is the source-authority row for this module. A diagram view and Atlas entry are generated from that source record.
Dependency route: downstream modules may reference paper_module.computer_use_action_trace_replay, but this page's source authority is the source record named above, not those downstream dependencies.
Fixture-manifest route: core/fixture_manifests/agent_route_observability_runtime.fixture_manifest.json::computer_use_action_trace_replay_contract_v1 names the positive inputs, negative-case floor, expected result record fields, runtime-example command, and scope limit.
Runtime route: src/microcosm_core/organs/agent_route_observability_runtime.py::run_computer_use_action_trace_bundle loads the bundle, validates projection protocol, interaction policy, episodes, observations, actions, authority verdicts, state transitions, recovery result records, cold replay, source-module manifest, negative cases, and public trace spans.
Source-module route: source_module_manifest.json records copied public source bodies for codex/standards/std_agent_execution_trace.json, system/lib/agent_execution_trace.py, and system/lib/strict_json.py, with body_in_receipt: false.
Focused-test route: tests/test_agent_route_observability_runtime.py validates negative cases, public-relative redacted result records, exported-bundle runtime shape, public trace span coverage, source-faithful public refactor status, source digest matching, and duplicate-key rejection.
Prior Art Grounding
This component is grounded in web and desktop agent benchmarks that make action trajectories inspectable. WebArena and Mind2Web anchor realistic web-task evaluation, while OSWorld extends the concern to multimodal agents acting in real computer environments. Browser automation standards such as WebDriver are also prior art for representing actions against visible browser state through a controlled protocol.
Microcosm borrows the action-trace accounting pattern: observations, affordances, actions, pre-action authority verdicts, transition result records, recovery result records, cold replay, and falsification cases must line up before a computer-use episode is credited. It does not operate a live browser or desktop.
The result record proves only this public synthetic replay boundary. It does not control a live browser or desktop, use accounts, enter account secrets, mutate external systems, export raw screenshots, claim benchmark performance, change source files, use external model services, or include launch operations.
Validation Result record Path
Reader-verifiable bundle command, run from microcosm-substrate/:
The command writes the computer-use replay result record under receipts/runtime_shell/demo_project/organs/agent_route_observability_runtime/, including computer_use_action_trace_replay_result.json and the exported bundle validation result. The tracked fixture result record records the synthetic observations, affordances, authority verdicts, transition result records, recovery result records, falsification cases, non-public-state scan posture, and scope limit.
This result record path is reader-verifiable evidence only. It does not flip Mermaid/Atlas status, create bundle authority, operate a live browser or desktop, use accounts, enter account secrets, mutate external systems, claim benchmark performance, or aggregate doctrine-lattice coverage.
Scope boundary
Scope limit
This module may claim synthetic computer-use action-trace replay over public fixtures: visible observations, affordances, action rows, pre-action authority verdicts, state-transition result records, recovery result records, cold replay rows, public trace spans, source-module digest checks, expected negative cases, and metadata-only result records.
It does not claim live browser or desktop control, account automation, account secret entry, purchase/send authority, external network mutation, destructive host action, hidden screen-state truth, benchmark performance, provider behavior, source-file changes, launch-scope decision, or whole-system correctness. The diagram view and Atlas entry generated for this module are navigation surfaces; they are not additional proof authority.
Source and projection details
Governing Lattice Relation
The source record binds this module to the accepted agent_route_observability_runtime component and to mechanism.agent_route_observability_runtime.validates_public_route_feedback. That places the page under AX-1 and the P-1 / P-2 claim discipline: a computer-use claim is admissible only when the runtime recomputes it from lower level evidence, and the public sentence cannot exceed what the named validator actually checks. The generated JSON instance records nine resolved edges: component, mechanism, concept, axiom, principle, dependency, and code-locus links.
The relevant concept is concept.agent_reliability_and_safety_validator_bundle, not a generic browser agent benchmark. It frames the replay as an evidence bundle: visible observations and affordances are the basis, action rows are candidate transitions, pre-action authority verdicts decide whether a transition may be executed or blocked, and result record rows carry the bounded public result. The dependencies on agent_route_observability_runtime and macro_projection_import_protocol keep the proof below the source-open import and result record lanes instead of treating this Markdown page as source authority.
Concurrency Mission ControlConcurrency Mission Control validates metadata-only coordination result records without becoming a live scheduler or production concurrency proof.
Concurrency Mission Control validates the public concurrency mission-control membrane: copied source-builder digests, bridge artifacts, failure classes, work log seed-speed topology, heartbeat and claim-collision anchors, negative cases, source-open body imports, and metadata-only result records. It separates coordination evidence from hosted orchestration, external model access, live scheduling, private-system equivalence, source authority, launch, public sharing, and production concurrency guarantees.
Scope limit Verified concurrency mission-control fixture and source-module import evidence only; no live scheduler, external model access, hosted orchestration, production concurrency-safety proof, source authority, private-system equivalence, launch-scope decision, publishing-scope decision, or whole-system correctness.
concurrency_mission_control imports the real self-indexing-cognitive-system/src/idea_microcosm/concurrency_mission_control_specimen.py source builder plus its public provider-canary and work log bridge artifacts as exact source copies. The component runs the copied builder in a temporary public seed root, then checks the transaction failure matrix, authority membrane, and a public work log seed-speed topology fixture. The work log code body itself is consumed through the existing mission_transaction_work_spine source-body import surfaces rather than duplicated here.
The component is deliberately narrow: it demonstrates fail-closed transaction gating for synthetic multi-agent lanes, not private mission-control runtime, external model access, live scheduling, production concurrency safety, hosted orchestration, or launch-scope decision.
Purpose
When several agents work the same repository at once, the dangerous moment is not a crash. It is a quiet one: two lanes edit the same generated file, or one lane commits work whose owner has not finished, and nobody notices until the state is already wrong. This component exists to make that moment a checkable verdict rather than a judgement call.
The single question it answers is: given a dirty path and the live claim topology around it, is acting on that path safe, and if not, what must happen first? The answer is never "probably fine". Each case resolves to a named classification and one allowed action, so a lane can decide whether to proceed, hand off, or wait.
What is unusual is where the evidence comes from. Rather than re-implementing a scheduler, the component runs the real source mission-control builder over public synthetic lanes and reads a public snapshot of the work log's seed-speed topology: who holds which claim, whether their heartbeat is current, and where path claims collide. The most pointed part is the pair of classifier lenses. The closure-state lens then folds in validation, commitability, and residual evidence to say whether a piece of work is genuinely closed or only looks closed. Both lenses default to the cautious verdict when the evidence is thin, which is the behaviour the page is really about.
Prior Art Grounding
This component borrows from workflow DAGs, lease-based coordination, atomic commit protocols, and CI concurrency controls. Useful anchors include:
Apache Airflow DAGs, for representing tasks, dependencies, retries, and scheduling separately from task internals.
Kubernetes Lease-based leader election, as a prior pattern for lease holders, renewals, and failover-sensitive internal control coordination.
IBM Research on two-phase commit, as a transaction-consistency pattern for distributed participants under failure.
GitHub Actions workflow syntax, for declared workflow concurrency and job orchestration controls.
Microcosm borrows the DAG, lease, commit-gate, and workflow-concurrency shapes, but keeps the component to fail-closed synthetic multi-agent transaction gating. It does not claim private mission-control runtime, external model access, live scheduling, production concurrency safety, hosted orchestration, or launch.
dirty generated file: owner live / stale / absent > allowed action
generated_surface_claim_lens
Diagram source
flowchart LR Builder["Copied source builder run in temp seed root: mission board, bridges, result record"] Bridge["Public bridge artifacts provider canary and work log cap economy"] Seed["work log seed-speed snapshot claims, heartbeats, collisions, session cards"] subgraph Engines["Six engines (all must pass)"] Matrix["failure_matrix_gate conflict, duplicate run, dependency, lease, result record, finalizer visible"] Membrane["bridge_authority_membrane bridges green, authority-collapse zero, forbidden claims blocked"] SeedGate["work_ledger_seed_speed_gate heartbeat current, path claims collision-free"] SurfaceLens["generated_surface_claim_lens dirty generated file: owner live / stale / absent -> allowed action"] ClosureLens["closure_state_lens closed and committed, validation deferred, or open and unclassified"] end Negative["Negative floor missing seed root, blocked bridge, authority collapse, private runtime, claim collision"] Result record["metadata-only result records refs, digests, anchors, counts, verdicts; no session or proof bodies"] Builder --> Matrix Builder --> Membrane Bridge --> Membrane Seed --> SeedGate Seed --> SurfaceLens Seed --> ClosureLens Engines --> Negative Negative --> Result record
Engines
mission_transaction_original_builder dynamically loads the copied source builder and emits the mission board, provider repair bridge, work-metabolism bridge, residual replay bridge, and result record.
failure_matrix_gate checks that owner-path conflicts, duplicate command runs, dependency gaps, stale leases, missing result records, supervised-scope gaps, missing parent finalizers, and misanchored claims all remain visible.
bridge_authority_membrane checks that bridge statuses are green while authority-collapse counters remain zero and forbidden claims stay blocked.
work_ledger_seed_speed_gate checks that public session heartbeat, seed-speed status, mutation-check commands, multi-session/claim counts, and collision-free path-claim rows are present without exporting private work log session bodies.
Each classification carries the single allowed action, so the verdict is what a lane should do, not just what it observed.
closure_state_lens decides whether a unit of work is genuinely closed. It folds the generated-surface classification together with validation state, commitability, and any open residual, separating closed_and_committed from the cases that only look done: closed_validation_deferred (validation parked under host pressure), closed_uncommitted_authority (event authority exists but shared append logs are unsafe to stage), false_residual_stale (a residual left open against a passing generator check), or open_unclassified when the closure evidence is simply insufficient. The default is the last of these, so absent evidence never reads as success.
Reader Evidence Routing
Read this module as a coordination-evidence membrane, not as a live scheduler. Start with paper_modules/concurrency_mission_control.json for the full structured binding, then open standards/std_microcosm_concurrency_mission_control.json for required copied-body counts, negative cases, result record fields, and the public/private boundary.
Open core/fixture_manifests/concurrency_mission_control.fixture_manifest.json and examples/concurrency_mission_control/exported_concurrency_mission_control_bundle/source_module_manifest.json before inspecting copied source modules. The manifest floor names one source builder body and six public bridge artifacts; result record payloads should carry source refs, hashes, anchors, counts, verdicts, and omission result records, not body text.
Read the work log seed-speed topology as a public coordination fixture. It can show heartbeat participation, mutation-check commands, session and claim counts, and collision-free selected rows, but it cannot export private work log session bodies or authorize live scheduling.
Negative Cases
The fixture carries stable cases for missing seed roots, blocked provider bridges, authority-collapse claims, private runtime overclaims, and unresolved work log seed-speed claim collisions. If focused validation reports an exact-copy source-module body mismatch, route that repair through microcosm_exact_copy_refresh; do not treat this Markdown projection as source authority for copied source bodies.
Validation Result record Path
From microcosm-substrate, validate with throwaway result record outputs first:
A diagram view and navigation card are generated for this module from its declared component, mechanism, concept, principle, axiom, dependency, and code-locus relationships. Fixture and bundle passes prove only public fail-closed coordination evidence over the declared copied bodies and synthetic fixtures. Source-copy digest drift belongs to microcosm_exact_copy_refresh; shared lattice projection drift belongs to the live projection owner lane.
Scope boundary
Scope limit
This module may claim public fixture evidence that the exact public source builder copy, provider-canary and work log bridge artifacts, failure-matrix fixture, bridge authority membrane, work log seed-speed topology fixture, source manifests, metadata-only result records, negative cases, and generated navigation projections support the declared concurrency mission-control fixture contract. It may also claim that the structured binding row resolves the accepted component subject, resolved mechanism subject, runtime source locus, governed concept, five principles, four axioms, and three dependency modules.
This module may not claim private mission-control runtime truth, external model access, live scheduling, production concurrency safety, hosted orchestration, source-file changes, hosted-public posture, launch-scope decision, publishing-scope decision, implementation correctness beyond the listed witnesses, or whole-system correctness.
Source and projection details
Governing Lattice Relation
The governing lattice claim is that this module turns concurrency coordination from a status narrative into a transaction-scoped evidence check. The bundle structured source record reports sixteen resolved edges and zero unresolved selective relations: the page explains the accepted component and mechanism, cites the runtime source locus, depends on the mission-transaction, bridge-continuity, and work-landing modules, and is governed by concept.work_landing_and_continuity_control_bundle. That concept binds this component to the same family shape as work landing and continuity controls: public fixture or exported bundle input becomes a coordination validator, and the result is a scoped transaction or continuity result record rather than chat status or generated projection authority.
The mechanism row mechanism.concurrency_mission_control.validates_public_concurrency_mission_control is the source-backed explanation edge. In source, run, run_concurrency_mission_control_bundle, classify_generated_surface_claim_lens, and classify_concurrency_closure_state_lens require copied-source digest equality, required anchors, failure-class coverage, work log seed-speed topology checks, metadata-only result records, and explicit scope limits. The focused proof consumer is tests/test_concurrency_mission_control.py: it checks the happy-path fixture, exported-bundle validation, digest-mismatch rejection, exact source-body imports, semantic negative cases, owner-state classification, and closure-state classification. The standard std_microcosm_concurrency_mission_control.json supplies the same ceiling in schema form, including seven copied public source modules, five negative cases, no non-public body export, and no live scheduler/provider/launch-scope decision.
The principle and axiom edges keep the proof boundary from drifting upward. P-10, P-16, and AX-9 make coordination effects transaction-scoped and compensable; P-2, P-6, P-8, AX-5, AX-7, and AX-8 force the validator to lower claim strength when evidence, preconditions, provenance, or refusal reasons are missing. A passing run therefore proves only the public concurrency mission-control fixture contract over declared copied bodies and synthetic fixtures. It does not establish private mission-control runtime truth, live scheduling, external model access, hosted orchestration, production concurrency safety, source-file changes, launch-scope decision, or whole-system correctness.
Doctrine Fact Claim AuditDoctrine Fact Claim Audit rejects wrong fact counts and dead anchors without claiming comprehension or route completeness.
Doctrine Fact Claim Audit validates public doctrine fact assertions against declared sections, numeric claim gates, code-locus anchors, route DAG fixtures, negative cases, source-open body imports, and scope limits. It lowers claim strength to fixture truth: fact assertion, code-loci, and DAG evidence are checked, but the component does not become a comprehension engine, minimum-read graph, doctrine saturation proof, source-file changes lane, or launch-scope decision.
Scope limit Declared fact-assertion, code-locus, and DAG fixture truth gate only; no comprehension engine, no minimum-read-graph proof, no doctrine saturation claim, no source-file changes, no launch-scope decision, no publishing-scope decision, and no whole-system correctness.
doctrine_fact_claim_audit is a Crown Jewel import component with real runnable system and a strict public scope limit. It consumes synthetic public fixtures, copied source source bodies, and source manifests that verify sha256 digests, line counts, required anchors, secret-exclusion status, and result record body omission.
What it proves: fact assertion, code-loci, DAG, and numeric claim binding fixture truth gate only.
Purpose
Documentation about a living system rots. A page states that there are forty-seven of something, or cites a function in a file, and both claims quietly go stale as the code moves underneath them. A reader cannot tell a current count from a number that was true once and never rechecked. This component exists to answer one question: which of a page's factual assertions can be re-derived from source right now, and which have become untracked drift?
The approach treats a documentation claim like a cached value that needs an invalidation strategy. A bare number is not enough; the claim is admissible only when it is bound to a fact assertion that records how to recompute or revalidate the value. The same pass resolves every cited code locus on disk and checks that the quoted anchor text is actually present, so a plausible-but-dead file reference becomes a typed finding rather than inert prose. The interesting move is that nothing here asks a model whether the prose reads as true. The component recomputes a bounded relation over public fixtures and reports only what that relation supports.
The second design choice worth naming is how the checks are proved. The negative floor is semantic, not label-trusting: the test harness overwrites the declared failure fixtures with bogus pass rows and confirms the evaluator still derives the expected stable error codes itself. That keeps the proof attached to the mechanism rather than to the fixture filenames. The honesty of the page rests on that: the component is a narrow claim-audit gate over copied public fixtures, not a comprehension engine, a minimum-read-graph proof, a source-file changes lane, or any launch-scope decision.
Prior Art Grounding
This component borrows from provenance modeling, structured fact-check metadata, schema validation, and supply-chain attestation. Useful anchors include:
W3C PROV, which models entities, activities, and agents so readers can assess the quality, reliability, and trustworthiness of derived information.
Schema.org ClaimReview, as a web metadata pattern for recording a reviewed claim and its fact-checking context.
JSON Schema, for declaring expected structure and rejecting malformed or incomplete claim records.
SLSA provenance, for the software-supply-chain pattern of tracing artifacts back to source and build metadata.
Microcosm borrows the provenance, claim-review, schema, and attestation shapes, but keeps this component to public fixture fact counts, code-loci existence, anchor presence, DAG references, and synthetic volatile numeric binding cases. It is not a comprehension engine, private-doctrine export, launch-scope decision, or a minimum-read-graph proof.
Technical Mechanism
The runtime mechanism is a public fixture evaluator in src/microcosm_core/organs/doctrine_fact_claim_audit.py. The component declares a CrownJewelSpec with four required inputs: fact_assertions.json, fact_dag.json, numeric_claims.json, and projection_protocol.json. The shared crown-jewel runner handles source-manifest validation, result record writing, negative-case execution, and scope limit attachment; this module supplies the domain evaluator and the semantic negative-case mutator.
evaluate first loads the fact assertion table and compares expected_fact_count to the number of fact rows. Each fact must carry at least one code locus. The evaluator resolves every relative code-locus path against the copied source-module bundle, then checks that the declared anchor text is present in the copied body. The DAG pass builds the set of audited fact ids and rejects any edge whose from or to endpoint is not in that set. These checks convert plausible documentation references into result record-backed paths, anchors, and graph edges.
Numeric claims are checked by importing the copied source_modules/system/lib/derived_fact_hologram.py body from the exported bundle and calling its find_unbound_numeric_claims function. For each row in numeric_claims.json, the evaluator synthesizes FactAssertion instances for the declared sections, records unbound numeric detections, and blocks a case when a non-detector row leaves current-state numeric prose without a matching fact assertion. Detector rows are positive evidence only because they must surface the expected section and number.
The negative floor is semantic rather than label-trusting. evaluate_negative_case mutates the positive fixture in memory for wrong_fact_count, missing_code_locus, dead_code_locus, dead_dag_ref, and unbound_numeric_claim, then reruns the same evaluator in a temporary input directory. The tests deliberately overwrite the declared negative-case files with bogus pass rows and confirm that the component still derives the expected stable error codes from the evaluator itself. That keeps the proof tied to the mechanism, not to fixture labels.
The source-open body floor is separate from the result record floor. The exported bundle manifest names two copied bodies, derived_fact_hologram.py and paper_modules.py, with digests and line counts. Runtime result records carry refs, counts, verdicts, scope boundaries, and body_in_receipt: false; they do not embed copied source bodies or private operator material.
Subject: doctrine_fact_claim_audit, with mechanism mechanism.doctrine_fact_claim_audit.validates_public_doctrine_fact_claim_audit.
Runtime locus: src/microcosm_core/organs/doctrine_fact_claim_audit.py, especially run, run_doctrine_fact_bundle, evaluate, _evaluate_numeric_claims, _load_derived_fact_module, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
The fixture checks an expected fact count, resolves declared code-locus paths, verifies required source anchors, rejects dead DAG references, and requires volatile numeric claim cases to be bound to fact assertions.
The accepted positive result record reports three facts, three verified code loci, two DAG edges, two numeric claim cases, and one detected unbound numeric detector case, while preserving body_in_receipt: false.
The negative floor is stable: dead_code_locus, dead_dag_ref, missing_code_locus, unbound_numeric_claim, and wrong_fact_count.
The public standard is standards/std_microcosm_doctrine_fact_claim_audit.json; the fixture manifest is core/fixture_manifests/doctrine_fact_claim_audit.fixture_manifest.json.
Source refs
facts + expected_fact_count
fact_assertions.json
edges
fact_dag.json
cases
numeric_claims.json
Diagram source
flowchart LR Facts["fact_assertions.json facts + expected_fact_count"] --> Eval["evaluate"] Dag["fact_dag.json edges"] --> Eval Numerics["numeric_claims.json cases"] --> Eval Manifest["source module manifest copied bodies"] --> Eval Eval --> Count{"declared fact count = table length?"} Eval --> Loci{"each code locus path on disk + anchor in body?"} Eval --> DagRef{"DAG endpoints are known fact ids?"} Eval --> Bound{"current-state numerics bound to a fact assertion section?"} Count -->|mismatch| Block["typed blocking finding"] Loci -->|missing path or anchor| Block DagRef -->|dead ref| Block Bound -->|unbound| Block Count -->|ok| Result record["metadata-only result record body_in_receipt: false"] Loci -->|ok| Result record DagRef -->|ok| Result record Bound -->|ok| Result record Neg["evaluate_negative_case mutate fixture, rerun evaluator"] --> Codes["expected stable error codes"]
Named Proof Consumers
Fixture CLI consumer: PYTHONPATH=src ../repo-python -m microcosm_core.components.doctrine_fact_claim_audit run --input fixtures/first_wave/doctrine_fact_claim_audit/input --out /tmp/microcosm-doctrine-fact-claim-audit/fixture --sign-off-out /tmp/microcosm-doctrine-fact-claim-audit/sign-off.json --card. Expected proof shape: status: pass, three fact rows, three verified code loci, two DAG edges, two numeric-claim cases, one detector case, zero blocking unbound numerics, five semantic negative cases, and body_in_receipt: false.
Exported bundle consumer: PYTHONPATH=src ../repo-python -m microcosm_core.organs.doctrine_fact_claim_audit run-doctrine-fact-bundle --input examples/doctrine_fact_claim_audit/exported_doctrine_fact_claim_audit_bundle --out /tmp/microcosm-doctrine-fact-claim-audit/bundle --card. Expected proof shape: the same evaluator runs through the exported bundle input mode, validates the source-module manifest, and writes metadata-only bundle result records.
Focused regression consumer: PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider --basetemp=/tmp/microcosm_doctrine_fact_claim_audit_pytest tests/test_doctrine_fact_claim_audit.py -q. Expected proof shape: the seven tests cover the positive fixture, dead code locus, missing code locus, dead DAG ref, unbound numeric claim, semantic negative-case derivation, and exported-bundle route.
Corpus parity consumer: PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus. Expected proof shape: the structured source record remains reproducible from the bundle and Markdown projection without hand-editing generated state.
structured source record readback consumer: jq '{source_authority:.paper_module_payload.source_authority, mermaid:.paper_module_payload.generated_projections.mermaid.status, atlas:.paper_module_payload.generated_projections.atlas_card.status, edge_count:(.relationships.edges|length), unresolved:(.relationships.unpopulated_selective_relations|length)}' paper_modules/doctrine_fact_claim_audit.json. Expected proof shape: json_capsule, available_from_capsule_edges, linked_from_capsule_edges, resolved bundle edges, and zero unpopulated selective relations.
Reader Evidence Routing
Start with paper_modules/doctrine_fact_claim_audit.json as the primary reference, then open this Markdown page as a reader guide to that record.
Open standards/std_microcosm_doctrine_fact_claim_audit.json for the standard, required witnesses, negative floor, denied authority, and result record contract.
Open core/fixture_manifests/doctrine_fact_claim_audit.fixture_manifest.json for fixture inputs, copied-body counts, durable result record refs, and source-open body omission rules.
Open examples/doctrine_fact_claim_audit/exported_doctrine_fact_claim_audit_bundle/source_module_manifest.json before inspecting copied source modules; result records carry refs and digests, not copied source body text.
Run the fixture or bundle route from the microcosm-substrate directory and inspect the written JSON files. The component CLI exposes --card, but it does not expose a --json stdout mode.
Use scripts/build_doctrine_projection.py --check-paper-module-corpus to verify this paper-module projection stays inside the shared corpus contract.
Claim-Rot Detection
This component treats documentation claims like cached values that need an invalidation strategy. The failure mode is not only a wrong number; it is a volatile number embedded in current-state prose with no attached route for re-deriving it.
The detector flags volatile numerics: a number near a countable noun inside a current-state section. Such a claim is admissible only when it is bound to a fact assertion that records how to recompute or revalidate the value. The same audit resolves every cited code locus on disk and checks that the quoted anchor is actually present, so stale file references and plausible-but-dead anchors are negative evidence rather than inert prose.
The public fixture does not claim natural-language comprehension. It proves the more useful contract: current-state numerics, fact assertions, DAG refs, code loci, and anchor text can be audited as result record-backed claims instead of untracked documentation drift.
Scope limit: Doctrine fact claim audit checks only public fixture fact counts, code-loci existence, anchor presence, DAG references, and synthetic volatile numeric claim binding cases. It is not a comprehension engine, does not establish a minimum read graph, does not export private doctrine, and excludes launch.
Validation Result record Path
From microcosm-substrate, validate with external result record outputs so the reader check does not churn tracked result records:
A diagram view is generated for this module, and an atlas card links to it. Passing result records validate fact-count, code-locus, DAG-ref, numeric-claim, digest, and negative-case boundaries only. If copied source bodies drift, refresh the exact copy bundle through the owning lane before treating bundle red as a reader-page defect.
Negative cases covered by the fixture manifest: dead_code_locus, dead_dag_ref, missing_code_locus, unbound_numeric_claim, wrong_fact_count.
Source provenance is anchored by examples/doctrine_fact_claim_audit/exported_doctrine_fact_claim_audit_bundle/source_module_manifest.json and result records carry refs, digests, counts, verdicts, and scope boundaries only.
Scope boundary
Scope limit
This module may claim public fixture evidence that doctrine fact assertions, code-locus refs, DAG refs, numeric claim bindings, copied source manifests, digest checks, anchor checks, secret-exclusion scans, metadata-only result records, and negative stale-claim cases are checked by the listed runtime witnesses.
This module may not claim doctrine comprehension, private doctrine export, minimum-read-graph proof, live launch-scope decision, hosted-public posture, source-file changes, candidate-axiom promotion, projection correctness beyond the listed witnesses, or whole-system correctness.
Source and projection details
Governing Lattice Relation
This module is the architecture-and-navigation contract specimen for turning current-state doctrine claims into auditable fact rows. The admitted mechanism, mechanism.doctrine_fact_claim_audit.validates_public_doctrine_fact_claim_audit, does not ask a model whether prose is true. It recomputes a bounded relation: declared fact count, code-locus anchors, route-DAG endpoints, volatile numeric claim bindings, source-module manifest anchors, and semantic negative cases must all agree with the copied public fixture basis before a result record can pass.
That relation is why the bundle binds the module to concept.architecture_and_navigation_route_contract_bundle. Architecture and navigation claims are only readable as doctrine when they can be traced through source rows, code loci, validator commands, and metadata-only result records. The bundle therefore treats the generated Mermaid and Atlas card as route projections of 15 resolved edges, not as independent proof that doctrine coverage is complete.
The principle edges are source-backed claim discipline, not decorative tags. P-1 is exercised when the evaluator recomputes fixture truth rather than echoing declared labels. P-2 is exercised by lowering the positive claim to the checker's strength: fact assertion, code-locus, DAG, numeric-claim, and manifest truth only. P-7 is exercised by recording known unknowns without claiming the unmapped doctrine space is exhausted. P-15 is exercised by keeping this Markdown, the structured source record, Mermaid, and Atlas below the bundle, source module, and validator result records.
The axiom bindings are likewise operational. AX-1 requires a derivation before the page repeats a fact count or source claim. AX-6 keeps the declared fixture domain open-world outside its explicit rows. AX-7 makes failed preconditions typed blocking findings instead of meaningless green output. AX-8 keeps public source refs, manifest digests, secret-exclusion status, and body_in_receipt: false attached as data moves from copied source bodies into result records and reader copy.
The proof consumer for this lattice relation is tests/test_doctrine_fact_claim_audit.py: its positive case, four direct mutation cases, semantic-negative-label override, and exported-bundle test prove that the mechanism is an executable claim-audit boundary. The fixture and bundle CLIs give the same boundary to a reader outside pytest; the corpus check proves only that the Markdown and generated structured source record still agree with the bundle, not that any new doctrine truth has been discovered.
Self-Ignorance Coverage LedgerSelf-Ignorance Coverage Ledger counts known coverage debt while refusing unknown-unknown or absence-proof claims.
Self-Ignorance Coverage Ledger validates a public known-debt coverage fixture: declared Kind Atlas gaps, missing coverage categories, negative cases, source-open body imports, and scope limits. It records what the system knows it has not covered without claiming omniscience, absence proof, total search, source-file changes, public sharing, launch-scope decision, or whole-system correctness.
Scope limit Known Kind Atlas coverage-debt projection only; no literal unknown-unknown omniscience, no absence proof, no total repository search proof, no source-file changes, no launch-scope decision, no publishing-scope decision, and no whole-system correctness.
A navigation system that lists what it knows is easy to build. A system that can state, precisely, what it has not yet covered is harder, and it is the more honest signal to a cold reader. This component answers one question: for a declared set of Kind Atlas families, how many rows does the option surface expose that the generated System Atlas has not yet materialised?
The answer is a small debt vector, computed rather than asserted. For each selected kind the component recomputes the live Kind Atlas row count through system.lib.kind_atlas.build_kind_atlas, counts the entities the build_system_atlas.py graph has actually materialised for that kind, and reports the difference as known coverage debt. Concepts, mechanisms and standards are checked back to real source source files so the materialised set cannot be inflated with names that have no file behind them.
The unusual part is what the validator refuses. It will not accept a fixture that claims its unknown-unknowns are exhaustive: declaring claims_unknown_unknowns_exhaustive raises a finding rather than passing. The ledger reports a bounded count of gaps it can see and explicitly declines to claim there are no others. Known debt is treated as typed residual pressure, not as a completeness proof, and absence of a row is never read as proof that nothing is missing.
Abstract
self_ignorance_coverage_ledger is a public Microcosm Crown Jewel component that measures a narrow, source-grounded form of self-ignorance: known row-level coverage debt between live Kind Atlas option-surface counts and generated System Atlas materialization evidence. It recomputes the selected Kind Atlas families, derives materialized entity IDs from a build_system_atlas.py graph snapshot, source-validates graph-derived entity IDs, replays semantic negative cases, and emits metadata-only result records with scope boundaries.
The current exported bundle is a realness-rung R4 check when the source repo is available: live Kind Atlas counts are bound, the System Atlas graph slice is builder-bound, the live System Atlas graph is cross-checked, expected entity IDs are source-backed, and copied source source bodies are digest-bound through a manifest. The claim is only known_kind_atlas_coverage_debt_projection_only: it is not absence proof, unknown-unknown omniscience, total repository search proof, source-file changes, launch-scope decision, publishing-scope decision, private-system equivalence, provider affiliation, or whole-system correctness.
Problem
Navigation systems can overstate themselves in two opposite ways. A vague "coverage is incomplete" tells a cold reader nothing operational. A confident "nothing else is missing" is worse: it converts absence of evidence into evidence of absence. This component exists to occupy the narrow technical middle: for a declared finite domain of Kind Atlas families, compute the gap between what the option surface exposes and what the System Atlas graph has materialized.
The result is a self-ignorance ledger, not a universal discovery engine. Its positive output is a bounded debt vector. Its negative output is equally important: the validator must refuse fixtures that claim exhaustive unknown-unknown coverage, hand-author materialization counts, substitute entity IDs, use stale/baked expected IDs as authority, tamper with the System Atlas builder result record, or repair a copied-source manifest into a self-reference.
Mechanism
The runtime locus is src/microcosm_core/organs/self_ignorance_coverage_ledger.py. The exported-bundle entrypoint is run_self_ignorance_bundle; the core evaluator is evaluate; the semantic negative-case replayer is evaluate_negative_case; the local scope limit is AUTHORITY_CEILING.
Recompute live row counts through system.lib.kind_atlas.build_kind_atlas; reject forbidden unknown-unknown exhaustiveness.
system_atlas_graph.json
Generated graph slice carrying materialized System Atlas entity IDs.
Require non-empty entities and generated_by == tools/meta/factory/build_system_atlas.py; derive materialized IDs from graph rows.
materialized_entities.json
Declared materialization rows and snapshot metadata.
Check declared counts against graph-derived counts; use graph-derived counts as authority.
projection_protocol.json
Result record for the System Atlas check and coverage scope.
Require the exact coverage scope and a valid build_system_atlas.py --check result record or blocked-refresh result record.
Algorithmically, the component performs this loop:
Load bundle inputs and the source-module manifest through the Crown Jewel common runner.
Recompute selected Kind Atlas rows from the source repo when system/lib/kind_atlas.py is available.
Load system_atlas_graph.json, require the System Atlas builder marker, and derive materialized IDs by kind.
Cross-check the bundled graph slice against state/system_atlas/system_atlas.graph.json when the source repo is available.
For concepts, mechanisms, and standards, verify that graph-derived expected IDs resolve to real source source files.
Compute known_coverage_debt_count = live_kind_atlas_row_count - graph_derived_materialized_count by kind.
Replay semantic negative cases from clean input copies instead of trusting declared error labels.
Write result records with refs, counts, hashes, findings, realness evidence, and scope boundaries; copied body text stays out of result records.
For the current exported bundle, the public count vector is:
Kind
Live Kind Atlas rows
Graph-derived materialized entities
Known debt
concepts
41
30
11
mechanisms
36
28
8
paper_modules
225
220
5
standards
201
29
172
Total
503
307
196
Those numbers come from examples/self_ignorance_coverage_ledger/exported_self_ignorance_coverage_ledger_bundle/kind_atlas_rows.json, materialized_entities.json, and system_atlas_graph.json, and are proof-consuming snapshot facts. They are not stable doctrine constants; rerun the validator after Kind Atlas, System Atlas, or source manifests move.
Projection Protocol Result record
projection_protocol.json is the result record that prevents a static graph slice from masquerading as live authority. The accepted bundle must carry:
Result record fields carry metadata and verdicts, not copied source bodies.
The focused tests test_self_ignorance_coverage_ledger_rejects_projection_scope_tamper and test_self_ignorance_coverage_ledger_rejects_system_atlas_receipt_tamper are the proof consumers for this protocol.
The accepted result must report status pass, known debt 196, observed negative cases forbidden_absence_inference and coverage_debt_mismatch, realness_rung: R4, live_kind_atlas_recompute_used: true, live_system_atlas_graph_crosscheck_used: true, and source-module digest success.
The real-bad cases are not marketing examples; they are the contract. Treat a guard as validated only when the focused pytest route passes in the current checkout:
Perturbation evidence is test_self_ignorance_coverage_debt_moves_with_materialized_entity_graph: adding a real, source-backed standard entity moves the known-debt count from 196 to 195 and keeps the result passing. That proves the ledger is coupled to the graph-derived materialization set, not to a fixed prose number.
The unsourced-materialization guard target is test_self_ignorance_coverage_ledger_rejects_coherent_fake_standard_entity. Its intended refusal is SELF_IGNORANCE_EXPECTED_ENTITY_ID_SOURCE_MISSING, but the paper must not count that guard as validated unless the focused pytest route currently blocks the fake standard entity. If it regresses, lower the source-validation claim to the passing guards above and route the source/test issue through the work log before completion.
Source-Backed Concept / Mechanism / Law Links
Link
Source-backed support
Claim supported
Component self_ignorance_coverage_ledger
organs/self_ignorance_coverage_ledger.json and core/organ_atlas.json::organs[51:self_ignorance_coverage_ledger]
This is an accepted public component with the named runtime locus and paper-module drilldown.
The component is part of the executable architecture/navigation route-contract family.
Principle P-2
principles/P-2.json
Claim strength must be no stronger than the named checker and result record.
Principle P-7
principles/P-7.json
Known gaps remain typed residual pressure, not completeness claims.
Principle P-11
principles/P-11.json
Freshness-sensitive claims require dated result records and refresh routes.
Principle P-15
principles/P-15.json
Generated projections stay below source registries and result records.
Axiom AX-6
axioms/AX-6.json
Closed-world coverage is valid only inside declared finite domains; absence is not negation.
Axiom AX-7
axioms/AX-7.json
Partial computation must totalize as pass or typed refusal with evidence.
Axiom AX-8
axioms/AX-8.json
Provenance and labels must survive source-to-projection and body-import boundaries.
Axiom AX-10
axioms/AX-10.json
Live-state counts require freshness, basis, and rederive contracts.
P-19 appears in the component atlas row as an adjacent governing principle for residual classification, but it is not part of the paper-module bundle's principle_refs. Treat it as component-level context unless the bundle is later updated through the JSON authority lane.
Evidence Contract
The fixture contract lives at core/fixture_manifests/self_ignorance_coverage_ledger.fixture_manifest.json. The active standard lives at standards/std_microcosm_self_ignorance_coverage_ledger.json. Together they admit public synthetic fixtures, copied source source bodies, hashes, anchors, validator refs, and generated result records. They forbid private repo bodies outside copied public fixtures, model-output data bodies, account secret or account-bound material, operator private notes, raw thread bodies, and result record body text for copied material.
The exported bundle manifest at examples/self_ignorance_coverage_ledger/exported_self_ignorance_coverage_ledger_bundle/source_module_manifest.json currently carries one source module: tools/meta/factory/build_system_atlas.py, copied into the bundle under source_modules/tools/meta/factory/build_system_atlas.py. The manifest records the source/target relation, digests, line count, required anchors System Atlas and kind, replacements, and the boundary that transform result records record hashes and replacement classes rather than source bodies.
The standard's result record contract requires a real runtime result record, a source-module manifest for the exported bundle, a secret-exclusion scan, at least the forbidden_absence_inference negative case, and body_in_receipt: false. Synthetic result records are not accepted as stand-ins for this component's authority.
Reader Evidence Routing
Read this module in this order:
paper_modules/self_ignorance_coverage_ledger.json for the generated paper-module projection and relationship edges.
core/paper_module_capsules.json::paper_modules[49:paper_module.self_ignorance_coverage_ledger] for source authority.
standards/std_microcosm_self_ignorance_coverage_ledger.json for the public/private boundary, validator contract, result record expectations, and scope limit.
src/microcosm_core/organs/self_ignorance_coverage_ledger.py for evaluate, evaluate_negative_case, run, and run_self_ignorance_bundle.
examples/self_ignorance_coverage_ledger/exported_self_ignorance_coverage_ledger_bundle/ for the current public evidence bundle.
tests/test_self_ignorance_coverage_ledger.py for proof consumers, bad cases, and perturbation cases.
Treat negative cases as part of the positive claim. The paper should cite only guards that the focused validation route blocks in the current checkout. The validated guard set must include forbidden absence inference, coverage mismatch, baked expected IDs without source, declared entity substitution, materialized count tamper, graph materialization tamper, graph builder tamper, projection protocol tamper, stale source self-reference, and digest mismatch. Fake-but-coherent standard entities remain a required unsourced-materialization guard target, but not an observed passing guard unless the focused test route blocks that perturbation.
Prior Art Grounding
The nearest ordinary analogue is software coverage measurement: coverage tools report what was exercised or missed over a declared surface, not all possible missing behaviors. coverage.py is useful as a reference pattern for bounded observed coverage over a source set.
The health-signal side is adjacent to automated repository checks such as OpenSSF Scorecard: bounded checks can produce useful risk signals without becoming complete security or quality proof. Microcosm applies that pattern to navigation coverage debt and keeps the scope boundary in the same result record frame as the count.
Passing validation proves that the declared public bundle, source-module manifest, negative cases, and paper-module corpus remain coherent. It does not establish freshness beyond the checked snapshot or authorize generated projection edits by hand.
Scope boundary
Limitations
This component is intentionally closed-world over selected artifact kinds. It does not discover arbitrary missing files, prove that all System Atlas materialization gaps are known, or search every repo surface. It counts row-level debt over selected Kind Atlas families and only within the graph/materialization/protocol evidence supplied to the bundle.
Freshness is conditional. projection_protocol.json can record that a System Atlas refresh was blocked by active source claims. In that state, the component reports a bounded snapshot plus a refresh boundary; it does not silently upgrade stale generated materialization into live truth.
Source-open evidence is public-safe, not public-total. The bundle may carry copied source bodies with transformed non-public paths, hashes, line counts, and anchors. That does not export source notes, model-output data bodies, account or browser state, account secrets, browser UI state, or private source-root equivalence.
The current generated paper-module JSON has resolved bundle edges and relationships.unpopulated_selective_relations: []. That is a discoverability and lattice-coherence statement. It is not implementation-correctness proof, launch-scope decision, provider authority, or proof that every related concept/principle/axiom has full empirical support.
Scope limit
The scope limit is narrow. Self-Ignorance Coverage Ledger can claim that public fixture evidence, graph-derived materialization rows, live Kind Atlas recomputation, source-backed expected entity IDs, copied source evidence, semantic negative cases, and result records make declared Kind Atlas coverage debt visible, recomputable, and checkable.
It cannot claim literal unknown-unknown omniscience, absence proof, total repository search proof, source-file changes, live Atlas mutation, private-source export, launch-scope decision, publishing-scope decision, provider affiliation, product readiness, or whole-system correctness. The active v2 standard status is a source JSON contract state only and does not expand the scope limit.
Tool Server Pressure InventoryTool Server Pressure Inventory validates public pressure-inventory fixtures without reading or mutating live host process state.
Tool Server Pressure Inventory validates the public tool-server pressure membrane: synthetic process rows, active-owner descendants, over-budget owner launch requests, redaction findings, source-module digests, metadata-only result records, negative cases, and scope limits. It treats pressure inventory rows as fixture evidence, not live process control, process signalling, host mutation, provider authority, launch-scope decision, public sharing, or whole-system correctness.
Scope limit Declared public helper-process pressure inventory fixture and source-module digest evidence only; no live process reads, process signalling, host mutation, launch-scope decision, external model access, private-data equivalence, publishing-scope decision, or whole-system correctness.
tool_server_pressure_inventory is the public read-only import of the source helper-process pressure inventory pattern from tools/meta/control/orphan_reaper.py. It validates the classifier without exposing live host state: fixtures inject a synthetic ps-shaped process table, synthetic helper-kind policy rows, and a synthetic owner-status taxonomy.
The accepted component keeps the load-bearing mechanism:
parse helper processes from ps-shaped rows
classify helper kind and owner status
distinguish detached orphan candidates from active-owner descendants
emit launch requests for over-budget active owners
keep all rows digest-only through command_hash
The exported bundle carries a source module manifest plus a source-faithful refactor body under source_modules/tools/meta/control/. That manifest records the source source ref, target digest, source digest, relation, material class, and required anchors. Result records carry refs, hashes, counts, and verdicts only; they do not inline the copied/refactored body.
The component rejects seven boundary failures:
active-owner descendants marked as safe-close candidates
unknown-owner processes marked as safe-close candidates
detached processes younger than the minimum age marked safe-close
process-signal results on the public surface
live command bodies instead of digest-only rows
absolute host paths
active-owner launch requests that overclaim kill or termination
Purpose
Long-running agent sessions leave helper processes behind: MCP servers, dev servers, keepalives. Over time these accumulate and the host slows down. The obvious fix is a reaper that walks the process table and kills stale helpers, and the source tool this component is ported from does exactly that. But a reaper is dangerous. The hard case is telling a genuinely abandoned process apart from a helper that a live session is still using. Kill the wrong one and you break the work in flight.
This component answers a single question: given a process table, which helper processes are safe to close, and which must be left alone because a live owner still depends on them? It does so by reconstructing each process's owner chain. A helper whose parent is launchd (ppid == 1) has been detached from any session and is a candidate. A helper that still traces back through a live agent session is not. The decision is deliberately narrow: a process is a safe-close candidate only when it is detached, its kind is on an allowlist, and it has been idle past a minimum age. Everything else routes to "needs an owner check" or "keep".
What is unusual is the second half of the design. When an active owner is over its helper budget, the component does not propose a kill. It emits a launch *request*: a row that asks the owning session to launch or reuse its own lease. The inventory is explicitly not a kill list. The central invariant, enforced by an audit pass over the component's own output, is that an active-owner descendant can never become a safe-close candidate.
The public version keeps that classifier and that invariant but removes every actuator. There is no os.kill, no signal, no live ps call. Input is synthetic process text from a fixture, rows carry a command_hash rather than a command line, and a redaction guard rejects any fixture that smuggles an absolute path, a live command body, or a process-signal claim onto the public surface. The result is the safety reasoning of a reaper presented as a read-only validator, with the part that could actually harm a host left out.
Shape
Diagram source
flowchart LR Fixture["Synthetic pressure fixture process_table, pressure_policy, owner_classes"] Classifier["Classify helper kind, walk owner chain (up to 8 hops), hash command to command_hash"] Owner{"Owner status?"} Detached["Detached orphan ppid == 1"] Keep["Active owner or keep runtime"] SafeClose["candidate_safe_close only if allowlisted and age >= min"] Check["requires_owner_check or keep"] launch["Over-budget owner: launch REQUEST, never a kill"] Negative["Boundary failures unsafe safe-close, command leak, process signal, absolute path, launch overclaim"] Source["Source manifest public refactor digest + anchors"] Result records["metadata-only result records result, board, validation, fixture sign-off"] Fixture --> Classifier Classifier --> Owner Owner --> Detached Owner --> Keep Detached --> SafeClose Detached --> Check Keep --> Check Keep --> launch Classifier --> Negative SafeClose --> Result records Check --> Result records launch --> Result records Negative --> Result records Source --> Result records
Technical Mechanism
The runtime mechanism is an actuatorless port of the read-only pressure path in tools/meta/control/orphan_reaper.py. The component receives injected synthetic ps_text plus pressure_policy.json and owner_classes.json; it never shells out to ps, imports process-control modules, or sends signals. _parse_process_rows normalizes process rows, _process_kind maps command tokens to helper kinds, and _owner_status_for_process walks parent links up to eight hops to separate launchd_detached helpers from active owner chains and keep runtimes.
The decision law is deliberately narrow. _inventory_owner_and_decision emits candidate_safe_close only when a helper is detached (ppid == 1), its kind is allowlisted, and its age exceeds the configured threshold. Active-owner chains, unknown parents, young detached helpers, and keep runtimes route to requires_owner_check or keep. Over-budget active-owner groups are summarized by _active_owner_pressure_groups, but the emitted helper_owner_release_request_v1 can only ask the owner to launch the helper; it cannot claim that Microcosm killed, terminated, or safely closed a process.
The source-open body floor and the public result records enforce the same membrane. _source_module_manifest_result verifies the exported orphan_reaper_pressure_inventory_public_refactor body, its source_faithful_public_refactor relation, target digest, and required anchors. _redaction_findings rejects command previews, absolute host paths, and process-signal claims before result record writing. The result is a pressure classifier with executable evidence and a hard no-actuator boundary, not a live host cleanup tool.
Reader Evidence Routing
Read the positive fixture as pressure-inventory evidence, not host process control. The fixture supplies process_table.json, pressure_policy.json, and owner_classes.json; the component classifies helper kind, owner status, detached safe-close eligibility, active-owner descendants, keep runtimes, and over-budget active-owner groups. Active-owner pressure becomes a launch request row, not a kill, terminate, or signal action.
Read the negative cases as the scope limit. The required failures are active_owner_kill_candidate.json, unknown_owner_kill.json, premature_safe_close.json, process_signal_sent.json, command_preview_leak.json, absolute_path_leak.json, and owner_release_overclaim.json. They prove the public surface rejects unsafe safe-close candidates, live command bodies, absolute host paths, process-signal claims, and launch-overclaim language.
Read source-open evidence through the source module manifest. The exported bundle includes one copied public refactor body at examples/tool_server_pressure_inventory/exported_tool_server_pressure_inventory_bundle/source_modules/tools/meta/control/orphan_reaper_pressure_inventory.py. The manifest binds source and target digests, declares source_faithful_public_refactor, requires anchors such as build_tool_server_pressure_inventory, build_pressure_hygiene_relief_receipt, no_process_signal_sent, and request_owner_release, and keeps body_in_receipt and body_text_in_receipt false.
Named Proof Consumers
Runtime fixture consumer: microcosm_core.organs.tool_server_pressure_inventory run consumes the synthetic pressure fixture and writes the result, board, validation result record, and sign-off result record.
Source-body consumer: microcosm_core.organs.tool_server_pressure_inventory run-pressure-bundle consumes the exported source-module bundle and blocks on missing manifests, target-ref mismatch, digest mismatch, unsafe body classes, or redaction hits.
Focused pytest consumer: tests/test_tool_server_pressure_inventory.py asserts every expected negative case, verifies that active-owner descendants are never safe-close candidates, checks owner-launch requests instead of kill actions, scans the component and public refactor AST for process-control imports or .kill(...), validates target-ref/digest parity, and checks compact card omission result records.
Scope limit consumer: the standard standards/std_microcosm_tool_server_pressure_inventory.json and the scope limit in the component require process_signal_authority, live_process_table_read_authorized, host_mutation_authorized, release_authorized, provider_calls_authorized, and whole_system_correctness_claim to remain false.
Prior Art Grounding
This component draws on process-inventory, tool-server, and owner-reference patterns. psutil.process_iter() is a common API for iterating over process metadata without shelling out to ad hoc ps parsing. Kubernetes garbage collection uses owner references to distinguish objects that may be collected from objects still owned by live controllers. The Model Context Protocol's tool-server model gives the local "server exposes callable tools" shape. The Microcosm version keeps the result deliberately weaker: synthetic rows are classified for pressure and safe-close eligibility, but the component does not read live host state or send signals.
Prior-art anchors:
psutil process iteration: https://psutil.readthedocs.io/en/latest/#psutil.process_iter
Model Context Protocol tool servers: https://modelcontextprotocol.io/docs/concepts/tools
Scope limit: this is projection and validation only. It does not read the live process table, signal processes, mutate host state, include launch operations, use external model services, export private account or browser state, or prove whole-system correctness.
Validation Result record Path
From microcosm-substrate, validate with result records under /tmp:
Passing result records prove synthetic inventory classification and source-manifest shape only; they do not read live host process state, send process signals, mutate host state, authorize cleanup, use external model services, or certify launch-scope decision. A diagram view and an atlas entry are generated for this module from the same source row.
Scope boundary
Scope limit
This module can claim that synthetic process-table fixtures, owner-status policy rows, digest-only command rows, boundary-failure cases, source manifest evidence, and metadata-only result records validate a public tool-server pressure classifier. It cannot claim live host inspection, process signaling, safe cleanup authority, host-state mutation, provider authority, launch-scope decision, private account or session export, or whole-system correctness.
Source and projection details
Governing Lattice Relation
The generated row binds this module to mechanism mechanism.tool_server_pressure_inventory.validates_public_tool_server_pressure_inventory, concept concept.import_projection_and_drift_control_bundle, principles P-2, P-4, P-6, and P-9, axioms AX-3, AX-5, AX-7, and AX-8, and the runtime code locus src/microcosm_core/organs/tool_server_pressure_inventory.py. Those edges make the module a Microcosm import-and-validation proof: source-open digest evidence is allowed, while private host state, process control, provider authority, launch-scope decision, and whole-system correctness stay outside the claim.
The dependency edges to mission_transaction_work_spine, provider_context_recipe_budget, and world_model_projection_drift_control_room define the reader route. This module can explain how a helper-pressure row becomes a metadata-only result record and an owner-launch request, but it must borrow mission-landing, provider-budget, and projection-drift boundaries from those sibling modules before any broader operational or launch claim is made.
Mechanistic Interpretability Circuit Attribution ReplayMechanistic interpretability replay validates public circuit-attribution result record contracts without live model access or private activation export.
Mechanistic Interpretability Circuit Attribution Replay checks that public circuit-attribution replay rows carry feature-to-edge links, intervention deltas, sufficiency and faithfulness limits, target/source refs, source-module digest evidence, secret-exclusion scans, negative cases, and scope limits. It validates the result record contract only, not live model transparency, private weights, raw activations, proprietary prompts, hidden reasoning, external model access, benchmark claims, launch, public sharing, or whole-system correctness.
Scope limit Declared public circuit-attribution runtime result record and source-module import evidence only; no live model access, model-transparency product claim, export of private weights, raw activations, proprietary prompts, hidden reasoning, external model access, benchmark claims, launch-scope decision, publishing-scope decision, or whole-system correctness.
Interpretability writing is unusually easy to overstate. A named feature can read like understanding, a graph picture can read like a discovered circuit, and a small local script can read like access to a real model. This component exists to hold one kind of claim to a smaller, checkable size. It answers a single question: before Microcosm lets a circuit-attribution story stand as public evidence, does the story survive a deterministic replay rather than being taken on trust?
The part worth noticing is how narrow the proof is, and how that narrowness is the point. The component does not attempt to interpret a trained model. It carries a tiny two-layer toy transformer with weights declared in the fixture, recomputes its forward pass, gradient attribution, and per-feature ablation, and then compares the recomputed top feature against the feature the fixture claims. A row passes only when the declared winner still matches after recomputation. Perturb the toy weights and leave the old claim in place and the row is rejected, because the recomputed answer has moved while the prose has not. That is the failure mode the component is built to catch: an interpretability statement that was once true of its inputs but no longer is.
Around that recomputation sit three further gates. Graph evidence must be machine-readable and traversable from declared sparse features to public error nodes, so a screenshot cannot stand in for a circuit. Transparency language needs a causal-intervention reference and faithfulness language needs an explicit limit, so the strongest words carry the strongest evidence requirements. Private weights, raw activations, proprietary prompts, and hidden reasoning are kept out of every result record. What the component produces is an accounting result record for a public fixture, not a transparency tool for any real model.
Abstract
mechanistic_interpretability_circuit_attribution_replay is a public Microcosm component that validates whether circuit-attribution claims are safe to represent as result record evidence. It is not a model-transparency product and does not inspect a live provider model. The component checks a fixture and exported bundle for machine-readable feature graph rows, causal-intervention references, faithfulness limits, source-module digest evidence, negative cases, and a small input-coupled toy-transformer replay.
The technical proof is deliberately modest. A replay passes only when its declared circuit-attribution story agrees with recomputed toy-transformer forward, gradient, and ablation winners; when graph evidence is traversable from public sparse features to public error nodes; when public result records omit private or raw bodies; and when the source-open body floor is backed by copied, source source modules with matching digests. A stale declared top feature is disconfirmed by perturbing the input fixture while leaving the old claim in place.
Problem Statement
Interpretability prose is easy to overclaim: a feature name can sound like transparency, a graph screenshot can sound like a circuit, and a local fixture can sound like model access. This module makes the public claim smaller and more testable. It asks: before Microcosm lets a circuit-attribution story become public evidence, can the story survive a deterministic replay membrane that checks structure, causality refs, source provenance, and explicit scope boundaries?
The answer is local and result record-scoped. Microcosm may claim public circuit-attribution replay accounting for this fixture and exported bundle. It may not claim live model internals, private weights, raw activations, proprietary prompts, hidden reasoning, provider behavior, benchmark claims, publishing-scope decision, hosting, launch-scope decision, or whole-system interpretability correctness.
The technical contribution is therefore an accounting membrane, not a new interpretability algorithm. The membrane turns an interpretability-shaped fixture into a pass/fail public result record by requiring all claim-bearing rows to cross four gates:
Gate
Accepts
Rejects
Replay schema
Feature ids, graph rows, causal refs, sufficiency and faithfulness limits, contradiction refs, cold-replay refs, target refs, and metadata-only result record flags.
Missing required fields, unverifiable feature labels, screenshot-only graph evidence, transparency claims without causal-intervention refs, and faithfulness claims without limits.
Graph traversal
Machine-readable nodes and edges with a path from declared sparse features to public error nodes.
Disconnected edges and decorative constant-delta edge-weight sequences.
Toy recomputation
Fixture-coupled forward, gradient, ablation, weight digest, and declared-winner comparison.
Internal default toy specs, stale declared winners, or uncoupled cached result records.
Source/body boundary
Copied source bodies with digest, class, anchor, and metadata-only result record checks.
Private weights, raw activations, proprietary prompt bodies, hidden reasoning, model-output data, body text in result records, and launch-scope decision.
flowchart TD Bundle["JSON bundle paper_module.mechanistic_interpretability_circuit_attribution_replay"] Fixture["Fixture / exported bundle feature catalog, replay rows, toy-transformer spec"] Policy["Policy gates required fields, forbidden private/raw exports, faithfulness limits"] Graph["Graph analyzer feature ids -> edges -> public error nodes"] Toy["Toy-transformer replay forward + gradient + ablation recomputation"] Source["Source-open body floor copied source bodies + digest checks"] Result records["metadata-only result records refs, digests, counts, verdicts"] Ceiling["Scope limit public replay accounting only"] Bundle --> Fixture Fixture --> Policy Fixture --> Graph Fixture --> Toy Fixture --> Source Policy --> Result records Graph --> Result records Toy --> Result records Source --> Result records Result records --> Ceiling
The component has four coupled checks:
Replay policy validation: each positive row must carry toy prompt refs, sparse feature ids, machine-readable graph nodes and edges, replacement-model approximation scores, causal inhibition and injection refs, causal-intervention result record refs, sufficiency labels, faithfulness limits, contradiction-case refs, cold-replay refs, target refs, and body_in_receipt: false.
Graph analysis: _graph_analysis_for_replay verifies that graph edges resolve to declared nodes and that at least one path exists from the row's sparse feature ids to a public error node. _weight_sequence_analysis rejects simple decorative arithmetic edge-weight sequences across replay rows.
Toy-transformer replay: _toy_transformer_attribution_runtime recomputes a pure-Python two-layer toy transformer from fixture-provided token_ids, embeddings, layer1, layer2, and target_logit_index, then compares the recomputed top attribution and ablation features against declared winners.
Source/body boundary: _source_module_manifest_result, _source_open_body_import_summary, scan_paths, _write_receipts, and result_card verify copied source bodies while keeping result record payloads metadata-only and public-safe.
Implementation Contract
Runtime locus
Role in the mechanism
Evidence surface
run
First-wave fixture validator. It loads the public input directory, negative cases, source-module manifest, secret-exclusion policy, and sign-off output.
Source-open body floor: copied source body checks with digest, class, anchor, and metadata-only result record constraints.
Source-module exact-import and body-text rejection tests
_write_receipts / result_card
Public output membrane. Result records and cards carry refs, digests, counts, omitted-payload flags, and scope limits rather than source bodies or private state.
Result record-boundary and card-reuse tests
Toy-Transformer Attribution Mechanism
The toy-transformer runtime is intentionally small enough to audit. The fixture in fixtures/first_wave/mechanistic_interpretability_circuit_attribution_replay/input/attribution_replays.json declares:
token_ids: [0, 1, 2]
a three-row embedding table over two dimensions
a two-by-three first layer
a three-by-two second layer
target_logit_index: 1
expected top feature by attribution and ablation: toy_hidden_feature_1
The runtime computes token embeddings, averages them into a context vector, applies the first layer, applies a tanh hidden activation, applies the second layer, and reads the target logit. It then computes activation-gradient scores for each hidden feature, using the analytic tanh derivative 1 - h^2 so the attribution score is grounded in the same forward pass rather than a separate estimate. It also ablates each hidden feature in turn, zeroing it and re-reading the target logit, to measure the output delta that feature is responsible for. The fixture currently produces target logit 0.044176; both the gradient attribution and the ablation delta select toy_hidden_feature_1, and the row passes only because those two independent paths agree with each other and with the fixture's declaration.
The important point is not that this is a serious transformer. It is a deterministic proof harness for the public replay claim. The result record can say the declared top feature agrees with recomputation only because the verifier recomputes from input fields and compares the result. The result record also records a weight digest so cached or exported bundle cards can prove which fixture basis they are coupled to.
Discriminating Tests
The proof is strongest where it distinguishes a real coupling from a plausible but stale story. The focused tests exercise those distinctions directly:
Marks source body text as present in result record material.
Blocks the source/body import.
Source-open evidence remains metadata-only at result record boundaries.
Evidence Contract
Evidence class
Local authority
What it proves
What it does not establish
Bundle binding
core/paper_module_capsules.json row 52
The paper module, component, mechanism, source locus, and generated projection statuses are linked.
Markdown is not promoted to source authority.
Replay rows
fixtures/.../input/attribution_replays.json and exported bundle mirror
Six public replay rows with feature ids, graph edges, causal refs, faithfulness limits, contradiction refs, cold replay refs, and metadata-only target refs.
The refs are fixture/accounting evidence, not live model internals.
Feature catalog
fixtures/.../input/feature_catalog.json
Six public sparse-feature summary ids with labels and no private weights or activation dumps.
It does not disclose trained-model features or raw activations.
Toy runtime
_toy_transformer_attribution_runtime and focused tests
Forward, gradient, ablation, digest, and stale-declaration checks are recomputed from the input fixture.
The toy runtime is not a general interpretability method.
Graph analysis
_graph_analysis_for_replay and _weight_sequence_analysis
Graph rows are machine-readable, traversable, and not decorative constant-delta weight sequences.
It does not validate a real neural circuit.
Source-open body floor
source_module_manifest.json plus source_modules/
Eleven copied source bodies have digest/anchor/material-class checks.
Bodies are not copied into result records and do not authorize private/live export.
Result record set
receipts/first_wave/..., result records/sign-off/..., runtime-shell lens
Public outputs carry refs, digests, counts, verdicts, omitted-payload flags, and scope limits.
Result records do not publish private model data or launch-scope decision.
Reader Evidence Routing
The proof consumer for this reader slice is the focused interpretability replay suite plus the paper-module corpus parity check. The table below is the route a rank/projection reader should follow before trusting any claim in this module:
Reader question
Source surface
Focused proof consumer
Scope limit
Is this module bound to a real component and mechanism?
core/paper_module_capsules.json::paper_module.mechanistic_interpretability_circuit_attribution_replay and paper_modules/mechanistic_interpretability_circuit_attribution_replay.json
_toy_transformer_attribution_runtime over fixture-provided token_ids, weights, and target_logit_index
test_mechanistic_interpretability_toy_transformer_runtime_computes_attribution, perturbation, and stale-claim tests
Proves fixture-local recomputation, not a general interpretability method.
Are graph rows actual circuit evidence rather than screenshots?
_graph_analysis_for_replay and _weight_sequence_analysis over declared graph nodes, edges, and public error nodes
disconnected-graph and decorative-weight regression tests
Proves machine-readable traversability and anti-decoration checks, not a real neural circuit.
Do source-open bodies stay out of result records?
source_module_manifest.json, copied source_modules/, _source_module_manifest_result, and _write_receipts
source-module exact-import and body-text-in-result record rejection tests
Proves copied body floor and metadata-only result records, not private/live export authority.
Where does a reader start when projections disagree?
source record, generated JSON instance, runtime source, focused tests, then result records
corpus check and focused pytest together
Failure Modes And Limitations
Missing required replay fields block with INTERPRETABILITY_REPLAY_FIELD_REQUIRED.
Feature names without catalog-backed ids block with INTERPRETABILITY_FEATURE_NAME_UNVERIFIABLE.
Graph screenshots or disconnected graph rows block because machine-readable edges and traversable paths are required.
Transparency language without a causal-intervention result record blocks with INTERPRETABILITY_INTERVENTION_RECEIPT_REQUIRED.
Faithfulness language without explicit limits blocks with INTERPRETABILITY_FAITHFULNESS_REQUIRES_LIMITS.
Private model weights, raw activation dumps, proprietary prompt exports, hidden chain-of-thought exports, model-output data bodies, and launch-scope decision are forbidden public outputs.
Decorative graph-weight sequences block as suspected fabrication.
Stale declared toy-transformer winners block when recomputation selects a different top feature.
The proof is fixture-local. It verifies a public replay membrane and copied source evidence; it does not certify real-world model faithfulness.
Relation To Interpretability Literature
The module borrows its accounting shape from the transformer-circuits and mechanistic-interpretability tradition: circuits should be graph-structured, features should be identifiable, causal language should be backed by interventions, and faithfulness language should be bounded. Useful prior-art anchors include Anthropic's transformer-circuits framing, causal scrubbing, and SAE/sparse-feature circuit work.
Microcosm does not reproduce those methods. The local contribution is a public replay boundary around an interpretability-shaped claim: machine-readable edges instead of screenshots, causal-intervention refs instead of bare transparency language, fixture recomputation instead of stale row trust, and explicit scope boundaries before a claim becomes public evidence.
Relation To Microcosm Concepts, Mechanisms, And Principles
This consumes the exported circuit-attribution bundle, copied body floor, digest checks, metadata-only result records, command-card omission contract, and runtime-shell validation shape.
The focused regression pins recomputation, stale-row rejection, graph and source-body gates, card result record reuse, and body-text exclusions.
Reader Route
A cold reader should inspect in this order:
core/paper_module_capsules.json row 52 for authority and projection binding.
paper_modules/mechanistic_interpretability_circuit_attribution_replay.json for generated relationship edges.
src/microcosm_core/organs/mechanistic_interpretability_circuit_attribution_replay.py for runtime logic.
tests/test_mechanistic_interpretability_circuit_attribution_replay.py for the stale-row, perturbation, graph, source-body, and result record-boundary proof.
fixtures/first_wave/mechanistic_interpretability_circuit_attribution_replay/input for the fixture.
examples/mechanistic_interpretability_circuit_attribution_replay/exported_circuit_attribution_bundle for the public bundle.
receipts/first_wave/mechanistic_interpretability_circuit_attribution_replay and receipts/runtime_shell/public_mechanistic_interpretability_circuit_attribution_replay_lens.json for metadata-only public result record evidence.
Prior Art Grounding
This replay exercises a circuit-attribution pass that traces which internal components account for a behaviour. It is grounded in mechanistic interpretability, the study of the internal circuits of neural networks (Anthropic, Transformer Circuits). Microcosm borrows the attribution-replay shape over synthetic fixtures; the result is fixture-bound runtime evidence, not live model access, a transparency product, or a correctness claim about any real model.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
These are reader-verifiable evidence only and do not include launch operations, external model access, source-file changes, or whole-system correctness.
Scope boundary
Authority And Evidence Boundary
Source authority: core/paper_module_capsules.json::paper_modules[52:paper_module.mechanistic_interpretability_circuit_attribution_replay] with source_authority: json_capsule.
This Markdown is a human-readable paper projection. The bundle JSON binds the component, mechanism, source locus, generated Mermaid status available_from_capsule_edges, and Atlas status linked_from_capsule_edges. The runtime, fixtures, tests, result records, and manifests are the technical evidence for the claims below.
Scope limit
This module may claim:
public, cold-replayable circuit-attribution accounting for the named fixture and exported bundle;
feature ids tied to machine-readable graph edges and traversable public error-node paths;
causal-intervention result record refs and faithfulness-limit refs are required before transparency or faithfulness language passes;
the toy-transformer declaration is input-coupled to recomputed forward, gradient, and ablation evidence;
stale toy-transformer declarations are rejected by focused tests;
copied source source bodies are verified by manifest and digest checks while result records remain metadata-only.
It may not claim:
live model access or external model access;
private weights, raw activation tensors/dumps, proprietary prompts, hidden chain-of-thought, hidden reasoning, or model-output data export;
real model-transparency product status;
benchmark claims authority;
public sharing, hosted-product readiness, launch-scope decision, or recipient-send authority;
The manifest covers copied source bodies: Oracle attribution maps, pattern-ledger rows, high-novelty scout records, component projection IR, projection readiness code, mission transaction preflight code, execution trace code, strict JSON code, and trace/readiness standards. The runtime verifies classification, material class, body-copied status, body-not-in-result record status, target digest, source/target digest agreement, line count when the source is available, and required anchors.
The body floor excludes private model weights, raw activations, proprietary prompts, hidden reasoning, model-output data, account or browser state, browser or HUD state, account secret material, private source-root material, public sharing, hosting, and launch-scope decision.
Spatial world-model demos are unusually easy to oversell. A plausible-looking video, or a row that simply asserts "the model predicted the next state correctly", can pass for understanding without anything having been checked. This component exists to answer one narrow question: does a declared spatial counterfactual row actually bind a source state, an event, and a predicted outcome that survive an independent recomputation, or is it just a shape that looks right?
The approach is the unusual part. The predicted actor count, transition delta, event label, and spawn cells are derived from the inputs (sensor-packet refs, consistency budget, topology), so a stale or hand-edited prediction no longer matches and the row blocks. The point is not a good simulator. The point is that a spatial-AI claim cannot pass on appearance alone: it has to agree with a recomputation a reader can audit in one screen.
Abstract
spatial_world_model_counterfactual_simulation_replay is a Microcosm component for checking spatial world-model counterfactual claims as metadata transitions, not as generated video, robotics control, AV simulation, geographic truth, or benchmark authority. The component validates six synthetic scene-state rows, six counterfactual replay rows, six predicted transition rows, eight forbidden-claim negative cases, and an exported source-module bundle whose result record stays metadata-only.
The technical claim is deliberately small: for each replay row, the runtime recomputes a deterministic toy gridworld next state from the declared scene state, counterfactual event, sensor-packet refs, consistency budget, topology ref, and limitation labels; it then compares that actual transition against the declared predicted state, transition diff, and oracle check. A green run proves the public replay rows are internally consistent and bounded by their scope limit. It does not establish real-world spatial accuracy, trained simulator quality, generated-video correctness, robot or AV operation, provider behavior, hosting, public sharing, launch-scope decision, or whole-system correctness.
Telos
World-model demos are easy to overstate because visual plausibility can hide whether any state transition was checked. This component makes the proof surface inspectable: a reader can see the scene-state ref, action trace, predicted-state ref, transition-diff ref, oracle-check ref, fidelity limit, limitation labels, negative cases, and source-module digest evidence before accepting any spatial counterfactual claim.
The useful result is not a better simulator. The useful result is an evidence spine that refuses to let a spatial-AI claim advance unless the public row binds input state, counterfactual event, predicted output, actual recomputation, and scope boundary boundary in one result record.
Mechanism
The positive fixture has six scene states and six matching replay rows: warehouse occlusion, crosswalk emergence, drone-corridor gust recovery, mobile robot reflective-floor detour, loading-dock pallet shift, and unprotected-turn late yield. Each row declares a source scene-state ref, action-trace ref, counterfactual event, predicted-state ref, transition-diff ref, oracle-state-check ref, two public sensor-packet refs, a rare-event label, a fidelity-limit label, limitation labels, and explicit false values for private video, raw sensor export, live operation, geography, simulator-product, generated-video-only, benchmark, and launch claims.
Runtime transition checking happens in _state_transition_analysis:
The component resolves each replay to exactly one state-transition row.
It builds an 8 x 8 toy gridworld from the source scene's actor count and topology ref.
It maps the counterfactual event to a deterministic event action such as new_dynamic_actor.
It recomputes the actual next state and transition diff from the input row.
It compares predicted actor count, transition delta, event label, spawn cell or cells, predicted-state ref, diff ref, oracle-check ref, and metadata-only result record status.
The input-driven part matters. Actor-count delta is not copied from the expected fixture. It is recomputed as:
Spawn cells are also input-derived: the runtime hashes the event, replay id, scene-state ref, topology ref, sensor-packet refs, consistency budget, limitation labels, and source actor count, then walks the bounded grid from the declared event cell. This makes the row sensitive to real input changes while remaining small enough to audit.
Transition Evidence
The current fixture proves a narrow but useful invariant: all six declared predicted states match the runtime's actual toy-gridworld step. The focused test expects:
scene_state_count == 6
replay_count == 6
state_transition_count == 6
predicted_state_body_count == 6
deterministic_simulation_pass_count == 6
gridworld_step_count == 6
predicted_actual_match_count == 6
transition_diff_count == 6
oracle_state_check_count == 6
sensor_packet_ref_count == 12
Those counts are technical evidence only because the runtime recomputes the state transition before accepting them. The result record cannot be read as a learned world-model score; it is a public replay consistency check over synthetic metadata and copied source-module digests.
Real-Bad Mutation Contract
The regression suite includes deliberately bad mutations that show the proof is not just shape validation:
If a transition row changes actor_count_delta from the recomputed value, run_simulation_bundle blocks with SPATIAL_STATE_TRANSITION_SIMULATION_MISMATCH.
If the predicted state misses the gridworld step, the transition row records predicted_state_actor_count_mismatch while the recomputed actual state still shows the expected gridworld execution.
If a replay gains an extra sensor-packet ref, the recomputed actor delta moves from 1 to 2. The stale expected transition blocks until the predicted actor count, actor delta, and spawn cells are updated to match the new actual transition.
If the source scene actor count and topology ref change, the recomputed source and spawn-cell state moves. The stale predicted state blocks until the transition row is updated.
If a source-module manifest tries to place copied body text inside a result record, the source-module summary blocks with SPATIAL_SOURCE_BODY_TEXT_IN_RECEIPT_FORBIDDEN and SPATIAL_SOURCE_MODULE_BODY_TEXT_IN_RECEIPT_FORBIDDEN.
The negative payload cases are similarly typed: private video export, raw sensor export, live robot or AV operation, real-world location claims, simulator-product claims, generated-video-only authority, geographic accuracy claims, and benchmark-score claims without state-diff result records all have explicit forbidden-code coverage.
Shape
Diagram source
flowchart TD Scene["Scene-state row actor count + topology"] --> Replay["Counterfactual replay row event + sensor refs + budget"] Replay --> Step["Deterministic toy gridworld step 8x8 bounded recomputation"] Step --> Actual["Actual next state actor delta + spawn cells"] Replay --> Expected["Declared predicted state transition diff + oracle check"] Actual --> Compare{"Actual matches declared transition?"} Expected --> Compare Compare -->|yes| Result record["metadata-only pass result record counts + refs + digests"] Compare -->|no| Finding["Typed mismatch finding blocked status"] Replay --> Boundary{"Forbidden payload or claim?"} Boundary -->|no| Result record Boundary -->|yes| Finding
This diagram is a reader map for the runtime proof. The generated doctrine lattice Mermaid remains the bundle-derived edge proof.
Reader Evidence Routing
Read this page from source authority outward:
Open core/paper_module_capsules.json::paper_modules[53:paper_module.spatial_world_model_counterfactual_simulation_replay] for the JSON bundle and scope limit.
Open paper_modules/spatial_world_model_counterfactual_simulation_replay.json for generated relationship edges, Mermaid status, Atlas status, and source_authority: json_capsule.
Inspect src/microcosm_core/organs/spatial_world_model_counterfactual_simulation_replay.py, especially _state_transition_analysis, _gridworld_step, _gridworld_actor_count_delta, _gridworld_spawn_cells, _replay_policy_findings, and _source_module_manifest_result.
Inspect fixture inputs under fixtures/first_wave/spatial_world_model_counterfactual_simulation_replay/input and exported-bundle inputs under examples/spatial_world_model_counterfactual_simulation_replay/exported_spatial_world_model_simulation_bundle.
Inspect tests/test_spatial_world_model_counterfactual_simulation_replay.py for the positive replay, public-relative result record, source-module import, body-text rejection, transition-delta mutation, predicted-state mutation, input-perturbation, scene-perturbation, and fresh-card reuse contracts.
The runtime shell also exposes the compressed lens at:
microcosm spatial-simulation
Prior Art Grounding
This replay exercises a spatial world model under counterfactual interventions. It is grounded in the world-models line of work (Ha and Schmidhuber, World Models), where an agent learns a compressed model of its environment it can roll forward under hypothetical actions. Microcosm borrows the counterfactual-rollout shape over synthetic metadata; the result is fixture-bound replay evidence, not robot or AV operation, real-world geography, or a calibrated simulator.
Validation Result record Path
Run from microcosm-substrate:
The expected bundle projection is Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, and 20 generated relationship edges. These checks prove the public synthetic replay and source-module import boundary only; they do not validate real geography, robot or AV operation, simulator-product claims, benchmark claims, public sharing, hosting, or launch.
Scope boundary
Public Boundary
The exported bundle may include copied Station geometry source bodies as public source-open material, but result records carry refs, digests, counts, and verdicts only. They must not carry private video bodies, raw sensor payloads, GPS trace bodies, model-output data, account or browser state, account secrets, or live-access material.
The scope limit is therefore:
allowed: synthetic scene-state refs, action-trace refs, predicted-state refs, transition-diff refs, oracle-check refs, source-open public sensor-packet refs, rare-event labels, fidelity-limit labels, limitation labels, source-module digests, negative-case result records, and metadata-only validation result records;
not allowed: simulator-product authority, private video export, raw sensor export, live robot or AV operation, real-world geography claims, benchmark claims, external model access, hosting, public sharing, launch-scope decision, private-system equivalence, or whole-system correctness.
Limitations
The dynamics are toy dynamics. The 8 x 8 gridworld models actor counts and spawn cells from public metadata; it does not model perception, control, physics, sensor calibration, camera geometry, lidar, maps, vehicle dynamics, human behavior, or material truth. The synthetic events are useful because they force state-diff accounting, not because they approximate the real world.
The fixture is also finite. It covers six public replay rows, six transition rows, two sensor refs per replay, eight negative claim families, and three copied source modules. It does not establish all possible spatial counterfactuals, full secret absence outside the scanner envelope, complete robotics safety, simulator correctness, or future fixture coverage.
The source-open body floor is limited to exact copied Station geometry guardrail bodies named by the source-module manifest and verified by digest. That does not certify private source-root equivalence, private video or raw sensor availability, account or browser state, provider behavior, hidden GPS trace bodies, live-access material, or launch-scope decision.
Scope limit
This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.
It may not claim more than its bundle scope limit allows: Declared public synthetic spatial counterfactual-replay metadata and source-module import evidence only; no robot or AV operation, real-world geographic accuracy, simulator product validation, generated-video authority, benchmark claims, external model access, hosting, launch-scope decision, publishing-scope decision, or whole-system correctness.
Prediction Oracle ReconciliationPrediction Oracle Reconciliation exercises synthetic forecast reconciliation gates without forecasting, trading, provider, or live-market authority.
Prediction Oracle Reconciliation validates synthetic prediction packets through CP1 fork preservation, CP2 target-universe checks, pre-target evidence limits, oracle-diff grading, bounded dossier edits, numeric reconciliation rows, source-module imports, negative cases, and scope limits. It is a projection-mechanics replay, not a forecasting correctness claim, investment or trading decisions, live market data call, external model access, private-data equivalence, public sharing, launch, or whole-system correctness.
Scope limit Synthetic invented prediction packet and source-module import evidence only; no forecasting correctness or accuracy, no trading, financial, or investment-related actions, no live market data, no external model access, no prediction public sharing, no performance track record, no non-public data import, no launch-scope decision, no publishing-scope decision, and no whole-system correctness.
prediction_oracle_reconciliation is a source-available runtime fixture component for the prediction-engine slice. It compresses the source pattern group around CP1 bifurcation resolution, CP2 valid target universes, oracle grounding firewalls, diff grading, and dossier mutation into a synthetic packet a cold reader can run.
It is deliberately not a market product. The component has no live data, no external model access, no trading authority, no financial or investment-related actions authority, no publishing-scope decision, and no launch-scope decision. Its job is to make the reasoning shape inspectable without making performance or action claims. The result record contract is source-open by default: public fixture packets, exported bundle refs, source refs, and runtime result records carry the evidence, while secret_exclusion_scan blocks only live market feeds, model-output data bodies, account or browser material, private dossiers, and account secret-equivalent access.
Purpose
A forecast that gets the direction right can still be badly wrong about the number, and a forecast can look accurate only because it quietly used evidence that arrived after the outcome it was meant to predict. This component exists to make those two failures visible on a synthetic packet, before any reasoning is dressed up as a track record. The single question it answers is narrow: does this prediction packet keep its evidence honest and its grading recomputable, or does it cut a corner?
The unusual choice is that the component does not trust the numbers the packet reports. For every numeric row it recomputes the absolute error, the percent error, and the direction hit from the snapshot, predicted, and realized prices, then rejects any claimed value that contradicts the recompute. It also surfaces a direction hit that is still a large numeric miss rather than letting the correct arrow hide the size of the error. Evidence is split at the prediction time: a reference that points past the target window is refused, not silently scored.
None of this is forecasting. There is no live market data, no external model access, no trading or investment-related actions, and no performance claim. The packet, its target universe, and its realized values are invented fixtures. A direction hit or a numeric miss inside a result record is a statement about the fixture and the grading mechanics, nothing more.
Public Contract
The input packet names:
source_pattern_ids for the source pattern family being projected.
valid_prediction_targets and target_universe for the CP2 gate.
cp1_branches with selected side, rationale refs, and opposite-side invalidation refs.
cp2_predictions with pre-target evidence refs and grounding ids.
oracle_diff rows that grade synthetic realized direction against prediction.
dossier_mutations constrained to fixture deltas.
public_runtime_refs for the public fixture, exported bundle, and paper module system refs.
authority_ceiling values that explicitly keep trading, advice, provider, live-market, public sharing, launch, and secret-export authority false.
How it works
validate_reconciliation_packet runs five checks over the packet and folds the findings into one status. Each check guards a specific way a forecast can flatter itself.
CP1 resolution. Every cp1_branches row must name the side it chose, carry rationale refs, and keep an opposite_side_invalidation_ref, the record of why the losing side lost. A branch that asserts a winner without retaining the discarded alternative is rejected as an unresolved bifurcation. Equity or market-lane branches additionally need an explicit confirmation bit before they count.
CP2 universe and pre-target evidence. Predictions must name a target_id inside the declared valid_prediction_targets, so the set of things being predicted is fixed before the outcome rather than chosen afterwards. Evidence refs must be pre-target: a ref is accepted only if it carries the T- time prefix, and a reference that points past the target window raises PREDICTION_ORACLE_POST_T_EVIDENCE_FORBIDDEN. This is the gate that stops a packet from grading itself with hindsight.
Recomputed numeric grading. This is the part that does real arithmetic. For each graded row the component takes the snapshot, predicted, and realized prices and recomputes the absolute delta, the percent delta against the snapshot, and the direction hit. If the row also reports its own abs_error, pred_error_pct, or direction_hit, the claimed value must match the recompute or the row is rejected. Two further rules matter. A row whose direction is correct but whose error clears the floor (ten in absolute terms, or five percent) is surfaced as a large miss, so a right arrow cannot conceal a large numeric error. A row with no realized price is not fabricated into a graded row, a row marked degraded is gated out of grading rather than scored, and the STOCK and ETF asset classes are kept as separate counts rather than blended.
Oracle diff and bounded mutation. The oracle_diff rows grade synthetic realized direction against each prediction, and dossier_mutations may only add a contradiction, revise a confidence band, or retire a claim. A high-severity mutation needs two evidence refs and an explicit public-delta allowlist before it is allowed.
A run passes only when at least two CP1 branches, two CP2 predictions, two graded numeric rows across both asset classes, and one bounded mutation are present, the recompute and evidence gates raise no findings, the source-module digests match, and the secret scan is clean. The result record records counts, verdicts, and authority booleans; the packet body, claimed numbers, and source bodies stay out of it.
Shape
Diagram source
flowchart TD Packet["Synthetic prediction packet target universe, CP1 branches, CP2 predictions, oracle diff, numeric rows, dossier mutations"] CP1["CP1 resolution chosen side + rationale + why the opposite side lost; equity lane needs confirmation"] CP2["CP2 universe + evidence target inside declared universe; evidence must be pre-target (T-)"] Numeric["Recomputed numeric grading abs error, percent error, direction hit recomputed; claimed values must match"] Oracle["Oracle diff + mutation realized vs predicted direction; bounded dossier deltas"] LargeMiss["Direction-right, numeric-miss surfaced, not hidden"] Gated["Degraded / missing-truth rows gated, not fabricated"] Result records["metadata-only result records result, board, validation, sign-off; counts and verdicts"] Ceiling["Scope limit synthetic fixture only; no trading, advice, provider, live market, publish, launch"] Packet --> CP1 Packet --> CP2 Packet --> Numeric Packet --> Oracle Numeric --> LargeMiss Numeric --> Gated CP1 --> Result records CP2 --> Result records LargeMiss --> Result records Gated --> Result records Oracle --> Result records Result records --> Ceiling
Evidence/accounting:
Bundle authority: core/paper_module_capsules.json::paper_modules[54:paper_module.prediction_oracle_reconciliation] sets source_authority: json_capsule, binds the component, binds mechanism.prediction_oracle_reconciliation.validates_public_prediction_oracle_reconciliation, and resolves src/microcosm_core/organs/prediction_oracle_reconciliation.py.
Generated instance: paper_modules/prediction_oracle_reconciliation.json reports paper_module_payload.source_authority: json_capsule, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, 15 relationship edges, and no unpopulated selective relations.
Runtime and fixture floor: src/microcosm_core/organs/prediction_oracle_reconciliation.py exposes run, run_prediction_bundle, validate_source_module_imports, validate_reconciliation_packet, _source_open_body_import_summary, write_receipts, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING. fixtures/first_wave/prediction_oracle_reconciliation/input/reconciliation_packet.json carries the synthetic CP1/CP2, oracle-diff, target-universe, and dossier-mutation evidence shape.
Exported bundle and result records: examples/prediction_oracle_reconciliation/exported_prediction_oracle_bundle/source_module_manifest.json and the exported source artifacts provide source-open replay evidence. receipts/first_wave/prediction_oracle_reconciliation/prediction_oracle_reconciliation_result.json, prediction_oracle_validation_receipt.json, and result records/sign-off/first_wave/prediction_oracle_reconciliation_fixture_acceptance.json keep the result record metadata-only and fixture-bounded.
Test and claim boundary: tests/test_prediction_oracle_reconciliation.py checks invalid target universes, unresolved CP1 branches, post-target evidence, unsafe dossier mutation, live-market/trading/advice overclaims, exported-bundle validation, and source-module digest gates. The structured source record scope limit excludes forecasting correctness, financial decisions, trading authority, live market data, external model access, prediction public sharing, performance track record, non-public data import, launch-scope decision, publishing-scope decision, and whole-system correctness.
Reader Evidence Routing
Open this module as a reader map, not as prediction evidence. Use the runtime fixture input for packet shape, the exported bundle for source-open replay, the structured source record for relationship edges, and the test file for the negative cases that enforce the scope limit.
Route evidence in this order:
Read the structured lattice bindings section to confirm the source record path and subject edges.
Inspect the fixture input for declared target universes, CP1 branches, CP2 prediction evidence, oracle-diff rows, and fixture-bounded dossier mutations.
Run the fixture and exported-bundle commands to produce metadata-only result records.
Check tests/test_prediction_oracle_reconciliation.py for the negative cases that reject target-universe escapes, unresolved CP1 branches, post-target evidence, live-market overclaims, and authority overclaims.
Use paper_modules/prediction_oracle_reconciliation.json as the generated relationship graph for this module.
Negative Cases
The fixture rejects:
a CP2 prediction outside the target universe;
an unresolved CP1 bifurcation;
post-target evidence used as prediction evidence;
unconfirmed equity or market-lane claims;
unsafe high-severity dossier mutation;
trading, advice, live-provider, public sharing, launch, or secret-export authority overclaims.
Prior Art Grounding
This component is grounded in probabilistic forecast evaluation and prediction market infrastructure. The Brier score is an early probability-forecast verification anchor, proper-scoring-rule work such as Gneiting and Raftery motivates incentive-compatible forecast scoring, and Hanson's logarithmic market scoring rule grounds the prediction-market idea that forecasts can be updated and evaluated through explicit scoring mechanisms. Forecasting tournament work around tracking and calibration also motivates separating prediction evidence from post-outcome explanation.
Microcosm borrows the reconciliation pattern: declare the target universe before the outcome, keep pre-target evidence separate from post-target evidence, grade against a synthetic oracle diff, and constrain dossier mutation to declared fixture deltas. It does not trade, advise, publish predictions, or claim forecast performance.
A passing run proves only synthetic target-universe reconciliation, CP1/CP2 accounting, oracle-diff grading, and fixture-bounded dossier mutation; it does not establish forecasting performance, financial decisions, trading authority, live market access, public sharing, or launch.
Scope boundary
Scope limit
This module covers only fixture-bounded prediction-oracle reconciliation: synthetic target-universe accounting, CP1/CP2 separation, oracle-diff grading, dossier mutation constraints, copied source-module import evidence, negative cases, and public result records. They do not prove forecasting accuracy, financial decisions, trading authority, live-market access, provider behavior, prediction public sharing, performance track record, private-data import, launch-scope decision, publishing-scope decision, or whole-system correctness.
Limitations
The target universe, CP1 branches, CP2 evidence, realized values, oracle diff, and dossier mutations are fixture artifacts. They exercise the shape of a reconciliation pipeline, but they are not live market data, a validated forecasting track record, an investment strategy, or a prediction public sharing surface. A direction hit or numeric miss inside the result record is evidence about the synthetic packet only.
The exported bundle is source-open in the narrow body-floor sense. It digest checks copied source contracts, node manifests, tool code, pattern rows, and route-decision artifacts while keeping body text out of result records. That does not certify private source-root equivalence, provider behavior, account or session state, hidden market feeds, private dossiers, or launch-scope decision.
The negative cases are scoped regression guards. They reject invalid targets, unresolved bifurcations, post-target evidence, unconfirmed equity-lane claims, unsafe dossier mutation, trading/advice overclaims, degraded feed misuse, missing realized numeric truth, and asset-class mixing. Those refusals do not prove full financial safety, whole-system correctness, runtime correctness outside the named component, or complete secret absence beyond the declared scanner envelope.
Scope limit
Synthetic invented prediction packet and source-module import evidence only; no forecasting correctness or accuracy, no trading, financial, or investment-related actions, no live market data, no external model access, no prediction public sharing, no performance track record, no non-public data import, no launch-scope decision, no publishing-scope decision, and no whole-system correctness.
Scope boundary
This module demonstrates synthetic prediction-reconciliation mechanics only. It does not trade, give financial or investment-related actions, call live market providers, publish predictions, claim forecasting performance, import non-public data, or include launch operations.
Subject edges: explains component prediction_oracle_reconciliation and mechanism mechanism.prediction_oracle_reconciliation.validates_public_prediction_oracle_reconciliation.
Doctrine edges: governed by principles P-2, P-6, P-8, and P-9; abides by axioms AX-5, AX-7, AX-8, and AX-10.
Dependency edges: depends on paper_module.finance_forecast_evaluation_spine, paper_module.world_model_projection_drift_control_room, and paper_module.research_replication_rubric_artifact_replay.
Runtime code locus: src/microcosm_core/organs/prediction_oracle_reconciliation.py, including run, run_prediction_bundle, validate_source_module_imports, validate_reconciliation_packet, _source_open_body_import_summary, _build_result, write_receipts, result_card, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
Generated row proof: 15 resolved relationship edges, no unpopulated selective relations, Mermaid available_from_capsule_edges, and Atlas linked_from_capsule_edges.
The governing lattice turns the component into a bounded reconciliation checker rather than a forecast authority. P-2 lowers every positive claim to the checker strength: CP1/CP2 accounting, oracle-diff grading, numeric-row gates, source-module digest checks, negative cases, and metadata-only result records. P-6 fails closed when a branch is unresolved, a target escapes the declared universe, a source digest mismatches, or an authority flag tries to rise above the accepted component ceiling. P-8 makes those refusals typed outcomes instead of prose warnings. P-9 carries source refs, public runtime refs, copied-body material status, and result record refs across the fixture and exported bundle.
The axiom layer supplies the same boundary. AX-5 prevents the fixture from upgrading synthetic reconciliation evidence into trading, advice, live-market, provider, public sharing, launch, or performance-track-record authority. AX-7 permits partiality: degraded feed health, missing realized numeric truth, and asset-class split pressure are surfaced as scoped findings rather than hidden successes. AX-8 keeps copied source bodies while excluding live market data, model-output data bodies, private dossiers, and account secret-equivalent material. AX-10 requires the target-universe, CP1/CP2, oracle-diff, and source-module evidence to be tied to the current fixture or bundle result records before the Markdown projection is treated as current.
The structured source record's 15 edges prove route parity only.
Provider Context Recipe BudgetProvider Context Recipe Budget validates context-budget projection mechanics without authorizing external model access or truth-side material.
Provider Context Recipe Budget validates provider-context recipe mechanics: fixed byte ceilings, ordered section fill, omitted-section manifests, deliverable routing, digest-checked source-body imports, forbidden-body rejection, negative cases, and scope limits. It emits context metadata and verdicts only, without provider/API authorization, Lean/Lake execution, proof or oracle truth-side material, formal-result correctness, domain-level conclusions, launch, public sharing, or whole-system correctness.
Scope limit Context-budget projection fixture and source-body import evidence only; no provider or API call authorization, no Lean or Lake execution, no proof or oracle truth-side material, no theorem or domain-level conclusions, no launch-scope decision, no publishing-scope decision, and no whole-system correctness.
provider_context_recipe_budget_policy is the public Microcosm component for turning retrieved proof-support metadata into bounded provider context recipes.
It validates six public recipe shapes: minimal_4kb, premise_16kb, skill_32kb, repair_32kb, fewshot_64kb, and strategy_classification_4kb. Each recipe has a fixed byte ceiling, ordered section fill, a graph role, a reducer deliverable type, and an omitted-sections manifest when a section cannot fit.
Purpose
This component answers one question: when a proof-support pipeline is about to hand material to a model, which sections fit inside a fixed byte budget, in what order, and which sections are dropped? It treats the context window as a budget to spend rather than a place to dump everything retrieved. The board records this stance directly as context_is_budget_not_dump.
The byte sizes are not asserted by the fixture. The validator imports the copied benchmark harness, runs its real _provider_context_pack over each recipe, and measures the actual byte size of each packed section. A recipe is filled in declared order, admitting a section only while the running total stays under the ceiling, so an over-budget section is omitted and named in an explicit manifest rather than silently truncated. If the harness is unavailable the component falls back to declared sizes and says so, rather than guessing.
The second deliberate choice is what cannot enter context at all. A small fixed set of section ids and field keys, covering proof bodies, oracle-only premise ids, ideal answers, and provider output bodies, is rejected structurally, not by convention. Any recipe or section material carrying one of them is blocked before a packet is built. The output is metadata about the context shape: byte ceilings, the admitted and omitted section ids, the deliverable route, and a set of authority claims that stay false. No provider is called and no answer is produced.
Shape
Source refs
JSON bundle
paper_module.provider_context_recipe_budget
provider_context_recipe_budget.md
6 public recipe budgets
provider_context_recipes.json
Runtime
provider_context_recipe_budget_policy.py
9 source-backed sections
section_materials.json
8 copied bodies
source_module_manifest.json
Diagram source
flowchart TD Bundle["JSON bundle paper_module.provider_context_recipe_budget"] --> Instance["Generated instance 19 relationships, no selective residuals"] Bundle --> Markdown["Reader projection provider_context_recipe_budget.md"] Recipes["provider_context_recipes.json 6 public recipe budgets"] --> Runtime["provider_context_recipe_budget_policy.py"] Sections["section_materials.json 9 source-backed sections"] --> Runtime SourceManifest["source_module_manifest.json 8 copied bodies"] --> Runtime NegativeCases["negative fixtures 7 forbidden-boundary cases"] --> Runtime Runtime --> Projection["context_packets included/omitted sections, byte counts, routes"] Runtime --> Result records["metadata-only result records result, board, validation, sign-off"] Projection --> Ceiling["scope limit no provider/proof/launch-scope decision"] Result records --> Ceiling
Evidence and accounting:
Bundle authority: core/paper_module_capsules.json::paper_modules[55:paper_module.provider_context_recipe_budget] sets source_authority: json_capsule, subjects the component provider_context_recipe_budget_policy plus mechanism mechanism.provider_context_recipe_budget_policy.validates_public_context_budget_boundary, and names generated_projections.mermaid.status: available_from_capsule_edges plus generated_projections.atlas_card.status: linked_from_capsule_edges.
Generated instance: paper_modules/provider_context_recipe_budget.json::relationships.edges contains 19 bundle-derived relationship edges, and relationships.unpopulated_selective_relations is empty. That is lattice wiring evidence, not implementation-correctness proof.
Runtime accounting: src/microcosm_core/organs/provider_context_recipe_budget_policy.py defines EXPECTED_RECIPE_BUDGETS for the six recipes, EXPECTED_DELIVERABLES for their reducer routes, _recipe_projection for included/omitted section accounting, _recipe_findings and _section_findings for boundary errors, and _write_receipts for metadata-only result record output.
Fixture inputs: fixtures/first_wave/provider_context_recipe_budget_policy/input/provider_context_recipes.json carries six public recipes with byte budgets from 4096 to 65536, while .../section_materials.json carries nine section rows with source refs and anchors.
Body-floor and result records: core/fixture_manifests/provider_context_recipe_budget_policy.fixture_manifest.json records body_copied_material_count: 8, seven negative_case_ids, four expected fixture result record paths, and source_open_body_imports.authority_ceiling fields that keep external model access, Lean/Lake execution, proof authority, truth-side material, payload export, runtime-correctness claims, and launch-scope decision false.
Focused tests: tests/test_provider_context_recipe_budget_policy.py checks the six recipe ids, expected negative cases, source-backed section materials, public-relative redacted result records, exported bundle validation, omitted-section movement when section size changes, digest mismatch rejection, and manifest body-text result record-boundary rejection.
Technical Mechanism
The runtime mechanism is a context-packet compiler plus boundary validator. It does not ask a provider for an answer. run loads fixture inputs with negative cases enabled; run_budget_bundle loads the exported bundle shape without the fixture-only negative cases. Both routes call _build_result, which loads recipe rows, section rows, copied source-module bodies, and the non-public-state scan policy before it constructs any result record.
Recipe projection is deterministic. _recipe_projection walks each recipe's ordered section ids, computes each section's byte size with _byte_size, admits a section only while the running total stays within the recipe's byte_budget, and records omitted sections when the next section would exceed the budget. The projection records graph role, deliverable type, included and omitted section ids, included bytes, approximate tokens, and whether the omitted-sections manifest is emitted. The six public recipes are the closed set in EXPECTED_RECIPE_BUDGETS: minimal_4kb, premise_16kb, skill_32kb, repair_32kb, fewshot_64kb, and strategy_classification_4kb.
The validator then checks three independent boundaries. _recipe_findings rejects budget changes, forbidden truth-side section ids, proof/provider body fields, provider-call authorization, deliverable-route drift, and over-budget context with no omitted-sections manifest. _section_findings requires each public section to cite an allowed source ref and source anchor, verifies those anchors against the copied source bodies, and rejects synthetic or truth-side section material. _source_module_findings checks the source-module manifest, expected module ids, metadata-only result record flags, target presence, source/target digest equality, and required anchors for the eight copied source bodies.
The result record mechanism is deliberately metadata-only. _write_receipts writes the fixture result, board, validation result record, and sign-off result record for fixture mode; bundle mode writes only the exported-bundle validation result. result_card emits a compact command card while omitting context packets, source-module imports, source refs, result record paths, private scan hit bodies, and the scope boundary payload. The full result records keep counts, ids, hashes, routes, and verdicts, bounded evidence bodies or provider answers.
In lattice terms, the JSON bundle binds this Markdown projection to provider_context_recipe_budget_policy, to mechanism.provider_context_recipe_budget_policy.validates_public_context_budget_boundary, and to concept.agent_reliability_and_safety_validator_bundle. The principle and axiom refs in the bundle (P-1, P-2, P-3, P-6, P-8, P-16 and AX-1, AX-2, AX-5, AX-7, AX-8, AX-9) are implemented here as admission control over public evidence: bounded context metadata is allowed, truth-side material and provider authority are not.
The named proof consumer is tests/test_provider_context_recipe_budget_policy.py. It verifies streaming hash and line-count helpers, real-text byte sizing, all six expected recipe ids, all seven negative cases, source-backed section material, public-relative and redacted result records, exported-bundle validation, omitted-section movement when a section becomes small enough to fit, source-module digest mismatch rejection, source/target digest mismatch rejection, manifest and row body-text result record boundary rejection, compact --card output, exact copied source body imports, and fixture-manifest source-open body-floor counts.
The runtime proof consumers are the two module commands in the Validation Result record Path: provider_context_recipe_budget_policy run for fixture mode and provider_context_recipe_budget_policy run-budget-bundle for exported-bundle mode. Fixture mode must observe the negative-case set and write result, board, validation, and sign-off result records. Bundle mode must validate the exported runtime shape and write one metadata-only bundle validation result.
The corpus proof consumer is scripts/build_doctrine_projection.py --check-paper-module-corpus.
Reader Evidence Routing
Start with the JSON Bundle Binding to identify the source record and the launch-safe scope limit before reading any validation result as a capability claim.
Use Structured Lattice Bindings for navigation: it names the component, mechanism, generated row, and runtime code locus that the bundle binds.
Use Validation Result record Path for reproducibility: fixture and bundle commands produce metadata-only result records, the focused pytest exercises negative cases, and the corpus check verifies paper-module projection parity.
The lattice wiring for this module supports discoverability and internal consistency checks; it does not establish external model service, Lean/Lake execution, formal-result correctness, launch-scope decision, or public-send permission.
Negative Cases
budget_overflow_recipe rejects recipes above the public byte ceiling.
proof_body_leakage rejects proof and provider body fields.
provider_call_authorized rejects any public fixture that authorizes a external model access.
deliverable_type_route_mismatch rejects a recipe whose reducer output type changed.
omitted_sections_suppressed rejects over-budget context without an omitted-sections manifest.
synthetic_section_materials rejects section material that lacks an allowed source ref or source anchor, or that is otherwise synthetic.
Why It Matters
Microcosm needs provider context to look like a small operating system, not a prompt dump. This component makes the context boundary inspectable: a cold reader can see the exact byte ceilings, section order, omitted material, and deliverable routes before any provider or proof authority is even in scope.
Prior Art Grounding
The recipe budget is grounded in retrieval-augmented generation and context packing practice. Lewis et al.'s Retrieval-Augmented Generation paper is the direct research anchor for conditioning generation on retrieved supporting material rather than relying only on model parameters. Microcosm narrows that idea into recipe metadata: retrieved proof-support sections are budgeted, ordered, and omitted explicitly before any external model access is in scope.
The command-facing budget style also borrows from the Command Line Interface Guidelines principle of saying enough but not too much. The component turns that UX pressure into fixed byte ceilings, omitted-section manifests, and deliverable-type routing so "more context" does not silently become proof authority or provider authorization.
A green result record proves only public context-recipe metadata, byte ceilings, omitted sections, deliverable routing, copied source-module refs, and negative cases; it does not use external model services, run Lean or Lake, prove formal-result correctness, export proof bodies, expose oracle-only material, include launch operations, or convert context metadata into proof authority.
Scope boundary
Scope limit
This component does not use external model services, run Lean or Lake, prove a theorem, expose a proof body, or reveal oracle-only truth-side material. Its output is context metadata: which sections would be admitted, which sections were omitted, which deliverable route is allowed, and which authority claims remain false.
The strategy_classification_4kb route emits only strategy_id_classification. It is not a proof-body route and cannot carry a provider answer body.
Scope limit
This module covers only public context-recipe metadata: byte ceilings, ordered section admission, omitted-section manifests, deliverable routing, copied source-module refs, digest and anchor checks, negative cases, and metadata-only result records. They do not authorize provider or API calls, Lean or Lake execution, formal-result correctness, proof-body export, oracle-only truth-side material, provider answer bodies, launch-scope decision, publishing-scope decision, or whole-system correctness.
Source and projection details
Source-Open Body Floor
The public bundle carries exact source bodies for the context recipe compiler, formal ladder consumer, provider result record reducer, set calibration report, transform-job ABI, provider adapter policy, compute-provider policy, and provider-navigation transform result record policy. The validator checks every copied module by digest and required anchors; result records report only paths, hashes, counts, anchor status, and verdicts.
The body floor is deliberately metadata-only at the result record edge: runtime result records may prove copied-module paths, digests, anchor presence, counts, and verdicts, but they must not expose proof bodies, oracle-only truth-side material, provider answer bodies, account state, account secrets, or launch-send authority.
Undeclared Library Prior ClassifierUndeclared Library Prior Classifier scores extracted Lean symbol observations against an allowed premise set without running Lean or treating libraries as implicit allowlists.
Undeclared Library Prior Classifier validates copied Lean/Std premise rows and pre-extracted symbol observations, classifying undeclared library priors and premise-budget violations with route outcomes, source-module digest checks, secret-exclusion scans, negative cases, and scope limits. It does not read proof source, run Lean or Lake, prove formal-result correctness, treat the whole standard library as an implicit allowlist, claim Mathlib availability, use external model services, launch, publish, or prove whole-system correctness.
Scope limit Symbol-boundary classification fixture over copied Lean/Std premise rows and pre-extracted symbol observations only; no proof-source reads, Lean or Lake execution, formal-result correctness, whole-library implicit allowlist, Mathlib availability, external model access, launch-scope decision, publishing-scope decision, or whole-system correctness.
This module is the Microcosm projection of the formal-prover rule that a Lean-accepted proof can still violate the evaluation contract when it uses a real library symbol that was not in the allowed premise set. It is a provenance-bearing symbol-boundary component, not a proof checker.
The fixture carries copied Lean/Std premise rows from the real Ring2 premise-index system and real Ring2 problem ids / candidate artifact digests for the symbol-boundary examples. It records extracted qualified symbol refs and classifies a known symbol outside allowed_premise_ids as UNDECLARED_LIBRARY_PRIOR. If cited_unallowed_premise_ids is present, that explicit budget violation takes precedence and routes as PREMISE_BUDGET_VIOLATION.
The source chain is digest-bearing: the real Ring2 premise index sha256:c78b176388a5e81bd8a785950e7db0c9a65fd38e556515134146163b48604df1, Ring2 run summary sha256:93304410f32d40f5cad1c161c1d01a5d6f353ee10b7cf3fecbaaf7b068b43008, copied Lean/Std premise fixture sha256:0be36ba5b75b40d2ede2d90cefa5181829420df7abbae216d18282b92a30f869, and the adjacent corpus-readiness / tactic-availability result records anchor the Mathlib-absent toolchain boundary.
The exported bundle carries a source-open body floor at examples/undeclared_library_prior_symbol_classifier/exported_symbol_classifier_bundle/source_module_manifest.json. It imports the reducer and set-calibration builder source bodies exactly, plus run bodies for the Ring2 premise index, Ring2 run summary, recipe policy metrics, and result record reduction matrix. The two run-state bodies are path-normalized to <repo-root> and <lean-toolchain-root> while preserving source and target digests, line counts, byte counts, and required anchors.
Purpose
A theorem prover can return a proof that Lean accepts, yet that proof can still break the rules of the evaluation it was run under. The usual reason is simple: the proof reached for a library lemma that the recipe never put on the table. The symbol is real and the proof is sound, but the run quietly used a fact it was not allowed to assume. This component answers one question. Given a set of premises a candidate was allowed to use and the symbols it actually reached for, did it cite a known library symbol that was outside that allowed set?
The unusual choice is what the classifier refuses to do. It does not run Lean, it does not read the proof body, and it does not treat the standard library as an implicit allowlist where anything that exists is fair game. It works only from a copied premise index and a list of symbol observations that were extracted beforehand, and it compares the two. That keeps the check cheap and keeps proof material out of the public result record, but it also means the allowed set is closed by construction: a symbol is admissible only because a premise row names it, never because it happens to live in Lean's standard library.
The check also separates two failure modes that are easy to confuse. An explicit budget breach, where the candidate names a premise id the recipe did not allow, is not the same as a residual breach, where the candidate used an allowed-looking symbol that turns out to be undeclared. The first is settled directly from the cited ids and takes precedence; the second is what the symbol comparison is for. Treating both as one class would either over-escalate honest retries or let genuine out-of-recipe library use slip through as a budget note. Keeping them apart is the point.
Shape
Source refs
JSON source record
paper_module.undeclared_library_prior_classifier
Runtime component
undeclared_library_prior_symbol_classifier.py
Pre-extracted symbol observations
Nat/List/Bool/Iff/Eq refs
Budget
cited_unallowed_premise_ids present
Diagram source
flowchart TD bundle["JSON source record paper_module.undeclared_library_prior_classifier"] structured source record["structured source record 19 edges, no selective residuals"] runtime["Runtime component undeclared_library_prior_symbol_classifier.py"] premise["Copied Lean/Std premise index 11 sanctioned symbols"] observations["Pre-extracted symbol observations Nat/List/Bool/Iff/Eq refs"] budget["cited_unallowed_premise_ids present"] residual["Known qualified symbol outside allowed_premise_ids"] clean["Allowed symbol or no known undeclared symbol"] retry["PREMISE_BUDGET_VIOLATION route: retry"] escalate["UNDECLARED_LIBRARY_PRIOR route: bridge_escalate"] advisory["NONE route: accept_as_advisory"] result records["Result record stream fixture, board, validation, sign-off"] ceiling["Scope limit no Lean/Lake, proof, provider, launch, private-system claim"] bundle --> structured source record structured source record --> runtime runtime --> premise runtime --> observations observations --> budget observations --> residual observations --> clean budget --> retry residual --> escalate clean --> advisory retry --> result records escalate --> result records advisory --> result records result records --> ceiling
Technical Mechanism
The component separates three questions that are easy to conflate in proof evaluation: whether a candidate explicitly cites a premise outside the recipe, whether it uses a known Lean/Std symbol that was not in the allowed premise set, and whether the theorem is actually correct. Only the first two are in scope. validate_premise_index builds the closed allowlist from copied Lean/Std premise rows, validate_symbol_observations reads pre-extracted qualified symbol observations, and _classify_row applies the precedence rule: cited_unallowed_premise_ids yields PREMISE_BUDGET_VIOLATION with retry; otherwise a known qualified symbol outside allowed_premise_ids yields UNDECLARED_LIBRARY_PRIOR with bridge_escalate; clean or unknown observations remain advisory. The classifier records observed symbols and computed/asserted classes, but it never evaluates proof bodies or runs Lean.
The exported-bundle mechanism is a second boundary rather than a richer proof. validate_source_module_manifest requires source_module_manifest.json, rejects manifest or row-level body_in_receipt: true, verifies six declared body imports against source/target digests, line counts, byte counts, required anchors, material classes, and relation type, and keeps path-normalized Ring2 run-state bodies separate from exact copied reducer bodies. secret_exclusion_scan then checks the declared public fixture and bundle inputs for proof-body, provider-payload, private-ref, and host-path sentinel classes. _write_receipts writes result, board, validation, and sign-off result records; result_card deliberately emits a small pass/fail card that omits source modules, source digests, proof bodies, non-public source refs, secret-scan detail, and scope limit bodies. This is why the module can be source-open about the symbol-boundary system without becoming a proof-body export.
The governing lattice follows the same separation. The bundle binds the component to mechanism.undeclared_library_prior_symbol_classifier.validates_public_symbol_boundary, concept.formal_math_and_proof_witness_bundle, principles P-1, P-2, P-3, P-6, P-8, and P-9, and axioms AX-1, AX-2, AX-5, AX-7, AX-8, and AX-10. The technical claim is therefore limited to public symbol-budget classification over copied, digest-bearing premise evidence. It does not establish theorem truth, Mathlib availability, Lean/Lake execution, launch-scope decision, provider correctness, or complete library allowlisting.
Reader Evidence Routing
Start with the source record, not this prose: core/paper_module_capsules.json::paper_modules[56:paper_module.undeclared_library_prior_classifier] is the source authority that names the component subject undeclared_library_prior_symbol_classifier, the mechanism mechanism.undeclared_library_prior_symbol_classifier.validates_public_symbol_boundary, the code locus src/microcosm_core/organs/undeclared_library_prior_symbol_classifier.py, the concept concept.formal_math_and_proof_witness_bundle, the governing principles P-1, P-2, P-3, P-6, P-8, and P-9, the axioms AX-1, AX-2, AX-5, AX-7, AX-8, and AX-10, and the sibling modules paper_module.corpus_readiness_mathlib_absence_gate, paper_module.tactic_portfolio_availability, and paper_module.lean_std_premise_index.
Then read the generated structured source record paper_modules/undeclared_library_prior_classifier.json. It is the parity projection from the bundle, carrying source_authority: json_capsule, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, 19 generated relationship edges, and no unpopulated selective relations. The structured source record is evidence that the reader page is wired into the doctrine lattice; it is not theorem-correctness, launch, or runtime-correctness authority.
For runtime behavior, inspect src/microcosm_core/organs/undeclared_library_prior_symbol_classifier.py. The named locus validates projection protocol, premise index, classifier policy, source-module manifest, symbol observations, secret-exclusion scan, result construction, result record writing, and result-card compaction. The load-bearing classifier rule is _classify_row: explicit cited_unallowed_premise_ids short-circuit as PREMISE_BUDGET_VIOLATION with retry; otherwise a known qualified Lean/Std symbol outside allowed_premise_ids classifies as UNDECLARED_LIBRARY_PRIOR with bridge_escalate. Negative cases reject proof bodies, non-public source refs, theorem-correctness overclaims, allowed-symbol false positives, unqualified-token overclaims, and missing escalation.
For public fixture evidence, use fixtures/first_wave/undeclared_library_prior_symbol_classifier/input/. The fixture carries the premise index, classifier policy, projection protocol, symbol observations, and the seven negative-case files named by EXPECTED_NEGATIVE_CASES. For exported source-open body-floor evidence, use examples/undeclared_library_prior_symbol_classifier/exported_symbol_classifier_bundle/source_module_manifest.json. That manifest verifies six source body imports: reducer source, set-calibration builder source, path-normalized Ring2 premise-index state, path-normalized Ring2 run summary, recipe policy metrics, and result record reduction matrix. The manifest keeps body_in_receipt false and checks source/target digests plus required anchors; it does not export proof bodies, model-output data bodies, account or browser state, source notes, or private source-root bodies.
For result records, read receipts/first_wave/undeclared_library_prior_symbol_classifier/undeclared_library_prior_symbol_classifier_result.json, receipts/first_wave/undeclared_library_prior_symbol_classifier/undeclared_library_prior_symbol_classifier_board.json, receipts/first_wave/undeclared_library_prior_symbol_classifier/undeclared_library_prior_symbol_classifier_validation_receipt.json, and result records/sign-off/first_wave/undeclared_library_prior_symbol_classifier_fixture_acceptance.json. The fixture result record reports 11 premises, 3 classifications, 1 undeclared-library prior, 1 premise-budget-precedence case, 1 bridge escalation, 1 retry, zero blocking secret-exclusion hits, and the scope boundary that this is not Lean/Lake, formal-result correctness, provider, private-ref, whole-library-allowlist, or launch-scope decision.
Focused regression coverage lives in tests/test_undeclared_library_prior_symbol_classifier.py. It runs both the fixture command and run-symbol-bundle, checks public-relative result records, verifies digest/manifest boundary failures, and confirms the compact card reuses a fresh result record without exporting source modules, body ids, secret-scan details, source digests, proof bodies, or non-public source refs. The paper-module coverage contract also names this module in tests/test_microcosm_paper_module_coverage_contract.py; that is route coverage evidence, not runtime proof evidence.
Named Proof Consumers
The fixture consumer is microcosm_core.organs.undeclared_library_prior_symbol_classifier run over fixtures/first_wave/undeclared_library_prior_symbol_classifier/input. It proves the public example still classifies 11 copied premise rows and 3 symbol observations into one undeclared-library-prior escalation, one premise-budget retry, and one advisory clean case, while the expected negative cases cover proof-body export, non-public refs, theorem-correctness overclaim, allowed-symbol false positives, unqualified-token overclaims, and missing escalation.
The exported-bundle consumer is microcosm_core.organs.undeclared_library_prior_symbol_classifier run-symbol-bundle over examples/undeclared_library_prior_symbol_classifier/exported_symbol_classifier_bundle. It proves the six source-open body imports remain digest/size/anchor checked and public-safe, including the exact copied reducer and calibration-builder bodies plus path-normalized Ring2 state, recipe metrics, and reduction-matrix bodies. It is the consumer that catches source-module digest drift and manifest-boundary violations; it does not certify formal-result correctness.
The focused regression consumer is tests/test_undeclared_library_prior_symbol_classifier.py. It ties the fixture and bundle commands to public-relative result records, source-module digest mismatch blocking, manifest and row-level body_in_receipt rejection, compact-card redaction, and fresh-card reuse. The corpus consumer is scripts/build_doctrine_projection.py --check-paper-module-corpus, which proves the Markdown remains part of the 98-module Microcosm paper-module corpus. That corpus check is routing and projection parity evidence only; it is not a runtime proof substitute.
Public Mechanics
Qualified symbol refs are restricted to Nat, List, Bool, Iff, and Eq namespaces in this public fixture.
The closed premise index is an allowlist boundary, not permission to use the whole standard library.
UNDECLARED_LIBRARY_PRIOR routes to bridge_escalate because the proof may be informative while still out of recipe.
PREMISE_BUDGET_VIOLATION routes to retry and short-circuits the residual symbol classifier.
Result records expose ids, candidate artifact digests, symbols, counts, failure classes, source refs, source digests, and scope limits.
secret_exclusion_scan records zero blocking hits for the declared sentinel classes in the public result record stream; it is not a complete secret audit, launch clearance, or proof that no private material exists anywhere.
Prior Art Grounding
This classifier is grounded in formal-methods work on premise control and library-aware proof search. Isabelle/Sledgehammer makes relevant-fact selection an explicit part of automated proof search, and Lean/Mathlib practice makes clear that accepted proofs can depend on a large library context. Microcosm uses that insight as a boundary check: an accepted proof artifact is not enough if it quietly used symbols outside the declared premise set. The component classifies the symbol-budget violation without judging theorem truth or exporting proof bodies.
Prior-art anchors:
Isabelle Sledgehammer and relevant-fact selection: https://isabelle.in.tum.de/doc/sledgehammer.pdf
Lean community Mathlib overview: https://leanprover-community.github.io/mathlib-overview.html
Lean 4 tactic and proof environment context: https://lean-lang.org/theorem_proving_in_lean4/Tactics/
Regression Cases
The forbidden proof-body, private-ref, allowed-symbol false-positive, unqualified-token, and theorem-correctness cases are regression-only leakage guards. They are not product evidence and cannot stand in for the copied Lean/Std symbol-boundary system.
Validation Result record Path
Run from microcosm-substrate:
The expected bundle projection is Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, and 19 generated relationship edges with no unpopulated selective relations. A green result record proves only the allowed-premise and symbol-budget classification boundary; it does not establish formal-result correctness, run Lean or Lake, expose proof bodies, authorize external model access, claim Mathlib availability, or broaden all Std and Mathlib declarations into allowed priors.
Scope boundary
Scope limit
The JSON bundle and generated row prove only allowed-premise and symbol-budget classification evidence: copied Lean/Std premise rows, real Ring2 ids and digests, extracted qualified symbol refs, declared budget-violation cases, source-open body-floor digest evidence, leakage regression cases, negative cases, and validation result records. They do not prove formal-result correctness, run Lean or Lake, expose proof bodies, use external model services, import non-public source refs, claim Mathlib availability, treat all Std or Mathlib declarations as allowed priors, include launch operations, authorize public sharing, or prove whole-system correctness. They also do not expose model-output data bodies, account or browser state, source notes, or private source-root bodies.
Limitations
The classifier depends on copied, premise rows and pre-extracted qualified symbol observations. It does not parse arbitrary Lean syntax, expand imports, normalize proof terms, or run Lean/Lake to discover symbols. Unknown or unqualified tokens are deliberately kept outside the positive undeclared-library-prior claim unless the public observation and closed premise index make the boundary explicit.
The public source-open body floor is a provenance check, not semantic equivalence for the full private source system. Exact copied bodies and path-normalized run-state bodies are checked for source/target digests, line counts, byte counts, and required anchors; that does not certify every upstream private root, model-output data, account state, or operator context that may have informed the original source run.
The leakage and launch boundaries are also scoped. secret_exclusion_scan checks declared sentinel classes in the public fixture and bundle inputs, while the focused pytest checks regression cases for proof-body export, non-public refs, overclaims, and compact-card redaction. Those checks do not replace a whole-repo secret audit, a public sharing review, theorem-correctness evidence, or a Mathlib availability proof. The paper-module corpus and generated-row checks prove routing parity only.
Scope limit
This module is allowed-premise and symbol-budget classification evidence only. It does not establish formal-result correctness, run Lean or Lake, expose proof bodies, use external model services, import non-public source refs, treat all Std or Mathlib declarations as allowed priors, claim Mathlib availability, or include launch operations.
Scope boundary
This module does not establish formal-result correctness, run Lean or Lake, expose proof bodies, use external model services, import non-public source refs, treat all Std/Mathlib declarations as allowed priors, claim Mathlib availability, or include launch operations.
Source and projection details
Governing Lattice Relation
The governing relation is the path from bundle authority to a bounded proof consumer. The source row binds this module to the undeclared_library_prior_symbol_classifier component, the mechanism mechanism.undeclared_library_prior_symbol_classifier.validates_public_symbol_boundary, the runtime locus src/microcosm_core/organs/undeclared_library_prior_symbol_classifier.py, the concept concept.formal_math_and_proof_witness_bundle, six principles, six axioms, and the sibling paper modules for corpus readiness, tactic availability, and Lean/Std premise indexing.
The principle layer explains why the classifier is a boundary component rather than a theorem authority. P-1 requires the symbol class to be recomputed from premise rows and observations instead of echoed from prose. P-2 lowers the claim to what the checker actually tests: allowed-premise and symbol-budget classification. P-3 concentrates trust in the small component and source-module manifest validators. P-6 fails closed on missing or stale evidence. P-8 turns inadmissible computations into typed outcomes such as PREMISE_BUDGET_VIOLATION and UNDECLARED_LIBRARY_PRIOR. P-9 carries source refs, target refs, digests, and body-material status through the fixture, bundle, and result record layers.
The axiom layer supplies the same ceiling in machine-checkable form. AX-1 requires derivation before assertion, so the page points to fixture and bundle result records instead of declaring theorem truth. AX-2 keeps verification inside kernelized validators. AX-5 prevents an authority upgrade without stronger evidence. AX-7 allows typed partiality and refusal when the proof body, non-public refs, or theorem-correctness claim is inadmissible. AX-8 preserves provenance while keeping proof/provider/private bodies out of public result records.
The generated JSON row currently contributes 19 relationship edges with no unpopulated selective relations. Those edges are evidence of route parity, not new authority: the source authority remains the JSON bundle and the proof authority remains the focused fixture, bundle, and regression consumers.
This page treats those generated navigation surfaces as bundle-derived projections while explaining the resolved symbol-boundary component, code-locus, law, and sibling-paper links.
Voice to Doctrine Self-Improvement LoopVoice to Doctrine Self-Improvement Loop validates lesson propagation without exporting source notes or granting doctrine mutation authority.
Voice to Doctrine Self-Improvement Loop validates whether declared lessons refined a named owner surface with validation or were captured with a re-entry condition. It checks projection protocol, policy, owner surfaces, lesson rows, negative cases, source-open body imports, and scope limits while rejecting source notes export, private thread bodies, model-output data, direct doctrine-node edits, unvalidated global promotion, live work log mutation, public sharing, launch, and whole-system correctness.
Scope limit Declared lesson-propagation fixture only; no source notes export, non-public body export, model-output data export, source or doctrine mutation authority, global-promotion authority, live work log mutation, publishing-scope decision, launch-scope decision, external model access, private-system equivalence, or whole-system correctness.
This module is the public Microcosm projection of the source system's recursive self-improvement metabolism. It is not a synthetic result record layer. It imports the real source shape from recursive_self_improvement_operating_loop, doctrine_population_loop, and local_to_general_propagation: local pressure is sensed, classified, assigned to an owner surface, mutated or captured there, validated, closed out, and given a concrete re-entry condition.
The exported bundle also carries exact copies of the source bodies that make this loop real: recursive self-improvement, doctrine population, local-to-general propagation, the plane-home decision table, work log metacontrol, work log skill doctrine, and the work log standard. Result records report only source refs, hashes, counts, and scan status; the body text lives under examples/.../source_modules/ai_workflow/.
Purpose
The component answers one question: did a declared lesson actually change a named owner surface and pass that surface's own validation, or did it only produce a result record that says so? "The system learned from its work" is an easy claim to assert and a hard one to back. Without a check, a log entry, a closed ticket, or a confident summary all read as progress. This validator refuses that shortcut.
Each lesson row must name the surface it changed (a skill, a paper module, a standard, or a captured Work item), the action it took there, and the validation and completion refs that show the change held. Every ref must resolve to a real file in the exported bundle, the copied source modules, or the public Microcosm tree. A lesson then lands in exactly one of four outcomes: refined_existing_surface (a surface changed and was validated), workitem_captured (deferred work, but only with a concrete re-entry condition), nothing_to_refine (a typed null result that still required stewardship and a next-best-lane check), or already_propagated_verified. Anything that does not fit one of these is a finding, not an outcome.
The unusual part is the defence against self-grading. A lesson row may carry an expected_label or expected_status field, but the validator ignores it and recomputes the verdict from the evidence. If the row is not genuinely backed, its own asserted label cannot rescue it, and the case is recorded as VOICE_DOCTRINE_BAKED_EXPECTED_LABEL_IGNORED. A fixture cannot pass by declaring its own success. The same instinct runs through the negative floor: source notes, private thread bodies, model-output data, direct edits to doctrine nodes, and global promotion without owner validation are each rejected, keeping "the system improves itself" separate from "this public artifact may rewrite doctrine or export private voice."
Shape
Source refs
changed ref + validation
refined_existing_surface
Already
already_propagated_verified
Diagram source
flowchart LR Signal["Local pressure mistake, route gap, validation finding, residual"] Classify["Classify owner surface + action"] Owner["Owner surface skill, paper module, standard, Work item"] Refused["Refused raw voice, private body, direct node edit, result record-only, unvalidated promotion"] subgraph Outcome["One of four typed outcomes"] Refined["refined_existing_surface changed ref + validation"] Captured["workitem_captured with re-entry condition"] Null["nothing_to_refine stewardship + next-lane checked"] Already["already_propagated_verified"] end Recompute["Recompute verdict from evidence expected_label ignored"] Validate["Validation owner evidence + completion ref; every ref must resolve"] Source["Exact source bodies 8 manifest rows: hashes, anchors"] Result records["metadata-only result records result, board, validation, fixture sign-off"] Signal --> Classify Classify --> Owner Owner --> Refused Owner --> Outcome Outcome --> Recompute Recompute --> Validate Source --> Result records Validate --> Result records Refused --> Result records
Public Mechanics
Local lessons carry source pattern refs, evidence refs, owner-surface ids, owner actions, validation refs, completion refs, and outcomes.
Owner surfaces are explicit: skills, paper modules, standards, and residual captures each retain their own mutation authority.
refined_existing_surface requires a changed owner surface and validation.
workitem_captured requires a concrete re-entry condition.
nothing_to_refine requires stewardship and next-best-lane checks.
source notes, private thread bodies, model-output data, live work log bodies, direct doctrine-node edits, and global promotion claims are rejected by negative cases.
Exported bundle validation requires source_module_manifest.json, verifies each copied body hash/line/byte/anchor contract, and scans copied bodies for forbidden public material.
Reader Evidence Routing
Read this module as a lesson-propagation validator, not as a general doctrine mutation license. The fixture proves that local pressure must choose a named owner surface, perform an owner-authorized action, carry validation and completion refs, and either refine an existing surface, capture a Work item with a re-entry condition, record a typed nothing_to_refine, or verify an already-propagated result.
Read source-open evidence through the exported bundle manifest. It carries eight copied source bodies: three paper modules, four skills or skill companions, and the work log standard. Each manifest row records byte and line counts, exact source and target hashes, required anchors, and body_in_receipt: false. The source bodies make the source loop inspectable, while result records remain refs, hashes, counts, scan status, and scope limits.
Read the negative floor as equally load-bearing. source notes bodies, private thread bodies, model-output data bodies, direct doctrine-node edits, result record-only progress, live work log mutation, and unvalidated global promotion are rejected. Those rejections keep "the system learns from work" separate from "this public artifact can mutate doctrine or export private state."
Prior Art Grounding
This component is grounded in after-action review, lessons-learned, and pattern language practices. NASA's Lessons Learned Information System is a public example of preserving operational lessons so future work can reuse them, while pattern-language practice gives a vocabulary for turning repeated local solutions into named, reusable forms. Microcosm adopts that direction without collapsing operator voice into doctrine: a local lesson only becomes durable when it has evidence, an owner surface, a validation result record, and a bounded re-entry path.
Prior-art anchors:
NASA Lessons Learned Information System: https://llis.nasa.gov/
Pattern language background: https://hillside.net/patterns/
A green fixture or bundle result record proves only the public lesson-propagation boundary above; it does not grant source-file changes, live work log mutation, global doctrine-promotion, launch, or whole-system authority.
Scope boundary
Scope boundary
This module does not export source note, source notes, private thread bodies, model-output data, account or browser state, live work log rows, proof authority, source-file changes, publishing-scope decision, or private-system equivalence. It shows the public mechanics of system learning under owner-surface evidence gates.
Scope limit
This paper module can claim a lesson-propagation fixture. It can explain owner-surface checks, negative cases, copied source-module manifests, and metadata-only result records. A diagram view and atlas card are generated for this module.
It cannot claim source notes export, non-public body export, model-output data export, source-file changes, doctrine mutation authority, global-promotion authority, live work log mutation, publishing-scope decision, launch-scope decision, external model access, private-system equivalence, or whole-system correctness.
Routing Anti-Patterns RegistryRouting Anti-Patterns Registry validates public anti-pattern registry rows without becoming route source authority or mutating routes.
Routing Anti-Patterns Registry validates the public routing anti-pattern registry contract: kind/version, unique row ids, text fields, required navigation anchors, source-module digest evidence, negative cases, and scope limits. It treats the registry as a checked public artifact, not route source authority, route mutation authority, private routing-note export, provider authority, launch, public sharing, or whole-system correctness.
Scope limit Declared public routing anti-pattern registry fixture and copied source body evidence only; no route source authority, no route mutation, no private routing-note export, no external model access, no launch-scope decision, no publishing-scope decision, and no whole-system correctness.
routing_anti_patterns_registry is the public contract diagnostic for the source system's typed navigation failure rows. It validates the copied codex/doctrine/routing_anti_patterns.json registry as runnable Microcosm system: the input must declare kind: routing_anti_patterns, carry a positive version, and expose stable anti_patterns rows with unique ids and plain explanatory text.
The positive fixture imports the real source registry body. The exported bundle also carries a source module manifest and a byte-for-byte copy under source_modules/codex/doctrine/routing_anti_patterns.json, with sha256 hashes and anchors for kernel_before_grep, bridge_before_scope, and mode_in_chat_only. Result records carry refs, hashes, counts, and verdicts only; they do not inline the copied body.
The component rejects five boundary failures:
missing kind
duplicate anti-pattern ids
anti-pattern rows missing explanatory text
launch, provider, source-file changes, route-policy mutation, maturity, or whole-system-correctness overclaims
private routing bodies, source note bodies, model-output data bodies, or secret values in public inputs
Purpose
A navigation system can fail quietly. An agent reaches for grep when a kernel route would have narrowed the space first, or changes execution mode in chat without updating the disk contract, and nothing complains until the work is already off the rails. This component answers one question: does the public registry of known routing failures hold its declared shape, and does the copied source body that backs it stay byte-honest? It names recurring navigation mistakes as typed rows so they can be recognised, not rediscovered.
The registry is treated as a checked artifact, not as authority. A page describing routing failures is easy to read as a router or as policy. The component refuses both: a row may project a public anti-pattern, but it may not declare source_authority, route_authority, or any internal control role, and the validator rejects rows that try. So the document can describe how navigation goes wrong without itself becoming the thing that decides how navigation should go.
One design choice sits in how each row's route-repair state is decided. Rather than trust a label baked into the row, the checker derives the repair state from the row's own id and explanatory text: kernel_before_grep only earns kernel_first_navigation if its text actually mentions grep, kernel, and route. A row carrying a pre-written expected_route_repair_state is flagged, and baked_expected_labels_sufficient is fixed to false. The point is to stop a registry from grading itself by self-asserted labels, and to keep the meaning grounded in the text a reader can see.
Shape
This module is a projection over a bundle-backed public routing diagnostic, not route source authority. Cold readers should read it as a bounded chain: the JSON bundle and standard name the contract; the runtime component validates fixtures and an exported source bundle; result records preserve hashes, counts, verdicts, and negative cases; generated Mermaid and Atlas rows expose the bundle edges; the scope limit remains projection-only.
flowchart TD Bundle["core/paper_module_capsules.json paper_module.routing_anti_patterns_registry"] Standard["standards/std_microcosm_routing_anti_patterns_registry.json"] Markdown["paper_modules/routing_anti_patterns_registry.md reader projection; not route authority"] Runtime["src/microcosm_core/components/routing_anti_patterns_registry.py run / run-bundle / result record writer"] Fixture["fixtures/first_wave/routing_anti_patterns_registry/input registry + negative cases"] Bundle["examples/routing_anti_patterns_registry/exported_routing_anti_patterns_bundle source_module_manifest + exact copied body"] Tests["tests/test_routing_anti_patterns_registry.py"] Result records["result records/.../routing_anti_patterns_registry*.json refs, hashes, counts, verdicts"] structured source record["paper_modules/routing_anti_patterns_registry.json 22 edges; Mermaid available; Atlas linked"] Ceiling["Scope limit no route authority, mutation, external model access, launch, or whole-system proof"] Bundle --> Markdown Bundle --> structured source record Standard --> Runtime Fixture --> Runtime Bundle --> Runtime Runtime --> Tests Runtime --> Result records Tests --> Result records structured source record --> Ceiling Result records --> Ceiling Markdown --> Ceiling
Technical Mechanism
The component is a contract checker around a public routing-registry copy, not a router. run loads the first-wave fixture and asks _build_result to validate the positive routing_anti_patterns.json payload, all declared negative cases, the secret-exclusion scan, and the metadata-only result record bundle. The positive path requires kind: routing_anti_patterns, a positive integer version, stable anti-pattern ids, explanatory text, and the named source anchors kernel_before_grep, bridge_before_scope, and mode_in_chat_only.
The failure lattice is explicit. _payload_findings records typed evidence for missing kind, non-positive version, missing rows, missing ids, duplicate ids, missing text, forbidden authority-role masquerade, private-source fields, and overclaims about launch, external model access, source-file changes, route-policy mutation, maturity, readiness, or whole-system correctness. A pass is admitted only when every expected negative case appears with its expected error code and missing_negative_cases is empty. That makes the negative cases proof obligations rather than illustrative examples.
The exported-bundle path adds source-copy accountability. run-bundle calls run_routing_anti_patterns_bundle, which requires bundle_manifest.json, source_module_manifest.json, and the copied body under source_modules/codex/doctrine/routing_anti_patterns.json. The manifest checker streams sha256 over the copied target, verifies sha256, source_sha256, and target_sha256, checks required anchors, classifies the material as copied_non_secret_macro_body, and rejects any body-in-result record claim. The source body is available in the exported source-module tree; result records keep only refs, hashes, counts, verdicts, and omission fields.
The governing lattice is deliberately narrow. The bundle binds this mechanism to concept.architecture_and_navigation_route_contract_bundle, P-1, P-2, P-3, P-5, P-6, P-8, P-9, P-12, P-15, and AX-1, AX-4, AX-5, AX-7, AX-8, AX-11, but the checker consumes those refs as a scope limit: evidence must be replayable, typed, public-safe, and below projection authority. It also depends on navigation_hologram_route_plane, agent_route_observability_runtime, and cold_reader_route_map, so the registry can describe navigation failure shapes without becoming the internal control route source.
Reader Evidence Routing
Read this module through the following source-to-proof route:
Start at the source record core/paper_module_capsules.json::paper_modules[58:paper_module.routing_anti_patterns_registry]. It is the source authority for source_authority: json_capsule, the component subject, mechanism subject, runtime source locus, concept, principles, axioms, dependency modules, and the projection statuses.
Read the generated structured source record paper_modules/routing_anti_patterns_registry.json only as a projection from that source record.
Follow the runtime proof path through src/microcosm_core/organs/routing_anti_patterns_registry.py, fixtures/first_wave/routing_anti_patterns_registry/input/, and examples/routing_anti_patterns_registry/exported_routing_anti_patterns_bundle/. Those surfaces carry the public registry fixture, negative cases, source_module_manifest.json, copied body target, required anchors, and digest checks.
Confirm the public result record floor with the named fixture command, bundle command, focused regression, and corpus check below. Result records may carry ids, refs, hashes, counts, verdicts, and omission fields, but not private routing bodies or model-output data.
Treat generated diagram, Atlas, search, object-map, and site cards as reachability projections from the same source row. They help a public reader find the module; they do not outrank the bundle, runtime, manifest, tests, or metadata-only result records.
Named Proof Consumers
First-wave fixture consumer: PYTHONPATH=src ../repo-python -m microcosm_core.components.routing_anti_patterns_registry run --input fixtures/first_wave/routing_anti_patterns_registry/input --out /tmp/microcosm-routing-anti-patterns-registry/fixture --sign-off-out /tmp/microcosm-routing-anti-patterns-registry/sign-off.json --card consumes the public registry fixture, six expected negative-case families, private-source rejection, secret-exclusion scan, metadata-only result record writer, and command-card omission boundary.
Exported-bundle consumer: PYTHONPATH=src ../repo-python -m microcosm_core.organs.routing_anti_patterns_registry run-bundle --input examples/routing_anti_patterns_registry/exported_routing_anti_patterns_bundle --out /tmp/microcosm-routing-anti-patterns-registry/bundle --card consumes the source-module manifest, exact copied source registry body, sha256 digest floor, required anchors, material class, and source-open summary while keeping body text out of result records.
It is a read-only result record for this Markdown slice, not permission to hand-edit generated projections.
Prior Art Grounding
This registry follows the same family as pattern and anti-pattern catalogs: name recurring failure shapes so future operators can recognize and avoid them. The Hillside patterns library is the positive pattern-language ancestor, and the software anti-pattern literature supplies the inverse move: documenting repeated practices that look useful but produce bad outcomes.
The routing-specific presentation also borrows from CLI usability practice. The Command Line Interface Guidelines emphasize discoverability, clear errors, and suggested next actions; this component applies that pressure to navigation failures by requiring stable ids and explanatory text while keeping the registry projection below route-source authority.
Validation Result record Path
From microcosm-substrate, validate the public routing-registry diagnostic without writing tracked result records:
Passing validation proves the public anti-pattern registry fixture and copied-body digest floor only. It does not make this registry route source authority, and it excludes route-policy mutation, external model access, launch, or whole-system correctness.
Scope boundary
Scope limit
This module may claim public fixture evidence that anti-pattern row shape, stable anti-pattern ids, source-module digest checks, private-leak rejection, negative cases, and validation result records support the declared routing anti-pattern registry contract. It may also claim that the JSON row resolves the accepted component subject, mechanism subject, runtime source locus, governed concept, principles, axioms, and dependency modules.
This module may not claim route source authority, live route freshness, route-policy mutation, provider authorization, private routing-note disclosure, maturity proof, hosted-public posture, launch-scope decision, publishing-scope decision, implementation correctness beyond the listed witnesses, or whole-system correctness.
Scope limit
This is a projection-only diagnostic. It can explain public anti-pattern registry validation, copied-body digest checks, private-leak rejection, negative cases, and validation result records. It does not become route source authority, mutate routes, expose private routing notes, authorize providers, include launch operations, or prove whole-system correctness.
Set 8 Audio Level RMS PortSet 8 Audio Level RMS Port validates deterministic RMS math parity over public synthetic samples without audio capture, microphone permission, or UI readiness authority.
Set 8 Audio Level RMS Port ports the Swift AudioLevelMonitor normalizedLevel RMS calculation into a bounded Python component and checks float, int16, clamp, empty-buffer, and unsupported-format cases over public synthetic sample arrays. It carries source-module refs, digests, anchors, sample counts, parity verdicts, negative cases, and an scope limit while excluding AVCaptureSession startup, microphone permission, recorded audio, device state, UI readiness, source-file changes, launch, public sharing, and whole-system correctness.
Scope limit Deterministic RMS parity evidence over public fixture inputs and copied source refs only; no macOS audio-session evidence, microphone permission authority, device capture, recorded audio, UI readiness, source-file changes, launch-scope decision, publishing-scope decision, or whole-system correctness.
This component ports the pure AudioLevelMonitor.normalizedLevel RMS math from Swift to Python and exercises it over public synthetic sample arrays.
The bundle is bounded to numeric parity. It does not start an AVCaptureSession, request microphone permission, read recorded audio, capture a device, claim UI readiness, authorize public sharing, or approve launch.
Purpose
The Swift AudioLevelMonitor feeds a live microphone level meter in a recording app. Most of that file is platform machinery: opening a capture session, selecting a device, reading sample buffers off a callback. Buried inside is one small, pure function, normalizedLevel, that turns a block of audio samples into a single number between zero and one. That number is the only part that can be checked without a microphone, so it is the only part this component ports.
The single question this component answers is: does the Python re-implementation of that calculation produce the same level value as the Swift original, on inputs we can publish? Everything device-specific, permission-gated, or stateful is deliberately left on the Swift side. What crosses into Python is the arithmetic alone.
The interesting choice here is what is held out, not what is included. A live level meter is hard to test because it depends on real audio hardware and OS permissions that cannot live in a public fixture. By isolating the pure amplitude maths and exercising it over synthetic sample arrays, the component keeps a checkable parity claim about the part that matters for the meter reading, while making no claim at all about capture, permission, or device state. The test is scoped to being a maths port and nothing more.
How it works
normalized_level takes a sequence of samples and a format tag. It accepts only float32 and int16; any other tag raises ValueError, which is how the "unsupported format" case is exercised. An empty buffer returns 0.0 immediately, before any arithmetic.
For each sample it accumulates the square of the value. Float samples are used as-is; int16 samples are first divided by 32767.0 (the Swift Int16.max) to map the integer range onto roughly minus-one to one. It then takes the root mean square, sqrt(total / count), which summarises the block's energy as a single amplitude. That value is multiplied by 8.0 and clamped to the [0.0, 1.0] range with min(max(rms * 8.0, 0.0), 1.0). The gain of eight is a display choice carried over verbatim from the Swift source: quiet speech sits low on a zero-to-one meter without it, so the level is scaled up and then capped so loud input cannot overshoot one. These two lines, the int16 divisor and the rms * 8 clamp, are the anchors the bundle requires to match the copied Swift text.
The runtime checks three reference cases drawn from a public probe manifest (float32, int16, and an over-one buffer that must clamp), optionally decodes mono 16-bit PCM WAV byte fixtures and recomputes their level from the raw bytes, and runs three negative exercises: empty buffer must read zero, an over-one buffer must clamp to one, and an unknown format must be refused. Each case compares the observed level against the manifest's expected value within a small tolerance. A mismatch, a missing expected case, or a failed refusal is recorded as a finding, and any finding turns the verdict from pass to blocked.
Shape
Read this module as a bounded RMS-parity pipeline: the JSON bundle names the reader authority, runtime locus, standard, and generated navigation edges; the runtime ports Swift normalizedLevel math over public fixture arrays; tests and result records verify numeric parity and metadata-only evidence. Generated Mermaid and Atlas links are navigation status, not macOS audio-session, microphone, device, source-file changes, public sharing, or launch-scope decision.
Diagram source
flowchart TD swift["Copied Swift source AudioLevelMonitor.normalizedLevel metadata-only; anchors only"] manifest["Public probe manifest synthetic sample arrays + WAV bytes expected level per case"] samples["normalized_level(samples, format)"] fmt{"format tag?"} refuse["raise ValueError unsupported format refused"] empty{"buffer empty?"} zero["return 0.0"] scale["square + accumulate int16 divided by 32767"] rms["rms = sqrt(total / count)"] clamp["min(max(rms * 8, 0), 1) scaled, then clamped to 0..1"] compare["compare observed vs expected within tolerance"] verdict{"any finding?"} blocked["status: blocked"] passed["status: pass"] ceiling["Scope limit RMS parity over public fixtures only no audio session, microphone, device, source-file changes, public sharing, or launch"] swift --> samples manifest --> samples samples --> fmt fmt -->|"not float32/int16"| refuse fmt -->|"float32 or int16"| empty empty -->|"yes"| zero empty -->|"no"| scale scale --> rms rms --> clamp clamp --> compare refuse --> compare zero --> compare compare --> verdict verdict -->|"yes"| blocked verdict -->|"no"| passed blocked --> ceiling passed --> ceiling
Reader Evidence Routing
Bundle route: read core/paper_module_capsules.json::paper_modules[59] before treating this Markdown as explanation.
Generated route: inspect paper_modules/batch8_audio_level_rms_port.json for the current generated instance derived from the source record.
Bundle route: inspect examples/batch8_audio_level_rms_port/exported_batch8_audio_level_rms_port_bundle for copied Swift source refs and digest evidence.
Runtime route: run tests/test_batch8_audio_level_rms_port.py and the commands in ## Validation Result record Path for recomputation evidence.
Prior Art Grounding
The component is grounded in standard digital-audio metering practice: root mean square amplitude is a common way to summarize signal energy for level displays, while OS capture APIs and media tools are kept outside pure numeric tests. Useful anchors include:
Apple's AVFoundation media framework family for time-based audiovisual capture and processing on Apple platforms.
FFmpeg audio/video documentation, as a broad media-processing toolchain where audio streams and levels are handled as explicit inputs and transforms.
Microcosm borrows only the pure RMS-level calculation shape and ports it to fixture-bound Python parity tests. It does not start an audio session, request microphone permission, read recorded audio, capture a device, or approve UI or launch-scope decision.
Source Reference
The exported bundle copies apps/demo-take-console/Sources/DemoTakeConsoleApp/AudioLevelMonitor.swift under examples/batch8_audio_level_rms_port/exported_batch8_audio_level_rms_port_bundle/source_modules/. Result records carry refs, digests, anchors, sample counts, and parity verdicts, not copied body text, recorded audio, or private device state.
Mechanism Set
The validator requires float32 parity, int16 parity, over-one clamp behavior, empty-buffer zero behavior, and unsupported-format refusal. Shared registry, sign-off, runtime-shell, CLI, atlas, package-data, and generated docs wiring is intentionally deferred while the existing shared Microcosm core lease is active.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the bounded RMS parity result record and sign-off JSON. The bundle command validates the copied Swift source module, digest anchors, negative exercises, body-exclusion scan, and source-ref boundary. The focused test checks the Python port, bundle validation, result record body scan, and scope limit.
This result record path is reader-verifiable evidence only. It does not start an audio session, request microphone permission, read recorded audio, prove device capture, approve UI readiness, change source files, authorize public sharing, or approve launch.
Scope boundary
Scope limit
This is deterministic Python-port evidence over fixture inputs only. It is not macOS audio-session evidence, not microphone permission authority, not device capture, not UI readiness, not source-file changes, and not launch-scope decision.
Scope limit
This paper module can claim a deterministic Python port of the audio-level RMS calculation with a diagram view generated for this module and navigation links available from the same source row. It can explain deterministic numeric RMS/level behavior over fixture inputs and metadata-only result records.
It cannot claim macOS audio-session evidence, microphone permission authority, device capture, UI readiness, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness. Those claims would need new supporting evidence before this module could narrate them.
Set 8 Compliance Pipeline BundleSet 8 Compliance Pipeline Bundle validates copied compliance scanner and observe-pipeline mechanics without refreshing the full ledger or dispatching bridge/provider work.
Set 8 Compliance Pipeline Bundle imports compliance scanner registry, bounded compliance ledger builder, and observe-loop pipeline helper bodies into a runnable component. It checks registry shape, bounded no-write compliance checks, baseline scanner truth accounting, pure pipeline helper behavior, source-module digests, negative cases, and scope limits while excluding full compliance-ledger refresh, bridge/external model access, source note mutation, repository mutation, launch, public sharing, and complete compliance proof.
Scope limit Declared public compliance and pipeline fixture evidence plus copied source-module digest checks only; no full compliance-ledger refresh, bridge/external model access authority, source note mutation, repository mutation, complete compliance proof, launch-scope decision, publishing-scope decision, or whole-system correctness.
batch8_compliance_pipeline_capsule copies two source subsystems into Microcosm as source bodies and then exercises them. The first is the compliance scanner registry and its bounded ledger builder. The second is the six-stage observe pipeline that turns a source note into a synthesis seed. The component runs six engines over the copied bodies and writes metadata-only result records.
Purpose
Most of the bundle components in this set are shape linters: they grep the copied source for expected tokens and pass when the tokens are present. This one goes further. Four of its six engines run the copied bodies on synthetic inputs, importing the pipeline and scanner helpers directly or driving the ledger builder as a subprocess, so the result record records observed behaviour rather than mere presence. The question it answers is narrow and testable: when these two subsystems are imported as copied bodies, do they still behave as their source contracts say, without touching the live ledger or dispatching any work?
The behaviour worth singling out is digest preservation. The pipeline compresses a long source note down to a short digest before deciding what to inspect next. If that compression silently drops an instruction, the agent downstream loses it. The component feeds the real digest_raw_seed an eighty-line block of low-signal text with one directive line buried inside, then checks the directive survives the compression. The matching negative case removes the directive marker from the copied source and confirms the directive is then lost. That pairing is what the page is really about: a compression step that is asserted to keep the one line that matters, with a test that fails when it does not.
The standing limit is just as deliberate. The bounded compliance check runs the ledger builder in --check --report mode, which reads and reports but never writes the ledger. The pipeline engines stop before any bridge or external model access. The bundle is evidence that the imported mechanics work on a sample, not a claim that the full compliance ledger is fresh or that every branch is covered.
Role
This module imports the source compliance scanner registry, the bounded compliance ledger builder, and the observe-loop pipeline stages into Microcosm as copied source bodies with a runnable component.
The component runs six engines and passes only if all six pass and every required source body is present.
compliance_registry_runtime_witness confirms the copied registry exposes the adapter table, the domain and baseline standard-id sets, and a scan_all entry point, that the coverage adapter carries its self-audit fields, and that the ledger builder carries its bounded-check command. When the live registry is importable it also reads the adapter, domain, and baseline counts as a shape witness, never as a freshness claim.
compliance_coverage_bounded_check runs the ledger builder with --check --report for two named standards. The pass condition is strict: the check reports ok, wrote_ledger is false, there are no error findings, and a next-step ratchet command is present. The point is a check that reads and reports without writing. Stale ledger rows that were not selected stay outside the claim.
baseline_companion_scanner_contract runs the baseline scanner on a sample standard and checks the returned row is honest about its own shallowness: it must be marked a baseline-inventory row with no domain-specific adapter, so a bare file-exists check can never read as a real compliance pass.
pipeline_digest_and_shard_normalization exercises three pure helpers from the extract stage. It checks the buried directive survives digest compression, that an unknown shard status is normalised to pending while the original value is preserved as a variant, and that diverse-shard selection caps how many shards one group can contribute.
pipeline_observe_compile_helpers runs the compile-stage helpers on a small fixture and checks they pull the right known-file mentions from free text, order follow-up files, and lift probe questions from a plan while skipping synthesis and summary roles.
pipeline_dispatch_process_boundary_contract confirms the execute and process stages keep the dispatch boundary explicit. It checks the copied bodies carry the observe_dispatch_skipped and observe_dispatch_started markers and the result record-selection helper, so the page can state plainly that bridge dispatch stays disabled.
Each engine carries its own scope limit in the result record. The six negative cases each remove one load-bearing token from a copied body and confirm the matching engine then reports blocked, so a pass means the contract was actually exercised rather than skipped.
Shape
The authoritative source record is core/paper_module_capsules.json::paper_modules[60:paper_module.batch8_compliance_pipeline_capsule]. The generated JSON instance is paper_modules/batch8_compliance_pipeline_capsule.json, whose source_refs mark that source record as the source of record and this Markdown as legacy_markdown_projection_not_source_authority.
Source refs
Dispatch and process boundary
observe_dispatch_skipped
Diagram source
flowchart LR bundle["Copied source bundle 11 source bodies body_in_receipt: false"] subgraph Compliance["Compliance subsystem (3 engines)"] reg["Registry runtime witness adapter table, scan_all, coverage self-audit"] bounded["Bounded ledger check --check --report reports ok, wrote_ledger: false"] base["Baseline scanner contract row admits no domain adapter"] end subgraph Pipeline["Observe pipeline (3 engines)"] digest["Digest and shard helpers buried directive survives; status normalised, variant kept"] compile["Compile helpers file mentions, follow-ups, probe questions"] boundary["Dispatch and process boundary observe_dispatch_skipped"] end neg["6 negative cases remove one token per body; matching engine reports blocked"] result records["metadata-only result records result, board, validation"] ceiling["Scope limit no ledger refresh, no provider/bridge dispatch, no source note or source-file changes, no public sharing or launch"] bundle --> reg & bounded & base bundle --> digest & compile & boundary bundle --> neg reg & bounded & base --> result records digest & compile & boundary --> result records neg --> result records result records --> ceiling
The shape is a bounded compliance and observe-pipeline witness. The bundle names the component subject batch8_compliance_pipeline_capsule, the mechanism subject mechanism.batch8_compliance_pipeline_capsule.validates_public_compliance_pipeline_capsule, the resolved runtime/source locus src/microcosm_core/organs/batch8_compliance_pipeline_capsule.py, and the dependency/concept/law edges.
The local standard, when read as standards/std_microcosm_batch8_compliance_pipeline_capsule.json, keeps the same boundary: public engine ids, stable negative-case codes, source refs, digests, line counts, required anchors, bounded synthetic outcomes, scope limits, and scope boundaries are public-safe; keys, account secrets, browser state, account or browser state, model-output data bodies, browser UI live-access material, raw operator transcripts, private artifact bodies, live observe dispatch state, and source note bodies are forbidden public inputs. Its validator contract expects eleven copied source source modules and six negative cases, with the runtime command routed through microcosm_core.organs.batch8_compliance_pipeline_capsule.
The runtime locus writes and validates result records through run, run_batch8_compliance_pipeline_bundle, result_card, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING. The fixture path fixtures/first_wave/batch8_compliance_pipeline_capsule/input and the example bundle examples/batch8_compliance_pipeline_capsule/exported_batch8_compliance_pipeline_capsule_bundle carry the public exercise inputs, source-module manifest, and copied compliance/pipeline source bodies. The manifest currently records source_import_class: copied_non_secret_macro_body, module_count: 11, and body_in_receipt: false.
Validation evidence is the focused test tests/test_batch8_compliance_pipeline_capsule.py, the first-wave result record set under receipts/first_wave/batch8_compliance_pipeline_capsule/, the sign-off result record result records/sign-off/first_wave/batch8_compliance_pipeline_capsule_fixture_acceptance.json, the runtime-shell exported validation result record under receipts/runtime_shell/demo_project/organs/batch8_compliance_pipeline_capsule/, and the verifier cycle result record state/microcosm_verifier/receipts/20260604T0346Z_batch8_compliance_pipeline_capsule_cycle.json. Those result records can show pass status, exact-copy digest/anchor checks, stable negative cases, no-write behavior, secret/body exclusion scans, and body_in_receipt: false; they do not become full compliance-ledger freshness, pipeline dispatch, external model access, source-file changes, public sharing, launch, or whole-system correctness authority.
Reader Evidence Routing
Bundle route: read core/paper_module_capsules.json::paper_modules[60] before treating this Markdown as explanation.
Generated route: inspect paper_modules/batch8_compliance_pipeline_capsule.json for the current generated instance (relationship graph, diagram availability, and lattice position).
Bundle route: inspect examples/batch8_compliance_pipeline_capsule/exported_batch8_compliance_pipeline_capsule_bundle for copied compliance and pipeline source refs.
Runtime route: run tests/test_batch8_compliance_pipeline_capsule.py and the commands in ## Validation Result record Path for recomputation evidence.
Prior Art Grounding
This bundle borrows from control-assessment, policy-as-code, provenance, and observability practice. Useful anchors include:
NIST SP 800-53 Rev. 5, as a control-catalog pattern for naming, assessing, and reporting control posture.
Open Policy Agent, as a general-purpose policy engine pattern for evaluating structured inputs without embedding every rule in the caller.
SLSA provenance, for treating artifact origin and process metadata as explicit attestations.
OpenTelemetry, for instrumentation patterns around pipeline stages, traces, metrics, and logs.
Microcosm borrows the scanner, policy, provenance, and pipeline-stage shape, but the component only validates bounded no-write behavior and pure helper mechanics. It stays with bounded registry/helper checks; broader compliance refresh, provider work, source-record changes, and complete branch certification are outside this fixture.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the bounded compliance/pipeline exercise result record and sign-off JSON. The bundle command validates copied compliance and pipeline source modules, manifest digests, observed negative cases, result record body scans, and public/private boundary checks. The focused test confirms the no-write runtime boundary, bundle validation, omission posture, and scope limit.
This result record path is reader-verifiable evidence only. It does not refresh the full compliance ledger, dispatch bridge or provider work, change source records, certify every compliance branch, authorize public sharing, or approve launch.
Scope boundary
Scope limit
The bundle validates registry shape, bounded no-write compliance checks, baseline scanner truth accounting, and pure pipeline helper behavior. It does not refresh the full compliance ledger, dispatch bridge/provider work, change source records, or certify every compliance and pipeline branch.
Scope limit
This paper module can claim a compliance pipeline fixture with a diagram view generated for this module and a navigable atlas card. It can explain registry shape checks, bounded no-write compliance probes, scanner truth accounting, pure pipeline helper behavior, and metadata-only result records.
It cannot claim full compliance-ledger refresh, bridge or external model access, source-record changes, complete compliance branch certification, public sharing, launch, or whole-system correctness.
Set 8 Policy Engines BundleSet 8 Policy Engines Bundle validates three deterministic public policy-engine exercises without running campaigns, providers, markets, or repository mutations.
Set 8 Policy Engines Bundle imports Lab contract audit red/green gating, market-fusion fail-closed claim preflight, and campaign dispatch transition adjudication as exact copied source bodies with bounded public exercises. It checks source-module manifests, stable negative cases, exercise outcomes, source digests, and scope limits while excluding live campaigns, external model access, repository mutation, private artifact export, market-level conclusions, launch, public sharing, and whole-system safety.
Scope limit Deterministic public policy-engine fixture evidence and copied source source refs only; no live campaign execution, external model access, repository mutation, private artifact export, market validation, publishing-scope decision, launch-scope decision, or whole-system correctness.
This component imports three Set-8 policy engines as exact copied source source bodies plus bounded public exercises: Lab contract audit red/green gating, market-fusion fail-closed claim preflight, and campaign dispatch transition adjudication.
The bundle is source-open and bounded. It exercises deterministic policy mechanics over synthetic public fixtures. It does not run live campaigns, use external model services, mutate repositories, export private artifacts, claim market-level conclusions, authorize public sharing, or approve launch.
Purpose
The three engines copied here share one design idea: a machine-checkable gate that runs before any judgement, model call, or downstream action, and refuses by default rather than passing on absent evidence. The bundle answers a single question for a cold reader: do these copied gate bodies still make the same deterministic decisions when run against public fixture inputs?
The Lab contract audit reads persisted Lab node artifacts from disk and applies fixed structural rules: a ban on question marks in compute-node outputs (an output carrying a ? is treated as an unresolved hedge, not an answer), tuple formatting and two-sentence annotation rules, exact thesis inheritance between nodes, prediction targets grounded against an allowed set, and contradiction reconciliation. Any hard fail flips the report from green to red. The interesting choice is that this audit is deterministic and runs ahead of any semantic interpretation, so a runtime gate can fail closed on structure without asking a model whether the output looks right.
The market-fusion readiness gate decides whether a consumer may turn raw feed presence into a cross-feed claim. Every registered candidate situation is currently set to refuse, each for named, specific reasons (a missing provider, an absent event window, relation edges that are not measurement-conditioned). An unregistered situation also refuses, but with the distinct reason candidate_situation_gate_missing. That distinction is the point: the gate fails closed on anything it has not explicitly reasoned about, and the bundle checks that a registered refusal and a fail-closed refusal stay legible as different things.
The campaign dispatch adjudicator is a small state machine over a fixed table of legal status transitions. It returns legal_transition for an allowed move, already_target for a no-op, and raises an error for an illegal one. Its load- bearing rule is that completed is terminal: a completed dispatch cannot move back to running without an explicit superseding event.
Shape
Read this module as a bounded evidence pipeline: the JSON bundle names the paper-module authority, runtime locus, standard, and generated projections; the runtime exercises copied policy sources against public fixtures; the tests and result record commands verify those fixture mechanics and scope boundaries. Everything below the bundle is reader or navigation evidence, not live policy, source-file changes, market, public sharing, provider, production, or launch-scope decision.
flowchart TD bundle["Copied source source bodies lab_contract_audit.py market_fusion_readiness.py campaign_state_transition.py"] fixtures["Public synthetic fixtures Lab node artifacts, candidate claims, dispatch status pairs"] subgraph Lab["Lab contract audit"] labrun["compute_lab_contract_audit question-mark ban, tuple/annotation, thesis inheritance, target grounding"] labgreen["green no hard fails"] labred["red QUESTION_MARK_OUTPUT and others"] end subgraph Market["Market-fusion readiness"] mkrun["preflight_candidate_situation"] mknamed["refuse: named reasons registered situation"] mkmissing["refuse: candidate_situation_gate_missing fail-closed default"] end subgraph Campaign["Campaign dispatch adjudicator"] cprun["validate_dispatch_transition"] cplegal["legal_transition / already_target"] cpillegal["CampaignTransitionError completed is terminal"] end exercises["Bundle evaluator three engines must pass, three stable negative cases"] ceiling["Scope limit fixture evidence and copied source refs only no live campaign, provider, market, repo, or launch-scope decision"] bundle --> labrun bundle --> mkrun bundle --> cprun fixtures --> labrun fixtures --> mkrun fixtures --> cprun labrun --> labgreen labrun --> labred mkrun --> mknamed mkrun --> mkmissing cprun --> cplegal cprun --> cpillegal labred --> exercises mkmissing --> exercises cpillegal --> exercises exercises --> ceiling
Reader Evidence Routing
Bundle route: read core/paper_module_capsules.json::paper_modules[61] before treating this Markdown as explanation.
Generated route: inspect paper_modules/batch8_policy_engines_capsule.json for the current generated instance of this module.
Bundle route: inspect examples/batch8_policy_engines_capsule/exported_batch8_policy_engines_capsule_bundle for the three copied source policy sources.
Runtime route: run tests/test_batch8_policy_engines_capsule.py and the commands in ## Validation Result record Path for recomputation evidence.
Prior Art Grounding
This bundle borrows from policy-as-code, risk-management, and market-claim boundary practice. Useful anchors include:
Open Policy Agent, which treats policy as a separately evaluated engine over structured input.
NIST's AI Risk Management Framework, whose govern/map/measure/manage posture is a useful precedent for explicit risk gates and red/green decision surfaces.
The CFTC's prediction markets explainer, as a boundary reminder for market-facing claims and event-contract language.
Microcosm borrows the deterministic policy-gate and market-claim-preflight shape, but keeps the component to fixture inputs and copied public source. It does not run campaigns, use external model services, claim market-level conclusions, mutate repositories, or approve launch.
Source Modules
The exported bundle copies the relevant source sources under examples/batch8_policy_engines_capsule/exported_batch8_policy_engines_capsule_bundle/source_modules/. Result records carry source refs, digests, anchors, counts, and exercise outcomes, not copied body text or private state.
Mechanism Set
The validator requires exactly these three engine rows: Lab contract audit deterministic red gate, market-fusion readiness fail-closed gate, and campaign dispatch status transition adjudicator. The source module manifest requires three exact copied source source modules. The fixture requires three stable negative cases, one per engine row.
Each engine exercise runs the copied body and checks a concrete decision, so a silent change in gate behaviour shows up as a blocked exercise:
Lab contract audit: a green artifact set must return green, and the same set with a banned ? injected into a compute-node output must return red with QUESTION_MARK_OUTPUT in its hard fails. The negative case BATCH8_LAB_CONTRACT_QUESTION_MARK_RED_GATE confirms the red gate fires.
Market-fusion readiness: a registered candidate situation must refuse with named reasons, while an unregistered situation and a malformed payload must both refuse with candidate_situation_gate_missing. The negative case BATCH8_MARKET_FUSION_MISSING_GATE_REFUSED confirms the fail-closed default.
Campaign dispatch adjudicator: candidate -> blocked is a legal_transition, completed -> completed is already_target, and completed -> running raises a terminal-state error. The negative case BATCH8_CAMPAIGN_COMPLETED_TO_RUNNING_REFUSED confirms the refusal.
Shared registry, sign-off, runtime-shell, CLI, atlas, package-data, and generated docs wiring is intentionally deferred while the existing shared Microcosm core lease is active.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the bounded policy-engine result record and sign-off JSON. The bundle command validates copied source policy sources, manifest digests, negative cases, source-body exclusion, and scope limit posture. The focused test checks deterministic red/green gates, bundle validation, private-boundary scans, and the no-launch scope limit.
This result record path is reader-verifiable evidence only. It does not run live campaigns, use external model services, mutate repositories, validate markets, certify whole system safety, authorize public sharing, or approve launch.
Scope boundary
Scope limit
This is deterministic public-system evidence over fixture inputs only. It is not Lab correctness, not live campaign execution authority, not market validation, not whole-system safety, not repository mutation authority, not external model access, and not launch-scope decision.
Scope limit
This paper module covers a bounded policy-engines fixture. A diagram view and atlas card are generated for this module. It can explain deterministic policy checks over public fixture inputs and metadata-only source-module result records.
It cannot claim Lab correctness, live campaign execution authority, market validation, whole-system safety, repository mutation authority, external model access, publishing-scope decision, launch-scope decision, or private-system equivalence.
Set 8 Structural Theses BundleSet 8 Structural Theses Bundle validates public synthetic thesis-family replay without financial decisions, live-market validation, external model access, or portfolio authority.
Set 8 Structural Theses Bundle imports tools/finance/structural_theses.py as exact copied source source and exercises CP1/CP2 thesis-family validation over public synthetic winner, loser, and control rows. It checks source digest parity, anchors, public family evidence, loser evidence, negative controls, survivor-only rejection, forward-gate-breach rejection, control-leak rejection, runtime verdicts, and scope limits while excluding financial decisions, investment recommendations, live market data, external model access, portfolio action, launch, public sharing, and whole-system correctness.
Scope limit Deterministic fixture evidence over public synthetic thesis rows and copied source refs only; no financial decisions, investment recommendation, live-market validation, external model access, portfolio authority, publishing-scope decision, launch-scope decision, or whole-system correctness.
This component imports tools/finance/structural_theses.py as exact copied source source and exercises it over public synthetic structural-thesis fixtures.
The bundle is bounded to replayable CP1/CP2 thesis-family validation. It excludes financial decisions, investment recommendations, live market data, external model access, portfolio action, public sharing, or launch.
Purpose
The copied source, tools/finance/structural_theses.py, takes a tempting idea and disciplines it. The tempting idea is that some market moves look structurally obvious, so a corpus of "obvious" theses ought to predict the next one. The trap is survivorship: it is easy to assemble a list of patterns that worked in hindsight and call the list a method.
The single question the source answers is narrower and harder. Given claims that looked structurally obvious at the time they were written, which reasoning families still survive once you resolve every claim forward and keep the ones that failed? The load-bearing inversion is that "obvious" is treated as a claim-status frozen at commitment time, never as a label applied to outcomes afterwards. A thesis whose meaning shifts once the result is known is a post-hoc mutation, and the leakage guard rejects it.
What is unusual is that losers and negative controls are first-class, required evidence rather than noise. A refuted thesis must flow through the same pipeline as a confirmed one and stay legible as valid evidence; a negative control must be present and must not resolve into a confirmed claim. The output vocabulary deliberately has no tradable "winner": the strongest a surviving pattern can earn is review_candidate, a flag for human review and nothing more.
This bundle does not assert any of those findings as market-level conclusions. It imports the source verbatim, runs it over public synthetic rows, and checks that the discipline holds. It is not financial decisions, an investment recommendation, or live-market validation.
What it validates
The component loads the copied finance source, builds one public winner, loser, and control family from a synthetic probe, and then exercises the source's own validator over both the clean family and three deliberately broken variants.
The clean path confirms the at-time semantics survive a full run: the winner resolves claim_confirmed_forward, the loser resolves claim_refuted_forward and is marked valid evidence, the control resolves as a control without becoming a confirmed claim, the surviving pattern lands in family memory as a candidate_set, and the authority boundary keeps investment_recommendation_authorized false. Under the hood the source maps each thesis onto the existing forecast-claim shape and drives the real CP1 admission, CP2 resolution, proper-scoring replay, and purged walk-forward replay with deterministic fixture prices rather than building a new evaluator.
The three negative exercises are the substance of the proof, because each one forces a specific discipline to fire:
Survivor-only. A family built from winners alone, with no failed thesis, must be rejected. The source raises NO_LOSER_FLOWED_THROUGH, NO_NEGATIVE_CONTROL, and SURVIVORSHIP_SAMPLE; the component confirms all three appear (error code BATCH8_STRUCTURAL_THESES_SURVIVOR_ONLY_REJECTED).
Forward-gate breach. A refuted pattern is smuggled into the forward review candidates. The source must raise FORWARD_GATE_BREACH, because only a pattern that survived at-time replay may produce a review_candidate (BATCH8_STRUCTURAL_THESES_FORWARD_GATE_BREACH_REJECTED).
Control leak. A negative control is mutated to claim it confirmed forward. The source must raise CONTROL_LEAK (BATCH8_STRUCTURAL_THESES_CONTROL_LEAK_REJECTED).
If any of these refusals fails to fire, the component records a blocked finding rather than a pass. Alongside the family check it verifies exact digest parity and required anchors for the copied source, so the page cannot drift away from the code it claims to exercise. Result records carry verdicts, counts, error codes, and refs only; copied bodies, market data, and model-output data stay out.
Shape
This module's shape is bundle-first and projection-bounded. The source row is core/paper_module_capsules.json::paper_modules[63:paper_module.batch8_structural_theses_capsule]; the generated JSON instance is paper_modules/batch8_structural_theses_capsule.json, and it preserves source_authority: json_capsule.
flowchart TD Bundle["JSON source record core/paper_module_capsules.json[63]"] --> Runtime["Runtime locus components/batch8_structural_theses_capsule.py"] Source["Exact copied source tools/finance/structural_theses.py"] -->|digest + anchor parity| Runtime Probe["Public synthetic probe winner, loser, control rows plus realized returns"] --> Runtime Runtime --> Build["build_structural_thesis_family CP1 admit forward-only CP2 resolve vs frozen criterion proper-scoring + purged replay"] Build --> Clean["validate_structural_thesis_family on the clean family"] Clean --> CleanCheck{"Winner confirmed, loser refuted + valid evidence, control not confirmed?"} Runtime --> Neg["Three broken variants"] Neg --> Survivor["Survivor-only family NO_LOSER_FLOWED_THROUGH NO_NEGATIVE_CONTROL SURVIVORSHIP_SAMPLE"] Neg --> Forward["Refuted pattern smuggled into forward candidates FORWARD_GATE_BREACH"] Neg --> Control["Control mutated to confirmed CONTROL_LEAK"] CleanCheck -->|yes| Pass["Bounded pass result record"] CleanCheck -->|no| Block["Blocked finding"] Survivor -->|refusal fires| Pass Forward -->|refusal fires| Pass Control -->|refusal fires| Pass Survivor -->|refusal missing| Block Forward -->|refusal missing| Block Control -->|refusal missing| Block Pass --> Ceiling["Scope limit public synthetic fixture + copied source only"] Ceiling -. forbids .-> NoClaims["No advice, recommendation, live market data, external model access, portfolio action, public sharing, launch"]
The standards lane is split deliberately. The module-specific public runtime standard, standards/std_microcosm_batch8_structural_theses_capsule.json, governs the fixture fields, public/private boundary, result record contract, validator command, negative-case count, and explicit anti-purpose. The wider codex/standards/std_microcosm.json::paper_module_coverage_contract governs how paper-module coverage, Atlas cards, generated Mermaid, and context-pack depth stay navigable without promoting generated projections into source truth.
The runtime/source lane is likewise bounded. The Microcosm component src/microcosm_core/organs/batch8_structural_theses_capsule.py loads the copied structural-theses source, builds the winner/loser/control family, evaluates survivor-only, forward-gate-breach, and control-leak negative exercises, and writes metadata-only result records. The exported bundle at examples/batch8_structural_theses_capsule/exported_batch8_structural_theses_capsule_bundle contains source_module_manifest.json; that manifest records 12 exact copied source modules for bundle validation, including source_modules/tools/finance/structural_theses.py, while the first-wave result record narrows the copied-source proof to the structural-theses module itself.
The proof lane is fixture-level. The public fixture input under fixtures/first_wave/batch8_structural_theses_capsule/input and the focused regression tests/test_batch8_structural_theses_capsule.py validate digest and anchor parity, thesis-family replay, winner/loser/control semantics, stable negative cases, body exclusion, scope limits, and the runtime-shell bundle path. Result record evidence lives under receipts/first_wave/batch8_structural_theses_capsule/, result records/sign-off/first_wave/batch8_structural_theses_capsule_fixture_acceptance.json, and receipts/runtime_shell/demo_project/organs/batch8_structural_theses_capsule/exported_batch8_structural_theses_capsule_bundle_validation_result.json.
The generated Mermaid and Atlas statuses are useful only as navigation result records: available_from_capsule_edges and linked_from_capsule_edges mean the JSON bundle edges are walkable. They do not authorize financial decisions, investment recommendations, live-market validation, external model access, portfolio action, public sharing, launch, private-system equivalence, or whole-system correctness.
Reader Evidence Routing
Bundle route: read core/paper_module_capsules.json::paper_modules[63] before treating this Markdown as explanation.
Generated route: inspect paper_modules/batch8_structural_theses_capsule.json for current generated state.
Bundle route: inspect examples/batch8_structural_theses_capsule/exported_batch8_structural_theses_capsule_bundle for copied source refs and digest evidence.
Runtime route: run tests/test_batch8_structural_theses_capsule.py and the commands in ## Validation Result record Path.
Prior Art Grounding
This bundle borrows from empirical-finance validation and bias-control patterns. Useful anchors include:
Fama and French's common risk factors work and data-library tradition, as a precedent for decomposing structural market claims into named factor families and testable rows.
MacKinlay's event-study methodology, as a prior pattern for separating an event window, expected baseline, and abnormal-return evidence.
Brown, Goetzmann, Ibbotson, and Ross on survivorship bias, which motivates explicit loser/control cases rather than winner-only thesis replay.
Microcosm borrows the factor-family, event-window, and bias-control shape, but keeps the component to public synthetic thesis rows and copied source. It is not financial decisions, an investment recommendation, live-market validation, portfolio authority, publishing-scope decision, or launch-scope decision.
Source Reference
The exported bundle copies tools/finance/structural_theses.py under examples/batch8_structural_theses_capsule/exported_batch8_structural_theses_capsule_bundle/source_modules/. Result records carry refs, digests, anchors, counts, and runtime verdicts, not copied body text, model-output data, market data, or private runtime state.
Mechanism Set
The validator requires exact source digest parity, structural-thesis source anchors, a public winner/loser/control family, valid loser evidence, a negative control that does not become a confirmed claim, and rejection of survivor-only, forward-gate-breach, and control-leak exercises. Shared registry, sign-off, runtime-shell, CLI, atlas, package-data, and generated docs wiring is intentionally deferred while shared Microcosm core leases are active.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the bounded thesis-family result record and sign-off JSON. The bundle command validates copied source refs, digest anchors, public winner/loser/control cases, negative controls, body-exclusion posture, and scope limit fields. The focused test checks fixture validation, bundle validation, survivor-bias refusal, control-leak refusal, and claim boundaries.
This result record path is reader-verifiable evidence only. It is not financial decisions, not an investment recommendation, not live-market validation, not external model access, not portfolio authority, not publishing-scope decision, and not launch-scope decision.
Scope boundary
Scope limit
This is deterministic fixture evidence over public synthetic thesis rows and exact copied source only. It is not advice, not an investment recommendation, not live-market validation, not external model access, not portfolio authority, not publishing-scope decision, and not launch-scope decision.
Scope limit
This paper module demonstrates a bounded structural-theses fixture: deterministic validation over public synthetic thesis rows, exact copied source refs, and metadata-only result records. A diagram view is generated for this module and it appears in the Atlas navigation surface.
It cannot claim advice, investment recommendation, live-market validation, external model access, portfolio authority, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness. Higher claims must be authorized by the JSON bundle and generated projection state first.
Set 8 Tools-Tail Primitives BundleSet 8 Tools-Tail Primitives Bundle validates four public tools-tail primitive exercises without Oracle truth, external model access, live bridge work, or repository mutation authority.
Set 8 Tools-Tail Primitives Bundle imports observer set diffing, JSON patch interpretation, ledger identity hashing, and shadow envelope parse coverage as exact copied source source bodies with bounded public exercises. It checks source-module manifests, stable negative cases, exercise outcomes, mechanism rows, digests, and scope limits while excluding GodMode execution, external model access, live bridge work, repository mutation, private lab artifact export, oracle truth, launch, public sharing, and whole-system correctness.
Scope limit Deterministic public tools-tail fixture evidence and copied source source refs only; no oracle truth, prediction correctness, semantic edit correctness, Lab execution authority, live bridge authority, repository mutation authority, external model access, launch-scope decision, publishing-scope decision, or whole-system correctness.
This component imports four Set-8 tools-tail primitives as exact copied source source bodies plus bounded public exercises: observer set diffing, JSON patch interpretation, ledger identity hashing, and shadow envelope parse coverage.
The bundle is intentionally source-open and bounded. It exercises pure mechanics over synthetic public fixtures. It does not run GodMode, use external model services, execute live bridge work, mutate repositories, export private lab artifacts, claim oracle truth, authorize public sharing, or approve launch.
Purpose
When a piece of tooling is copied from the private system into the public system, the obvious question is whether the copy still behaves the way the original did, or whether it has quietly drifted into a stub that only looks right. This bundle answers that one question for four small "tools-tail" primitives: does the copied source body, when run on a fixed public input, still produce the exact output the original would?
The unusual choice here is that the bundle does not re-describe the primitives or re-implement them. It loads the copied module straight from the exported bundle and runs the real functions, then checks the results against hard-coded expected values. If the copy were a hollow shell, the assertion would fail rather than pass with a green tick. The evidence is therefore behavioural, not merely a digest match: the code is executed, not just hashed.
What it deliberately does not do is treat any of that execution as truth about the world. Diffing two sets of observer rows is set arithmetic, not a claim that either set is correct. Applying a JSON patch is interpreting an edit script, not a claim that the edit is the right one. The bundle keeps the gap between "the mechanism runs as copied" and "the answer is correct" explicit, which is why the scope limit refuses oracle truth, prediction correctness, and semantic edit correctness even though real code ran.
flowchart TD bundle["JSON source record core/paper_module_capsules.json[64] source basis: source record"] instance["Generated JSON instance paper_modules/batch8_tools_tail_primitives_capsule.json 20 edges; 0 unresolved selective relations"] markdown["Reader projection paper_modules/batch8_tools_tail_primitives_capsule.md"] standard["Local standard standards/std_microcosm_batch8_tools_tail_primitives_capsule.json"] runtime["Runtime/source locus src/microcosm_core/components/batch8_tools_tail_primitives_capsule.py loads copied modules, runs four exercises, checks exact output"] exercises["Four primitive exercises observer set diff | JSON-patch VM ledger-id hash | shadow envelope parse each: accept path + negative case"] fixture["Public fixture input fixtures/first_wave/batch8_tools_tail_primitives_capsule/input four primitives + negative cases"] bundle["Copied source bundle examples/batch8_tools_tail_primitives_capsule/exported_batch8_tools_tail_primitives_capsule_bundle source_module_manifest.json"] tests["Tests and result records tests/test_batch8_tools_tail_primitives_capsule.py result records/first_wave + sign-off + bundle validation"] projections["Generated navigation Mermaid: available_from_capsule_edges Atlas: linked_from_capsule_edges"] ceiling["Scope limit deterministic public primitive exercises and metadata-only source refs only no oracle truth, semantic edit correctness, live bridge/Lab execution, external model access, repo mutation, public sharing, launch, or whole-system proof"] bundle --> instance bundle --> runtime instance --> projections instance --> markdown standard --> runtime runtime --> bundle bundle --> runtime fixture --> runtime runtime --> exercises exercises --> tests fixture --> tests bundle --> tests tests --> ceiling projections --> ceiling markdown --> ceiling
The bundle explains the batch8_tools_tail_primitives_capsule component and the public tools-tail mechanism, binds the import/projection drift concept plus the principle and axiom edges, and resolves the runtime locus to src/microcosm_core/organs/batch8_tools_tail_primitives_capsule.py. The local standard keeps the evidence to four primitive mechanics: observer set diffs, JSON-patch interpretation, ledger identity hashing, and shadow-envelope parse coverage. Public evidence may include primitive ids, source refs, digests, anchors, counts, stable negative cases, metadata-only result record posture, and scope limits; it must not include private lab artifacts, model-output data, bridge payloads, account or browser state, or account secret-equivalent material.
The fixture path fixtures/first_wave/batch8_tools_tail_primitives_capsule/input and exported bundle examples/batch8_tools_tail_primitives_capsule/exported_batch8_tools_tail_primitives_capsule_bundle hold the public inputs and exact copied source modules. The focused test and result records prove fixture mechanics, bundle validation, negative cases, source-module digest/anchor posture, and no body text in result records. Generated Mermaid and Atlas links only make the bundle edges walkable; they do not authorize live tool execution, bridge work, external model access, repository mutation, publishing-scope decision, launch-scope decision, or whole-system correctness.
How it works
The evaluator loads four copied modules by manifest reference and runs one bounded exercise against each, comparing the live output to a fixed expected value. A primitive passes only when every checked field matches.
Observer set diff. The copied diff_evidence and diff_predictions functions take two lists of rows keyed by id and partition them. For evidence, three lab rows and two oracle rows resolve to one overlap, one missed id, and one extra id; a row with no ledger_id is dropped rather than crashing the diff. For predictions, rows are split into matching, divergent, and missing-target sets. The exercise also asserts the dropped malformed row never appears in the serialised result, so a parse gap cannot leak through as silent data.
Version committer JSON-patch VM. The copied _apply_op interprets a small set of edit operations (set, merge, append) over a nested document by path. The exercise applies four ops, checks the resulting document exactly, and confirms that attempting to traverse into a scalar (/profile/name where profile is a string) raises VersionCommitterError instead of corrupting the document. The interesting property is the refusal: a malformed path is a controlled error, not a partial write.
Ledger-id identity hash. The copied generate_ledger_id produces a stable id from a lane and a record. The exercise checks that the lane alias poly and POLYMARKET normalise to the same canonical lane and hash to the same id, so the id is identity-stable across spelling; an unknown lane falls back to an X_ prefix; and a record missing the identity field its lane requires raises ValueError rather than hashing a blank.
Shadow envelope parser coverage. The copied run parses a small envelope DSL (miner tuples, a spine line, prediction rows) written into a temporary run directory. The exercise feeds it one well-formed line and one malformed tuple per node, then checks that parsing did not hard-fail, that the well-formed rows parsed, and that the malformed tuple was counted as a comma_arity coverage gap. The point is that the parser reports its own coverage holes rather than swallowing them.
Each exercise also has a matching negative case (EXPECTED_NEGATIVE_CASES) that re-runs the same code on input designed to be rejected and confirms the rejection. So for every primitive the page shows both the accepting path and the refusing path. None of these checks open a network, a provider, or the live bridge; they run copied source bodies in process and keep the bodies out of the result records.
Reader Evidence Routing
Bundle route: read core/paper_module_capsules.json::paper_modules[64] before treating this Markdown as explanation.
Generated route: inspect paper_modules/batch8_tools_tail_primitives_capsule.json for the current generated instance.
Bundle route: inspect examples/batch8_tools_tail_primitives_capsule/exported_batch8_tools_tail_primitives_capsule_bundle for the copied source source modules.
Runtime route: run tests/test_batch8_tools_tail_primitives_capsule.py and the commands in ## Validation Result record Path.
Prior Art Grounding
This bundle borrows from standardized patch formats, transparency-log identity patterns, provenance metadata, and parser coverage practice. Useful anchors include:
IETF RFC 6902, which defines JSON Patch operations such as add, remove, replace, move, copy, and test.
IETF RFC 9162, where Certificate Transparency uses an append-only Merkle tree as an auditable log pattern.
W3C PROV, for representing the provenance of derived artifacts and their generating activities.
Microcosm borrows the patch-operation, identity-hash, append-only-log, and provenance shapes, but keeps this bundle at deterministic fixture exercises. It does not claim oracle truth, semantic edit correctness, live bridge authority, external model access, repository mutation authority, or launch-scope decision.
Source Modules
The exported bundle copies the relevant source sources under examples/batch8_tools_tail_primitives_capsule/exported_batch8_tools_tail_primitives_capsule_bundle/source_modules/. Result records carry source refs, digests, anchors, counts, and exercise outcomes, not copied body text or private state.
Mechanism Set
The validator requires exactly these four mechanism rows: observer set diff kernel, version-committer JSON patch VM, ledger-id identity hash engine, and shadow envelope DSL parser coverage.
The source module manifest requires four exact copied source source modules. The fixture requires four stable negative cases, one per mechanism row. Shared registry, sign-off, runtime-shell, CLI, atlas, and generated docs wiring is intentionally deferred while the existing shared Microcosm core lease is active.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the bounded tools-tail primitives result record and sign-off JSON. The bundle command validates copied source sources, manifest digests, observer-diff, JSON-patch, ledger-id, and shadow-envelope exercises, body-exclusion posture, and scope limit fields. The focused test checks fixture mechanics, bundle validation, negative cases, and the no-live-bridge scope limit.
This result record path is reader-verifiable evidence only. It is not oracle truth, not prediction correctness, not semantic edit correctness, not live bridge or Lab execution authority, not external model access, not repository mutation authority, not publishing-scope decision, and not launch-scope decision.
Scope boundary
Scope limit
This is deterministic public-system evidence over fixture inputs only. It is not oracle truth, not prediction correctness, not semantic edit correctness, not provenance by itself, not Lab execution authority, not live Oracle bridge authority, not repository mutation authority, not external model access, and not launch-scope decision.
Scope limit
This paper module can claim a tools-tail primitives fixture with a diagram view generated for navigation. It can explain deterministic public-system checks over fixture inputs and metadata-only source-module result records.
It cannot claim oracle truth, prediction correctness, semantic edit correctness, provenance sufficiency by itself, Lab execution authority, live Oracle bridge authority, repository mutation authority, external model access, publishing-scope decision, launch-scope decision, or whole-system correctness.
Set 8 Validator Checker BundleSet 8 Validator Checker Bundle validates selected public checker groups without becoming launch-scope decision or a complete validator-suite proof.
Set 8 Validator Checker Bundle imports the real idea_microcosm validators body and exercises policy well-formedness, status transition judging, private-boundary scanning, zero-failure, specimen, launch-gate, source-bundle, source-shuttle, concurrency, native-guard, launch-root compiler, and no-write validate entrypoint groups. It carries source anchors, public runtime-only evidence, negative cases, and scope limits while excluding launch-scope decision, hosted-public proof, source-file changes, full validator-suite proof, external model access, public sharing, and whole-system correctness.
Scope limit Selected public checker-group fixture and copied validator-source evidence only; no launch-scope decision, hosted-public proof, source-file changes, complete validator-suite proof, external model access authority, publishing-scope decision, launch-scope decision, or whole-system correctness.
This module imports the real self-indexing-cognitive-system/src/idea_microcosm/validators.py body into Microcosm and exercises individual checker functions that were not covered by the earlier status-judge-only import.
Purpose
An earlier import brought across only one entry point from validators.py, the status-judge function. That left most of the validator body imported as text but never actually run. This bundle answers a single question: when the real checker functions are invoked, do they still behave the way their names claim? It picks six groups of checkers from the copied body and runs them, rather than asserting from a distance that the file is correct.
The groups are chosen to span the kinds of judgement the validator makes: whether a status policy blocks a poisoned transition, whether the private boundary scanner finds a planted home path and email address, whether the specimen and launch-gate checkers report zero failures on the existing fixture, and whether the no-write validate(root, write_receipt=False) entry point runs without mutating anything. Each group reaches into a different part of the imported body.
The design choice worth noting is what happens when the private source state is not present. In that case the component does not pretend the checkers passed. It falls back to reading the copied source for the named anchors and marks the remaining engines public_runtime_source_only, recording that as a stated limit rather than a hidden success. The second unusual choice is that the negative cases are judged from the engine outputs themselves, so a check cannot pass merely because a fixture file happens to contain the right error string. Both choices exist to stop a green run from claiming more than it observed.
Prior Art Grounding
This bundle borrows from schema validation, fixture-driven testing, and policy/checker separation. Useful anchors include:
JSON Schema, as a general pattern for declaring structural expectations and validating data instances against them.
pytest fixtures, as a common test pattern for isolating public inputs and expected negative cases.
Open Policy Agent, as a prior art pattern for separating policy evaluation from the application code that invokes it.
Microcosm borrows the validator/checker and fixture-negative-case shape, but keeps this component to bounded checker exercises over copied public source. It is not launch-scope decision, hosted-public proof, source-file changes, or a complete validator-suite proof.
The runtime does not ask the reader to trust the phrase "validator checker." It builds a small checker membrane around a single imported source body and then records how far that membrane reaches.
The source-anchor phase reads examples/batch8_validator_checker_capsule/exported_batch8_validator_checker_capsule_bundle/source_module_manifest.json. That manifest declares one exact copied module under the public bundle-relative locus source_modules/self-indexing-cognitive-system/src/idea_microcosm/validators.py, with a 12,747-line body and digest 4b2d44810cb9db2c5f62fd39da55deb7f20f6bd44ed1a8b0ae4324d38012a1d4. Here the root segment is a manifest-included public synthetic Microcosm root. The private source-root path is lineage-only and remains excluded from public copy; the checker validates the copied bundle body, not live private source. _validator_source_anchor_matrix checks that the copied body still contains the named validator anchors: private_boundary_hits, policy_wellformedness_failures, judge_status_request, _status_collapse_suite_failures, _source_shuttle_specimen_failures, and validate(root: Path).
The checker-exercise phase then runs six bounded engines when source state is available: source anchoring, status-policy judging, private-boundary scanning, specimen checker groups, launch-gate checker groups, and the no-write validate(root, write_receipt=False) witness. In exported-bundle mode, where a public runtime should not import private source state, the same component falls back to copied-source anchor evidence and marks the remaining engines as public_runtime_source_only. That fallback is a scope limit, not a hidden pass-through to private state.
The negative-case phase is semantic rather than fixture-string-only. The component declares six failure modes: missing validator source, policy poisoning, blind private-boundary scanning, missing specimen checkers, missing launch gates, and bypassing the validate entrypoint. evaluate_negative_case observes those cases from the engine outputs, so the tests can prove the negative cases move with runtime evidence instead of passing because a fixture file contains the right error code.
The result record phase uses the shared crown-jewel runner to write result, board, validation, and sign-off artifacts, then result_card deliberately compresses them into an authority floor and body floor. Those card fields keep release_authorized, publication_authorized, provider_dispatch, model_dispatch, source_mutation_authorized, full_validator_suite_freshness_claim, public_clone_or_hosting_authority, and test_completeness_proof false while also preserving body_in_receipt: false.
Shape
Diagram source
flowchart TD A["Fixture input or exported bundle"] --> B["Source manifest validation"] B --> C["Exact copied validators.py digest and required anchors"] C --> D{"Source state available?"} D -- "yes" --> E["Six runtime checker engines"] D -- "no" --> F["Copied-source anchors plus source-only witnesses"] E --> G["Semantic negative-case evaluator"] F --> G G --> H["Crown-jewel result, board, validation, sign-off result records"] H --> I["Result card authority_floor and body_floor"] I --> J["Reader claim: bounded checker membrane, not launch-scope decision"]
Doctrine Relation
The generated JSON row binds this page to mechanism.batch8_validator_checker_capsule.validates_public_validator_checker_capsule and concept.agent_reliability_and_safety_validator_bundle; that relation is bundle-declared rather than inferred from this prose. The bundle also names the axiom refs AX-1, AX-4, AX-5, AX-7, AX-8, AX-11, and AX-12 and the principle refs P-1, P-2, P-5, P-6, P-8, P-9, P-13, and P-15. In this module those refs matter because the component separates evidence from authority, keeps JSON as the navigable contract, prevents body leakage, and refuses to promote a selected checker run into a launch or proof claim.
The dependency edges also explain the reader route. microcosm_axiom_substrate owns the axiom vocabulary this module abides by; engine_room_generated_projection_drift_gate owns the generated-projection freshness posture this page must not bypass; and public_reveal_walkthrough owns the reading lane for result records, source refs, and scope boundaries.
Evidence Model and Limitations
The strongest positive evidence is narrow and useful: the focused regression checks that all expected engines are present, the exact copied source body matches the source source digest, exported-bundle validation does not import source validators, source-anchor corruption blocks validation, result cards omit private bodies, and semantic negative cases fail when runtime evidence is weakened.
The limitations are just as important. Exported-bundle mode validates copied source anchors and public-runtime witness fields; it does not re-run the full source validator suite. The fixture proves selected checker groups and selected negative cases, not all future validator behavior. The copied source body being large does not itself increase the claim; only the named anchors, engines, digests, negative cases, and result record fields are evidence. A green run therefore supports a bounded checker-membrane claim and nothing broader.
Reader Evidence Routing
Bundle route: read core/paper_module_capsules.json::paper_modules[65] before treating this Markdown as explanation.
Generated route: inspect paper_modules/batch8_validator_checker_capsule.json for current relationship state and projection details.
Bundle route: inspect examples/batch8_validator_checker_capsule/exported_batch8_validator_checker_capsule_bundle for copied validator source refs and digest evidence.
Runtime route: run tests/test_batch8_validator_checker_capsule.py and the commands in ## Validation Result record Path.
Exercised checker groups
Policy well-formedness and status transition judging.
Private boundary scanning without putting private body text into result records.
Status collapse, internal control, correction, self-comprehension, task-ledger, and atlas navigation specimen checkers.
The no-write validate(root, write_receipt=False) entrypoint.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the bounded validator-checker result record and sign-off JSON. The bundle command validates copied checker source, manifest digests, selected checker-group exercises, body-exclusion scans, and scope limit fields. The focused test checks fixture validation, bundle validation, private-boundary scanning, and the no-complete-suite-proof scope limit.
This result record path is reader-verifiable evidence only. It does not establish the complete validator suite, authorize source-file changes, provide hosted-public proof, dispatch providers, authorize public sharing, or approve launch.
Scope boundary
Scope limit
The bundle is not launch-scope decision, not hosted-public proof, not source-file changes, and not a complete validator-suite proof.
Scope limit
This paper module can claim a bounded validator/checker fixture with a diagram view and Atlas navigation generated for it. It can explain the declared checker groups, no-write validation entrypoint, and metadata-only result record boundary.
It cannot claim launch-scope decision, hosted-public proof, source-file changes, complete validator-suite proof, publishing-scope decision, provider authority, or whole-system correctness. Any broader checker claim must be grounded in the JSON bundle and its generated projection.
Set 12 Market Dashboard Read-Model BundleSet 12 Market Dashboard Read-Model Bundle validates copied read-model helpers over public fixtures without market-level conclusions, external model access, or launch-scope decision.
Set 12 Market Dashboard Read-Model Bundle runs market-dashboard read-model over public synthetic fixtures. It validates market dashboard import stubs, validator-case derivation, runtime feed freshness overlays, related-situation rows, source anchors, digest checks, negative cases, and scope limits while excluding launch-scope decision, external model access, private-system equivalence, live market-level conclusions, investment-related actions, public sharing, and whole-system correctness.
Scope limit Fixture-bound market-dashboard read-model evidence and copied source refs only; no launch-scope decision, external model access, private-system equivalence, market-level conclusions, investment-related actions, publishing-scope decision, or whole-system correctness.
The underlying source module compiles a generated market-situation graph into a backend read model: a trust strip, a ranked situation queue, a detail index, a graph slice, facets, drilldowns, and an API contract. The read model is the shape a dashboard consumes. It runs the copied read-model helpers over small synthetic fixtures and asks one question: does the read-model layer hold its own claim boundary, or does it quietly become a market-truth or advice surface?
The interesting part is what the validator refuses rather than what it accepts. A presentation layer is the easy place for an overclaim to leak in: a label like "strong buy", an auto_apply_allowed flag left true, a freshness state that reports green from a stale or missing artifact. The copied validate_market_dashboard_read_model scans for trading and action-claim language, requires oracle_evolve.auto_apply_allowed to be false and review_gated to be true, requires no_advice_mode to be enabled, and requires the silent-omission count to be zero. The bundle drives those checks with fixtures designed to trip each one, then records whether the source actually flagged them.
The other two mechanisms guard the read path itself. A feed-freshness overlay classifies the current run into a small set of honest states so historical green proof cannot stand in for live-feed capability, and a related-situations scorer groups situations by shared entities or matching type without inventing links. Everything is fixture-bound: there is no live market data, no external model access, and no investment-related actions anywhere in scope.
Mechanisms
validate_market_dashboard_read_model
_runtime_feed_freshness_overlay
_related_situations
What the checks do
validate_market_dashboard_read_model is the structural and overclaim gate. It first checks the read model is well formed: the schema version matches, every situation in the queue resolves to a detail entry, every graph-slice edge points at a node that exists, and each drilldown source-ref returns metadata only with no arbitrary file read and no .. traversal in its route. It then enforces the claim boundary. auto_apply_allowed must be false, review_gated must be true, no_advice_mode must be enabled, the silent-omission count must be zero, and any copied source text is scanned for trading or action-claim language (buy, sell, short, price target, stop loss, and similar). The bundle feeds it five negative fixtures, one per failure shape, and confirms the source emits the matching error string for each. A read model that passed these checks but stayed silent on a planted overclaim would be the real failure, so the bundle treats a missing error as a finding.
_runtime_feed_freshness_overlay reads a per-run readiness summary and reports one of three honest states. fresh_green_feed requires the run to be ready, all targets met, no blockers, and same-day generation. stale_green_feed is artifact-backed but no longer same-day. blocked_missing_artifact covers the run that is missing its readiness file, falls short on targets, or carries blockers. The point is that a stale or absent run never reports green: historical proof cannot stand in for live-feed capability, and the state carries a plain truth-statement saying so. The bundle writes synthetic readiness files for each case and checks the classifier returns the expected state.
_related_situations builds the "see also" cohort for a situation. It collects other situations that either share an entity or match the situation type, ranks them, excludes the focus situation itself, and caps the list at six. The bundle checks one boundary case in particular: a situation with no entity overlap and a different type produces an empty cohort rather than a spurious link.
Shape
Source refs
Validate market dashboard read model
validate_market_dashboard_read_model
Blocked missing artifact
blocked_missing_artifact
Diagram source
flowchart TD A["Synthetic dashboard, freshness, related fixtures"] --> B["Copied read-model helpers (market_dashboard_read_model.py)"] B --> C["validate_market_dashboard_read_model"] C --> C1["Structure: schema, queue-to-detail, graph edges, drilldown route safety"] C --> C2["Scope limit: no auto-apply, review-gated, no-advice, no trading language, zero silent omissions"] B --> D["_runtime_feed_freshness_overlay"] D --> D1["fresh_green_feed"] D --> D2["stale_green_feed"] D --> D3["blocked_missing_artifact"] B --> E["_related_situations"] E --> E1["Entity overlap or type match; self-excluded, capped at six; no overlap means empty"] C1 --> F["metadata-only result record and card (refs, digests, counts, verdicts)"] C2 --> F D1 --> F D2 --> F D3 --> F E1 --> F
Reader Evidence Routing
Start with paper_modules/batch12_market_dashboard_read_model_capsule.json for bundle-derived source authority, then read this Markdown as the explanatory projection. Use examples/batch12_market_dashboard_read_model_capsule/exported_batch12_market_dashboard_read_model_capsule_bundle/source_module_manifest.json to inspect copied-source digest status before opening copied source modules. Use tests/test_batch12_market_dashboard_read_model_capsule.py to verify the fixture and bundle expectations.
The useful evidence is dashboard read-model accounting over synthetic public fixtures: validation rows, freshness overlays, related-situation joins, negative cases, metadata-only result records, and scope limit fields.
Prior Art Grounding
The component is grounded in CQRS/read-model and dashboard-observability patterns: derive presentation-ready projections from source data, make freshness visible, and keep the read surface separate from mutation authority. Useful anchors include:
Microsoft's CQRS pattern, where read models are optimized for queries and presentation rather than command handling.
Grafana dashboards, which query and transform data sources into operational panels.
Microcosm borrows the read-model shape for dashboard validation, runtime feed freshness overlays, and related-situation joins. The result is fixture-bound mechanism evidence; it does not become market-level conclusions, external model access, investment-related actions, or launch-scope decision.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the dashboard read-model result record and sign-off JSON. The bundle command validates copied source system, manifest digests, freshness overlay rows, related-situation joins, negative cases, and metadata-only result record posture. The focused test checks fixture validation, bundle validation, digest/anchor coverage, and scope limits.
This result record path is reader-verifiable evidence only. It excludes launch, external model access, private-system equivalence, market-level conclusions, investment-related actions, or whole-system correctness.
Scope boundary
Scope limit
This module may claim public fixture evidence that the copied source system produced market-dashboard read-model rows, runtime feed freshness overlays, related-situation joins, negative-case checks, metadata-only result record posture, and validation result records over synthetic inputs.
This module may not claim launch-scope decision, external model access, private-system equivalence, live market-level conclusions, investment-related actions, deployment posture, source-file changes, publishing-scope decision, or whole-system correctness.
Scope limit
This is fixture-bound market-dashboard read-model mechanism evidence. It excludes launch, external model access, private-system equivalence, market-level conclusions, investment-related actions, deployment posture, source-file changes, publishing-scope decision, or whole-system correctness.
Set 12 Prediction Market Board BundleSet 12 Prediction Market Board Bundle validates copied prediction-board and quant-mart diagnostics over public fixtures without market-level conclusions or provider authority.
Set 12 Prediction Market Board Bundle runs prediction-market board and quant-mart diagnostics over public synthetic fixtures. It validates prediction-market joins, Polymarket identity by slug, provider drift monitors, missingness boards, prior green deltas, source lifecycle vintage enrichment, source-module digests, negative cases, and scope limits while excluding launch-scope decision, external model access, private-system equivalence, market-level conclusions, provider truth, investment-related actions, public sharing, and whole-system correctness.
Scope limit Fixture-bound prediction-board and quant-mart diagnostic evidence plus copied source refs only; no market-level conclusions, provider truth, investment-related actions, external model access, private-system equivalence, launch-scope decision, publishing-scope decision, or whole-system correctness.
Market and source dashboards have a recurring failure: a row looks like a fact when it is really a guess. A duplicate listing inflates a volume figure, an unmatched market slug grows a fabricated identity, a feed reports zero rows but the board shows it as healthy, and a "change since last time" number appears even when there is no prior baseline to compare against. The single question this component answers is whether the copied presentation-mart logic keeps those distinctions honest when run over public synthetic inputs.
It does that by importing the real quant_presentation_mart helper body and running it against fixtures that are built to expose each trap, then asserting the exact diagnostic the body should produce. The interesting choice is that the board never asserts what a market price means. It computes accounting about the data: which event a market belongs to, whether its identity was actually matched, how providers drifted, where rows went missing, and whether a vintage date is genuinely present. Aggregation is deliberately conservative. A missing value stays missing rather than defaulting to a confident zero, and an unmatched slug is reported as missing_from_feed_artifact instead of being given a synthetic event id.
The result is fixture-bound evidence, not a forecast. The board is a diagnostic surface over public synthetic rows. It does not read live markets, use external model services, or claim that any number is tradeable.
Mechanisms
_prediction_market_board
_polymarket_identity_by_slug
_provider_drift_monitor
_missingness_board
_delta_since_previous_green
_macro_lifecycle_by_slug
_macro_regime_board
How it works
The bundle loads three fixtures, runs the copied helpers, and checks eight named invariants. Each check targets a specific way a board can quietly mislead.
The event-join engine (_prediction_market_board with _polymarket_identity_by_slug) groups raw market rows into events using the Polymarket identity snapshot. Identity is matched by market_slug. When two rows share the same slug and outcome, only the higher-volume one is kept, so a duplicate listing cannot double a market count or inflate an aggregate. A slug with no identity match is not dropped and is not given a made-up event id. Its event_identity_status becomes missing_from_feed_artifact and its max_liquidity stays at 0.0. The fixture proves all three: the duplicate fold (top volume 900000 with one surviving market), the orphan with a null event id, and the deduped aggregate.
The provider-drift monitor (_provider_drift_monitor) reads each feed's diagnostics and raises typed flags rather than a single health score. Generic transport problems (provider_fallback_used, html_response_seen, fetch_failures) are kept distinct from FRED-specific ones (fred_invalid_series, fred_network_warning). The fixture checks that the stock feed surfaces the generic set, the news feed stays clean, and the source feed surfaces the FRED set. Keeping the families apart means a source data-source fault is not laundered into a generic warning.
The missingness board (_missingness_board) lists only feeds that are not both non-empty and ok. A feed with zero rows is labelled zero_rows; a populated but low-quality feed is labelled quality_degraded; a healthy feed is omitted entirely. The fixture confirms the healthy feed is absent and the two failing lanes carry the correct reason, so an empty feed cannot read as present.
The prior-green delta (_delta_since_previous_green) only computes a "change since last run" when a previous green run actually exists. With no baseline it returns status: unavailable and an empty row_deltas_by_lane, which the fixture asserts directly. This is the guard against a delta number that has nothing to compare against.
The source lifecycle enrichment (_macro_lifecycle_by_slug feeding _macro_regime_board) buckets source series, then binds each bucket's vintage_status and release_calendar_status to whether the lifecycle structured source record genuinely carries that metadata. The fixture proves a series with a present vintage reads available with the expected observation date, while a series whose lifecycle row is absent reads missing_from_feed_artifact. A vintage date is shown only when it is really there.
Shape
Source refs
no fabricated event id
missing_from_feed_artifact
Diagram source
flowchart TD Rows["Synthetic market rows"] --> Join["Event join + identity match _prediction_market_board"] Identity["Polymarket identity snapshot"] --> Join Helpers["Quant-mart helper fixtures"] --> Drift["Provider drift monitor generic vs FRED flags"] Helpers --> Miss["Missingness board zero_rows vs quality_degraded"] Helpers --> Delta["Prior-green delta unavailable with no baseline"] Helpers --> Source["Source regime board vintage status bound to structured source record"] Join --> Dedup{"Slug + outcome seen before?"} Dedup -->|yes| Keep["Keep higher-volume market"] Dedup -->|no, unmatched| Orphan["missing_from_feed_artifact no fabricated event id"] Dedup -->|no, matched| Append["Append to event aggregate"] Keep --> Result record["metadata-only result record and card diagnostic rows, negative cases, scope limit"] Orphan --> Result record Append --> Result record Drift --> Result record Miss --> Result record Delta --> Result record Source --> Result record
Reader Evidence Routing
Start with paper_modules/batch12_prediction_market_board_capsule.json for bundle-derived source authority, then read this Markdown as the explanatory projection. Use examples/batch12_prediction_market_board_capsule/exported_batch12_prediction_market_board_capsule_bundle/source_module_manifest.json to inspect copied-source digest status before opening copied source modules. Use tests/test_batch12_prediction_market_board_capsule.py to verify the fixture and bundle expectations.
The useful evidence is diagnostic accounting over synthetic public fixtures: provider identity matching, drift rows, missingness boards, prior-green deltas, lifecycle/vintage rows, source-regime enrichment, negative cases, metadata-only result records, and scope limit fields.
Prior Art Grounding
The component borrows from prediction-market information aggregation and public market-data integration practice: event contracts expose market prices and settlement states, while dashboards must keep provider identity, missingness, and vintage drift visible. Relevant anchors include:
Robin Hanson's information markets framing, where markets are used to aggregate dispersed information about uncertain events.
Microcosm borrows the information-aggregation and provider-join shape, then keeps the board explicitly diagnostic: identity matching, provider drift, missingness, prior-green deltas, lifecycle vintage, and source-regime enrichment are tested over public synthetic fixtures. It is not market-level conclusions, provider truth, investment-related actions, or launch-scope decision.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the prediction-market board result record and sign-off JSON. The bundle command validates copied source system, manifest digests, provider identity and drift diagnostics, missingness rows, lifecycle rows, negative cases, and metadata-only result record posture. The focused test checks fixture validation, bundle validation, digest/anchor coverage, and scope limits.
This result record path is reader-verifiable evidence only. It excludes launch, external model access, private-system equivalence, market-level conclusions, provider truth, investment-related actions, or whole-system correctness.
Scope boundary
Scope limit
This is fixture-bound mechanism evidence for prediction-market joining, quant-mart diagnostics, and source-lifecycle vintage enrichment. It excludes launch, external model access, private-system equivalence, market-level conclusions, provider truth, investment-related actions, source-file changes, publishing-scope decision, or whole-system correctness.
Scope limit
It does not establish live market-level conclusions, provider truth, external model access, investment-related actions, source-file changes, launch-scope decision, publishing-scope decision, private-system equivalence, or whole-system correctness.
Set 12 launch claim-Language GateSet 12 launch claim-Language Gate checks public claim language against result record-backed scope limits without approving launch or public sharing.
Set 12 launch claim-Language Gate runs launch claim-language over public fixtures. It classifies phrases against evidence class and scope limit, checks typed ordinal evidence ranks, real-system flags, fail-closed defaults, boundary-context negation, main --assert-clear behavior, source digests, negative cases, and scope limits while excluding launch-scope decision, external model access, private-system equivalence, market-level conclusions, investment-related actions, public sharing, and whole-system correctness.
Scope limit Fixture-bound launch-claim language gate evidence and copied source refs only; no launch-scope decision, publishing-scope decision, external model access, private-system equivalence, market-level conclusions, investment-related actions, or whole-system correctness.
Public copy drifts towards over-claiming. A page that started as "fixture-proven, not yet published" gets edited over months until someone writes launch, licensing, or maturity language without noticing that nothing changed underneath. This component answers one question: does a piece of public copy claim more than the result records behind it can support, and would the launch gate catch it if it did?
The mechanism it wraps is a deterministic regex scan, not a language model. The copied gate body reads a public sharing manifest, walks every claim-bearing file it lists, and matches each line against fixed families of risky launch, licensing, maturity, and private launch-control wording. What makes the scan more than a grep is the classification step. The same family of wording is read three ways depending on context: a bare affirmative launch or maturity claim becomes an active_claim_blocker; the same wording inside a forbidden-example block or near a negation marker becomes boundary_or_negative_context and is allowed; and a phrase that has neither an affirmative verb nor a clear negation marker is parked in a needs_review queue rather than waved through.
That last branch is the interesting design choice. The gate fails closed. An ambiguous claim does not pass quietly; it lands in a no-go review state, and main --assert-clear exits non-zero whenever any active blocker or unresolved review item remains. The scan never rewrites a file, never authorises launch, and treats marketing copy as just another claim surface with an evidence ledger rather than a looser register of speech.
This paper module is the public, fixture-bound check that the wrapped gate behaves as described over the shipped fixtures. The component runs the copied gate over a safe fixture and an active fixture, then checks that boundary-context language clears, that bare launch language blocks, and that the assert-clear exit code is 2 when blockers remain. It is a check on the checker, held behind digest, result record, and scope limit boundaries.
Mechanisms
_classify_hit
build_gate
main --assert-clear
Shape
Runtime locus: src/microcosm_core/organs/batch12_release_claim_language_gate.py, especially _blocked_exercise, _write_gate_fixture, _run_main_assert_clear, _evaluate, run, run_batch12_release_claim_language_gate_bundle, result_card, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
Source source import: tools/meta/dissemination/release_claim_language_gate.py, copied into the exported bundle as one source body with digest equality and anchors RISKY_PHRASES, NEGATIVE_CONTEXT_MARKERS, def _classify_hit, and def build_gate.
Positive fixture shape: one safe boundary-context claim surface passes because limiting language keeps does_not_authorize_release: true.
Active fixture shape: two active claim blockers are reported for bare unsupported launch-language surfaces, while boundary/negative context remains counted separately.
Negative floor: affirmative_open_source_production_ready_blocks and assert_clear_returns_exit_2, with stable error codes BATCH12_RELEASE_CLAIM_ACTIVE_BLOCKER and BATCH12_RELEASE_CLAIM_ASSERT_CLEAR_EXIT_2.
Public result record posture: real-system bundle, source manifest pass, secret-exclusion scan pass, result record body scan pass, and a false body_in_receipt flag.
Source refs
safe and active public copy surfaces
release_gate_fixture.json
exact copied source gate body
source_module_manifest.json
negation marker or forbidden example => allowed
boundary_or_negative_context
Diagram source
flowchart TD Fixture["release_gate_fixture.json safe and active public copy surfaces"] Manifest["source_module_manifest.json exact copied source gate body"] Loader["load source module digest equality and required anchors"] SafeRoot["safe fixture root _write_gate_fixture(active=false)"] ActiveRoot["active fixture root _write_gate_fixture(active=true)"] Scan["build_gate scan manifest files for RISKY_PHRASES"] Classify{"_classify_hit read each phrase in context"} Boundary["boundary_or_negative_context negation marker or forbidden example => allowed"] Active["active_claim_blocker affirmative line, no downgrade => status active_claim_blocked"] Review["needs_review no clear marker either way => fail-closed no-go queue"] Assert["main --assert-clear exit 2 when not public_copy_clean"] Negatives["computed negative cases affirmative claim blocks assert-clear exits 2 private internal control leak blocks"] Result records["metadata-only result records result, board, validation, sign-off"] Ceiling["scope limit no launch, public sharing, NLP truth, secret completeness, or whole-system claim"] Fixture --> SafeRoot Fixture --> ActiveRoot Manifest --> Loader Loader --> Scan SafeRoot --> Scan ActiveRoot --> Scan Scan --> Classify Classify -->|allowed| Boundary Classify -->|blocked| Active Classify -->|ambiguous| Review Active --> Assert Review --> Assert Boundary --> Negatives Active --> Negatives Assert --> Negatives Negatives --> Result records Result records --> Ceiling
This component is the public copy gate for result record-backed evidence accounting. It does not ask whether a phrase sounds impressive; it asks whether the phrase is within the evidence class and scope limit that result records can support.
Evidence strength is typed ordinal data, not vibes: ranks, real-system flags, and fail-closed defaults constrain how far public language may climb. Independent validators reconcile each component's declared class against result record-backed facts so over-claiming is blocked and stale under-claiming can be surfaced for review. Result record scanners may downgrade when bodies or account secret-equivalent payloads leak; they cannot upgrade merely because a narrative is strong.
The boundary-context classifier is allowed to pass negated or limiting language such as "not a hosted product" while blocking bare maturity claims when no launch-scope decision exists. Marketing copy is therefore treated as another claim surface with an accounting ledger, not as a looser mode of speech.
Reader Evidence Routing
Start with paper_modules/batch12_release_claim_language_gate.json for source authority, then read this Markdown as the projection.
Open standards/std_microcosm_batch12_release_claim_language_gate.json for the required witnesses, negative floor, denied authority, result record contract, validator command, and runtime bundle command.
Open core/fixture_manifests/batch12_release_claim_language_gate.fixture_manifest.json for source-open body import count, source manifest refs, and durable result record refs.
Open examples/batch12_release_claim_language_gate/exported_batch12_release_claim_language_gate_bundle/source_module_manifest.json before inspecting copied source modules; result records carry refs, hashes, counts, verdicts, and omissions rather than copied body text.
Open tests/test_batch12_release_claim_language_gate.py for assertions on pass result records, digest mismatch rejection, fixture path safety, duplicate-key rejection, duplicate fixture names, exact source body import, and card body omission.
Run fixture and bundle routes from microcosm-substrate/. The CLI supports --card, but it does not expose a --json flag.
Use scripts/build_doctrine_projection.py --check-paper-module-corpus to verify this Markdown projection still satisfies the shared paper-module coverage contract.
Prior Art Grounding
The component borrows a narrow pattern from advertising-substantiation and regulated-communication practice: public claims should stay within evidence actually held, and stronger language requires stronger support. This is prior art for the proof-consumer shape only. The module does not implement legal compliance, include launch operations, or decide whether public copy is fit to publish.
External source result record, checked 2026-06-05:
Public communications must be fair and balanced, give a sound factual basis, and avoid false, exaggerated, unwarranted, promissory, or misleading claims.
The module only uses this as a prior-art analogue for keeping benefits, risks, and qualifications in the same local claim context.
The marketing rule guide summarizes general prohibitions on untrue or misleading material statements, unsupported material facts, unfair treatment of risks, and constrained performance or endorsement claims.
The module's investment-advice scope boundary stays negative: a green result record is not adviser marketing compliance or investment-related actions.
Current staff FAQ entries still route extracted performance and characteristics through Rule 206(4)-1 general prohibitions.
This is a currency/source-link result record for scope limit posture, not a new Microcosm capability or finance claim.
Microcosm adapts the substantiation pattern to launch and evidence language. Result record-backed classes, ordinal evidence strength, real-system flags, boundary-context exceptions, and fail-closed defaults constrain what public copy may say. The gate blocks unsupported elevation without turning itself into public launch permission, market-level conclusions, investment-related actions, legal review, or whole-system correctness.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the claim-language gate result record and sign-off JSON. The bundle command validates copied source system, source manifest digests, active-blocker and boundary-context classification, negative cases, metadata-only result records, and scope limit fields. The focused test checks pass result records, digest mismatch rejection, fixture path safety, duplicate-key and duplicate-fixture rejection, exact source body import, and card body omission.
This result record path is reader-verifiable evidence only. It excludes launch, public sharing, external model access, semantic NLP truth, complete secret detection, private-system equivalence, portability proof, market-level conclusions, investment-related actions, source-file changes, or whole-system correctness.
Scope boundary
Scope limit
This module may claim fixture-bound evidence that the Set 12 public launch-language gate can classify result record-backed public copy against an scope limit. Positive claims stay within typed claim hits, evidence strength ranks, real-system flags, boundary-context classification, fail-closed defaults, active blockers, negative cases, copied source source-module refs and bodies, source-manifest pass status, metadata-only result record scan status, secret-exclusion scan status, and validation result records.
This module may not claim public launch permission, public sharing posture, hosted product status, external model dispatch authority, semantic NLP truth, complete secret detection, private-system equivalence, portability proof, market-level conclusions, investment-related actions, source editing authority, deployment maturity, formal-result correctness beyond the listed witnesses, or whole-system correctness.
Limitations
The gate is a lexical and fixture-driven proof consumer, not a launch oracle. It exercises copied release_claim_language_gate.py behavior over bounded public markdown fixtures, so it can detect active over-claiming phrases, boundary-context exceptions, digest drift, fixture path hazards, and stable negative-case regressions. It cannot prove that public copy is semantically complete, market-accurate, legally sufficient, safe for public sharing, or free of all secrets.
The exact-copy evidence floor is intentionally narrow. The source-module manifest proves one copied source body, required anchors, digest equality, and metadata-only result record posture; it excludes refreshing the source module, accepting private-system equivalence, mutating launch policy, or publishing copied bodies into result records. Any change to the copied source body, fixture corpus, negative cases, or scope limit belongs in the source, standard, and bundle lanes before this Markdown can expand its claim.
The focused test proves the runtime contract only for the shipped fixtures and bundle shape. Passing test_batch12_release_claim_language_gate.py means the public proof consumer still rejects digest mismatch, unsafe fixture names, duplicate fixture inputs, unstable negative labels, and result record body leakage in that bundle. It does not establish launch-scope decision for other documents, providers, frontends, markets, or future site projections.
Scope limit
This is fixture-bound launch claim-language gate evidence. Its scope stops before public launch permission, public sharing posture, external model dispatch, semantic NLP truth, complete secret detection, private-system equivalence, portability proof, market-level conclusions, investment-related actions, source editing authority, deployment maturity, formal-result correctness beyond the listed witnesses, or whole-system correctness.
Source and projection details
Governing Lattice Relation
This paper module sits under concept.import_projection_and_drift_control_bundle: a copied source mechanism is imported into the public system, exercised through public fixtures, and held behind digest, result record, and projection boundaries. The bundle therefore does not treat Markdown prose as authority; it treats the JSON bundle, generated instance, mechanism row, standard, source manifest, and result records as the lattice that the prose must explain.
The governing principles P-2, P-6, P-13, and P-15 map onto the component's operational checks. Typed evidence ranks and real-system flags keep public claims below the result record-backed ceiling; public/private boundary rules keep source bodies and private launch state out of result records; negative fixtures and fail-closed defaults prevent optimistic marketing language from bypassing the validator; and generated Mermaid/Atlas rows remain projections of bundle edges, not independent launch-scope decision.
The axiom boundary is the hard scope limit. AX-5, AX-7, AX-11, and AX-12 require the gate to preserve source truth, avoid projection drift, route public copy through explicit authority checks, and block unsupported launch language. That is why the mechanism couples _write_gate_fixture, _evaluate, run_batch12_release_claim_language_gate_bundle, exact-copy source manifest validation, and metadata-only result records instead of asking a prose reviewer to decide whether a claim sounds acceptable.
The sibling dependencies define how to read the result. public_reveal_walkthrough supplies the public-copy setting, proof_derived_governed_mutation_authorization supplies the proof-before-mutation posture, and batch8_validator_checker_capsule supplies the validator/checker pattern. This module is the claim-language checker within that lattice, not the public launch decision itself.
The generated JSON row currently contributes 15 relationship edges: two paper_module.explains.organ_or_mechanism edges, one paper_module.governed_by.concept edge, four paper_module.governed_by.principle edges, four paper_module.abides_by.axiom edges, three sibling paper_module.depends_on.paper_module edges, and one resolved paper_module.cites.code_locus edge.
At this HEAD the generated instance reports zero unresolved selective relations. If future bundle edits introduce residuals, this Markdown may name them but must not invent concept ids or promote candidate doctrine.
Set 10 Cold Eval Honesty BundleSet 10 Cold Eval Honesty Bundle runs cold-eval over public fixtures without benchmark, navigation-truth, or launch-scope decision.
Set 10 Cold Eval Honesty Bundle imports the real cold_eval.py source body and runs its route-quality simulator over a synthetic public workspace. It audits the all-B idea-first scorecard shape, expected-ref injection policy, private fixture refs, missing-task refusals, negative cases, and scope limits while excluding live benchmark results, navigation truth, hosted readiness, launch-scope decision, external model access, source-file changes, and whole-system correctness.
Scope limit Fixture-bound route-quality scorecard and copied source refs only; no live benchmark, navigation truth, hosted readiness, launch-scope decision, external model access, source-file changes, or whole-system correctness.
batch10_cold_eval_honesty_capsule answers one narrow question: can the public Microcosm copy of the source cold_eval.py route-quality simulator run over a synthetic workspace, expose its measured scorecard shape, and refuse to promote that shape into a benchmark or navigation-truth claim?
The useful evidence is deliberately small. A green run means the copied source body executed, the all-B.idea_first_packet winner shape was recomputed from fixture rows, and the scope limit blocked benchmark, hosted-readiness, and launch language. It does not say idea-first routing wins in the live system.
Shape
Diagram source
flowchart TD A["Public cold-eval workspace (tasks, navigation packets)"] --> B["Copied cold_eval.py runner"] B --> A1["Arm A: flat repo entry (README, quickstart, pyproject)"] B --> A2["Arm B: idea-first packet (entry packet, atlas, index)"] A1 --> SC["Score each task by declared route refs covered (refs scored, never injected)"] A2 --> SC SC --> W["Winner per task, idea-first win count"] W --> C["Scorecard shape audit all-B win + route asymmetry + no non-public refs"] C --> D["Scope limit gate injection off, forbidden benchmark/launch claims named"] D --> E["metadata-only result record and card"]
Prior Art Grounding
This component is grounded in evaluation-transparency and benchmark-hygiene practice: scorecards should expose what was measured, what fixture assumptions were injected, and what claims the result can and cannot support. Useful anchors include:
HELM, which frames model evaluation as a transparent, scenario-bound benchmark surface rather than a single global capability claim.
Model Cards for Model Reporting, which established the pattern of pairing performance results with intended use, limitations, and caveats.
Microcosm borrows the scorecard-plus-limitations shape, then narrows it to a deterministic route-quality fixture. The all-B.idea_first_packet winner row is accounting evidence for this fixture only; it is not promoted into navigation truth, hosted readiness, or launch-scope decision.
Reader Evidence Routing
Read the scorecard as evidence accounting, not as a leaderboard. The fixture intentionally creates a public workspace where the idea-first packet wins. The component then checks that the expected-ref injection policy is off, that non-public refs are not present, and that forbidden claims are named in the manifest.
The honesty of that win turns on one design choice in the copied scorer. Each task lists the route refs an answer should reach, but those expected refs are only ever used to *score* coverage. They are never added to either arm's route, so neither arm is handed the answer. Arm A is scored on the refs a flat reader reaches from README.md, docs/quickstart.md, and pyproject.toml. Arm B is scored on the refs the navigation packets actually declare. The scoring policy is named in every row as declared_route_refs_no_expected_ref_injection_v1, and every row carries expected_ref_injection_used: false. The idea-first arm wins because the entry packets genuinely declare more of the relevant files, not because the scorer leaked the target into the route. That distinction is the difference between a measured route-quality result and a rigged one, and the scope limit gate reports blocked rather than pass if the injection flag is ever turned on.
The engine ids are:
cold_eval_original_runner: dynamically loads the copied source body and runs run_cold_eval in a temporary public workspace.
cold_eval_scorecard_shape_audit: verifies the all-B winner shape and records visible route-surface asymmetry without upgrading it into proof.
cold_eval_claim_ceiling_gate: checks expected-ref injection policy and forbidden benchmark/launch claims.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the route-quality scorecard result record and sign-off JSON. The bundle command validates copied source source, source manifests, metadata-only cards, expected-ref injection policy, and private-ref negative cases. The focused test covers missing tasks, flat-route wins, expected-ref injection, private fixture refs, and the no-benchmark/no-launch scope limit.
This result record path is reader-verifiable evidence only. It does not establish live benchmark results, navigation truth, hosted readiness, launch-scope decision, external model access, source-file changes, or whole-system correctness.
Scope boundary
Scope limit
This module may claim public fixture evidence that the copied cold_eval.py source body executed over the synthetic workspace, the expected scorecard shape was recomputed, expected-ref injection was refused, non-public refs were excluded, negative fixtures were checked, metadata-only cards were emitted, and validation result records enforced the listed scope limit.
This module may not claim live benchmark results, navigation truth, hosted readiness, route-quality superiority, external model access, deployment posture, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.
Scope limit
Fixture-bound route-quality scorecard and copied source refs only; no live benchmark, navigation truth, hosted readiness, launch-scope decision, external model access, source-file changes, or whole-system correctness.
Set 10 Live Source Drift BundleSet 10 Live Source Drift Bundle validates copied current internal control source bodies without route, ledger-mutation, or launch-scope decision.
Set 10 Live Source Drift Bundle imports exact current internal control Python bodies after source source drift. It validates stale-versus-current digest rows, copied-source compileability without import execution, required behavioral anchors, source-manifest boundaries, negative cases, and scope limits while excluding route authority, work log or work log mutation authority, mission execution, git approval, source-file changes, launch, public sharing, external model access, and non-public runtime export.
Scope limit Fixture-bound source-digest, anchor, compile, and scope limit evidence only; no route authority, Work or work log mutation, mission execution, git approval, source-file changes, launch, public sharing, external model access, or non-public runtime export.
batch10_live_source_drift_capsule answers one narrow question: can Microcosm prove that selected internal control source copies match the current source source bytes, still compile without import execution, and still carry the scope limit that prevents copied code from becoming route or mutation authority?
The component imports exact current Python source bodies for four source internal control files:
The bundle exists because the source source moved ahead of older public source-module records. The interesting part is that it keeps the old, wrong digest visible on purpose. Each digest row carries three fingerprints that must agree before a copy passes: the copied public body, the manifest target it claims to match, and the current source source. In the same row it keeps the stale recorded digest and asserts it differs from the current one, so the proof of freshness and the evidence of the earlier drift sit side by side.
That makes the component a drift sentinel rather than a one-off check. It is built to go red when the public copies fall behind the source source again, and a red result is the signal to refresh the copies through the exact-copy source lane, not a defect in the page. Two cheap checks back the freshness claim without running anything dangerous: the copied Python is compiled but never imported, so a malformed body is caught without executing source code, and a small set of named anchors is matched in each body so a copy that compiles but has quietly lost a command or contract surface is still flagged.
Shape
Diagram source
flowchart TD A["Probe manifest stale + current digests"] --> C B["Copied internal control bodies and source manifest"] --> C C["Digest refresh matrix copied = target = current, stale differs from current"] --> F B --> D["Compile gate py_compile, no import"] B --> E["Anchor matrix named command and contract surfaces present"] D --> F E --> F F["Scope limit gate import is not route or mutation authority"] --> G["metadata-only result record and card"] C -. mismatch .-> H["Blocked: refresh copies via exact-copy source lane"]
Prior Art Grounding
The component borrows from reproducible-build and supply-chain provenance practice: declared source inputs are fingerprinted, generated or copied artifacts are checked against those fingerprints, and result records avoid shipping unnecessary private state. Useful anchors include:
Bazel hermeticity, especially the emphasis on source identity, declared inputs, and repeatable outputs.
SLSA provenance, which records how software artifacts relate to build inputs and supply-chain guarantees.
Microcosm applies that pattern to live source-copy drift: stale digest rows remain visible as regression fixtures, current public copies must match source digests byte-for-byte, and result records carry digest/anchor/negative-case evidence instead of private source bodies or runtime state.
Reader Evidence Routing
The copied bodies are real system, not result record-only metadata. The evidence route is still metadata-only at result record time: result records keep digest rows, required anchors, negative-case outcomes, compile status, and scope limit evidence.
The engine ids are:
live_source_drift_digest_refresh_matrix: compares stale recorded digests, current source digests, copied target digests, and target digest status.
copied_python_source_compile_gate: compiles each copied Python target without importing or executing it.
control_surface_anchor_matrix: checks that each copied body still exposes expected command, route, landing, claim, or read-result record anchors.
claim_ceiling_gate: verifies the copied-body import excludes live route decisions, work log mutation, work log mutation, mission execution, git staging, source-file changes, launch, public sharing, external model access, or non-public runtime export.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
This component is also a drift sentinel. The fixture command writes digest-refresh, compile-gate, anchor-matrix, and scope limit result records, and it is allowed to exit blocked when copied public source no longer matches current source source. That blocked result is evidence for the exact-copy/source-refresh owner lane, not a paper-module corpus defect. Re-entry after a blocked result is to refresh the copied public source bodies and manifest digests through the source-open exact-copy lane, then rerun the fixture, bundle, and focused test.
The bundle command validates current copied source digests, source manifests, compile-without-import checks, stale-digest negative cases, metadata-only cards, and scope limit fields when the exact-copy refresh is current. The focused pytest command is therefore a green-gate after refresh: if the sentinel is blocked, that test file is expected to fail on pass-status or exact-body equality and should be reported with the same exact-copy refresh residual. When current, the focused test covers stale digest replay, compile bypass, private runtime state export, and live mutation-authority claims.
This result record path is reader-verifiable evidence only. It does not provide route authority, Work or work log mutation authority, mission execution, git approval, source-file changes, launch, public sharing, external model access, or non-public runtime export.
Scope boundary
Scope limit
Fixture-bound source-digest, anchor, compile, and scope limit evidence only; no route authority, Work or work log mutation, mission execution, git approval, source-file changes, launch, public sharing, external model access, or non-public runtime export.
Scope limit
This module supports only the reader-verifiable claim that selected internal control source copies can be compared with current source digests, compiled without import execution, checked for required anchors, and guarded by stale-digest and scope limit negative cases when the exact-copy lane is current. A green result does not grant route authority, Work or work log mutation authority, mission execution, git approval, source-file changes, launch-scope decision, publishing-scope decision, external model access, non-public runtime export, or whole-system correctness.
Set 7 Source Engines BundleSet 7 Source Engines Bundle imports source engine bodies and exercises trace, graph, scheduling, source-index, patch, numeric, rank, and regression-selection invariants.
Set 7 Source Engines Bundle binds the accepted batch7_macro_engines_capsule component to its public source-open bundle. It checks copied trace parser, codemap layout, DAG scheduling, launch-root, source-surgeon, clean-clone, calculator, PageRank, and regression-selection witnesses, negative cases, digest boundaries, and scope limits while excluding launch-scope decision, private-system equivalence, semantic truth, investment-related actions, sandbox completeness, and selected-test sufficiency proof.
Scope limit Fixture-bound public source-body import and deterministic exercise evidence only; no launch-scope decision, private-system equivalence, semantic truth, investment-related actions, complete sandbox proof, selected-test sufficiency proof, external model access, or source-file changes.
batch7_macro_engines_capsule imports the Set-7 source engines as source bodies and runs focused exercises around them. It is a real-system bundle: source copies, original JS/TS witnesses, deterministic Python exercises, negative cases, digest checks, and fenced claims.
What It Makes Visible
tools/agent_trace_structurer/parser.mjs as a trace-IR/edit-claim witness with node --test parser.test.mjs.
system/server/ui/src/lib/codemap/ as a code-map layout witness with Vitest.
DAG wave scheduling, source indexing, patch context validation, network blocking, robust numeric center/scale, PageRank mass preservation, and never-empty regression-test selection.
What Each Exercise Proves
Each engine has a single deterministic check with a known answer, plus a paired negative case that must keep failing. The exercises are concrete:
Trace IR parser (agent_trace_ir_compiler). Runs node --test parser.test.mjs against the copied parser. The paired negative case is a commit claim with no diff evidence, which the parser's own test rejects, so a pass means the edit-claim gate is intact rather than merely that the file copied.
Code-map layout (codemap_orbit_layout). Runs the Vitest suite for the layout module and, in process, places five nodes on an orbit and measures every pair distance. The pass condition requires zero circle overlaps, so the layout proves geometric non-collision, not route meaning.
DAG scheduler (constitutional_dag_kernel). Calls compute_waves on a six-node graph and checks the schedule is exactly [["a","f"], ["b","c"], ["d"], ["e"]]. A two-node cycle must raise, and an impure config path must be flagged, so the kernel proves wave ordering and cycle rejection together.
launch-root index (release_root_compiler). Parses the copied module's AST and confirms the expected report-building functions exist and that a missing-reference count is reported. This is source indexing, not launch-scope decision.
Source surgeon (source_surgeon_patch). Applies a one-line unified diff and checks the result is exactly a = 'B'. A diff whose context does not match must raise, and broken Python must fail to parse, so the engine proves patch-context and syntax validation, not semantic correctness.
Clean clone (hermetic_clean_clone). Temporarily replaces the socket factory and confirms an outbound connection raises a network-disabled error. It proves a hermetic baseline, not complete sandboxing.
Robust calculator (calculator_standard_actor). Feeds [1, 2, 3, 4, 5, 100] to the robust centre/scale routine. The robust centre stays at 3.5 while the naive mean is dragged above 19, so the outlier is resisted. It is a numeric primitive, not market data or investment-related actions.
PageRank ranker (personalized_pagerank_ranker). Ranks a four-node graph and checks the score mass sums to 1.0; an unknown source node must return an empty map. It proves the rank invariant and missing-source refusal, not semantic understanding.
Regression selection (regression_test_selection). Confirms the impacted- test selector never returns an empty set: an empty selection must fall back to a non-empty bundle. It proves the never-empty contract, not that the selected tests are sufficient.
When the input is the exported source-open bundle rather than the live fixture, the same nine engine rows are gated on the copied source manifest instead: every expected digest must match and every required anchor must be present before any row passes. The exercises stay metadata-only throughout; result records carry status, counts, digests, and refs, never the copied source or command output.
Prior Art Grounding
The component is grounded in trace instrumentation, graph analysis, and regression selection practice: parse execution traces into structured spans, project code or route graphs into navigable layouts, preserve graph-rank invariants, and choose focused tests without claiming sufficiency. Relevant anchors include:
OpenTelemetry, especially traces/spans as a vendor-neutral model for representing units of work and their relationships.
D3 force layouts, a common graph layout pattern for visualizing networks and hierarchies.
NetworkX PageRank, which documents the PageRank family for graph-link analysis.
Microcosm borrows the structured-trace, graph-layout, and invariant-checking shape across its mixed Set-7 engines. The bundle remains a bundle of focused source witnesses and deterministic exercises; it is not a complete sandbox, semantic truth engine, or proof that selected tests are sufficient.
Source Body Imports
The source-module manifest at examples/batch7_macro_engines_capsule/exported_batch7_macro_engines_capsule_bundle/source_module_manifest.json lists the exact copied source bodies and required anchors. Result records store digests and counts, not source bodies.
Purpose
This module is the reader-facing instrument for the accepted batch7_macro_engines_capsule component. Its source authority is the JSON source record in core/paper_module_capsules.json; this Markdown explains the proof boundary for a cold reader and points back to the runtime component, copied source manifest, and focused tests.
The component answers one narrow question: do nine unrelated source engines, copied out of the larger system as source, still behave the way their own tests and invariants say they should? Rather than describe them in prose, the bundle runs each one. A trace-IR parser is checked by its own Node test runner; a code-map layout is checked by its Vitest suite; a dependency-graph scheduler, a robust numeric scorer, a PageRank ranker, a patch applier, a network-isolation guard, an AST source index, and a regression-test selector are each driven through a small deterministic exercise with a known correct answer.
What is worth noting is the mix. Most validators in this set check one shape of evidence. This one deliberately binds several kinds under a single fixture and a single scope limit: an external JavaScript test process, an external TypeScript test process, in-process Python function calls, and static AST reads. The point is not that any one engine is impressive in isolation. It is that nine engines with quite different runtimes can be exercised together, each with a concrete pass condition, while every exercise stays below launch, semantic-truth, and source-file changes.
The failure mode this guards against is the comfortable assumption that copied code still works. A source body can be copied faithfully, pass a digest check, and still be broken or subtly different from the original. The bundle refuses to treat a digest match as behaviour: each engine has to produce the expected output, and each negative case has to keep failing, before the row is allowed to pass.
Shape
Source refs
Nine engine rows
source_open_manifest_verified
Diagram source
flowchart LR input["Input dir"] mode{"Live fixture or exported bundle?"} subgraph Live["Live fixture: run each engine"] trace["Trace IR parser node --test"] codemap["Code-map layout Vitest + orbit non-overlap"] dag["DAG scheduler waves + cycle reject"] rest["launch index, source surgeon, clean clone, calculator, PageRank, regression selection"] end subgraph Bundle["Exported bundle: gate on manifest"] manifest["Source manifest: digests match + anchors present"] rows["Nine engine rows source_open_manifest_verified"] end neg["Negative cases must keep failing"] result["metadata-only result status, counts, digests"] ceiling["scope limit no launch, no semantic truth, no source-file changes"] input --> mode mode -->|live| trace mode -->|live| codemap mode -->|live| dag mode -->|live| rest mode -->|bundle| manifest manifest --> rows trace --> neg codemap --> neg dag --> neg rest --> neg rows --> result neg --> result result --> ceiling
Reader Evidence Routing
Start from the component source when checking behavior:
EXPECTED_NEGATIVE_CASES names the rejected cases.
AUTHORITY_CEILING names the forbidden claims.
_source_open_bundle_exercises and _evaluate assemble the accepted public witness set.
run_batch7_bundle and result_card expose the reproducible command and metadata-only summary.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the Set-7 source-engine result record and sign-off JSON. The exported-bundle command validates copied trace, codemap, DAG, source-rank, and regression-selection witnesses without emitting private bodies. The focused test covers the runtime component, exported bundle shape, exact-copy source imports, negative cases, card body omission, and numeric dependencies. The corpus and projection checks prove only that the generated paper-module instance remains fresh for this bundle-backed Markdown state.
This result record path is public fixture evidence only. It does not establish semantic truth, selected-test sufficiency, sandbox completeness, private-system equivalence, launch-scope decision, external model access, source-file changes, or whole-system correctness.
Scope boundary
Scope limit
This bundle is not launch-scope decision, hosted-public authority, semantic truth, investment-related actions, a complete sandbox, or proof that selected tests are sufficient. It excludes raw operator transcripts, provider/browser state, wallet/account state, account secrets, and live market fetches.
Scope limit
The module can support only fixture-bound public source-body import evidence and deterministic exercise result records. It cannot authorize external model access, source-file changes, launch, public sharing, investment-related actions, private-system equivalence, or whole-system correctness.
Set 9 Source Engines BundleSet 9 Source Engines Bundle imports backend and governance source-engine bodies and exercises provenance, approval, AST, finance-news, mission graph, dependency, config, edge, WorkAtlas, host-pressure, doctrine-enrichment, worker-budget, and milestone-quality invariants.
Set 9 Source Engines Bundle binds the accepted batch9_macro_engines_capsule component to its public source-open bundle. It checks thirteen copied source engines for provenance lineage, approval adjudication, Python AST indexing, finance headline clustering, mission graph compilation, dependency pin drift, config authority audit, heterogeneous edge extraction, WorkAtlas aggregation, host-pressure admission, doctrine enrichment, worker job budget gating, and milestone-relative quality accounting while excluding live lineage truth, approval authority, market/news truth, host-state truth, work log truth, external model access, source-file changes, public sharing, and launch-scope decision.
Scope limit Fixture-bound public source-body import and deterministic exercise evidence only; no live lineage truth, human approval authority, real market/news truth, host-state truth, work log truth, external model access, source-file changes, public sharing, launch-scope decision, or private-system equivalence.
Copying a file into a public bundle proves only that the bytes match. It does not establish that the imported logic still behaves the way it did in the larger system it came from. This component exists to close that gap for thirteen backend, governance, and frontend data-shaping modules. The single question it answers is: do these copied source bodies still compute what they claim to compute, when run against bounded fixtures, here in the public repository?
The unusual part is how it checks. Rather than asserting against pre-baked result files, the component loads each copied module and calls its real functions. It imports system/lib/approval_registry.py and runs decide_approval against a temporary approvals tree to confirm a pre-acquired claim is refused. It imports system/lib/python_documentation_tree.py and runs build_file_entry over written-out Python to read symbols back. It runs the copied mission-graph compiler, the dependency-pin parser, the config-authority registry validator, the host-pressure admission builder, the worker budget guard, and the milestone metric computer, each on its own fixture. The three TypeScript bodies for finance clustering, edge extraction, and WorkAtlas aggregation are parsed for their load-bearing constants and branches, then mirrored deterministically. Each exercise carries both a positive shape and a paired negative case, so the proof moves with source behaviour, not with a static result record.
The reader should treat the result as fixture-bound evidence and nothing more. A passing bundle shows that representative mechanics still match the imported bodies under positive and negative cases. It does not assert live lineage truth, approval authority, real market or news truth, host-state truth, work log truth, external model access, source-file changes, public sharing, or launch-scope decision.
Abstract
Set 9 Source Engines Bundle is a public Microcosm paper module for a source-open, body-import-backed component. The component copies thirteen source source bodies into examples/batch9_macro_engines_capsule/exported_batch9_macro_engines_capsule_bundle/source_modules/, checks their digests and required anchors, then runs deterministic public exercises over fixture data. The result is a reproducible evidence bundle for backend, governance, frontend data-shaping, worker-gate, and quality-accounting mechanics without granting live system authority.
The useful claim is narrow: the copied bodies and public fixtures can show that representative mechanics still behave like the imported source bodies under bounded positive and negative cases. They do not prove live lineage truth, approval authority, market or news truth, host-state truth, work log truth, external model access, source-file changes, public sharing, launch-scope decision, private-system equivalence, or whole-system correctness.
Telos
This module exists to make the Set-9 import legible as technical evidence rather than as generic public copy. A cold reader should be able to answer four questions:
Which source bodies were copied, and how are they checked?
Which mechanisms are exercised, and which ones are source-body-sensitive?
Which result records prove only fixture truth, and which claims remain forbidden?
How does this component relate to the Microcosm concept/mechanism/principle lattice?
Mechanism Map
Source refs
13 copied source bodies
source_module_manifest.json
run / run_batch9_bundle
batch9_macro_engines_capsule.py
Diagram source
flowchart TD manifest["source_module_manifest.json 13 copied source bodies"] fixtures["first_wave fixture input probe manifest + 13 negative cases"] runtime["batch9_macro_engines_capsule.py run / run_batch9_bundle"] digest["Digest + anchor check copied bytes match source, required anchors present"] exercise["Re-execute imported logic _run_all_exercises"] py["10 Python bodies importlib load, call real functions (lineage, approval, AST, mission graph, pin drift, config, host pressure, doctrine, worker gate, milestone)"] ts["3 TS-backed bodies parse constants/branches, mirror (finance, WorkAtlas, edge extractor)"] pos["Positive case expected shape"] neg["Negative case e.g. self-loop pruned, preacquired claim refused, forbidden surface blocked"] result records["metadata-only result records result, board, validation, sign-off; body_in_receipt false"] ceiling["Scope limit fixture evidence only"] manifest --> runtime fixtures --> runtime runtime --> digest runtime --> exercise exercise --> py exercise --> ts py --> pos py --> neg ts --> pos ts --> neg digest --> result records pos --> result records neg --> result records result records --> ceiling
The runtime source is src/microcosm_core/organs/batch9_macro_engines_capsule.py. Its load-bearing symbols are EXPECTED_MECHANISMS, EXPECTED_MODULE_IDS, EXPECTED_NEGATIVE_CASES, SOURCE_REQUIRED_ANCHORS, AUTHORITY_CEILING, run, run_batch9_bundle, and result_card.
Set-9 Pipeline
The Set-9 pipeline has four stages.
Source import.source_module_manifest.json declares thirteen copied source bodies, each with source_ref, copied target path, digest equality fields, line and byte counts, material class, and required anchors. The manifest states source_import_class: copied_non_secret_macro_body, body_copied_material_count: 13, and body_in_receipt: false.
Fixture execution.run consumes fixtures/first_wave/batch9_macro_engines_capsule/input, including batch9_macro_engines_capsule_probe_manifest.json plus thirteen negative-case files. It writes the result, board, validation result record, and optional sign-off JSON.
Exported-bundle validation.run_batch9_bundle validates examples/batch9_macro_engines_capsule/exported_batch9_macro_engines_capsule_bundle. The bundle manifest names exported_batch9_macro_engines_capsule_bundle as the input mode, points at source_module_manifest.json, and declares thirteen negative cases.
Result record and ceiling. The public result records may expose refs, digests, anchors, counts, verdicts, negative-case outcomes, and omission evidence. They must not inline copied source bodies or private/live payloads.
Mechanism Set
Mechanism id
Imported source body
What the public exercise checks
lineage_temporal_provenance_chain_resolver
system/server/lineage.py
Parent/truth lineage chain behavior and self-loop pruning.
approval_sign_off_claim_adjudicator
system/lib/approval_registry.py
Approval decision shape and claim-conflict enforcement.
python_ast_symbol_index_doc_tree
system/lib/python_documentation_tree.py
Python AST symbol extraction, including async/function/class coverage.
finance_news_dedup_cluster_ranker
system/server/ui/src/lib/financePresentation.ts
Headline fingerprinting, stopword behavior, and duplicate clustering.
mission_graph_topological_compiler
system/server/graph.py
DAG compilation, group closure, upstream dependency walk, and missing-target handling.
dependency_pin_drift_auditor
tools/dev/check_pin_drift.py
Requirement parsing and drift/missing/unparseable classification.
config_authority_drift_audit
system/lib/config_authority_registry.py
Config authority registry validation and mutation-allowed rejection.
heterogeneous_graph_edge_extractor
system/server/ui/src/pages/RootNavigator.tsx
Generic edge-field map extraction and relation normalization.
Cell aggregation and the unrouted-only route-reason histogram gate.
host_pressure_admission_decision_gate
system/lib/admission_consumer.py
Admission normalization and summary-first blocking behavior.
doctrine_file_enrichment_multihop_join
system/server/doctrine_enrichment.py
File-to-doctrine enrichment join and empty-envelope detection.
worker_job_budget_forbidden_surface_gate
system/lib/type_a_worker_harness.py
Provider budget and forbidden-surface pre-dispatch gates.
milestone_relative_promotion_quality_accounting
system/lib/population_lane_metrics.py
Milestone-relative promotion metrics and blocker-to-next-action classification.
Several tests deliberately mutate copied source bodies in a temporary public bundle and refresh the manifest digest. Finance, lineage, approval, AST, mission graph, dependency, config, WorkAtlas, heterogeneous edge, doctrine, worker-gate, host-pressure, and milestone tests prove the exercise result moves with source-body behavior rather than with static result record fixtures alone. Two tamper modes are load-bearing: an unapproved copied-body edit without a manifest digest refresh fails CROWN_JEWEL_SOURCE_DIGEST_MISMATCH, while a body edit with a refreshed digest is only accepted when the required witnesses and semantic exercise still pass. Removing a required witness while refreshing the digest still fails CROWN_JEWEL_SOURCE_ANCHOR_MISSING. The fixture path also resolves through the copied source-module manifest, so a fixture-only or static result record replacement is outside the accepted proof shape.
Copied-Body and Import Authority
The source-module manifest is the body-import authority for this paper module. It proves that the public bundle contains copied bodies and that the runtime can compare copied target digests with expected source digests and required anchors. It does not make the Markdown source authority.
The authority chain is:
core/paper_module_capsules.json::paper_modules[73:paper_module.batch9_macro_engines_capsule] is the paper-module bundle source row.
paper_modules/batch9_macro_engines_capsule.json is the governed generated instance derived from that bundle.
organs/batch9_macro_engines_capsule.json and mechanisms/mechanism.batch9_macro_engines_capsule.validates_public_macro_engines_capsule.json bind the accepted component and mechanism to the runtime, result records, and scope limit.
standards/std_microcosm_batch9_macro_engines_capsule.json defines the public standard: exactly thirteen mechanisms, exactly thirteen copied source source modules, metadata-only result records, and forbidden live-authority claims.
Current Partial-Realness Limitations
Set 9 is real system progress because it copies source bodies and verifies source-sensitive behavior in public fixtures. It is still partial-realness, not live authority.
The lineage exercise is a public provenance specimen, not live lineage truth.
The approval exercise checks adjudication mechanics, not human approval authority.
The finance exercise checks headline clustering over synthetic rows, not real market-level conclusions, investment-related actions, or news-truth authority.
The host-pressure exercise checks admission-consumer behavior over quoted fixtures, not host-state truth.
The WorkAtlas, worker-gate, and milestone exercises validate bounded mechanics, not live work log authority or external model access readiness.
The generated Markdown/JSON/site projections remain navigation and reader surfaces; source authority stays in JSON contracts, source manifests, tests, and result records.
Failure Modes
The standard and tests protect against these failure modes:
Mechanism count drifts away from thirteen.
Source-module count drifts away from thirteen without manifest and test updates.
The source manifest stops declaring copied_non_secret_macro_body.
A copied source body changes without a matching manifest digest update.
A copied source body loses required anchors, even if the manifest digest is refreshed.
Runtime exercises stop checking named engine semantics and become result record-only assertions.
Negative-case files declare error codes that the semantic evaluator does not actually observe.
Result records include copied body text, raw operator transcripts, provider/browser state, account secrets, live market data, private runtime state, or source bodies.
Public prose expands fixture evidence into launch, public sharing, provider, source-file changes, live-system, or private-system-equivalence authority.
Evidence Contract
Run these commands from microcosm-substrate/:
The fixture command proves the public fixture path. The bundle command proves the exported bundle path. The focused test suite covers exact-copy source imports, source-sensitive behavior shifts, copied-body digest mismatch blocking, source-import-class perturbation, required-witness removal with a refreshed digest, semantic negative cases, bundle validation, and metadata-only command cards. The doctrine projection checks prove only that the bundle-backed generated instance remains fresh for the current corpus. Rank saturation, rerank, and projection inheritance remain downstream routing work; this paper module does not apply or claim those projection mutations.
Reader Evidence Routing
Use this order when auditing the module:
Read standards/std_microcosm_batch9_macro_engines_capsule.json for the governing standard and scope boundaries.
Read src/microcosm_core/organs/batch9_macro_engines_capsule.py for expected mechanisms, expected modules, required source anchors, negative-case semantics, and scope limit.
Read examples/batch9_macro_engines_capsule/exported_batch9_macro_engines_capsule_bundle/source_module_manifest.json for copied-body authority.
Run the fixture and bundle validators, then the focused tests.
Treat result records as metadata-only evidence summaries, not as copied body storage or live-system proof.
Prior Art Grounding
This bundle imports copied source engine bodies and exercises them over fixtures. It follows the characterization, or golden-master, testing tradition (Feathers, Working Effectively with Legacy Code), which pins existing behaviour with deterministic fixtures before trusting it. Microcosm borrows the pin-then-exercise shape; the result is fixture-bound import evidence, not lineage truth, human-approval authority, or market-level conclusions.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
These are reader-verifiable evidence only and do not include launch operations, external model access, source-file changes, or whole-system correctness.
Scope boundary
Scope limit
This module may claim fixture-bound evidence that the component ran over public synthetic inputs and produced the result records and projections described above, reproduced by the validation result records named on this page.
It may not claim more than its bundle scope limit allows: Fixture-bound public source-body import and deterministic exercise evidence only; no live lineage truth, human approval authority, real market/news truth, host-state truth, work log truth, external model access, source-file changes, public sharing, launch-scope decision, or private-system equivalence.
Pattern AssimilationPattern Assimilation validates public completion-learning metadata, owner-routed refinement result records, typed nothing-to-refine decisions, and copied body-import manifests without promoting local lessons into global doctrine.
Pattern Assimilation binds the accepted pattern_assimilation_step component to the public sign-off validator, first-wave fixture, exported assimilation bundle, source-module manifest, standard row, and metadata-only result records. It checks same-lane completion decisions, owner-surface refinement evidence, stewardship and re-entry fields, duplicate result record rejection, local-lesson scope limits, source note exclusion, copied body imports, and public-relative result record paths while excluding live ledger mutation, source notes ingestion, model-output data, global doctrine changes, launch, public sharing, behavior-change proof, and private-data equivalence.
Scope limit Public fixture metadata, exported assimilation bundle metadata, copied body-import digest evidence, and metadata-only result records only; no live work log or work log mutation, source note ingestion, global doctrine changes, launch or publishing-scope decision, external model access, behavior-change proof, private-data equivalence, or whole-system correctness.
pattern_assimilation_step is the public completion-learning contract for landed components. It validates that every component recorded as landed in a fixture set carries exactly one same-lane completion decision, and that the decision resolves to a result record that can be inspected rather than to a phrase.
Purpose
When a development pass claims that local work taught the system something, that claim is usually prose: a note that the run "improved the fixture" or "found nothing to refine". Prose is easy to assert and impossible to check. This component answers a single question: did each landed component actually deposit an inspectable completion decision, or is the learning claim unbacked?
The decision is forced into one of two typed shapes. Either a concrete refinement result record that names the owner surface it changed and the artifact it touched, or a typed nothing_to_refine result record that proves stewardship was checked, the next-best lane was considered, and a re-entry condition was recorded. A landed component with no completion, or with a completion that points at a result record that does not exist or does not match, is rejected. So is a duplicate result record id that would let one lesson be counted twice.
The interesting constraint is the one the component refuses to relax. A local lesson may route to the owner surface that owns the affected artifact, but it may not promote itself into global doctrine. A refinement row that sets claims_global_doctrine_authority is blocked outright. The point is that learning has to land on a specific board with a named steward, not become a free-floating rule, which is the failure mode that turns a useful local note into unsupported general advice.
Primary result records: receipts/first_wave/pattern_assimilation_acceptance.json, receipts/first_wave/pattern_assimilation_receipt.json, and receipts/first_wave/pattern_assimilation_step/exported_assimilation_bundle_validation_result.json
Projection posture: the JSON bundle is the paper-module source authority. This Markdown is the cold-reader explanation.
Shape
Source refs
Landed component rows each names a completion result and result record ref
organ_landing_summaries.jsonl
Validator
acceptance.pyvalidate_pattern_assimilation
metadata-only result records
receipts/first_wave/pattern_assimilation_*
Diagram source
flowchart TD landings["Landed component rows organ_landing_summaries.jsonl each names a completion result and result record ref"] refinement["Refinement result records owner_surface, changed artifact"] nothing["Nothing-to-refine result records stewardship, next-best lane, re-entry"] validator["sign-off.py validate_pattern_assimilation"] filter["Pre-filter valid result records refinement: named owner, no doctrine upgrade nothing: all three fields present"] match{"Per landed component: exactly one completion, ref resolves to a matching row?"} pass["Accepted typed, owner-routed completion learning"] negatives["Negative cases recorded MISSING_PATTERN_ASSIMILATION_Completion MISSING_REFINEMENT_OWNER_SURFACE DUPLICATE_REFINEMENT_RECEIPT_ID LOCAL_LESSON_AUTHORITY_UPGRADE RAW_SEED_BODY_IN_ASSIMILATION_FIXTURE"] result records["metadata-only result records result records/first_wave/pattern_assimilation_*"] ceiling["Scope limit public fixture metadata, no doctrine changes"] landings --> match refinement --> filter nothing --> filter filter --> match validator --> filter match -->|resolved| pass match -->|missing, dangling, duplicate, upgraded| negatives pass --> result records negatives --> result records result records --> ceiling
The bundle is present, so the cold-reader path starts from core/paper_module_capsules.json::paper_module.pattern_assimilation, not from a legacy-only boundary. That bundle binds this Markdown to the accepted pattern_assimilation_step component, the sign-off.py validator locus, the standard, first-wave fixture manifest, exported assimilation bundle, focused tests, metadata-only result records, and generated Mermaid/Atlas navigation status.
Read the diagram as the validation flow, not an authority upgrade. The validator pre-filters the refinement and nothing-to-refine result records, then walks each landed component row and checks that its declared completion resolves to a matching valid result record; unresolved, missing, duplicate, or doctrine-upgraded rows become recorded negative cases. The ceiling remains public fixture and exported-bundle metadata plus metadata-only result records, with no live ledger mutation, source-file changes, source note ingestion, private-system equivalence, global doctrine changes, launch or publishing-scope decision, behavior-change proof, or whole-system correctness.
First Command
From microcosm-substrate:
Use the exported bundle validator when the question is whether the public source-open body imports still match their declared source bodies:
What It Proves
Pattern assimilation is the public completion-learning contract for landed components. It validates that each landed component in the fixture set has exactly one same-lane completion decision: either a concrete refinement result record naming the owner surface and changed artifact, or a typed nothing_to_refine result record with stewardship checked, next-best-lane checked, and a re-entry condition.
A cold agent should use this component when a pass claims that local work taught the system something. The validator makes that claim inspectable: it checks owner-surface evidence, duplicate result record ids, off-lane completions, missing completion decisions, residual lifecycle posture, and attempts to promote a local lesson into global doctrine authority without the governing lane.
Bundle-Bound Reader Shape
The paper-module bundle binds this Markdown to two explained subjects: pattern_assimilation_step and mechanism.pattern_assimilation_step.validates_public_pattern_assimilation_step. It also carries the route-contract concept concept.architecture_and_navigation_route_contract_bundle.
The executable locus is src/microcosm_core/validators/sign-off.py, specifically validate_pattern_assimilation, run_assimilation_bundle, validate_source_module_manifest, _write_jsonl_upsert, EXPECTED_NEGATIVE_CASES, PATTERN_ASSIMILATION_AUTHORITY_CEILING, and main.
Its law edges are bounded to the local completion-learning scope limit: P-1, P-2, P-3, P-5, P-6, P-7, P-8, P-9, P-12, P-13, P-15, AX-1, AX-4, AX-5, AX-6, AX-7, AX-8, AX-11, and AX-12. Its paper-module neighbors are cold_reader_route_map, pattern_binding_contract, and voice_to_doctrine_self_improvement_loop.
If the generated JSON instance disagrees with the bundle or validator source, the bundle and validator win; refresh the projection rather than editing it.
Source-Backed System
This component is more than a prose rule. The exported assimilation bundle imports four bodies by manifest:
macro_pattern_autonomy_process_contract_body_import from state/microcosm_portfolio/reconstruction/macro_pattern_autonomy_process_contract_v1.json
macro_pattern_assimilation_fixture_manifest_body_import from state/microcosm_portfolio/reconstruction/fixture_manifests/pattern_assimilation_step.fixture_manifest.json
pattern_assimilation_retracted_adapter_receipt_body_import from state/microcosm_portfolio/reconstruction/pattern_assimilation_step_real_substrate_adapter_receipt_v1.json
pattern_assimilation_acceptance_validator_source_body_import from src/microcosm_core/validators/sign-off.py
The manifest is examples/pattern_assimilation_step/exported_assimilation_bundle/source_module_manifest.json. It must keep body_in_receipt: false, exact source and target digests, required anchors, and validation refs. The copied validator body anchors validate_pattern_assimilation, run_assimilation_bundle, and PATTERN_ASSIMILATION_AUTHORITY_CEILING.
The first-wave result records must include public-relative paths, no private root paths, no copied body text, a redacted non-public-state scan with zero blocking hits, observed negative cases, error codes, scope limit, scope boundary, and the exact result record paths. The bundle result record must show source_module_manifest_status: pass, body_copied_material_count: 4, the four body-material ids above, body_in_receipt: false, body_text_in_receipt: false, and only public replacement refs.
Reader Evidence Routing
A cold reader should inspect the evidence in this order:
Open the JSON source record to confirm subject ids, dependency ids, principle and axiom refs, and code locus.
Run the focused sign-off test or fixture command to prove the completion learning shape still accepts valid fixture rows and rejects the required negative cases.
Run the exported bundle validator when source-module digest, anchor, copied body, or replacement posture is the question.
Treat generated JSON, Mermaid, Atlas, and coverage as projection evidence only; if they drift, refresh them through the doctrine-lattice builder.
Use the result record floor to check public-relative paths, metadata-only source verification, source note exclusion, and local-lesson scope limits.
Negative Cases
The current negative-case floor is:
MISSING_PATTERN_ASSIMILATION_CLOSEOUT for a landed component without a refinement or typed no-op completion.
MISSING_REFINEMENT_OWNER_SURFACE, MISSING_STEWARDSHIP_CHECK, and MISSING_REENTRY_CONDITION for refinement result records that cannot route the lesson to an owner surface and re-entry condition.
DUPLICATE_REFINEMENT_RECEIPT_ID for duplicate refinement result records.
LOCAL_LESSON_AUTHORITY_UPGRADE for local lessons that claim global doctrine authority.
RAW_SEED_BODY_IN_ASSIMILATION_FIXTURE for source notes or private source note bodies in the public fixture.
ASSIMILATION_BUNDLE_SOURCE_MODULE_INVALID for exported source-module digest or anchor mismatch.
These are not ornamental checks. If a run stops observing them, the module can no longer support the claim that Microcosm learns from landed work without turning local notes into unsupported global doctrine.
Prior Art Grounding
Pattern assimilation is grounded in software pattern-language practice: recurring engineering lessons should be named, bounded, reviewed, and connected to the context where they apply. The Hillside patterns library is the direct prior-art family for treating patterns as a shared engineering vocabulary rather than one-off notes.
The result record and trace shape also borrows from provenance and observability practice. W3C PROV informs the requirement that each refinement cite its owner surface and evidence relation, while OpenTelemetry traces are a useful analogue for linking spans of work into an inspectable causal chain. Microcosm uses those inspirations for completion learning only; a local lesson still needs the owning lane before it can become broader doctrine.
Validation Result record Path
From microcosm-substrate, keep validation result records outside tracked first-wave paths unless the owning result record lane intends to refresh them:
The fixture and bundle result records prove same-lane completion-learning shape over the public fixtures and copied body imports only; they do not promote a local lesson to global doctrine authority. Source-copy or result record drift is an owning validator/manifest lane issue, not Markdown source authority.
Use an isolated /tmp basetemp for focused pytest runs so result record scratch paths do not rewrite source-run rows inside the checkout.
Validation Anchors
Focused coverage lives in tests/test_pattern_assimilation_step.py and checks:
streamed JSONL loading and upsert behavior;
required negative-case observation;
public-relative redacted result records;
source result record field floors from the fixture manifest;
exported assimilation bundle runtime shape;
source-module digest mismatch rejection;
exported bundle result records;
exact copied source body imports.
Scope boundary
Scope limit
Pattern assimilation validates public completion-learning metadata plus regression fixtures. It does not ingest private lessons, read source note bodies, mutate live work log or work log state, promote global doctrine, include launch operations or public sharing, make external model access, claim private-data equivalence, prove behavior changes, or certify public runtime behavior.
Its useful claim is narrower: over the supplied fixtures and copied public body imports, the component shows that completion learning has a typed, same-lane, owner-routed shape and that invalid completion claims are rejected before they become doctrine.
Scope limit
This module may claim public completion-learning validation over the supplied fixtures and copied body-import manifests: same-lane completion decisions, owner-surface refinement evidence, typed nothing_to_refine result records, stewardship and re-entry fields, duplicate result record rejection, local-lesson scope limits, source note exclusion, public-relative result records, and metadata-only source-module verification.
It does not claim complete pattern coverage, private source-root equivalence, live work log or work log mutation, source note ingestion, external model access, global doctrine changes, behavior-change proof, launch or publishing-scope decision, or whole-system correctness. The generated diagram and atlas views are navigation surfaces; they do not upgrade local lessons into global doctrine.
Set 10 Governance And Compilers BundleSet 10 Governance And Compilers Bundle imports governance, compiler, launch, finance, dependency, DAG, table, reference, and recent-change source bodies as public source-open evidence without granting live ledger, public sharing, launch, market, or source-file changes.
Set 10 Governance And Compilers Bundle binds the accepted batch10_governance_compilers_capsule component to a refreshed source-open bundle. It exercises governed-mutation intent, observe/apply compilation, public-proof review, launch blocker triage, public sharing path contracts, result record reuse, no-lookahead horizons, session-wave execution, claim-conflict wait tax, role-aware DAG blocking, frontend table shaping, reference grouping, recent-change coalescing, and the deferred Set-9 lane-width repair while preserving copied source digests, negative cases, and scope limits.
Scope limit Fixture-bound public source-body import, source-faithful public refactor evidence, deterministic exercise evidence, and metadata-only result records only; no live work log truth, live work log truth, source-file changes, publishing-scope decision, launch-scope decision, external model access, market advice, private-system equivalence, neutral benchmark claim, or whole-system correctness.
This bundle answers one question: when the wider system claims that a governance gate, a compiler, or a launch check behaves correctly, can a cold reader confirm that claim from copied source and a re-run, rather than taking the claim on trust? It collects fourteen mechanisms that already exist in the main system, copies their source bodies into the public bundle, and re-runs a small, source-faithful port of each one against controlled inputs.
The mechanisms span the work they were drawn from: a mutation gate that reads the latest user message and blocks file writes when the intent is diagnostic; an observe/apply compiler that turns an artifact into an apply plan and refuses malformed input; a reviewer gauntlet that checks a public proof bundle from several reader personas; launch-blocker triage; a public sharing path-contract check; result record-reuse staleness; a no-lookahead finance horizon; a session dependency wave; claim-conflict detection; role-aware blocking in a task graph; and three frontend helpers for table shaping, reference grouping, and recent-change coalescing.
What is unusual is the stance towards its own fixtures. The negative-case files on disk hold only a label and an expected error code. The bundle does not treat that error code as proof of anything. For each negative case it recomputes the outcome itself, in code, and compares the computed result against the expectation. A fixture that merely declares the right error code, without the ported logic actually producing it, is flagged rather than passed. The point is to stop a test from grading itself green by assertion.
Exact-copy authority: the bundle source_module_manifest.json plus copied source modules; refresh through macro_projection_import_protocol, not by hand.
This Microcosm component imports and exercises Set-10 source system for governed mutation, observe/apply compilation, public-proof review, launch blocker triage, public sharing path contracts, result record reuse, no-lookahead horizons, session-wave execution, claim conflict wait tax, role-aware DAG blocking, frontend data shaping, reference grouping, and recent-change coalescing.
The bundle carries exact source source snapshots where safe. publication_manifest_selector_contract_verifier is represented as a source-faithful public refactor because the source source contains a private home-path example. weighted_lane_width_apportionment_solver is recorded as a binding repair deferred to the Set-9 RootNavigator body, not as a fresh Set-10 import.
Integrity hardening: negative-case fixture files are labels and stable-code rows only. The result record's exercise.integrity_matrix is the verdict surface: each Set-10 mechanism records source relation, positive computed output, negative input shape, negative computed output, scope limit, and whether the result was computed by the bundle evaluator. A fixture-supplied error_codes row is never enough to prove refusal behavior.
Shape
The source row is core/paper_module_capsules.json::paper_modules[75:paper_module.batch10_governance_compilers_capsule]; the generated instance is paper_modules/batch10_governance_compilers_capsule.json; and the runtime source locus is src/microcosm_core/organs/batch10_governance_compilers_capsule.py. The specific standard is standards/std_microcosm_batch10_governance_compilers_capsule.json, with Microcosm-wide coverage and entry boundaries governed by std_microcosm.
flowchart LR Bundle["JSON bundle source row core/paper_module_capsules.json paper_module.batch10_governance_compilers_capsule"] Instance["Generated JSON instance paper_modules/batch10_governance_compilers_capsule.json"] Markdown["Markdown reader projection paper_modules/batch10_governance_compilers_capsule.md"] Standard["Standards std_microcosm_batch10_governance_compilers_capsule std_microcosm"] Runtime["Runtime/source loci batch10_governance_compilers_capsule.py exercise 14 mechanism ports resolve source evidence per mechanism recompute each negative case flag fixture_verdict_echo_risk"] Fixtures["Fixtures and source bundle fixtures/first_wave/.../input (labels + expected codes) exported bundle: 13 copied source modules source_module_manifest.json"] Tests["Tests and result records tests/test_batch10_governance_compilers_capsule.py result records/runtime_shell/demo_project/components/batch10_governance_compilers_capsule"] Projections["Generated navigation projections Mermaid: available_from_capsule_edges Atlas: linked_from_capsule_edges"] Ceiling["Scope limit fixture-bound public source-open evidence only no live ledger truth, source-file changes, public sharing, launch, provider, private-system, benchmark, or market authority"] Bundle -->|seeds| Instance Bundle -->|bounds prose| Markdown Bundle -->|names laws and source authority| Standard Bundle -->|cites code locus| Runtime Runtime -->|computes integrity matrix and result records| Tests Fixtures -->|public inputs, exact copies, declared refactor| Runtime Fixtures -->|manifest and source bundle validate| Tests Instance -->|derives edges| Projections Projections -->|navigation only| Markdown Tests -->|result record evidence remains below| Ceiling Standard -->|enforces public/private and launch boundary| Ceiling Markdown -->|must not outrank| Bundle
The bundle makes the module actual by binding five reader questions to typed authority surfaces:
What is the source of record? The source record and generated JSON instance, not this Markdown file and not generated Mermaid or Atlas output.
What is being exercised? The accepted batch10_governance_compilers_capsule component, the mechanism.batch10_governance_compilers_capsule.validates_public_governance_compilers_capsule mechanism, and the concept.import_projection_and_drift_control_bundle concept edge named by the bundle.
Which runtime and source artifacts matter? The component module computes the integrity matrix, negative-case verdicts, source evidence, fixture run, bundle validation, result card, and AUTHORITY_CEILING; the exported bundle carries source_module_manifest.json, copied source modules, and the declared public refactor for the private-path-bearing public sharing manifest selector body.
Which result records and tests are binding? The focused test file verifies the fixture run, bundle validation, digest mismatch rejection, private-body omission, negative-case semantics, source-evidence classifications, source helper parity, and reviewer-gauntlet behavior; the result record directory under receipts/runtime_shell/demo_project/organs/batch10_governance_compilers_capsule holds the runtime shell validation result, board, and validation result record.
What is the honest ceiling? The module can claim fixture-bound public source-open import/refactor evidence, deterministic exercise evidence, integrity-matrix verdicts, metadata-only result records, and validation result records. It cannot claim live work log truth, live work log truth, source-file changes, publishing-scope decision, launch-scope decision, external model access, private root equivalence, neutral benchmark evidence, market advice, deployment posture, or whole-system correctness.
Bundle-Bound Reader Shape
The JSON bundle binds this paper module to one accepted subject: the batch10_governance_compilers_capsule component. The executable proof locus is src/microcosm_core/organs/batch10_governance_compilers_capsule.py, especially _build_integrity_matrix, _source_evidence, _evaluate, run, run_batch10_governance_compilers_bundle, result_card, EXPECTED_MECHANISMS, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
The bundle keeps the mechanism and concept layer intentionally narrow: it names the resolving governance/compiler mechanism subject and the concept.import_projection_and_drift_control_bundle concept, while additional concept or mechanism edges stay residual until resolving Microcosm rows exist. Its law edges are bounded to content-addressed reuse, provenance, freshness, and projection-below-source rules: P-2, P-5, P-9, P-15, AX-4, AX-8, AX-10, and AX-11. Its sibling paper-module dependencies are macro_projection_import_protocol, batch10_live_source_drift_capsule, and batch9_macro_engines_capsule.
If a projection disagrees with the bundle or refreshed source-open bundle, refresh the projection; do not edit generated output by hand.
How it works
The run takes a public input directory, validates the source-module manifest, and exercises each of the fourteen mechanisms against inputs the evaluator constructs itself. _build_integrity_matrix then writes one row per mechanism. Each row records the source evidence for that mechanism, the positive computed output, the attached negative cases with their computed outputs, the scope limit, and a current_action of keep, harden, or block.
Source evidence is resolved per mechanism by _source_evidence. A mechanism's named source reference is looked up in the manifest. If the body was copied exactly, the row carries the copy's digest status and anchor-match count. If the body could not be copied verbatim, the row instead names a declared source-faithful public refactor and records the original source digest. Two mechanisms are honest about not being plain copies. publication_manifest_selector_contract_verifier is a public refactor, because the source source carried a private home-path example that cannot ship. weighted_lane_width_apportionment_binding_repair is recorded as an under-bound repair deferred to the Set-9 RootNavigator body, so it is held as a block rather than presented as a fresh Set-10 import.
The negative cases are handled the same way. For each case, _compute_negative_case_probe runs the ported logic over the case's declared input and reads the result at a named path. For example, the mutation case feeds a diagnostic message and confirms prohibit_file_writes is true; the finance case feeds an unparseable horizon and confirms it is rejected; the public sharing case feeds a non-public paths against a hard-exclude rule and confirms it is caught. A row counts as proven only when the computed value matches the expectation. If any negative case lacks computed evidence, the summary raises fixture_verdict_echo_risk, and the run is blocked. The bundle also requires exactly thirteen copied source modules, so a thinned bundle fails rather than passes quietly.
Prior Art Grounding
The component is grounded in policy-as-code, admission-control, and supply-chain assurance patterns: compile rules into deterministic checks, reject unsupported actions before they mutate state, and preserve provenance for the decision. Relevant anchors include:
Open Policy Agent, which decouples policy decisions from enforcement and evaluates structured input against machine-readable rules.
Kubernetes validating admission policies, which can block, warn, or audit non-compliant API requests before admission.
SLSA and OpenSSF Scorecard, which represent the broader software-supply-chain pattern of typed assurance levels, checks, and provenance.
Microcosm borrows the compiler/gate shape for governed mutation, public sharing path contracts, blocker triage, result record reuse, and claim-conflict accounting. The bundle remains fixture-bound evidence over copied or refactored source system; it is not live work log truth, source-file changes, publishing-scope decision, or investment-related actions.
Reader Evidence Routing
A cold reader should inspect the evidence in this order:
Open the JSON source record to confirm source authority, subject ids, dependency ids, principle and axiom refs, code locus, Mermaid status, Atlas status, and the absence of unresolved selective relations.
Run the focused component test to prove the public fixture still computes the integrity matrix and observes the required negative cases.
Run the exported bundle validator when copied source digests, declared public refactors, metadata-only result records, or source-evidence rows are the question.
Treat generated JSON, Mermaid, Atlas, and coverage as projection evidence only; if they drift, refresh them through the doctrine-lattice builder.
Use the result record floor to verify source relations, positive and negative computed outputs, scope limits, and metadata-only result record payloads.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the governance/compiler integrity-matrix result record and sign-off JSON. The bundle command validates copied or source-faithful source system, source evidence, positive and negative exercise rows, metadata-only result records, and scope limit fields. The focused test verifies the mechanism matrix, negative floor, bundle validation, and scope limit.
This result record path is reader-verifiable evidence only. It does not establish live work log truth, live work log truth, source-file changes, publishing-scope decision, launch-scope decision, external model access, neutral benchmark evidence, private-system equivalence, or investment-related actions.
Scope boundary
Scope limit
This module may claim public fixture evidence that the copied or declared governance/compiler source system produced source-evidence rows, computed positive and negative exercise rows, integrity-matrix verdicts, metadata-only result records, and validation result records with explicit scope limits.
This module may not claim live work log truth, live work log truth, source-file changes, publishing-scope decision, launch-scope decision, external model access, neutral benchmark evidence, private-system equivalence, investment-related actions, deployment posture, or whole-system correctness.
Scope limit
This is not live work log truth, not live work log truth, not source-file changes, not public sharing or launch-scope decision, not external model access, not neutral benchmark evidence, not private-system equivalence, and not investment-related actions.
The useful claim is narrower: over the public fixtures and refreshed source-open bundle, the component shows that the Set-10 governance/compiler mechanisms have copied or declared source evidence, computed positive and negative exercise rows, and metadata-only result records with explicit scope limits.
Set 11 Saturation Engines BundleSet 11 Saturation Engines Bundle imports saturation, diagnostic, wayfinding, market-board, secret-scan, and demo-take source bodies as public source-open evidence without granting live runtime, launch, market, or navigation authority.
Set 11 Saturation Engines Bundle binds the accepted batch11_saturation_engines_capsule component to a refreshed source-open bundle. It exercises run-affinity scoring, calculator cluster insight derivation, std_python delta gating, exogenous navigation grading, portability supersession rollup, shard browse priority, holographic evidence selection, projection secret scanning, stockgrid flow normalization, source-regime board bucketing, frontend wayfinding, agent-session diagnostic lenses, and demo-take coverage auditing while preserving copied source digests, computed negative probes, and scope limits.
Scope limit Fixture-bound public source-body import, source-faithful public port evidence, computed negative-probe evidence, and metadata-only result records only; no live work log truth, navigation authority, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, source-file changes, publishing-scope decision, launch-scope decision, external model access, private-system equivalence, or whole-system correctness.
batch11_saturation_engines_capsule is a Microcosm component for the Set-11 saturation pass. It takes thirteen unrelated pieces of internal machinery, copies their source bodies into a public bundle, and re-runs each one against small synthetic fixtures so a reader can see the logic behave rather than take a claim on trust. The thirteen targets are deliberately mixed:
run affinity session scoring
calculator cluster insight derivation
std_python delta ratchet gating
exogenous navigation ladder grading
portability gate supersession rollup
shard browse context-priority sectioning
holographic research evidence selection
projection secret scanning
stockgrid flow multisource merge and unit normalization
source regime bucketing and z-score board construction
frontend navigation wayfinding
agent session diagnostic lenses
demo-take story coverage auditing
The single question the bundle answers is narrow: for each of these mechanisms, does the imported source actually compute the guard it claims to, on inputs designed to fail? It is a saturation pass because the targets share nothing except that pattern. They are a route ranker, a few financial-data normalisers, a navigation grader, a secret scanner, a graph wayfinder, and so on, swept up together so a reviewer can audit a broad slice of the codebase from one place.
The part worth noticing is how a negative case is treated. A fixture file named ..._stale_terminal_rejected is only a label. The bundle never lets that label stand in for a result. It re-runs the real function on the fixture's own probe_input, computes whether the guard fired, and refuses to mark the case verified unless the mechanism's own exercise and the independent probe both agree. A fixture that asserts a failure it cannot demonstrate is flagged, not counted. That guard against self-congratulating fixtures is the reason the page exists.
How it works
The run loop is the same for every target. The bundle first imports the copied source bundle and checks each module against the recorded source digest, line count, and a handful of required provenance anchors, so a drifted or partial copy is caught before any logic runs. It then exercises all thirteen mechanisms in a fixed order, and any reordering, blocked exercise, or missing module fails the run.
Each mechanism's exercise feeds an integrity matrix row. A row pairs the mechanism's own computed output with an independently computed fixture probe and a binding disposition that records how the mechanism relates to the rest of the system: a new import, an already-bound gate the bundle is only re-checking, or an under-bound path it is extending. The two computed values must agree. The matrix marks a row's negative result verified only when the mechanism exercise and the fixture probe both come out true, and it sets fixture_verdict_echo_risk on any row where they do not. A non-zero echo-risk count is a finding that blocks the whole run.
Two short examples show what the probes actually compute. For run affinity, the probe builds a recommendation over candidate runs and confirms that a stale terminal run, even when made sticky and feed-rich, is not the one selected. For projection secret scanning, the probe runs the redaction patterns over a file carrying a synthetic key shape and a private ledger path and confirms both are blocked. The fixtures are synthetic and the key shapes are deliberate test strings, never live material.
The failure mode all of this guards against is the quiet pass: a fixture whose filename promises a rejection while the code underneath was never exercised, or was exercised and did not reject. By recomputing the guard from the fixture's own input and refusing to count a label it cannot reproduce, the bundle keeps the negative cases honest. The result records carry refs, digests, counts, and the computed verdicts; the copied bodies stay in the bundle's source_modules tree and are never inlined.
Shape
This module's shape is a reader map over source-backed artifacts, not a new authority layer. The source record in core/paper_module_capsules.json is the source of record for subjects, code loci, doctrine refs, dependency edges, and projection status; paper_modules/batch11_saturation_engines_capsule.json is the governed JSON parity seed; this Markdown only narrates the proof boundary.
Source refs
paper_modules[76:paper_module.batch11_saturation_engines_bundle] source basis: source record
flowchart TD bundle["core/paper_module_capsules.json paper_modules[76:paper_module.batch11_saturation_engines_capsule] source basis: source record"] instance["paper_modules/batch11_saturation_engines_capsule.json governed JSON instance markdown: legacy_import_projection_until_roundtrip_builder"] standard["standards/std_microcosm_batch11_saturation_engines_capsule.json active public runtime standard boundary: not live navigation/ledger/market/secret authority"] runtime["src/microcosm_core/components/batch11_saturation_engines_capsule.py run, validate-bundle, result_card, scope_limit"] fixture["fixtures/first_wave/batch11_saturation_engines_capsule/input public mechanism and negative-case probes"] bundle["examples/batch11_saturation_engines_capsule/exported_batch11_saturation_engines_capsule_bundle source_module_manifest.json: 12 copied/refactored public source modules"] tests["tests/test_batch11_saturation_engines_capsule.py scripts/build_doctrine_projection.py --check-paper-module-corpus scripts/build_doctrine_projection.py --check"] result records["result records/first_wave/batch11_saturation_engines_capsule/* result records/sign-off/first_wave/batch11_saturation_engines_capsule_fixture_acceptance.json status: pass; accepted: true; body_in_receipt: false"] atlas["atlas/doctrine_lattice_graph.mmd and doctrine_lattice_projection.json Mermaid: available_from_capsule_edges Atlas: linked_from_capsule_edges"] ceiling["Scope limit fixture-bound source-body import, source-faithful public ports, computed negative probes, metadata-only result records only"] bundle -->|seeds subjects, dependencies, code locus, projection status| instance bundle -->|governed by| standard instance -->|cites resolved runtime/source locus| runtime standard -->|requires fixture and result record contract| fixture standard -->|requires copied/source-faithful public bundle| bundle runtime -->|exercises| fixture runtime -->|validates exact-copy/source-faithful evidence| bundle runtime -->|writes metadata-only result and validation result records| result records tests -->|checks runtime, bundle, corpus, projection freshness| result records instance -->|generated projection edge status| atlas result records -->|bounded evidence, not launch-scope decision| ceiling atlas -->|projection, source-linked only| ceiling
The public/private and launch boundary stays narrow: the fixture inputs, source refs, digest rows, computed values, negative-probe labels, sign-off status, and metadata-only result records are evidence for the standalone microcosm-substrate bundle. They do not authorize live work log claims, navigation decisions, market or investment conclusions, complete secret detection, transcript or video authority, source-file changes, external model access, publishing-scope decision, launch-scope decision, private-system equivalence, generated-lattice source authority, or whole-system correctness.
Reader Evidence Routing
Read this module through the fixture, exported-bundle, focused-test, and generated-row surfaces. The fixture and bundle commands prove public source-body import discipline: exact copied-source digests, source-faithful public ports, computed negative-probe values, and metadata-only result cards. The structured source record proves that the paper module is bundle-backed and that Mermaid and Atlas availability come from bundle edges rather than prose.
The mixed Set-11 target list remains evidence routing, not an authority expansion. The reader should treat each target as a public fixture exercise inside the accepted saturation-engines component, not as live work log truth, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, publishing-scope decision, or launch-scope decision.
Prior Art Grounding
The component borrows from overload management, backpressure, and observability practice: systems need explicit signals for saturation, queue pressure, freshness, and recoverability instead of relying on a single success/failure bit. Relevant anchors include:
Microcosm borrows the saturation-signal and pressure-accounting pattern across its mixed Set-11 targets: route affinity, delta gates, shard browse priorities, evidence selection, secret scanning, market boards, wayfinding, and diagnostic lenses. The bundle computes public fixture verdicts; it is not live work log truth, complete secret detection, live market data, or launch-scope decision.
Binding Dispositions
Set-11 contained a mixed target set. The bundle records the distinction explicitly:
New or under-bound imports: run affinity, calculator insight, exogenous nav grading, shard browsing, holographic evidence selection, quant stockgrid, source regime board, frontend wayfinding, and session diagnostics.
Already-bound validations: projection secret scan and portability gate are covered by the engine-room public projection leak gate family; demo-take coverage is already represented by the Set-7 demo-take component. Set-11 validates the relevant scoring or gate behavior rather than claiming a standalone authority surface.
Partial existing system: the std_python ratchet path had existing assay coverage; the Set-11 bundle adds a bounded delta-regression witness.
Shared Wiring Status
The component-owned system can validate independently. Shared registry, atlas, sign-off, Components, ARCHITECTURE, preflight, and package wiring must be serialized behind the live shared Microcosm binding owner before this component is promoted to whole-surface discoverability.
Validation Result record Path
Negative-case fixture files are inputs, not verdicts. Each file carries a public probe_input; the component computes the corresponding fixture probe and records fixture_probe_input_digest, fixture_computed_value, and mechanism_computed_value in the integrity matrix before counting a negative case as verified.
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the Set-11 saturation-engine result record and sign-off JSON. The bundle command validates copied source-source digests, source-faithful public port evidence, computed negative-probe evidence, and metadata-only cards. The focused test covers the runtime component, exported bundle shape, exact-copy imports, private body omission, stable negative cases, and tier-B mechanism output coverage. The corpus and projection checks prove only that the generated paper-module instance remains fresh for this bundle-backed Markdown state.
This result record path is public fixture evidence only. It does not establish live work log truth, navigation authority, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, source-file changes, publishing-scope decision, launch-scope decision, external model access, or whole-system correctness.
Scope boundary
Boundary
This bundle is not live work log truth, navigation authority, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, publishing-scope decision, or launch-scope decision. Result records expose only refs, digests, counts, computed verdicts, public negative-case probe digests, and omission result records; copied source source bodies remain under the public bundle's source_modules tree.
Scope limit
This bundle is fixture-bound public source-body import, source-faithful public port evidence, computed negative-probe evidence, and metadata-only result record evidence only. It does not establish live work log truth, navigation authority, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, source-file changes, publishing-scope decision, launch-scope decision, external model access, private-system equivalence, or whole-system correctness.
Scope limit
Those result records do not prove live work log truth, navigation authority, complete secret detection, live market data, investment-related actions, raw transcript authority, video capture, external model access, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness.
Set 4 Proof, Authority, and Runtime BundleSet 4 Proof, Authority, and Runtime Bundle imports proof-search, reasoning-authority, completion, Codex runtime, bitemporal, taskpolicy, and context-yield source bodies as public source-open evidence without claiming proof success, benchmark claims, live runtime control, or launch-scope decision.
Set 4 Proof, Authority, and Runtime Bundle binds the accepted batch4_proof_authority_runtime component to a refreshed source-open bundle. It exercises Lean strategy-control and prover-skill witnesses, VeriSoftBench harness and calibration rows, Erdos #257 certificate-kernel static checks, Lean packet integrity, reasoning grant and plan authority fences, forward-integration policy, completion executor deferral, Codex driver and idle heartbeat diagnostics, metabolism claim logs, taskpolicy passthrough, and context-yield attribution while preserving copied source digests, bounded negative cases, and scope limits.
Scope limit Fixture-bound public source-body import, static proof-placeholder checks, dry-run authority-boundary evidence, source-anchor evidence, negative-case evidence, and metadata-only result records only; no theorem success, Erdos #257 solution, official benchmark claims, live sandbox enforcement, live Codex orchestration, external model access, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness.
batch4_proof_authority_runtime is the public source-open evidence membrane for fourteen source mechanisms that are easy to overclaim: proof search, machine-checked mathematics, reasoning-authority fences, completion planning, Codex runtime diagnostics, bitemporal coordination, taskpolicy wrapping, and context-yield attribution.
Purpose
These fourteen mechanisms sit close to claims a reader will want to make on their behalf. A proof-search benchmark looks like solving open problems. A copied CertificateKernel.lean for Erdos #257 looks like a solution. A reasoning-grant fence looks like a live sandbox. The single question this bundle answers is narrow and deliberately so: can each of these mechanisms be shown to a cold reader as copied, anchored, public source, without any of them quietly inheriting an authority it does not have?
The unusual part is how the bundle resists the easy inflation. It does not run the mechanisms; it imports their source bodies, checks each one against named required anchors, and then recomputes a stable negative case per mechanism from that source rather than trusting a fixture to declare its own verdict. For the Erdos #257 row it runs a static token scan over the copied Lean source and rejects sorry, admit, and axiom, so an absent proof obligation cannot be smuggled in. An optional local Lean/Lake compile probe is wired in too, but a pass means only that the copied kernel elaborated without error, and the code records that as a non-authoritative availability signal, never as formal-result correctness.
The result is a membrane, not a flagship. The interesting claim is the one it refuses: source import is made auditable, every result record stays metadata-only, and each tempting stronger statement is forced into a visible scope boundary with the authority delta held at none.
Abstract
batch4_proof_authority_runtime is a technical paper module for the Set 4 proof/authority/runtime bundle. Its positive claim is deliberately narrow: Microcosm imports exact copied source source modules into a public bundle, checks source digests and required anchors, runs bounded fixture and bundle validators, records semantic negative cases, and emits metadata-only result records with explicit scope limits.
This module does not claim formal formal-result correctness. It is not a Lean/Lake execution component, not an Erdos #257 solution, not an official benchmark result, not live sandbox enforcement, not live Codex orchestration, not external model access, not source-file changes, not publishing-scope decision, not launch-scope decision, not private-system equivalence, and not whole-system correctness. Where the paper mentions Lean/Lake, it distinguishes Set 4's static copied-source checks from sibling witness components that actually run local Lean/Lake processes.
Telos
The Set 4 bundle exists to make proof-adjacent runtime claims inspectable without leaking private roots or inflating source import into proof authority. It gathers fourteen mechanism families that otherwise invite overclaiming: strategy-control proof search, prover-skill foundry work, VeriSoftBench harness diagnostics, Erdos #257 certificate-kernel source anchors, Lean packet replay, dry-run authority grants, completion planning, Codex runtime diagnostics, bitemporal coordination, macOS taskpolicy wrapping, and context-yield attribution.
The paper's job is not to make those systems authoritative by prose. Its job is to explain the public result record membrane: what was copied, what was checked, what negative cases were observed, what was omitted from result records, and which scope limit remains in force.
Mechanism Overview
The public fixture manifest names fourteen mechanism rows and one stable negative case per mechanism:
lean_strategy_control_benchmark
prover_skill_foundry
verisoftbench_harness_differential
verisoftbench_calibration_executor
erdos257_certificate_kernel
lean_full_fidelity_packet_verifier
reasoning_execution_authority_grant
forward_integration_policy_fence
closeout_executor_state_machine
codex_cdp_driver
codex_idle_heartbeat_fsm
metabolism_bitemporal_claim_log
macos_taskpolicy_actuator
context_yield_attribution
The exported bundle contains nineteen exact copied source source modules. Validation checks their manifest rows, SHA-256 digests, line counts, required anchors, and per-mechanism public exercise clauses. Result records carry source refs, digests, anchors, counts, verdicts, negative-case ids, and scope limits; they do not inline copied body text or private runtime state.
Runtime Mechanism
The runtime has two public entry shapes:
run consumes fixtures/first_wave/batch4_proof_authority_runtime/input, evaluates the Set 4 fixture manifest, writes the public result board, and emits sign-off JSON.
validate-bundle consumes examples/batch4_proof_authority_runtime/exported_batch4_proof_authority_runtime_bundle, validates the copied-source manifest, and emits a bundle validation result record.
Both paths enforce the same ceiling. They validate public fixture evidence and copied-source integrity; they do not run providers, dispatch live Codex state, execute a live sandbox, change source files, submit benchmark results, approve public sharing, approve launch, or establish formal-result correctness.
For the Erdos #257 certificate-kernel row, Set 4 performs a static placeholder-token scan over copied Lean source and ties that scan to target-runner anchor evidence. That scan may reject sorry, admit, and axiom mutations in the copied source floor. It is not a Lean proof check and not a certificate that the open problem has been solved.
Diagram
Diagram source
flowchart TD fixture["Public fixture manifest 14 mechanism rows + 14 negative cases"] bundle["Exported public bundle 19 copied source modules"] runtime["Set 4 runtime run / validate-bundle"] anchors["Per-mechanism source check module present + required anchors in body"] scan["Erdos #257 static scan reject sorry / admit / axiom"] probe["Optional Lean/Lake probe copied kernel elaborates? availability only"] negatives["Negative cases recomputed verdict derived from source, not declared"] result records["metadata-only result records refs, digests, anchors, counts, verdicts"] ceiling["Scope limit authority delta = none"] leanWitness["Sibling Lean/Lake components actually run local proofs"] fixture --> runtime bundle --> runtime runtime --> anchors runtime --> scan scan --> probe runtime --> negatives anchors --> result records scan --> result records probe --> result records negatives --> result records result records --> ceiling leanWitness -. "separate execution evidence" .-> ceiling
The dashed edge is intentional. Lean/Lake subprocess evidence informs the technical boundary, but Set 4 itself does not inherit proof authority from sibling components.
Semantic Negatives And Threat Model
The negative cases are not decoration. They are the public failure floor that prevents a source-import bundle from becoming an unbounded proof or runtime claim. The fixture includes negatives for weak proof skeletons, low-repair foundry promotion, benchmark truth leakage, prefix-answer leakage, Erdos solution overclaim, packet hash corruption, forbidden authority grants, dirty forward integration targets, stale completion heads, absent CDP ports, stale idle snapshots, expired bitemporal claims, missing taskpolicy binaries, and accepted read guards.
The threat model is overclaiming. A green result record must not be interpreted as:
a formal proof of a theorem;
a solution to Erdos #257;
an official benchmark result or leaderboard submission;
a live provider, browser, sandbox, Codex, or metabolism run;
authorization to change source files, publish, launch, or export private state;
evidence that public copied modules are equivalent to a private root.
Result Interpretation
A passing fixture command evidences that the public manifest, mechanism rows, negative cases, result record body scan, and scope limit are internally consistent for the Set 4 fixture. A passing bundle command evidences that the exported copied-source manifest matches expected digests and anchors while keeping result records metadata-only. A passing focused pytest evidences regression coverage for fixture execution, bundle validation, source digest mismatch, mutated Lean body rejection, exact-copy imports, private-body omission, and semantic negative-case evaluation.
These are engineering result records. They are not formal proof certificates. They support public reader confidence in the bundle's source-open evidence membrane; they do not certify theorem truth, benchmark claims, launch-scope decision, or whole-system correctness.
Relationship To Formal-Proof Concepts
Set 4 relates to formal-proof practice through boundary discipline, not through theorem authority. The local concept edge is concept.formal_math_and_proof_witness_bundle: proof-adjacent claims must pass through explicit witness artifacts, source refs, digests, declaration or anchor metadata, negative cases, and metadata-only result records before they become reader evidence.
The sibling formal_math_lean_proof_witness component supplies the small public Lean/Lake witness pattern. The sibling certificate_kernel_execution_lab component supplies the bounded certificate-kernel execution pattern. Set 4 imports and validates copied source-body evidence around those themes, but it keeps the authority delta at none.
This distinction is the main technical result of the paper: a source-open public bundle can be useful without pretending to be a formal proof. It can make evidence auditable, show exactly where a proof-adjacent route stops, and force every tempting stronger claim into a visible scope boundary.
Data And Artifact Availability
The public artifact boundary is the standalone microcosm-substrate root. A cold reader should use the paper module, generated structured source record, standard, fixture manifest, exported bundle manifest, focused test, and metadata-only result records inside that root. Public links and public sharing surfaces must resolve to the public Microcosm system, not private source roots, model-output data stores, browser state, prompt-shelf bodies, or operator-voice material.
Prior Art Grounding
The runtime keeps the authority to act separate from the evidence that an action is permitted. This is the idea behind proof-carrying code (Necula, 1997) and capability-based security, where a request arrives with evidence of its own legitimacy rather than relying on ambient trust. Microcosm borrows the proof-before-authority ordering over fixtures; the result is fixture-bound evidence, not a verified authorization system or launch-scope decision.
Reproducibility Route
Run these commands from microcosm-substrate/ when validating this module without changing durable generated projections:
The projection checks for the broader paper-module corpus remain:
These are reader-verifiable evidence only and do not include launch operations, external model access, source-file changes, or whole-system correctness.
Scope boundary
Source Authority And Projection Boundary
The source authority for this paper-module identity is the JSON source record:
It may explain the source record, the generated relationship set, and the validation route, but it does not mint new subject edges, proof authority, Mermaid authority, Atlas authority, or launch status. Future relationship changes belong in the source record plus builder regeneration, not in hand-authored Markdown.
Lean/Lake Witness Boundary
Set 4 should be read as the import/result record bundle, not as the Lean/Lake executor. Actual local Lean/Lake subprocess evidence lives in sibling public components:
formal_math_lean_proof_witness runs a tiny public Lean/Lake fixture and exported witness bundle, records local tool availability, build status, declaration metadata, four negative-case observations, and metadata-only result records. Its scope limit is toy public witness evidence only; it rejects Mathlib, Aesop, and Batteries authority unless a wider authority plane is introduced.
certificate_kernel_execution_lab runs a bounded public certificate-kernel lab through Lean/Lake machinery, records command identity, transition rows, accepted/residual counts, copied-source manifest status, negative cases, and metadata-only result records. Its scope limit is bounded certificate-kernel evidence, not general theorem authority.
Therefore the correct reading is layered:
Set 4 validates source-open source-body import, static placeholder scanning, authority-boundary fields, and semantic negatives.
The Lean/Lake witness components validate that specific public fixtures can route through local Lean/Lake subprocesses under their own ceilings.
None of these pages, individually or together, claim arbitrary formal-result correctness, Mathlib-dependent proof authority, benchmark claims, Erdos #257 solution status, publishing-scope decision, launch-scope decision, or private-system equivalence.
Public/Private Boundary
Allowed public material:
mechanism ids, source-module ids, negative-case ids, and stable error codes;
exact copied source modules in the exported public bundle;
source refs, SHA-256 digests, line counts, required anchors, and bounded outcomes;
scope limits, scope boundaries, and metadata-only validation verdicts.
Forbidden public material:
keys, account secrets, browser state, account or browser state, model-output data bodies, browser UI live-access material, live Codex state exports, live metabolism DB exports, private runtime state, source notes, prompt-shelf bodies, theorem work-product bodies, raw command-output bodies, public sharing operation state, and official benchmark submission state.
The exported bundle may contain approved copied source modules. The result records are stricter: they identify copied modules by refs, digests, anchors, classes, counts, and verdicts, not by inlining source bodies.
Limitations
The current module has these hard limits:
Set 4 does not execute Lean/Lake; it performs static checks over copied source and validates public manifest evidence.
Static placeholder-token scanning is bounded evidence checking.
Digest and anchor equality do not prove semantic equivalence to a private root.
Negative-case coverage is finite and fixture-bound.
metadata-only result records improve public safety, but they are not a substitute for formal proof review.
Generated Mermaid, Atlas, and JSON structured source record are projections; they do not create source authority.
Accepted-component status means accepted current public result record inventory for this verified source-body import, not launch, public sharing, benchmark, or theorem authority.
Scope limit
This module may claim fixture-bound public source-body import, exact copied source-module digest checks, required-anchor checks, static placeholder-token scan evidence, dry-run authority-boundary evidence, semantic negative-case evidence, and metadata-only result record discipline.
It may not claim theorem success, Lean formal-result correctness, Erdos #257 solution status, official benchmark claims, live sandbox enforcement, live Codex orchestration, external model access, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness.
Set 6 Unsurfaced Primitives BundleSet 6 Unsurfaced Primitives Bundle imports provenance, operator-handoff, market, finance, provider-recovery, and demo-take source primitives as public source-open evidence without granting live operator memory, market, provider, media, public sharing, or launch-scope decision.
Set 6 Unsurfaced Primitives Bundle binds the accepted batch6_unsurfaced_primitives_capsule component to source-open primitive exercises. It checks source note keyphrase scoring, schema-loose distillation, operator handoff linkage, observed-turn window merging, market situation graphs, finance numeric assurance, fail-closed policy judgment, clone-local concurrency, market-clock scheduling, provider-recovery scoping, and demo-take temporal remapping while preserving public fixture inputs, exact-source digest expectations, negative cases, and scope limits.
Scope limit Fixture-bound public source-body import, copied-module digest/anchor evidence, synthetic source-exercise evidence, and metadata-only result records only; no live operator memory, prompt-shelf capture authority, live market data, provider/browser state, media launch, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness.
This component imports the Set-6 source primitives that the scout identified as real but under-surfaced. It is a source-open bundle: exact copied source source bodies plus bounded public exercises and stable negative cases.
The bundle covers text structuring, provenance reconciliation, epistemic display guards, governance policy judgment, clone-local concurrency, market-clock scheduling, provider recovery scoping, and demo-take temporal remapping. It does not import raw operator transcripts, prompt-shelf private logs, browser/provider state, live market data, account secrets, audio, video, or public sharing state.
Purpose
A scout found eleven small primitives scattered across the wider system that were real and load-bearing but had never been surfaced as public evidence. They are the sort of utility code that quietly decides whether a larger feature is correct: a finance unit-scale check, a clock that fires market events once per session, a function that subtracts paused time from a recorded video offset. This bundle exists to bring those eleven into the public system without pretending they are anything grander than they are.
The single question it answers is narrow but useful: do the copied bodies still behave as claimed? It is easy to copy a function into a public bundle, check its file hash, and call that proof. That only shows the bytes match. It says nothing about whether the logic is right. This bundle goes one step further. For each primitive it imports the copied body and runs it on a small public synthetic input, then asserts the specific output the real code should produce.
The unusual part is that the eleven primitives are checked by execution, not by description. The Markdown prose and the JSON bundle say what each one is meant to do; the component proves it by calling the real function and comparing the answer. Each primitive also carries a paired negative case, a deliberately malformed input that the code must reject or correct, so the bundle shows both the working path and the guard. No private bodies, transcripts, or live data enter the result records; only refs, digests, anchor names, and the pass or fail of each exercise.
Prior Art Grounding
This bundle borrows from provenance modeling, risk-governance frameworks, policy-engine design, and temporal modeling. Useful anchors include:
W3C PROV, for reconciling derived artifacts back to entities, activities, and responsible agents.
NIST's AI Risk Management Framework, as a governance vocabulary for mapping, measuring, and managing system risk without turning every guard into a launch claim.
Open Policy Agent, which separates policy evaluation from application code through a general-purpose policy engine.
Martin Fowler's bitemporal history, as a prior pattern for preserving event time separately from record time.
Microcosm borrows the provenance, governance, policy-evaluation, and temporal separation patterns, but keeps this bundle at source-open public fixtures. It does not expose private operator memory, live market data, provider state, or publishing-scope decision.
Shape
Start from the bundle JSON, not from this prose. The source row core/paper_module_capsules.json::paper_modules[78:paper_module.batch6_unsurfaced_primitives_capsule] is the authority for the component subject, mechanism subject, concept edge, principle and axiom refs, dependency modules, runtime locus, generated projection statuses, and the scope limit. The generated JSON instance is paper_modules/batch6_unsurfaced_primitives_capsule.json; it is the parity projection that carries source_authority: json_capsule, the resolved relationship edges, the generated Mermaid and Atlas statuses, and the explicit scope boundaries.
flowchart LR Bundle["JSON bundle source row core/paper_module_capsules.json paper_module.batch6_unsurfaced_primitives_capsule"] Instance["Generated JSON instance paper_modules/batch6_unsurfaced_primitives_capsule.json source basis: source record"] Markdown["Markdown reader projection paper_modules/batch6_unsurfaced_primitives_capsule.md"] Standards["Standards standards/std_microcosm_batch6_unsurfaced_primitives_capsule.json std_microcosm public Microcosm boundary"] Runtime["Runtime/source loci src/microcosm_core/components/batch6_unsurfaced_primitives_capsule.py runtime_shell and macro_engines_gallery routes"] Fixtures["Fixtures, examples, source bundle fixtures/first_wave/batch6_unsurfaced_primitives_capsule/input examples/.../exported_batch6_unsurfaced_primitives_capsule_bundle source_module_manifest.json"] Result records["Tests and result records tests/test_batch6_unsurfaced_primitives_capsule.py result records/first_wave/... validation/result/board result records/sign-off/... fixture_acceptance.json"] Projections["Generated navigation projections Mermaid: available_from_capsule_edges Atlas: linked_from_capsule_edges"] Ceiling["Scope limit fixture-bound public source-body import digest/anchor checks, synthetic exercises, negative cases, metadata-only result records only"] Forbidden["Not authorized live operator memory, prompt capture authority, live market data, provider/browser state, media launch, source-file changes, public sharing or launch-scope decision, private-system equivalence, whole-system correctness"] Bundle -->|seeds| Instance Bundle -->|bounds| Markdown Bundle -->|names standard contract and ceiling| Standards Bundle -->|cites resolved code locus| Runtime Runtime -->|runs fixture and bundle validators| Result records Fixtures -->|public inputs and exact copied source bodies| Runtime Fixtures -->|26 copied modules; sha256 and anchor checks; body_in_receipt false| Result records Instance -->|derives relationship edges| Projections Projections -->|navigation projection only| Markdown Standards -->|public/private and launch boundary| Ceiling Result records -->|pass/fail evidence remains bounded by| Ceiling Ceiling -->|excludes| Forbidden Markdown -->|must not outrank| Bundle
The module is "actual" only because the reader can traverse these concrete surfaces:
Bundle/source row:paper_module.batch6_unsurfaced_primitives_capsule binds the accepted batch6_unsurfaced_primitives_capsule component, the mechanism.batch6_unsurfaced_primitives_capsule.validates_public_unsurfaced_primitives_capsule mechanism, concept.import_projection_and_drift_control_bundle, principles P-2, P-5, P-9, P-15, axioms AX-4, AX-8, AX-10, AX-11, and the dependency modules named in the structured lattice table below.
Generated instance:paper_modules/batch6_unsurfaced_primitives_capsule.json reports active status, public_paper_module_json_seeded_from_capsule_registry_not_legacy_markdown_authority, generated Mermaid available_from_capsule_edges, generated Atlas linked_from_capsule_edges, no unpopulated selective relations, and scope boundaries that the row is not runtime-correctness, launch-readiness, or whole-system authority.
Standards:standards/std_microcosm_batch6_unsurfaced_primitives_capsule.json is the specific public bundle standard, backed by std_microcosm for the wider Microcosm entry and public/private boundary. It allows public mechanism ids, source refs, digests, anchors, exact copied source modules, synthetic outcomes, scope limits, and scope boundaries; it forbids account secrets, account or browser state, model-output data bodies, browser UI live-access material, raw operator transcripts, prompt-shelf private logs, live market data responses, media assets, and public sharing operation state.
Runtime/source loci: the resolved locus is src/microcosm_core/organs/batch6_unsurfaced_primitives_capsule.py, with the runtime shell bundle-validation route and source-engines gallery route as readers over the same public component. The source bundle manifest records 26 copied source bodies with exact-copy source-to-target relations, SHA-256 matches, required anchors, and body_in_receipt: false.
Fixtures/examples/source bundle: fixture inputs live under fixtures/first_wave/batch6_unsurfaced_primitives_capsule/input; the exported bundle lives under examples/batch6_unsurfaced_primitives_capsule/exported_batch6_unsurfaced_primitives_capsule_bundle; source_module_manifest.json is the source-open body-floor manifest for copied modules and metadata-only result record handling.
Tests/result records:tests/test_batch6_unsurfaced_primitives_capsule.py covers the runtime component, copied subengine proofs, exact-copy imports, bundle shape, and private body omission. Result record authority is the fixture sign-off row plus receipts/first_wave/batch6_unsurfaced_primitives_capsule/batch6_unsurfaced_primitives_capsule_result.json, batch6_unsurfaced_primitives_capsule_board.json, batch6_unsurfaced_primitives_capsule_validation_receipt.json, and result records/sign-off/first_wave/batch6_unsurfaced_primitives_capsule_fixture_acceptance.json; the validation result record reports pass for source-module manifest status, exercise status, negative-case status, secret exclusion, and result record body scan.
Scope limit: this page can claim fixture-bound public source-body import, copied-module digest/anchor evidence, synthetic source-exercise evidence, negative-case coverage, and metadata-only result records only. It cannot claim live operator memory, prompt-shelf capture authority, live market data, provider/browser state, media launch, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness.
Source Modules
The exported bundle copies the relevant source sources under examples/batch6_unsurfaced_primitives_capsule/exported_batch6_unsurfaced_primitives_capsule_bundle/source_modules/. Result records carry source refs, digests, anchors, counts, and exercise outcomes, not copied body text or private state.
Reader Evidence Routing
Read this module through the fixture command, exported-bundle validation, focused pytest, structured source record, and result record paths. The fixture proves a public source-open Set-6 exercise, while the bundle proves copied source digests, anchors, synthetic source exercises, negative cases, copied-subengine proofs, and metadata-only cards. The generated structured source record proves that Mermaid and Atlas availability come from bundle edges.
The validator's mechanism set remains evidence for the accepted Set-6 component result record. It does not turn this page into live operator memory, prompt-shelf capture authority, trading decisions, live provider recovery, browser state, demo media launch, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.
The source module manifest requires 14 exact copied source source/support modules. The fixture requires 11 stable negative cases, one per mechanism row. The command card is the intended cold-reader first surface; the full result record is the drilldown.
How it works
For each mechanism the component loads the copied source body, runs it on a fixed public synthetic input, and checks the exact result. A few of the exercises make the idea concrete.
Demo-take temporal join.video_t_seconds converts a wall-clock offset into a position in a recorded video by subtracting elapsed paused time. The exercise feeds it a 120-second wall offset with one pause and resume fifteen seconds apart, and asserts the result is exactly 105.0. A second call with a pause that has not yet resumed checks the open-pause branch returns 15.0. The negative case confirms a still-open pause is handled rather than ignored.
Finance numeric assurance.build_finance_numeric_assurance recomputes declared numbers instead of trusting them. The exercise hands it a flow row tagged usd_millions whose flow and flow_usd fields disagree by orders of magnitude, plus a probability declared as 70.2. The check raises stockgrid_flow_unit_scale_mismatch and probability_bounds, and the result record's display_state becomes blocked rather than trusted. The point is that a mislabelled unit or an out-of-range probability fails closed.
Operator handoff linkage.score_pair compares an agent's suggestion (a Type B capture) against what the operator later typed (a Type A input) using containment, token overlap, and anchor matching. The exercise scores a related pair above the 0.8 floor with containment true, then scores an unrelated "summarize the weather" input and asserts it falls below 0.3. This is how the primitive tells a real handoff from a coincidence.
Market-clock scheduling.due_fire_points decides which scheduled market events are due at a given moment. The exercise sets the clock to 15:31 UTC with the open event already fired earlier that day, then asserts the hourly points fire while the already-fired open event is suppressed. The guard is idempotence: an event that already fired must not fire again in the same session.
The other mechanisms follow the same shape: keyphrase ranking returns ranked phrases for real text but an empty list for stopword-only input; the schema-loose distiller keeps assistant text and operator tail as separate roles without persisting either body; the fail-closed status judge blocks a transition when its policy is malformed; the concurrency guard reports that a parent directory and a child path overlap. Every exercise records only its pass or fail and a few summary numbers, never the copied body it ran against.
Copied-Subengine Proofs
The post-Set-12 proof surface exercises two copied dormant subengines directly from the exported source bundle:
operator_thread_memory is loaded from the copied manifest and checked with synthetic observed-window cases for observed_window_within_memory and preserved_existing_no_overlap.
market_situation_graph is loaded from the same copied bundle and checked with a public synthetic mart that covers fixture scoring, counterevidence, context rows, and source refs.
These are public test-level proofs in tests/test_batch6_unsurfaced_primitives_capsule.py. They do not add an accepted component, do not widen the fixture scope limit, and do not export private thread memory or live market data.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The fixture command writes the Set-6 public primitive-import result record and sign-off JSON. The bundle command validates copied source digests, anchor evidence, synthetic source exercises, negative cases, and metadata-only cards. The focused test covers the runtime component, copied subengine proofs, exported bundle shape, exact-copy imports, and private body omission. The corpus and projection checks prove only that the generated paper-module instance remains fresh for this bundle-backed Markdown state.
This result record path is public fixture evidence only. It does not establish live operator memory, capture authority, live market data, provider/browser state, media launch, source-file changes, publishing-scope decision, launch-scope decision, or whole-system correctness.
Scope boundary
Scope limit
This is not live operator memory, not capture authority, not trading decisions, not live provider recovery, not demo media launch, not publishing-scope decision, and not launch-scope decision. It is an exact-source public bundle with digest checks, source exercises, and negative-case coverage.
Engine Room Public Projection Leak GateThe Engine Room public projection leak gate validates rendered public projection roots for account secret-shaped strings, non-public paths shapes, symlink escapes, policy-exception handling, and optional gitleaks status while keeping findings hash-only.
Engine Room Public Projection Leak Gate is a DLP-style projection boundary. It scans rendered public projection files and paths, checks symlink escapes, records policy-exception hits as hash-only evidence, reports optional gitleaks status, and validates two positive plus three negative fixture cases without copying sensitive payloads, approving launch, or claiming general security, prompt-injection, sandbox, or information-flow authority.
Scope limit Public projection leak-gate fixture and rendered-root scan result records only; no general security scanner, prompt-injection defense, sandbox, information-flow proof, launch-scope decision, private-system equivalence, source-file changes, or whole-system correctness.
This staged Engine Room bundle imports the runnable core of the source projection leak scan into Microcosm as a refactor.
Purpose
Before a rendered set of files is exposed to a public reader, someone has to answer a narrow question: does this tree contain anything that should not leave the private workspace? account secret-shaped strings, a private home path, a provider-transport symbol, or a symlink that points outside the tree are all ways for private material to ride along with an otherwise public projection. This gate answers that one question over a directory of rendered files and returns a green or red verdict.
The interesting part is what the gate does with what it finds. A secret scanner that prints the secret it discovered into its own report has created a second copy of the leak. This gate never does that. Every match is recorded by category, path, line number, and a SHA-256 hash of the matched text, and the matched value itself is dropped (_hit builds the record without it). The verdict is auditable, the counts are honest, and the result record is itself safe to publish. A reviewer can confirm that a leak was found and where, without the report becoming the thing that leaks.
The gate is deliberately small and deterministic. It reads files and path names against a fixed set of regular expressions, treats a symlink that escapes the root as a hard blocker, and folds an optional gitleaks run into the same result record. It does not parse the files, follow data flow, or reason about intent. It is a data-loss-prevention boundary for one rendered tree, not a general security scanner, and the page is careful to keep it framed that way.
What It Demonstrates
Content scans for account secret-shaped strings and private host-bound path markers.
Path scans for private raw-voice, task-history, prompt-history, Obsidian, and provider-transport path shapes.
Policy-exception paths remain visible as hash-only hits while avoiding a blocking verdict.
Symlink escapes are hard blockers with target hashes only.
Optional gitleaks execution records pass, red, unavailable, or fail-closed status without copying findings into the result record.
Shape
Source refs
red result record
public_release_allowed_by_scan = false
Diagram source
flowchart TD Root["Rendered projection root walk files and path names"] Root --> Content["Content scan account secret and private-path regexes"] Root --> PathScan["Path-name scan source note, ledger, Obsidian, transport"] Root --> Symlink{"Symlink escapes root?"} Root --> Gitleaks["Optional gitleaks run pass / red / unavailable / fail-closed"] Content --> Hash["Findings as hash-only records category, path, line, match_sha256 matched value dropped"] PathScan --> Hash Hash --> Split{"Path in policy exception list?"} Split -- "yes" --> Allowed["Policy exception retained, non-blocking"] Split -- "no" --> Blocking["Blocking hit"] Verdict{"Any blocking hit, symlink escape, or gitleaks red / fail-closed?"} Blocking --> Verdict Symlink -- "yes" --> Verdict Gitleaks --> Verdict Allowed --> Green Verdict -- "yes" --> Red["red result record public_release_allowed_by_scan = false"] Verdict -- "no" --> Green["green result record no blocker found in this scan"]
The shape is intentionally narrow. The gate reads rendered public files, file paths, policy-exception paths, symlinks, and optional gitleaks output; it emits counts and hashed evidence. It does not ingest source notes, private source bodies, model-output data, browser state, account state, browser UI/operator UI state, or recipient-send state.
Technical Mechanism
The proof consumer is the runtime function scan_projection(root, policy_exception_paths, run_gitleaks_check, require_gitleaks, gitleaks_binary) in src/microcosm_core/engine_room/public_projection_leak_gate.py. It first resolves the supplied projection root and rejects a missing or non-directory root. It then normalizes policy-exception paths against DEFAULT_POLICY_EXCEPTION_PATHS, walks the rendered tree in stable path order, skips declared cache/build directories and bytecode suffixes, and treats symlink escapes as hard blockers while recording only the relative path and a hash of the escaped target.
For each non-skipped path, _scan_path checks private-history, raw-voice, Obsidian, and browser/provider transport path shapes. For each readable file, _scan_file applies CONTENT_PATTERNS for account secret-like strings, private home paths, private Chrome profile paths, private Obsidian markers, provider transport symbols, and browser debug ports. _hit stores the category, pattern, relative path, optional line number, source kind, policy-exception status, and match_sha256; it intentionally omits the matched value. The result record then splits hits into blocking and policy-exception sets, summarizes category counts, attaches optional run_gitleaks status, and derives green or red through _overall_status.
The focused fixture consumer is evaluate_fixture_dir, which materializes each JSON fixture into a temporary projection root and checks its expected status. tests/test_engine_room_public_projection_leak_gate.py exercises the same mechanism through unit cases and CLI result record output: clean projections stay green, private home paths and key-shaped strings go red without raw value leakage, policy-exception examples remain hash-only and non-blocking, required missing gitleaks fails closed, and the five-case fixture matrix returns status: pass.
Prior Art Grounding
The component is grounded in data-loss-prevention and secret-scanning practice: scan artifacts before public sharing, detect account secret-shaped strings and non-public paths markers, preserve enough evidence for triage, and avoid copying the sensitive payload into the report. Relevant anchors include:
NIST's Data Loss Prevention public sharing, which frames leakage prevention around sensitive data leaving an enterprise boundary.
GitHub secret scanning, which raises alerts when account secret-like material appears in repositories.
Gitleaks, a public scanner for hardcoded secrets in Git repositories and files.
Microcosm borrows the pre-public sharing leak-gate and secret-scanner shape, then narrows it to public projection artifacts: account secret-shaped content, non-public paths signatures, symlink escapes, policy exceptions, and optional gitleaks status. It is not a full security scanner or information-flow proof.
Reader Evidence Routing
Read status: green as "this scan did not find a blocking leak in the supplied projection root." Do not read it as publishing-scope decision. The runtime's public_release_allowed_by_scan field means the projection passed this one DLP gate only; launch-scope decision, source-open authority, hosted-product authority, formal-result correctness authority, and private-system equivalence all remain false.
Read policy_exception_count as "the gate saw known boundary-document examples and retained hash-only evidence without blocking." Do not read it as permission to place non-public paths or account secret-shaped payloads in arbitrary public files.
Read gitleaks_status: unavailable as an explicit optional-tool result record, not a green external scanner result. When require_gitleaks is true, missing gitleaks fails closed.
The fixture manifest names two positive cases (clean_projection, policy_exception_hash_only) and three negative cases (planted_private_path, planted_key_shape, path_pattern_blocked). The expected result record is status: pass, case_count: 5, and passed_case_count: 5.
Validation Result record Path
The reader-verifiable result record is the focused pytest plus the paper-module corpus parity check:
Passing these commands proves only that the public fixture behavior and governed paper-module projection remain reproducible; it does not create an accepted component, approve launch, or prove whole-system public-safety.
Scope boundary
Scope limit
This is a DLP-style public projection gate. It is not a general security scanner, not prompt-injection defense, not sandboxing, not an information-flow proof, and not launch-scope decision.
Limitations
This module is a deterministic projection scanner over the files it is given. It does not establish that the supplied root is the complete public site, that a builder selected the correct export set, or that a later public sharing step will reuse the same artifacts. Regex- and path-pattern detection can miss encoded, split, transformed, novel, or tool-specific secrets, and it can also flag benign boundary examples when they are not routed through the explicit policy-exception path list.
Optional gitleaks integration is result record evidence only when the tool is available or required by the caller. gitleaks_status: unavailable is not an external scanner pass, and public_release_allowed_by_scan: true means only that this gate found no blockers in this scan. launch-scope decision, source-open authority, accepted-component status, aggregate doctrine-lattice coherence, private-system equivalence, and whole-system security remain outside this proof consumer.
Source and projection details
Governing Lattice Relation
The source authority for this paper-module projection is the JSON source record core/paper_module_capsules.json::paper_modules[80:paper_module.engine_room_public_projection_leak_gate], projected into paper_modules/engine_room_public_projection_leak_gate.json. That row names the mechanism subject mechanism.engine_room_public_projection_leak_gate.validates_public_projection_leak_gate, the code locus src/microcosm_core/engine_room/public_projection_leak_gate.py, the governing concept concept.import_projection_and_drift_control_bundle, six principle refs (P-1, P-2, P-6, P-8, P-9, P-15), five axiom refs (AX-1, AX-5, AX-7, AX-8, AX-11), and the dependency on paper_module.engine_room_demo.
The standard standards/std_microcosm_engine_room_public_projection_leak_gate.json narrows that lattice edge by declaring the staged-bundle authority boundary, the two positive and three negative fixture classes, the public target refs, the validator command, and the scope boundary that a green leak-gate result record is not a general security scanner, prompt-injection defense, sandbox, information-flow proof, launch-scope decision, or private-system equivalence claim.
Set 5 Authority and Systems BundleSet 5 Authority and Systems Bundle imports post-execution authority, replay, proof-repair, process, generated-state, trace, blast-radius, and doctrine-graph source bodies as public source-open evidence without claiming live authority or launch-scope decision.
Set 5 Authority and Systems Bundle binds the legacy Markdown projection, Set 5 runnable source locus, exported copied-source bundle, source-module digests, synthetic negative exercises, metadata-only result records, and scope limits to a mechanism-backed JSON bundle. It covers post-execution result record validation, reasoning replay scope and lineage, verifier-gated Lean repair harnessing, process orphan classification, generated-state fixpoint settlement, trace-tape compaction, code blast radius, and doctrine graph compilation while preserving fixture boundaries.
Scope limit Fixture-bound public source-body import, copied-module digest and anchor evidence, synthetic source-exercise evidence, and metadata-only result records only; no live external model access, proof success, process signal authority, generated-state mutation authority, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness.
Set 5 imports the next authority/systems contour as a bundle: post-execution result record validation, reasoning replay scope and lineage, verifier-gated Lean repair harnessing, process orphan classification, generated state fixpoint settlement, trace-tape compaction, code blast radius, and doctrine graph compilation.
The bundle carries exact copied source bodies in examples/batch5_authority_systems_capsule/exported_batch5_authority_systems_capsule_bundle/source_modules/ and tests those copies against source-root digests and anchors. The runnable Microcosm exercise is deliberately bounded: it uses synthetic public inputs to prove the negative claim fences while preserving the real source source as the source-open system.
Purpose
This page answers one question: can a cold reader inspect eight separate authority and systems mechanisms, and confirm each one refuses the wrong thing, without the reader having to run any of the real machinery?
The eight mechanisms are unrelated in subject. One validates post-execution result records; another decides when a reasoning step needs re-running; another gates a Lean proof attempt; another classifies a stray process; another settles generated-state residuals; another compacts a trace tape; another computes a code blast radius; another compiles a doctrine graph. What they share is a single discipline: each must decline to claim more than it has earned. The result record validator must not accept a drifted result record; the proof gate must not hand a placeholder proof to Lean; the orphan reaper must not signal a live process; the blast-radius pass must not invent coverage for a leaf with no dependents.
The unusual choice is that the bundle does not replay the real tools. It carries an exact copy of each source source body, checks those copies against the source-root digests and required anchors, and then runs a small synthetic re-derivation for each mechanism. Each re-derivation recomputes its own verdict from the fixture input rather than echoing a stored answer, so a negative case passes only when the exercise itself reaches the refusal, not when a fixture asserts it. The page is therefore a way to read eight refusal behaviours at once, with the genuine source bodies kept verifiable alongside.
flowchart TD Manifest["Copied source bundle + exercise manifest digests and required anchors checked first"] --> Component["Runtime component batch5_authority_systems_capsule.py"] Component --> E1["Result record validator flag provider/context/artifact drift"] Component --> E2["Replay scope no_replay when changed context is disjoint"] Component --> E3["Proof gate reject sorry/plan-only before Lean"] Component --> E4["Orphan reaper live descendant -> requires_owner_check"] Component --> E5["Fixpoint drainer residual source moved -> non-converging"] Component --> E6["Trace tape over-budget -> pointer + omission result record"] Component --> E7["Blast radius reverse closure; empty leaf stays empty"] Component --> E8["Doctrine graph report deleted paths and tombstones"] E1 --> Refusal["Shared refusal check each exercise recomputes its own verdict"] E2 --> Refusal E3 --> Refusal E4 --> Refusal E5 --> Refusal E6 --> Refusal E7 --> Refusal E8 --> Refusal Refusal --> Result records["metadata-only result records result records/first_wave/batch5_authority_systems_capsule"] Result records --> Ceiling["Scope limit: no external model access, mutation, proof success, launch, or private equivalence"]
The diagram starts where the runtime starts: the copied source bundle and the exercise manifest, checked against source-root digests and anchors. The component then fans out to the eight mechanism exercises, each recomputing its own pass or refusal verdict, and folds the results into metadata-only result records under a single scope limit. Generated-state mutation, external model access, proof-success claims, and launch-scope decision all stay outside that ceiling.
What the eight exercises check
Each exercise reads a small synthetic block from the fixture manifest and recomputes a verdict. None of them call a provider, run Lean, signal a process, or mutate generated state. What follows is the specific question each one answers.
Result record validator. Given a runtime grant and two post-execution result records, it recomputes the drift codes for the second result record: a substituted provider, a context class outside the grant's allowed set, an output artifact hash that diverges from the grant, or runtime_execution claimed when no runtime grant was issued. The valid result record must pass and the drifted one must be flagged; the exercise will not call drift "absent".
Replay scope. It compares the context classes a step consumed against the classes that changed. When the two sets are disjoint, the classification is no_replay. In the fixture, a step consumed a task spec and a public fixture while only ambient browser state changed, so re-running the step is not demanded.
Proof gate. It scans a candidate proof string before any Lean call. A sorry token, a plan-only phrasing such as "plan:" or "I will", or a proof that merely restates the declared theorem without an exact are each treated as failure classes, and the gate verdict becomes rejected_before_lean. The exercise records 0/8 historical banked attempts; no proof-success claim.
Orphan reaper. A process marked as a live-session descendant is classified requires_owner_check, not safe_close_candidate, and no signal is sent even when the fixture requests SIGKILL. The refusal is the point: a stray-looking process that belongs to a live session must not be killed on inventory alone.
Fixpoint drainer. It walks residual signatures. If the same residual id reappears under a moved source signature, the settlement is classified settlement_residual_source_moved, which marks a non-converging residual rather than a settled one. No generated-state mutation is authorised either way.
Trace tape. When the joined trace text exceeds the byte budget, the exercise truncates to a head budget and appends a pointer row plus an omission result record that records the omitted byte count. A budget breach with no omission result record is treated as a failure, so compaction can never silently drop trace bytes.
Blast radius. It builds the reverse-dependency graph and takes the transitive closure of dependents for a target. A target with real dependents reports them; a leaf with no dependents reports an honestly empty bucket rather than inventing coverage.
Doctrine graph. It scans doctrine nodes for two conditions: a node whose code path no longer exists, reported as an authority gap, and a node marked tombstone, reported with its replacement id. The exercise passes only when both a drift finding and a tombstone candidate are present, so a deleted code path behind a doctrine claim cannot pass unnoticed.
Reader Evidence Routing
A source-authenticity reader starts with the exported bundle source_module_manifest.json, then checks the copied files under examples/batch5_authority_systems_capsule/exported_batch5_authority_systems_capsule_bundle/source_modules/ against the source source refs and anchor rows. The useful question is whether the public bundle is source-faithful, not whether it grants live generated-state authority.
A runtime reader runs the fixture command and the run-batch5-bundle command in the Validation Result record Path. The useful question is whether the synthetic exercise and exported bundle return bounded pass evidence while keeping body material out of result records.
A launch-boundary reader opens tests/test_batch5_authority_systems_capsule.py and the Scope limit before trusting any card copy. The useful question is whether negative fences block external model access, generated-state mutation, Lean proof-success claims, and launch-scope decision.
If any digest or exact-copy test is red, treat that as source-body import drift for the body-import owner. It does not make this Markdown a bundle source row, and it must not be patched here by hand.
Prior Art Grounding
This bundle borrows from provenance interchange, trace instrumentation, and software supply-chain attestation practice. Useful anchors include:
W3C PROV, which models the entities, activities, and agents involved in producing data so readers can assess reliability and trustworthiness.
OpenTelemetry, as a vendor-neutral pattern for traces, metrics, and logs across composed systems.
SLSA provenance, which treats artifact origin, builder identity, and build parameters as explicit attestable metadata.
Microcosm borrows the lineage, trace, and attestation shape, but keeps the exercise bounded to copied public source bodies, synthetic inputs, and negative claim fences. It excludes generated-state mutation, external model access, proof success, or launch.
First Command
PYTHONPATH=src python3 -m microcosm_core.organs.batch5_authority_systems_capsule run --input fixtures/first_wave/batch5_authority_systems_capsule/input --out /tmp/batch5_authority_systems_capsule --card
Source Bodies
The bundle imports these source bodies as exact public snapshots:
The fixture command writes the bounded synthetic exercise result record. The exported-bundle command validates the copied authority-system source modules, manifest digests, anchor rows, and secret-exclusion posture while keeping source bodies out of the result record. The focused test file checks the runtime exercise, exported bundle, omission result records, body-scan boundary, and negative claim fences.
This result record path is reader-verifiable evidence only. It does not flip Mermaid/Atlas status, create bundle authority, authorize generated-state mutation, dispatch providers, certify Lean proof success, claim launch-scope decision, or aggregate doctrine-lattice coverage.
Scope boundary
Scope limit
No live model/external model access.
No Lean proof-success or benchmark claim.
No process signals are sent.
No generated-state mutation is authorized.
No private-system equivalence, public sharing, or launch-scope decision.
Scope limit
Legacy Markdown path inventory only; no JSON bundle authority, typed subject coverage, runtime correctness, or launch proof.
This ceiling is deliberately lower than the runnable component evidence. The code and tests can show that the Batch5 exercise is inspectable and that its negative claim fences hold, but this page cannot promote itself into bundle authority, typed doctrine coverage, generated-state mutation permission, Lean proof success, provider correctness, publishing-scope decision, or aggregate doctrine-lattice health.
Set 7 Oracle Sibling BundleSet 7 Oracle Sibling Bundle imports Oracle sibling source bodies and exercises deterministic subject-index, snapshot, truth-diff, quartet-plan, and original pytest witness boundaries.
Set 7 Oracle Sibling Bundle binds the runnable Oracle sibling source locus, exported copied-source bundle, source-module digests, original pytest witnesses, deterministic subject-index/snapshot/truth-diff/quartet-plan exercises, negative cases, metadata-only result records, and scope limits to a mechanism-backed JSON bundle without claiming accepted-component authority or semantic truth authority.
Scope limit Fixture-bound public Oracle source-body import, copied-module digest and anchor evidence, deterministic local exercise evidence, original pytest witness evidence, and metadata-only result records only; no Oracle reasoning authority, semantic truth authority, external model access, bridge-backed reasoning, private orchestration engine invocation, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, complete Oracle coverage, or whole-system correctness.
The Oracle is a sibling system that reasons about market evidence. Most of it depends on live data feeds and on a bridge-backed reasoning engine, so it cannot be shown to a public reader directly. This bundle answers a narrower question: which parts of the Oracle are pure, deterministic, and inspectable, and can those parts be run and checked without touching any feed, provider, or reasoning call?
The answer turns out to be the Oracle's grounding and bookkeeping layer. Before the Oracle reasons, it builds a map of what evidence supports which prediction target, hydrates artifacts from a recorded run, diffs two timed snapshots of the same feed, and plans how to repair a missing artifact chain. None of that needs the network or the reasoning engine. This bundle imports the exact source for those four tools, exercises each against synthetic in-memory runs, and re-runs the Oracle's own original tests as an independent witness.
What is unusual is the discipline of the boundary rather than the cleverness of the code. The Oracle's repair planner can, if asked, call the bridge-backed GodModeEngine to fill a missing node by reasoning. The bundle exercises the planner up to the point where it would do so and stops: it builds the repair plan, materialises an alias for an artifact that can be copied, and records run_missing_quartet and GodModeEngine as explicitly excluded. The exclusion is part of the evidence, not a gap in it.
flowchart TD bundle["JSON bundle source row core/paper_module_capsules.json::paper_modules[82:paper_module.batch7_oracle_sibling_capsule]"] instance["Generated JSON instance paper_modules/batch7_oracle_sibling_capsule.json"] md["Reader projection paper_modules/batch7_oracle_sibling_capsule.md"] standard["Standards std_microcosm_paper_module std_microcosm_batch7_oracle_sibling_capsule"] mechanism["Mechanism subject mechanism.batch7_oracle_sibling_capsule.validates_public_oracle_sibling_capsule"] runtime["Runtime/source locus src/microcosm_core/components/batch7_oracle_sibling_capsule.py"] copied["Copied Oracle sibling source bundle examples/batch7_oracle_sibling_capsule/exported_batch7_oracle_sibling_capsule_bundle"] fixture["Fixture input fixtures/first_wave/batch7_oracle_sibling_capsule/input"] subgraph Exercise["Deterministic exercises (in-memory temp runs)"] subjectIndex["subject_index admissible vs contextual evidence missing-support targets preserved"] snapshot["subject_snapshot hydrate artifact, keep provenance"] truthDiff["truth_diff_macro changed / new / dropped series"] quartet["run_quartet plan + alias readiness BLOCKED, alias materialised"] stop(["STOP: run_missing_quartet / private orchestration engine excluded, not invoked"]) pytest["original pytest witness focused Oracle v1 + quartet tests"] quartet -.excluded.-> stop end tests["Focused tests tests/test_batch7_oracle_sibling_capsule.py"] result records["metadata-only result records summaries, counts, digests, booleans; no source/stdout bodies"] projections["Generated projection status Mermaid: available_from_capsule_edges Atlas: blocked_until_organ_atlas_owner_lane_binds_edges"] ceiling["Scope limit fixture-bound local replay only; no Oracle reasoning, external model access, source-file changes, launch, or semantic truth authority"] bundle --> instance bundle --> mechanism bundle --> runtime bundle --> projections instance --> md standard --> bundle standard --> tests runtime --> copied runtime --> fixture copied --> Exercise fixture --> Exercise Exercise --> tests tests --> result records result records --> md projections --> md ceiling --> md
The source of record is the JSON source record core/paper_module_capsules.json::paper_modules[82:paper_module.batch7_oracle_sibling_capsule]. The generated JSON instance at paper_modules/batch7_oracle_sibling_capsule.json carries paper_module_payload.source_authority: json_capsule and derives its relationships.edges from that bundle, including the mechanism subject, concept, principles, axioms, and resolved code locus.
The governing standard stack is two-layered. The local bundle standard is standards/std_microcosm_batch7_oracle_sibling_capsule.json, which requires exact Oracle source copies, direct execution of subject-index, subject-snapshot, truth-diff, and quartet-plan paths, an original pytest witness, explicit exclusion of run_missing_quartet and GodModeEngine, negative cases, and the local scope limit.
The runtime locus is src/microcosm_core/organs/batch7_oracle_sibling_capsule.py, especially the source record's resolved symbols _subject_index_engine, _subject_snapshot_engine, _truth_diff_macro_engine, _quartet_repair_engine, _run_original_pytest_witness, _evaluate, run, run_batch7_oracle_sibling_bundle, result_card, EXPECTED_ENGINES, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and main. The reader-facing source bundle lives at examples/batch7_oracle_sibling_capsule/exported_batch7_oracle_sibling_capsule_bundle/ with source_module_manifest.json; the fixture entrypoint is fixtures/first_wave/batch7_oracle_sibling_capsule/input/batch7_oracle_sibling_exercise_manifest.json.
Validation is grounded in tests/test_batch7_oracle_sibling_capsule.py, the fixture result records under receipts/first_wave/batch7_oracle_sibling_capsule/, the bundle-validation result records under receipts/first_wave/batch7_oracle_sibling_capsule/bundle_validation/, and the sign-off result record result records/sign-off/first_wave/batch7_oracle_sibling_capsule_fixture_acceptance.json. That split is part of the truthful scope limit: this module proves a fixture-bound, public-safe, local Oracle sibling source-import replay and metadata-only result record path only. It does not establish Oracle reasoning authority, semantic truth authority, external model access, bridge-backed reasoning, GodModeEngine invocation, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, complete Oracle coverage, accepted-component authority, or whole-system correctness.
Imported System
oracle_subject_index_grounding_map executes tools.oracle.subject_index.run against a temporary subject run and verifies admissible versus contextual grounding.
oracle_subject_snapshot_hydration executes tools.oracle.subject_snapshot.run and verifies subject artifact provenance hydration.
oracle_truth_diff_macro_series_delta executes tools.oracle.truth_diff_macro.run and verifies changed, new, and dropped source series.
oracle_quartet_repair_alias_plan executes run_quartet.build_quartet_repair_plan and materialize_missing_aliases on a temporary truth run.
oracle_original_pytest_witness runs the focused original pytest witness for the Oracle v1 tools and quartet planner tests.
What each engine checks
Each engine seeds a temporary run directory, calls the imported Oracle tool against it, and asserts the exact shape of the result. The fixtures are small but chosen to make the domain logic visible.
The subject-index engine seeds three pieces of evidence: a stock (XOM), an ETF (XLE), and a source instrument (TLT). The Oracle's rule is that evidence can ground a prediction only when its subject is a valid prediction target and its ledger id marks it as stock or ETF support (S_ or E_ prefix). So XOM and XLE land in the admissible bucket, while TLT stays contextual even though it is a valid target, because its support is source context that cannot anchor a price prediction. The engine checks that TLT is recorded in missing_admissible_support_targets: the Oracle does not silently drop a target it cannot ground, it carries the gap forward.
The subject-snapshot engine checks that hydrating a single named artifact preserves its provenance. The result must carry the source artifact id and the originating run id, so a downstream caller knows where a prediction payload came from. No artifact body is copied into the result record.
The truth-diff engine compares two timed snapshots of the same source feed. The subject snapshot is taken at one time and the truth snapshot later, and the engine confirms the diff identifies a changed series ranked by the strongest absolute delta (here US10Y moving from 4.10 to 4.35), a newly appeared series (CPI), and a dropped series (OIL). This is the difference between a number changing and the system knowing which number changed, by how much, and whether the series set itself shifted.
The quartet engine exercises the repair planner against a truth run that is missing most of its node chain. The plan must report readiness BLOCKED, name the deepest missing target (oracle_cp2_emitter), and list which artifacts can be repaired by aliasing rather than re-running. The engine then materialises one alias and confirms the written file records what it is an alias of. Crucially, it asserts that the bridge-backed run_missing_quartet path was not taken and GodModeEngine was not constructed.
Exact-copy hashes and required anchors for Oracle sibling source modules.
copied bodies remain source evidence; result records keep bodies out.
Sign-off result records
result records/sign-off/first_wave/batch7_oracle_sibling_capsule_fixture_acceptance.json and receipts/first_wave/batch7_oracle_sibling_capsule/
Prior fixture sign-off, board, validation result record, and bundle validation outputs.
Result record presence does not flip Mermaid/Atlas status or aggregate coverage.
The selective relation boundary is intentionally narrow: this Markdown names walkable source routes for readers, but it does not infer governed concepts, principles, axioms, dependencies, or code-locus relations into the generated JSON row. Those edges must be populated through core/paper_module_capsules.json and the doctrine projection builder after an admitted source row exists.
Prior Art Grounding
The component is grounded in software-test-oracle and data-provenance practice: automated checks compare observed outputs against admissible references, while provenance records make artifact origin and transformation visible. Useful anchors include:
Survey work on the test oracle problem, where an oracle determines whether a system's output is acceptable for a given test.
The W3C PROV family, which defines a provenance model for describing entities, activities, agents, and derivation.
Microcosm borrows those ideas for deterministic subject indexing, snapshot hydration, truth-diff series deltas, and quartet alias planning. The bundle does not promote local oracle checks into semantic truth authority, external model access, source-file changes, or launch-scope decision.
Validation Result record Path
Reader-verifiable fixture command, run from microcosm-substrate/:
The fixture run writes receipts/first_wave/batch7_oracle_sibling_capsule/batch7_oracle_sibling_capsule_result.json, receipts/first_wave/batch7_oracle_sibling_capsule/batch7_oracle_sibling_capsule_validation_receipt.json, and receipts/first_wave/batch7_oracle_sibling_capsule/batch7_oracle_sibling_capsule_board.json; the sign-off file records fixture sign-off. The exported-bundle re-run uses the run-batch7-oracle-sibling-bundle action over exported_batch7_oracle_sibling_capsule_bundle, and any bundle-validation result records stay under receipts/first_wave/batch7_oracle_sibling_capsule/bundle_validation/.
This result record path is reader-verifiable evidence only. It does not create accepted-component authority, link the Atlas card, invoke bridge-backed reasoning, dispatch providers, change source files, promote semantic truth authority, or prove aggregate doctrine-lattice coverage.
Scope boundary
Scope limit
This is a deterministic local system bundle. It does not invoke run_missing_quartet, GodModeEngine, bridge-backed reasoning, browser access, external model access, launch-scope decision, source-file changes, or semantic truth authority.
Scope limit
This paper module can claim mechanism-backed JSON bundle authority for the Oracle sibling source-import slice and a walkable reader route to deterministic subject-index, subject-snapshot, truth-diff, quartet-plan, original-pytest witness, standard, fixture, source manifest, and result record evidence. It cannot claim accepted-component authority, linked Atlas-card authority, semantic truth authority, bridge-backed reasoning, external model access, source-file changes, launch-scope decision, or whole Oracle coverage.
A green fixture run or focused pytest result record proves only bounded local replay, source-copy provenance, body hygiene, negative-case behavior, and metadata-only result records for the Oracle sibling slice.
Set 7 Demo Take Console BundleSet 7 Demo Take Console Bundle imports Swift capture-console source bodies and exercises SwiftPM build, recording-state, helper-bridge, recorder-store, hotkey/audio-meter, and transcribe-payload boundaries.
Set 7 Demo Take Console Bundle binds the runnable Demo Take Console source locus, exported copied Swift source bundle, source-module digests, SwiftPM build-witness posture, deterministic recording-state/helper-bridge/recorder-store/hotkey-audio-meter/transcribe-payload exercises, negative cases, metadata-only result records, and scope limits to a mechanism-backed JSON bundle without claiming accepted-component authority, app launch authority, or recording authority.
Scope limit Fixture-bound public Swift source-body import, copied-module digest and anchor evidence, deterministic local exercise evidence, SwiftPM build-witness evidence, and metadata-only result records only; no app launch authority, screen capture authority, microphone capture authority, recording-session export, FFmpeg execution, WhisperKit/model dispatch, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, complete UI coverage, or whole-system correctness.
This page documents the Swift source-body import for the Demo Take Console capture app as a mechanism-backed Microcosm paper-module bundle.
The component bundle verifies the SwiftPM app target, exact copied source-body digests, helper-bridge command contracts, recording state gates, hotkey and audio-meter guards, and the transcription payload builder. It does not launch the app, start FFmpeg, access screen or microphone devices, export recording sessions, or dispatch WhisperKit.
Scope limit: source-open capture-console mechanics over public fixtures only; not recording authority, launch-scope decision, or proof of complete UI coverage.
Purpose
The Demo Take Console is a real macOS app that records screen and microphone takes, drives FFmpeg through a Python helper, and hands audio to an on-device WhisperKit transcriber. This bundle exists so a reader can inspect how that app is wired without the app ever running. It answers one question: do the safety gates and contracts in the Swift source actually hold, as source, before anyone trusts the app to capture anything?
The interesting choice is that the bundle checks behaviour, not just shape. It copies the relevant Swift files into a public bundle, then runs six small exercises that read those copies and assert specific invariants: that recording cannot start without a chosen display and free disk, that the global hotkey requires Control-Option-Command, that the audio meter clamps its level to the 0 to 1 range, that the transcriber refuses to run when its audio file is missing. None of these read the app's live state. They read the source that decides the app's behaviour.
To prove those checks are not vacuous, the bundle pairs them with six negative cases. Five of them copy the bundle into a scratch directory, delete one guard from the copy (the display blocker, the hotkey modifier, the audio clamp, the helper-script path, the missing-audio guard), and re-run the matching exercise. If the check still passes after the guard is gone, it was never testing the guard. This is the unusual part: the negative cases mutate the source and demand that the validator notices, so a green run means the gate is present rather than merely that the file parses. The sixth negative case runs swift build against an empty directory and expects it to fail, confirming the build witness is real rather than asserted.
One of the six engines is a real swift build of the app target. It records the exit code and a build-complete marker, never the build log body, and claims only that the copied sources compile as a SwiftPM target. It does not launch the app or grant any capture permission.
Shape
The source row is core/paper_module_capsules.json::paper_modules[84:paper_module.batch7_demo_take_console_capsule]; the resolving source mechanism is core/mechanism_sources.json::mechanisms[87:mechanism.batch7_demo_take_console_capsule.validates_public_demo_take_console_capsule]. The generated instance is paper_modules/batch7_demo_take_console_capsule.json, which carries paper_module_payload.source_authority: json_capsule.
Diagram source
flowchart TD bundle["Copied public Swift bundle eight exact-copy source files body_in_receipt = false"] subgraph Engines["Six source-contract engines"] build["swift build witness app target compiles exit code + build marker only"] state["Recording state model eleven typed states present marker uses wall + video time"] bridge["Helper-bridge contract eleven helper commands script bound to repo"] fsm["RecorderStore state machine start needs display + disk pause / resume / stop-to-review"] meter["Hotkey + audio meter Control-Option-Command-M level clamped 0 to 1"] transcribe["Transcribe payload builder WhisperKit decode config guards missing audio"] end subgraph Negatives["Paired negative cases"] mutate["Copy bundle to scratch delete one guard token"] rerun["Re-run that engine expect it to flip to blocked"] end result records["metadata-only result records refs, digests, anchors, booleans no source bodies, no logs"] ceiling["Scope limit source-copy + fixture evidence only no app launch, capture, FFmpeg, WhisperKit dispatch, or launch"] bundle --> Engines bundle --> mutate mutate --> rerun Engines --> result records rerun --> result records result records --> ceiling
The runtime component is the executable validation locus: it names six expected engines, six negative cases, source-copy anchor checks, metadata-only result record rules, and the local command/card surfaces. runtime_shell.py exposes the bundle validation step, while cli.py routes the bundle command. The fixture manifest binds fixture input to the exported source bundle manifest; the source bundle manifest records eight exact-copy public Swift modules with body_in_receipt: false and digest matches.
Validation result records keep the proof narrow. The focused verifier result record reports fixture and bundle passes, tampered copied-source digest/anchor swaps blocked, focused pytest passing, and private-token scan matches at zero for the scratch cycle. Those result records support reader walkability and source-copy evidence only. They do not admit an accepted component/card edge, clear the Atlas block, authorize recording or device access, dispatch FFmpeg or WhisperKit, or make a launch/public-sharing claims.
Reader Evidence Routing
Readers can walk the local evidence without private payloads:
src/microcosm_core/organs/batch7_demo_take_console_capsule.py defines the component id, fixture id, validator id, scope limit, scope boundary, expected engines, negative cases, and required anchors for the eight imported Swift source files.
standards/std_microcosm_batch7_demo_take_console_capsule.json records the same scope limit, body_in_receipt: false, the required source refs, the SwiftPM witness command, negative-case count, and copied_non_secret_macro_body import class.
core/fixture_manifests/batch7_demo_take_console_capsule.fixture_manifest.json binds the fixture input to the exported source bundle manifest.
tests/test_batch7_demo_take_console_capsule.py validates the engine set, exact source-body copy digests, required anchors, negative cases, no-private-body card shape, and absence of local absolute paths in result records.
receipts/import_binding/partial_import_binding_report.json records the sign-off result record, fixture manifest, source bundle manifest, test file, and component source refs for this bundle.
This routing is because result records and cards keep source bodies out of result record payloads. It is not evidence for app launch, recording permission, provider/model dispatch, launch-scope decision, or whole-UI coverage.
Prior Art Grounding
The component borrows from desktop media-capture and local transcription tooling: capture apps commonly combine OS capture APIs, command-line media encoders, recording state gates, hotkeys, level meters, and transcription handoff contracts. Useful anchors include:
Apple's ScreenCaptureKit framework for selecting and streaming screen/audio content in macOS apps.
FFmpeg, the established command-line media recording, conversion, and streaming toolchain.
WhisperKit, an on-device speech recognition toolkit for Apple Silicon.
Microcosm borrows the capture-console contract shape but keeps the exercise at source-body and fixture validation. The bundle does not start capture devices, launch FFmpeg, invoke WhisperKit, or claim recording/launch-scope decision.
Validation Result record Path
Reader-verifiable fixture command, run from microcosm-substrate/:
The fixture run writes receipts/first_wave/batch7_demo_take_console_capsule/batch7_demo_take_console_capsule_result.json, receipts/first_wave/batch7_demo_take_console_capsule/batch7_demo_take_console_capsule_validation_receipt.json, and receipts/first_wave/batch7_demo_take_console_capsule/batch7_demo_take_console_capsule_board.json; the sign-off file records fixture sign-off. The exported-bundle re-run uses the validate-bundle action over exported_batch7_demo_take_console_capsule_bundle, and any bundle-validation result records stay under receipts/first_wave/batch7_demo_take_console_capsule/bundle_validation/.
This result record path is reader-verifiable evidence only. It does not flip component-atlas status, launch the app, start capture devices, run FFmpeg, dispatch WhisperKit, or aggregate doctrine-lattice coverage.
Scope boundary
Scope limit
This paper module can claim that the Demo Take Console bundle has a walkable reader route to its component file, standard, fixture manifest, source manifest, focused tests, result record path, Swift source-copy evidence, resolving mechanism row, and generated Mermaid availability. It cannot claim accepted-component authority, Atlas-card linkage, app launch, recording permission, device access, FFmpeg execution, WhisperKit/model dispatch, launch-scope decision, or complete UI coverage.
The generated structured source record is sourced from the JSON source record. A green fixture run or focused pytest result record proves only bounded replay, source-copy provenance, body hygiene, and negative-case behavior for the fixture. The remaining scope limit can only rise through a separate component-atlas owner lane that binds accepted component/card edges; this paper module does not create those edges by prose.
Engine Room Generated Projection Drift GatePublic generated-projection drift fixture: owner-routed checks fingerprint declared sources and artifacts, reuse clean result records only under matching hashes, and fail planted-byte or missing-artifact cases.
Engine Room Generated Projection Drift Gate is a generated-artifact freshness bundle. It validates projection owner selection from changed paths, declared source and artifact fingerprints, no-write check return codes, source-hash cache reuse, planted-byte detection, and missing-artifact failure over four public fixtures while keeping semantic drift proof, repair authority, full source registry validation, launch, and private-system claims out of scope.
Scope limit Public owner-routed generated projection drift fixture and focused regression result records only; no semantic drift proof, full source registry validation, repair authority, launch-scope decision, private-system equivalence, source-file changes, external model access, or whole-system correctness.
This staged Engine Room bundle imports the generated-projection drift control shape into Microcosm as a refactor.
Purpose
A repository that commits generated files alongside their sources has a standing problem: a generated artifact can quietly fall out of step with the source it was built from, and nothing fails until a reader trusts a stale file. This bundle answers one question. For a given set of changed paths, which generated artifacts might now be out of date, and is the owner's own check still passing?
The unusual choice is owner routing rather than a global snapshot. Each generated surface is modelled as a ProjectionOwner row that names its artifacts, its source authorities, and a no-write check command that is treated as the drift authority for that surface. A changed path is matched against those patterns, so a small edit selects only the owners it could plausibly affect instead of rerunning every builder in the repository. The check command itself, not this gate, decides whether a surface is fresh. The gate's job is to route to the right owner and record the evidence honestly.
Two properties keep that honest. The skip cache is deliberately strict: a prior clean result record is reused only when the source hash, the artifact hash, the check command, and artifact presence all still match, so any drift in any of those falls back to actually running the check. And a missing artifact counts as drift on its own, even when the owner's check command would pass, so an absent generated file cannot be laundered by a green command. The result is a freshness signal for declared owners, not a claim that every generated surface in the wider system is semantically correct.
Changed-path scoping selects the responsible owner instead of sweeping every generated surface.
Source and artifact files are content-addressed with sha256 fingerprints.
Prior clean result records skip repeat checks only when source hash, artifact hash, check command, and artifact presence still match.
Missing artifacts drift even if an owner check command would otherwise pass.
A planted artifact byte is rejected when the owner's check command detects the mismatch.
Shape
Diagram source
flowchart LR A["Changed path or owner id"] --> B["Select projection owner"] B --> C["Fingerprint source authorities"] B --> D["Fingerprint generated artifacts"] C --> E{"Prior clean result record still matches?"} D --> E E -- "yes" --> F["source-hash cache hit"] E -- "no" --> G["Run owner's no-write check"] F --> H{"Artifact missing or check failed?"} G --> H H -- "yes" --> I["drift result record"] H -- "no" --> J["clean result record"]
The important property is owner routing, not global sweeping. A changed path selects the relevant owner row, the owner row names the artifact/source patterns and no-write check, and the result record records why the owner was checked or skipped. The gate can prove that a declared owner check was current for a fixture root; it cannot prove every generated surface in the source system is semantically fresh.
Prior Art Grounding
The component is grounded in reproducible-build and regression-testing practices: declare source inputs, produce generated artifacts, compare content hashes, and rerun the owner check when either source or artifact identity changes. Useful prior-art anchors include:
Bazel hermeticity, especially the emphasis on declared inputs, source identity, repeatable actions, and cache validity.
pytest-regtest snapshot testing, where recorded outputs are compared against reference outputs to detect unexpected changes.
Microcosm borrows the declared-input and artifact-fingerprint discipline, then routes drift checks through the projection owner instead of treating all generated files as one global snapshot. The result record proves owner-check freshness for declared artifacts; it is not semantic drift proof or launch-scope decision.
Reader Evidence Routing
Read status: clean as "all selected owner rows had required artifacts present and either a current matching clean result record or a passing no-write check." Do not read it as proof that generated prose is semantically correct, that every source registry owner is valid, or that a repair command should run.
Read source_hash_cache.hit_count as bounded skip evidence: source hash, artifact hash, artifact presence, and check command all matched a prior clean result record. If any of those change, the command path is the evidence lane again.
Read status_reasons: ["artifact_missing"] as drift even when a check command would pass. The artifact presence check is part of the authority boundary; a missing generated output cannot be laundered by a passing owner command.
The fixture manifest names two positive cases (clean_owner, scoped_changed_path) and two negative cases (planted_byte_detected, missing_artifact). The expected result record is status: pass, case_count: 4, and passed_case_count: 4.
Validation Result record Path
The reader-verifiable result record is the focused pytest plus the paper-module corpus parity check:
Passing these commands proves that the public fixture behavior and bundle-backed JSON projection remain reproducible. It does not establish semantic freshness for all generated surfaces, does not run repair commands, and excludes launch.
Scope boundary
Scope limit
This is an owner-routed generated projection drift gate over declared artifacts, source authorities, clean-result record fingerprints, and no-write check command return codes. It is not semantic drift proof, not full source registry validation, not repair authority, and not launch-scope decision. The long-tail source registry must still be judged by each owner's real check command.
Engine Room Command-Run SingleflightPublic command-run singleflight fixture: content-addressed subprocess keys collapse duplicate active runs and replay captured result records without claiming scheduler or daemon authority.
Engine Room Command-Run Singleflight is a subprocess singleflight bundle. It validates content-addressed command keys, scoped dirty/content fingerprints, fcntl leader/follower election, completed-run reuse, captured output replay, and two negative boundaries over fixture commands while keeping scheduler, daemon, distributed-lock, live-state export, launch, and private-system claims out of scope.
Scope limit Public subprocess singleflight fixture and focused regression result records only; no job scheduler, daemon, distributed lock service, live command_runs export, external model access authority, launch-scope decision, private-system equivalence, source-file changes, or whole-system correctness.
This staged Engine Room bundle imports the runnable core of the source system/lib/command_run_singleflight.py into Microcosm as a refactor.
Purpose
When several agents or background tasks fire the same command at the same moment, the naive outcome is several identical subprocesses doing the same work at once. That wastes the machine, and where the command has a side effect it can corrupt shared state by writing twice. The bundle answers one question: when two callers ask for the same command at the same time, can the system run it exactly once and hand the second caller the first caller's captured output?
The approach worth noticing is how it decides that two requests are "the same". It does not compare command names. It builds a content-addressed key over the argv, the working directory, a small slice of the environment, and, crucially, the scoped worktree state. Inside a Git repository that state includes HEAD, the porcelain status, and the diff and content hashes for the named scope paths. So two runs of the same command collapse into one only while the code they would see is identical. Edit a file in scope and the key changes, which forces a fresh run rather than serving a stale result. The scope_mutation_changes_key fixture exists precisely to pin that behaviour.
The coordination itself is deliberately small: an fcntl file lock per key elects one leader, the leader runs the subprocess and captures its output, and followers wait and replay that captured stdout, stderr, exit code, and run id rather than launching their own copy. There is no daemon, no queue, and no network lock service. The point of the bundle is to show that the collapse and the content-addressing both hold under a real two-process race, not to stand in for a scheduler.
What It Demonstrates
Content-addressed command keys over argv, cwd digest, selected environment, scoped Git dirty state when available, and scoped file-content fallback.
fcntl leader/follower coordination so duplicate active invocations collapse to one subprocess execution.
Completed-run reuse when the caller opts in.
Captured stdout/stderr replay for followers and reuse result records.
Shape
Diagram source
flowchart TD A["Command argv, cwd, env, scope paths"] --> B["Build content-addressed key argv + cwd + env + scoped worktree state"] B --> K["Hash key to key_hash"] K --> C["Take per-key fcntl lock"] C --> D{"Active run for this key?"} D -- "running" --> E["Follower waits on active result record"] D -- "completed and reuse allowed" --> G["Reused: replay completed result record"] D -- "none, or stale" --> H["Leader runs subprocess once"] E --> F{"Active finished in window?"} F -- "yes" --> R["Follower: replay leader output shared run_id, same exit code"] F -- "no" --> T["stale_or_timeout, exit 124 no rerun"] H --> I["Capture stdout, stderr, exit code write run and latest-by-key result records"] R --> Z["Return result record"] G --> Z T --> Z I --> Z
The shape is intentionally local. It proves a duplicate command key can elect one leader and make followers reuse that leader's captured result record. It does not claim a durable queue, daemon, distributed lock, scheduler, or export of the source state/command_runs/ tree.
Technical Mechanism
The runtime mechanism is a local subprocess singleflight, not a scheduler loop. build_command_key constructs a stable command key from argv, cwd label and digest, resource class, selected environment, and scoped state. When the cwd is inside Git, scoped state includes HEAD, porcelain status, binary diff hashes, staged diff hashes, and scoped file-content hashes; outside Git, the fallback is the scoped file-content fingerprint. The key hash is therefore an equality claim over command identity plus selected local state, not over command name alone.
run_command_singleflight uses the key hash to locate a per-key fcntl lock, active result record, latest result record, run metadata, stdout file, and stderr file under the caller-provided state root. Under the lock, the first process writes active metadata and becomes the leader. A concurrent duplicate sees the active running metadata and becomes a follower. A caller that explicitly sets reuse_completed=True may replay a completed result record instead of launching a new subprocess.
The leader path _run_leader starts the subprocess once, writes active/run metadata, captures stdout and stderr, persists the final exit code, updates the latest-by-key result record, and appends leader lifecycle events. The follower path _wait_for_active polls the active result record until it is completed, then replays the same stdout, stderr, exit code, and run_id; if the active run is stale or does not finish before the wait window, the follower returns stale_or_timeout with exit code 124 and does not rerun the command. Empty argv is rejected before key construction.
The public fixture matrix exercises this mechanism through four named cases: single_leader, completed_reuse, scope_mutation_changes_key, and missing_command_rejected. The focused pytest adds the real OS-process race: two callers start the same command, the roles resolve to leader and follower, both result records share one run_id, both replay counter=1, and the side-effect counter increments exactly once.
Concurrency Claim
The mechanism is cross-process singleflight, not just memoization. Its cache key is content-addressed over command identity and scoped state, including Git HEAD, porcelain status, and scoped dirty-file content when available. That means the same command can safely reuse or collapse while an edited scoped file creates a different key and must miss.
The fcntl lock is the leader/follower election. The leader executes the subprocess once and writes the captured result record; followers wait for the active run and replay the leader's captured stdout, stderr, return code, and metadata. The expected regression is a real OS-process race: two callers start with the same key and the side-effect counter increments exactly once.
The sibling idempotency pattern belongs in metabolism_runtime: active work is deduped with a SQLite partial unique index, while terminal work can be rerun. Together, the two patterns distinguish "collapse duplicate active execution" from "cache completed results."
Prior Art Grounding
The component is directly inspired by duplicate-call suppression and cache-stampede control patterns, especially Go's singleflight package:
`golang.org/x/sync/singleflight`, which defines a namespace of work where duplicate calls for the same key share one in-flight execution.
Microcosm borrows the leader/follower and shared-result shape, then adapts it to local subprocess runs with content-addressed command identity, scoped worktree state, captured stdout/stderr replay, and an explicit distinction from completed-result caching. It is singleflight for command execution, not a general scheduler or distributed lock service.
Reader Evidence Routing
role: leader: this process won the key lock and executed the subprocess.
role: follower: this process attached to the active run and replayed the leader's captured stdout, stderr, exit code, and metadata.
role: reused: completed-result reuse occurred because the caller explicitly set reuse_completed.
dirty_fingerprint: command-key invalidation is scoped to content. The scope_mutation_changes_key fixture mutates a scoped file and expects a new key, so edited scoped content is not laundered into an old singleflight run.
status: stale_or_timeout with exit code 124: the wrapper refused to rerun while an active run failed to finish inside the wait window.
empty argv rejection: input validation only, not scheduler policy.
non-proof boundary: these result records do not prove daemon behavior, distributed locking, live state/command_runs/ export, Atlas ownership, accepted-component status, launch-scope decision, or whole-system correctness.
The fixture manifest names two positive cases (single_leader, completed_reuse) and two negative-boundary cases (scope_mutation_changes_key, missing_command_rejected). The expected result record is status: pass, case_count: 4, and passed_case_count: 4.
Named Proof Consumers
The named proof consumer is tests/test_engine_room_command_run_singleflight.py. Its focused tests cover: scoped file mutation changing the dirty fingerprint; completed-run reuse without rerunning; concurrent duplicate collapse with one leader, one follower, shared run_id, replayed output, and exactly one side effect; stale active-run timeout refusal without rerun; empty-command rejection; fixture-matrix parity; and the module CLI JSON result record.
The runtime proof consumer is microcosm_core.engine_room.command_run_singleflight evaluate-fixtures, which loads fixtures/first_wave/engine_room_command_run_singleflight/input/*.json and reports status, case_count, passed_case_count, source refs, source_faithful_public_refactor, scope limit, and scope boundaries.
The projection proof consumer is scripts/build_doctrine_projection.py --check-paper-module-corpus, which keeps the bundle-backed Markdown/JSON corpus reproducible while preserving the generated Mermaid status and the blocked Atlas-card boundary named in the JSON bundle.
Validation Result record Path
The reader-verifiable result record is the focused pytest plus the paper-module corpus parity check:
Passing these commands proves only that the public fixture behavior and bundle-backed JSON projection remain reproducible; it does not establish scheduler authority, daemon authority, distributed lock service behavior, launch-scope decision, or whole-system correctness.
Scope boundary
Scope limit
This is a subprocess singleflight bundle, not a job scheduler, not a daemon, not a distributed lock service, and not an export of live state/command_runs/ state. Its JSON bundle authority is limited to the paper-module relationships named in core/paper_module_capsules.json; it does not claim an accepted component, Atlas ownership, launch-scope decision, private-system equivalence, source-file changes, or whole-system correctness.
Subject Boundary Audit
The admitted subject is the mechanism row mechanism.engine_room_command_run_singleflight.validates_public_command_run_singleflight. There is still no accepted engine_room_command_run_singleflight component claim, and organ_atlas.engine_room_command_run_singleflight remains blocked until the component-atlas owner binds its edges. The source authority is therefore enough for Mermaid and lattice walkability, but not enough for component readiness, Atlas readiness, scheduler authority, or launch-scope decision.
Source and projection details
Governing Lattice Relation
This module sits in the Microcosm lattice as a mechanism-backed proof of duplicate active-command collapse. Its admitted subject is mechanism.engine_room_command_run_singleflight.validates_public_command_run_singleflight, whose source row names the validating behavior: content-addressed subprocess keys, fcntl leader/follower election, completed-run reuse, scoped dirty/content fingerprint invalidation, captured stdout/stderr replay, and empty-command refusal. The Markdown is therefore a reader narrative over that mechanism row, not an independent source of authority.
The concept edge is concept.import_projection_and_drift_control_bundle: the public refactor imports the source system/lib/command_run_singleflight.py shape into Microcosm while keeping the proof surface bounded to fixture inputs, source refs, and metadata-only result records. The bundle's principles P-1, P-2, P-6, P-8, P-9, and P-15, plus axioms AX-1, AX-5, AX-7, AX-8, and AX-11, are the governing relationship edges reported by the structured source record; this page cites those ids rather than minting new doctrine to make the module look more complete.
The important dependency edge is paper_module.engine_room_metabolism_runtime. That sibling module shows the runtime idempotency pattern for active work, while this module isolates the command-run singleflight pattern. Read together, the pair separates two claims: active work can be deduped by a stateful runtime, and duplicate subprocess invocations can be collapsed by a content-addressed key plus per-key lock. The boundary matters because a green singleflight result record is not evidence for a durable scheduler, daemon, or distributed lock service.
Projection status follows the same lattice boundary. The proof consumer for this relation is still tests/test_engine_room_command_run_singleflight.py plus the fixture CLI and paper-module corpus check named below.
Engine Room Metabolism RuntimeStaged Engine Room component: synthetic SQLite metabolism runtime exercise for queues, leases, blackboard projection, and reconciliation.
Engine Room Metabolism Runtime explains the metabolism runtime component inside the accepted Engine Room demo. It exercises synthetic SQLite queue state, lease recovery, blackboard claim projection, and cold-start reconciliation fixtures without exporting private runtime state or dispatching providers.
Scope limit Component evidence for the accepted staged Engine Room demo only; not a live non-public runtime export, not external model service, not agent dispatch, not distributed database proof, not launch-scope decision, and not source-file changes.
This staged Engine Room bundle imports the always-on metabolism runtime shape into Microcosm as a synthetic SQLite bundle.
Purpose
A long-running agent runtime keeps a durable record of work: jobs to do, leases held by workers, runs in flight, and claims asserted on a shared blackboard. That record drifts out of step with reality. A worker dies mid-run and its lease never gets released. A run finishes but the job it belonged to is still marked running. A launch log goes stale because nothing is writing to it. The question this component answers is narrow: given the durable state alone, which rows are inconsistent, and which of those should a person look at before anything touches them?
The interesting choice is what the reconciliation pass does not do. It reads the jobs and runs tables, applies its rules, and emits findings tagged operator_review_required. It does not auto-repair. An expired lease has a clean recovery path, so requeue_expired_jobs moves it back to recoverable on its own. But a running job with no run row, or a finished run whose job still reads running, is ambiguous: the safe move is to surface it, not to guess. The component draws that line deliberately and refuses to cross it.
The blackboard makes the same refusal. Claims are not edited or deleted in place. An assertion is one event row; a contradiction, expiry, or supersession is a separate event that points back at the assertion it invalidates. The active view is then projected by replaying the event log and dropping any assertion an invalidating event named. State is reconstructed from an append-only history rather than mutated, so the reason a claim is no longer active stays on the record.
This is a synthetic SQLite exercise, not the live runtime. It ships fixtures and a real database file, exercises the queue, lease, projection, and reconciliation paths, and emits a result record. It does not carry the private source database, dispatch any worker or provider, or stand in for distributed-database behaviour.
Shape
The module is a staged, synthetic runtime model over a local SQLite database. It demonstrates durable queue state, lease recovery, blackboard claim-event projection, and cold-start reconciliation findings without exporting or operating the private source runtime. The public body is intentionally small: fixtures create jobs, claims, runs, launch logs, and blackboard events, then the runtime emits result records over those local artifacts.
The proof boundary is durable-state behavior, not live orchestration. A clean result record means the synthetic fixture exercised the declared queue, lease, projection, and reconciliation cases. It does not establish external model service, agent dispatch, distributed database behavior, ambiguous automatic repair, or launch-scope decision.
flowchart TD Fixture["Public fixture cases queue recovery, blackboard projection, running-job reconciliation"] Schema["connect / ensure_schema WAL SQLite jobs, runs, blackboard_claim_events"] Queue["enqueue_job active-state idempotency"] Lease["claim_next_job requeue_expired_jobs"] Runs["start_run / complete_run run lifecycle rows"] Blackboard["append_claim_event build_blackboard_projection"] Reconcile["reconcile running_job_no_run_row, run_finalized_but_job_running, running_job_stale_launch_log"] Result record["evaluate_fixture_dir JSON result record with counts, scope boundaries, and scope limit"] Ceiling["Scope limit synthetic SQLite behavior only"] Fixture --> Schema Schema --> Queue Queue --> Lease Lease --> Runs Schema --> Blackboard Runs --> Reconcile Blackboard --> Reconcile Reconcile --> Result record Result record --> Ceiling
What It Demonstrates
WAL-enabled SQLite schema for jobs, runs, and blackboard claim events.
Idempotent job insertion through a partial unique index on idempotency_key scoped to the active states. A duplicate enqueue while the job is still live is rejected; once the job reaches a terminal state the key is free to re-enqueue.
Lease claim and expired-claim recovery into recoverable: a claim carries an expiry, and requeue_expired_jobs returns any lapsed claim to the queue.
Blackboard claim-event projection that removes contradicted assertions.
Cold-start reconciliation findings for:
running_job_no_run_row
run_finalized_but_job_running
running_job_stale_launch_log
Reader Evidence Routing
fixture CLI: inspect synthetic SQLite runtime behavior over public fixture roots.
paper-module coverage contract: verify that this slug has left the Engine Room legacy re-entry set because its JSON source record names source, subject, and code-locus evidence.
doctrine projection check: corpus/parity evidence only; it is not private-runtime export, external model access authority, accepted-component admission, or proof that live source metabolism state is healthy.
non-proof boundary: passing checks show the synthetic SQLite exercise is replayable and bounded by its scope limit.
Prior Art Grounding
The component is grounded in autonomic-computing and durable-runtime control loops: observe work state, detect stale or inconsistent state, recover leases, and keep the durable log separate from the acting dispatcher. Relevant anchors include:
IBM's autonomic-computing architecture lineage, including the MAPE-K loop tradition that frames self-management as monitor, analyze, plan, execute, and knowledge.
SQLite write-ahead logging, where a local database uses a WAL file for transactional durability and concurrency behavior.
Google's SRE monitoring guidance as a practical lineage for separating symptoms, causes, and operational signals.
Microcosm borrows the self-management loop and durable local-state pattern, but keeps the component synthetic and public-safe: jobs, leases, blackboard assertions, and cold-start findings are exercised without exporting private runtime state or dispatching providers.
Passing these commands proves only that the public fixture behavior and bundle-backed JSON projection remain reproducible; it does not establish live non-public runtime export, external model access, launch-scope decision, or whole-system correctness.
Scope boundary
Scope limit
This is a synthetic SQLite bundle for durable queue, lease recovery, blackboard claim-event projection, and cold-start reconciliation taxonomy. It is not a live non-public runtime export, not an agent dispatcher, not external model service, not ambiguous auto-repair, and not a distributed database. Its JSON bundle authority is limited to component evidence for the accepted Engine Room demo mechanism and the relationships named in core/paper_module_capsules.json; it does not claim a standalone metabolism-runtime mechanism, an accepted component, Atlas ownership, or launch-scope decision. It never ships the private source metabolism database, runtime status JSON, operator sessions, provider state, or live logs.
Source and projection details
Source-Open Body Floor
Readers should be able to inspect the public body through these local surfaces:
src/microcosm_core/engine_room/metabolism_runtime.py defines the SQLite schema, job queue, lease claim/recovery, run lifecycle, blackboard projection, reconciliation rules, fixture evaluator, and CLI.
tests/test_engine_room_metabolism_runtime.py checks WAL/idempotency, expired-claim recovery, contradicted blackboard assertions, each reconciliation finding, fixture replay, and the module CLI result record.
fixtures/first_wave/engine_room_metabolism_runtime/input carries the replayable public queue/reconciliation cases.
core/fixture_manifests/engine_room_metabolism_runtime.fixture_manifest.json binds the fixture set as an inspectable artifact.
standards/std_microcosm_engine_room_metabolism_runtime.json names the source-to-target relation, required positive and negative cases, validator command, and scope limit.
The source source refs in the standard are lineage anchors for the public refactor. They do not expose private runtime state, and they do not make this Markdown page a private-runtime export or broader launch-scope decision. Source authority for the paper-module row lives in the JSON bundle registry.
Engine Room Bridge Campaign DAGStaged Engine Room component: pre-dispatch bridge-campaign DAG validator for typed nodes, acyclicity, synthesis reachability, and provider fan-out ceilings.
Engine Room Bridge Campaign DAG binds the staged pre-dispatch campaign validator to the accepted Engine Room demo mechanism. It validates typed probe/reducer/synthesis graphs, rejects cycles, requires synthesis nodes to trace back to probe evidence, checks provider parallelism ceilings, and emits public fixture result records without dispatching providers or proving campaign execution safety.
Scope limit Component evidence for the accepted staged Engine Room demo only; no external model access, campaign execution, reducer or synthesis correctness proof, provider safety proof, launch-scope decision, private-system equivalence, source-file changes, or whole-system correctness.
engine_room_bridge_campaign_dag is the first disjoint Engine Room bundle. It imports the source bridge-campaign contract shape as a refactor: validate a typed probe/reducer/synthesis graph, prove the graph is acyclic, make sure synthesis reaches probe evidence, and reject provider over-parallelism before any campaign reaches a dispatcher.
Source refs:
tools/meta/bridge/bridge_campaign.py
tools/meta/bridge/dispatch_validator.py
tools/meta/bridge/provider_capabilities.py
Purpose
A bridge campaign fans one piece of work out across several agent providers, then folds their outputs back into a single synthesis. The cost of getting the graph wrong is paid after dispatch: a cycle that never terminates, a synthesis step that summarises nothing because it does not actually depend on any probe, or a fan-out that asks one provider for more parallel workers than it can take. This component answers one question before any of that happens. Is the campaign graph well formed enough to be safe to run?
The design choice that makes this useful is that it is a validator and nothing else. validate_campaign reads a small JSON spec and returns a list of typed rule decisions; it never dispatches an agent, calls a provider, or runs a reducer. The rule ids deliberately mirror the source contract's CR and VR rule families, so the public bundle carries a faithful subset of the same checks the private runtime applies, without carrying the runtime itself.
The check that is worth pausing on is reachability. It is not enough for the graph to be acyclic and for the node types to be valid. The single synthesis node must transitively depend on at least one probe, so that the conclusion can be traced back to evidence rather than to other conclusions. A graph that wires synthesis only to a reducer with no probe behind it is rejected. That is the difference between a structure that looks like a campaign and one whose output is grounded in something the campaign actually gathered.
Shape
Diagram source
flowchart TD A["Typed campaign spec (JSON object)"] --> B["Envelope checks schema, kind, kebab-case id, intent, plan path, continuation"] B --> C["Node checks unique labels, probe/reducer/ synthesis roles, input modes, dependencies resolve"] C --> D["Acyclicity check"] D --> E["Exactly one synthesis node barrier binds that node"] E --> F["Synthesis transitively reaches a probe"] F --> G["Provider fan-out ceiling workers <= safe_parallelism"] G --> H["Pass result record ValidationResult ok = true"] B -.->|"rule reject"| R["Reject result record rule id, target, reason"] C -.->|"rule reject"| R D -.->|"rule reject"| R E -.->|"rule reject"| R F -.->|"rule reject"| R G -.->|"rule reject"| R
The shape is intentionally pre-dispatch. The module reads a typed campaign graph, validates node kinds and dependencies, rejects cycles, requires synthesis paths to reach probe evidence, checks provider fan-out ceilings, and emits a pass/reject result record. It does not dispatch bridge workers, use external model services, execute reducers, run synthesis, or prove provider safety.
Technical Mechanism
The runtime mechanism is a deterministic graph validator in src/microcosm_core/engine_room/bridge_campaign_dag.py. validate_campaign normalizes one campaign JSON object, then emits rule decisions instead of performing side effects. The rule set checks the input envelope (schema_version, kind, kebab-case campaign_id, bounded intent, public-looking plan_path, and bounded continuation packet), then checks node structure: unique labels, valid probe / reducer / synthesis roles, declared dependencies, acyclicity, exactly one synthesis node, barrier binding, and transitive reachability from synthesis back to at least one probe.
The provider boundary is part of the same mechanism. SAFE_PARALLELISM gives a small local ceiling for chatgpt, claude, gemini, and local; rule VR005 rejects a request whose worker count exceeds the selected provider ceiling. Because the validator returns a ValidationResult with decision rows, errors, and warnings, the result record explains why a graph passed or failed without dispatching any provider work.
validate_fixture_dir is the public proof harness. It loads the four fixture campaigns, compares each fixture's declared expected_ok field against the observed validator result, and reports status: pass only if every positive and negative case behaves as declared. The positive case is a three-probe graph that fans into one reducer and one synthesis node. The negative cases are a cycle, a 99-worker provider ceiling violation, and a synthesis path that depends on a reducer with no probe evidence.
This mechanism sits under the bundle edge paper_module.engine_room_bridge_campaign_dag -> mechanism.engine_room_demo.validates_public_engine_room_demo. The doctrine relation is intentionally component-shaped: the structured source record records the existing Engine Room demo mechanism as the subject, the concept.import_projection_and_drift_control_bundle concept, principles P-1, P-2, P-5, P-6, P-9, and P-15, axioms AX-1, AX-4, AX-5, and AX-8, and the dependency on paper_module.engine_room_demo.
Focused regression: PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider --basetemp=/tmp/microcosm_engine_room_bridge_campaign_dag_pytest tests/test_engine_room_bridge_campaign_dag.py -q. Expected proof shape: the six tests cover the valid campaign, cycle rejection, provider ceiling rejection, dangling synthesis rejection, fixture-matrix result record, and CLI JSON result record.
Bundle/corpus parity: PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus. Expected proof shape: the structured source record remains reproducible from the bundle and Markdown projection, with Mermaid available from bundle edges and Atlas honestly blocked behind the component-atlas owner lane.
structured source record readback: jq '{source_authority:.paper_module_payload.source_authority, mermaid:.paper_module_payload.generated_projections.mermaid.status, atlas:.paper_module_payload.generated_projections.atlas_card.status, edge_count:(.relationships.edges|length), unresolved:(.relationships.unpopulated_selective_relations|length)}' paper_modules/engine_room_bridge_campaign_dag.json. Expected proof shape: json_capsule, available_from_capsule_edges, blocked_until_organ_atlas_owner_lane_binds_edges, resolved bundle edges, and zero unpopulated selective relations.
Reader Evidence Routing
valid fixture: the typed graph passed public pre-dispatch checks for probes, reducer fan-in, and synthesis reachability.
cycle failure: the local validator rejects a dependency cycle; this is a graph-shape proof only, not a statement about every private campaign graph.
over-parallel failure: the public fixture enforces the declared provider capacity ceiling; it is not live provider safety, quota authority, or launch clearance.
dangling synthesis failure: synthesis nodes must trace back to probe evidence before dispatch is allowed.
non-proof boundary: no fixture here executes a campaign, dispatches a provider, validates reducer output, proves synthesis correctness, creates a standalone component, or flips the generated Atlas card out of the component-atlas-owner lane.
Prior Art Grounding
This staged component is grounded in workflow-orchestration prior art that models work as a directed acyclic graph with typed nodes, dependency edges, fan-in, and validation before execution. Useful reference points are:
Apache Airflow DAGs, where tasks are grouped into a directed acyclic graph with explicit dependencies.
BPMN and related workflow notation traditions, where process graphs separate control-flow structure from the concrete execution system.
Microcosm borrows the graph validation and pre-dispatch accounting pattern: acyclicity, evidence reachability, and provider-capacity ceilings are checked before a campaign can run. This bundle stops at the contract/preflight layer; it does not dispatch agents or claim provider-safety proof.
Validation Result record Path
The reader-verifiable result record is the focused pytest plus the paper-module corpus parity check:
Passing these commands proves that the public fixture behavior and bundle JSON projection remain reproducible; it does not execute campaigns, validate reducer or synthesis correctness, prove provider safety, include launch operations, or create a standalone Engine Room component.
Positive case: a three-probe campaign fans into one reducer and one synthesis node. Negative cases: a cycle, a 99-worker provider-ceiling violation, and a synthesis path that never reaches a probe.
Scope boundary
Scope limit
This bundle is a contract/preflight validator. It does not dispatch agents, execute campaigns, prove provider safety, include launch operations, or claim equivalence to the private bridge runtime. It is staged under unshared Engine Room paths while the accepted-component registry, atlas, runtime shell, and CLI integration surfaces remain separate authority surfaces. The JSON bundle binds this module as component evidence for mechanism.engine_room_demo.validates_public_engine_room_demo; it does not invent a standalone engine_room_bridge_campaign_dag component or mechanism.
Source and projection details
Source-Open Body Floor
The source-open floor for this module is the runnable Engine Room refactor plus its fixture and test surfaces:
That floor is enough for a reader to replay the public preflight fixtures and inspect the DAG checks. It is not enough to claim private bridge-runtime parity, provider safety, campaign execution authority, accepted-component authority, or launch-scope decision.
Engine Room Reference Knowledge Router binds the staged reference-routing bundle to a concrete mechanism. It scores structured routing metadata, family text, open-first summaries, and curated notes over public fixtures, then rejects domain-mismatch and empty-query cases without cloning repositories, exporting private reference material, or claiming BM25, TF-IDF, embedding, license, launch, or private-system authority.
Scope limit Public sanitized-reference routing fixture and focused regression evidence only; no BM25, TF-IDF, embedding search, repository cloning, license authority, private reference corpus export, launch-scope decision, private-system equivalence, source-file changes, or whole-system correctness.
This staged Engine Room bundle imports the runnable route-selection core of the source reference registry into Microcosm as a refactor.
Purpose
The private system keeps a registry of references: reusable bodies of borrowed technique, each tagged with the domains, clusters, and problem spaces it speaks to. When an agent has a problem in front of it, something has to decide which references are worth opening. The source side does this with route_annexes. This module is a source-faithful copy of that decision, narrowed so it can run in public without carrying the private corpus.
It answers one question: given a problem statement and a sanitized catalogue, which reference rows are relevant, and exactly why. The "why" is the point. Rather than return an opaque relevance number, the router decomposes every row's score into four named buckets, so a reader can see whether a row ranked because its structured routing fields matched, because its description happened to share words, or because a curated note carried the weight.
The design choice worth noting is the tiering. Structured routing metadata that someone deliberately authored scores far higher than free text that merely happens to contain a query word. An exact match on a problem_spaces field is worth 120 points; the same word appearing in a description is worth 6. This encodes a small but useful bias: trust the metadata an author chose over accidental word overlap in prose. It is a deliberately simple weighted-token scorer, not a learned retriever, and the page is careful not to dress it as one.
What It Demonstrates
Structured routing fields score with the highest weights.
Family text, tags, and open-first summaries provide weaker fallback evidence.
Curated notes are sorted by relevance and contribute matched note ids.
Domain and cluster filters prevent unrelated references from ranking.
Every result carries a score decomposition via match_breakdown.
Shape
Diagram source
flowchart TB Problem["Problem statement normalized to tokens"] --> Empty{"Empty after normalization?"} Empty -->|yes| NoMatch["No-match result record status: no_match"] Empty -->|no| Loop["For each reference row in sanitized catalog"] Loop --> Filter{"Domain / cluster filter matches?"} Filter -->|no| Drop["Excluded before scoring"] Filter -->|yes| Score["Four-tier token scorer"] subgraph Tiers["Weighted scoring (exact / phrase / per-token)"] Structured["Structured routing fields 120 / 80 / 18"] Family["Family text: slug, name, description, tags 32 / 24 / 6"] OpenFirst["Open-first summaries 20 / 16 / 4"] Notes["Curated notes, relevance-sorted 18 / 12 / 3"] end Score --> Structured Score --> Family Score --> OpenFirst Score --> Notes Structured --> Sum["Sum tiers into total score + match_breakdown + matched_note_ids"] Family --> Sum OpenFirst --> Sum Notes --> Sum Sum --> Threshold{"total score > 0?"} Threshold -->|no| Drop Threshold -->|yes| Ranked["Ranked reference matches highest score first, with score breakdown"] Drop --> Loop Loop -->|no rows scored| NoMatch
The shape is intentionally a router, not a corpus crawler. It reads a sanitized fixture catalog and a problem statement, applies field-weighted token scoring and optional filters, then emits ranked matches with visible score breakdowns. It does not clone reference repositories, inspect private reference bodies, run BM25/Lucene/embedding search, perform license review, or authorize use of the private reference corpus.
Technical Mechanism
The runtime mechanism is a deterministic ranking pass over a caller-supplied catalog. route_catalog() is the result record boundary: it calls route_annexes(), records the problem text and optional domain/cluster filters, carries source_refs, and returns either a routed row set or an empty no-match row set with status: routed or status: no_match. The row score is intentionally decomposed into four buckets so readers can audit why a fixture ranked without reading private reference bodies.
The scoring path starts by normalizing punctuation, slashes, underscores, and hyphens into lowercase tokens, then drops a small local stopword set. Structured fields from routing_summary dominate the score: problem spaces, capabilities, domains, and clusters use weights of 120 for exact equality, 80 for phrase containment, and 18 per overlapping token. Family text is weaker: slug, display name, description, and tags use 32/24/6 weights. open_first summaries are weaker again at 20/16/4. Curated notes are sorted by bounded relevance, scored at 18/12/3, and contribute only their ids to matched_note_ids. The result record therefore exposes the rank cause as match_breakdown rather than asking the reader to trust an opaque relevance number.
Filters run before scoring. If the caller supplies a domain or cluster, the candidate row must match the normalized field exactly or it is excluded before any text score can rescue it. Empty problem text returns no candidates. Fixture cases bind both sides of that mechanism: provider_backoff_route and note_match exercise positive structured and note-backed ranking, while domain_filter_no_match and empty_problem_no_match exercise the negative filter and empty-query paths. evaluate_fixture_dir() then turns the four JSON cases into a pass/fail result record with case_count, passed_case_count, claim_ceiling, and anti_claims.
Prior Art Grounding
The component borrows the general information-retrieval pattern of scoring a candidate corpus with visible term evidence and returning ranked, inspectable matches. The closest prior-art families are classic TREC-style retrieval evaluation, BM25/Lucene explainable term scoring, and fielded/faceted search interfaces where structured fields carry different weights than body text:
Text REtrieval Conference (TREC) as the long-running benchmark lineage for retrieval tasks, relevance judgments, and scored runs.
Microcosm takes the inspectable scoring and field-weighting inspiration, but this module intentionally remains a weighted-token router over a sanitized reference catalog. It does not claim to implement BM25, TF-IDF, embeddings, or private-corpus search.
Reader Evidence Routing
Runtime route: src/microcosm_core/engine_room/annex_knowledge_router.py is the local source locus for the staged router. It supports replaying the sanitized fixture behavior under the mechanism subject; it does not create an component subject.
Positive fixtures: provider-backoff and curated-note fixtures show that structured routing fields and matched note ids affect ranking. Read a high score as local weighted-token relevance against the sanitized catalog only, not as semantic search, BM25/Lucene equivalence, embedding similarity, or proof about the private reference corpus.
Explanation surface: match_breakdown is the reader-facing account of why a row ranked. Structured fields carry more weight than notes and fallback text, and matched note ids explain which curated notes contributed. Do not treat matched notes as full reference-body disclosure.
Negative fixtures: domain-filtered no-match and empty-problem no-match cases are filter and threshold result records. They prove the public fixture rejects those cases; they do not prove that no real reference exists.
Corpus boundary: the fixture catalog is sanitized and finite. The page may name routes, fixtures, standards, counts, and result record shapes, but it must not expose private source reference bodies, source notes, model-output data, or live workspace state.
missing authority edge now: no accepted component JSON instance currently resolves for engine_room_annex_knowledge_router, so this page must not invent an component subject.
re-entry condition: after component admission or Atlas owner binding lands a broader edge, run scripts/build_doctrine_projection.py --write-paper-module-corpus, and verify the generated instance still reflects bundle authority without broadening the scope limit.
scope limit: this page can explain the staged public exercise and source loci; it cannot claim component admission, launch-scope decision, private-system equivalence, private-corpus search, or aggregate doctrine-lattice coverage.
Validation Result record Path
The reader-verifiable result record is the focused pytest plus the paper-module corpus parity check:
Passing these commands proves only that the public fixture behavior and JSON bundle projection remain reproducible; it does not unblock the Atlas owner lane, prove private-corpus search, or include launch operations.
Scope boundary
Scope limit
This is explainable tiered weighted-token retrieval over a sanitized reference catalog. It is not BM25, not TF-IDF, not embedding search, not repository cloning, not third-party license review, and not authority over the private reference corpus. Its JSON bundle authority is narrow: the bundle binds one staged mechanism subject, resolved source loci, fixtures, standard, tests, and result record surfaces. It does not admit an component, unblock the Atlas owner lane, or authorize public launch.
Limitations
The scoring model is a simple weighted token overlap model. It has no inverse document frequency, term saturation, learned embedding space, semantic reranker, or benchmarked retrieval metric.
The catalog is finite and sanitized. A no-match result record only proves the public fixture did not route under the supplied filters; it does not establish that no useful private reference, public repository, or future corpus row exists.
matched_note_ids expose which curated notes contributed, but they are ids and summaries only. They do not disclose or validate private reference bodies.
Domain and cluster filters are exact normalized filters. They prevent obvious cross-domain ranking in the fixture, but they do not solve ontology drift, synonym expansion, or ambiguous route taxonomy.
The validator proves the public fixture matrix and CLI result record shape. It does not establish private source-corpus coverage, third-party license safety, accepted component admission, Atlas owner binding, launch-scope decision, or whole-system correctness.
Source and projection details
Source-Open Body Floor
The source-open floor for this module is the runnable Engine Room refactor plus its fixture and test surfaces:
That floor is enough for a reader to replay the public routing fixtures and inspect how scores and filters are computed. It is not enough to claim private reference-corpus search, third-party license review, component admission, Atlas launch-scope decision, or launch-scope decision.
Governing Lattice Relation
The governing lattice role is a staged mechanism, not an accepted component. mechanism.engine_room_annex_knowledge_router.validates_public_annex_knowledge_router grounds the module in src/microcosm_core/engine_room/annex_knowledge_router.py, runs in the engine_room_demo host context, and connects to concept.architecture_and_navigation_route_contract_bundle plus concept.import_projection_and_drift_control_bundle. That relation says the module is evidence for route-selection behavior inside the Engine Room import and projection-control bundle; it does not upgrade the sanitized router into general reference search or launch-scope decision.
The standard row std_microcosm_engine_room_annex_knowledge_router supplies the hard public/private boundary: required positives are provider_backoff_route and note_match, required negatives are domain_filter_no_match and empty_problem_no_match, and the scope limit sets BM25, TF-IDF, embedding search, repository cloning, license authority, private-corpus authority, and launch-scope decision to false. Those edges are navigation and evidence-routing edges, source-linked only; the JSON source record and mechanism/standard rows remain the governing records.
Engine Room Derived Fact Provider Engine binds the staged derived-fact bundle to a concrete mechanism. It resolves JSON-pointer, glob-count, and git-backed callable fact rows over public fixture roots, records provider errors as repairable data, and keeps derived-fact availability below truth-audit, semantic-claim-validation, full source registry, launch, and private-system authority.
Scope limit Public fixture-root fact-provider evidence only; no doctrine truth audit, no semantic claim validation, no full source fact registry export, no launch-scope decision, no private-system equivalence, no source-file changes, and no whole-system correctness.
This staged Engine Room bundle imports the provider side of the source derived fact hologram into Microcosm as a runnable refactor.
Purpose
A system that makes claims about itself needs a disciplined way to fetch the numbers behind those claims. How many files match a pattern, how many entries a registry holds, how many files git is tracking: these are facts about the current state of the repository, and a document that hard-codes them goes stale the moment the repository changes. This component answers a narrow question: for each authored fact row, what value does it resolve to against a supplied root, right now?
A fact row declares how it resolves rather than what it equals. A json_pointer row names a file and a pointer into it. A glob_count row names a pattern and counts matching files. A callable row names one of a small fixed set of git-backed computations, such as the count of tracked Python files. The engine reads each declaration and produces the value, so the value is always derived from live state rather than copied by hand.
The design choice worth noting is how failures are handled. When a row cannot resolve, for example because its source file is missing, the engine does not raise and abandon the whole evaluation. It records the failure as an ordinary row carrying the error class, a human-readable message, and a suggested required_next_action such as restoring the named source path. One broken fact degrades to a single repairable row; the rest of the registry still resolves, and the ledger reports degraded rather than failing outright. This is the difference between a fact ledger that tells you which fact broke and one that gives you a stack trace.
The boundary is provider resolution, not truth adjudication. A clean result record means the declared rows resolved against the supplied root and the result record carried the expected lineage and accounting fields. It does not mean the underlying claims are true, that every source fact family is covered, or that anything is ready for launch.
Shape
The module is a staged provider engine over a public fixture registry. It takes authored fact rows, resolves each declared provider type against a supplied root, and emits result record slices for ledger, audit, and navigation-cache consumers. The public body is deliberately small enough for readers to replay. Each row resolves through exactly one provider, and a row that fails to resolve becomes an error row rather than aborting the run, so the registry status is ok only when no row errored and degraded when any row did.
Diagram source
flowchart TD Registry["public fixture registry authored fact rows"] Resolver["evaluate_provider selects provider_type branch"] JsonPointer["json_pointer read value at pointer in a JSON file"] GlobCount["glob_count count matching files, keep sample matches"] Callable["callable git-backed repo-state count"] Resolved["resolved fact row value + value_repr"] ErrorRow["error-as-data row error_class, message, required_next_action"] Registry2["evaluate_registry aggregate rows, count statuses"] Status{"any row errored?"} Ok["status: ok"] Degraded["status: degraded"] Result record["public provider result record ledger + audit findings + navigation cache + sha256"] Ceiling["scope limit provider resolution only; no truth audit, registry completeness, semantic validation, or launch-scope decision"] Registry --> Resolver Resolver --> JsonPointer Resolver --> GlobCount Resolver --> Callable JsonPointer --> Resolved GlobCount --> Resolved Callable --> Resolved JsonPointer -. on failure .-> ErrorRow GlobCount -. on failure .-> ErrorRow Callable -. on failure .-> ErrorRow Resolved --> Registry2 ErrorRow --> Registry2 Registry2 --> Status Status -- no --> Ok Status -- yes --> Degraded Ok --> Result record Degraded --> Result record Result record --> Ceiling
What It Demonstrates
Authored fact registry rows resolve through JSON-pointer providers.
Glob-count providers count matching public fixture files and preserve sample matches for auditability.
Callable providers can shell through git ls-files to bind facts to tracked repo state instead of prose memory.
Provider failures become error-as-data rows with repair hints rather than crashing the whole ledger.
The output shape includes ledger, audit, and navigation-cache slices.
Reader Evidence Routing
fixture CLI: inspect provider behavior over public fixture roots.
paper-module coverage contract: verify that this slug explains its JSON bundle binding with an exact source ref and generated projection boundary.
doctrine projection check: corpus/parity evidence only; it is not semantic fact-audit authority, or proof that source facts are true.
non-proof boundary: passing checks show the staged provider exercise is replayable and bounded by its scope limit. Any later mechanism/component admission must land through its own lane; this paper module currently names a staged mechanism subject, not an accepted component subject.
Prior Art Grounding
The component is grounded in database and data-platform patterns where derived facts are produced from declared sources, materialized for fast reads, and carried with lineage or freshness metadata:
OpenLineage as an open lineage model for recording jobs, datasets, and run metadata across data systems.
Microcosm borrows the registered-provider and lineage-accounting pattern: fact rows declare how they resolve, provider errors become data with repair hints, and output is shaped for ledger, audit, and navigation-cache consumers. This does not make the module a doctrine truth auditor or a semantic claim verifier.
Passing these commands proves only that the public fixture behavior and JSON bundle projection remain reproducible; it does not admit an component, unblock the Atlas owner lane, or include launch operations.
Scope boundary
Scope limit
This is the fact-provider/resolver engine over public fixture roots. It is not a doctrine truth auditor, not a full export of the source fact registry, not semantic claim validation, and not launch-scope decision. It is also not JSON bundle authority by itself for paper_module.engine_room_derived_fact_provider_engine; that authority lives in core/paper_module_capsules.json.
Source and projection details
Source-Open Body Floor
Readers should be able to inspect the public body through these local surfaces:
src/microcosm_core/engine_room/derived_fact_provider_engine.py defines the provider resolver, callable map, error-as-data rows, result record digests, and CLI.
tests/test_engine_room_derived_fact_provider_engine.py checks JSON pointer escaping, glob-count samples, git-backed callables, provider failure rows, fixture replay, and the module CLI result record.
fixtures/first_wave/engine_room_derived_fact_provider_engine/input carries the replayable public registry cases.
core/fixture_manifests/engine_room_derived_fact_provider_engine.fixture_manifest.json binds the fixture set as an inspectable artifact.
standards/std_microcosm_engine_room_derived_fact_provider_engine.json names the source-to-target relation, required cases, validator command, and scope limit.
The source source refs in the standard are lineage anchors for the public refactor.
Engine Room Egress Self-Compliance GatePublic Engine Room component: phrase-membership egress gate for permission ceremony, self-error capture binding, and command-displacement evidence.
Engine Room Egress Self-Compliance Gate binds the staged egress bundle to a concrete mechanism. It detects permission ceremony without a real blocker, self-error statements without durable capture, and command handoff language without execution evidence over public fixtures, while accepting bounded blocker or result record language and refusing taint-analysis, prompt-injection-defense, sandbox, information-flow, launch, and private-system claims.
Scope limit Public phrase-policy fixture evidence only; no taint analysis, prompt-injection defense, sandboxing, information-flow proof, launch-scope decision, private-system equivalence, source-file changes, or whole-system correctness.
engine_room_egress_self_compliance_gate carries a refactor of the source egress compliance checks. It scans agent-output text for three failure classes:
permission ceremony without a named blocker
self-error language without a durable work log/capture binding
handing a safe command to the operator instead of reporting that it ran
The single question this gate answers is narrow: does a line of agent output ask the operator to do something the agent should have done itself, or excuse a mistake without recording it? It exists because the failures it looks for are the ones that read as good manners. Asking for permission, apologising for an error, and offering the operator a command to run all look polite in isolation. Each is also the exact shape of an agent quietly displacing work back onto the human or letting a self-detected mistake evaporate into prose.
The design choice worth noting is that the gate treats each of those polite phrases as a tripwire that is a violation by default, and then looks in the same text for one specific legitimising signal. Permission ceremony is allowed only if the text also names a real blocker, such as a destructive or irreversible action, a secret, a public sharing boundary, or a concurrent-owner conflict. Self-error language is allowed only if it binds to a durable capture, such as a capture id or a work log reference. A handed-over command is allowed only if the text also reports that the command was run. The polite phrase is innocent only when accompanied by the evidence that makes it honest.
This is deliberately phrase membership over the output text, not analysis of what the agent actually did. The gate cannot tell whether a named blocker is real or whether a capture id resolves; it only checks that the legitimising language is present. That keeps the check small, fast, and inspectable, and it is why the page is careful to say what the gate is not: it is not taint analysis, not prompt-injection defence, not a sandbox, and not an information-flow proof. It encodes one operating contract as an output filter and stops there.
Shape
The module is a staged Engine Room egress gate, not a general compliance system. Its public body is the small runtime in src/microcosm_core/engine_room/egress_self_compliance_gate.py, the red/green fixture matrix under fixtures/first_wave/engine_room_egress_self_compliance_gate/input, and the focused test file that asserts the three declared detector classes.
The gate produces inspection result records over agent-output text. A red result record means the output matched one of the declared failure classes without the required repair binding; a green result record means the narrow phrase-membership policy did not detect that failure in the supplied text. Neither result proves semantic compliance, privacy safety, sandbox isolation, or launch fitness.
Diagram source
flowchart TB Text["Agent output text (lowercased)"] --> D1 Text --> D2 Text --> D3 subgraph Permission["detect_permission_gate_without_blocker"] D1{"Permission ceremony phrase?"} D1 -->|no| Skip1["no row"] D1 -->|yes| B1{"Names a real blocker?"} B1 -->|yes| OK1["informational: blocker named"] B1 -->|no| V1["violation: ceremony without blocker"] end subgraph SelfError["detect_self_error_without_capture"] D2{"Self-error phrase?"} D2 -->|no| Skip2["no row"] D2 -->|yes| B2{"Binds to durable capture?"} B2 -->|yes| OK2["informational: capture bound"] B2 -->|no| V2["violation: error without capture"] end subgraph Command["detect_command_displacement_to_operator"] D3{"Command handed to operator?"} D3 -->|no| Skip3["no row"] D3 -->|yes| B3{"Reports it was run?"} B3 -->|yes| OK3["informational: result record present"] B3 -->|no| V3["violation: command displaced"] end V1 --> Result record["evaluate_text result record red if any violation, else green"] V2 --> Result record V3 --> Result record OK1 --> Result record OK2 --> Result record OK3 --> Result record Fixtures["Public fixture JSON cases"] --> Runner["evaluate_fixture_dir compare status to expected"] Runner --> Result record
Technical Mechanism
The runtime mechanism is intentionally small. evaluate_text lowercases the candidate agent-output text, applies three detector functions, and emits a metadata-only JSON result record with source refs, scope boundaries, a red/green status, and per-detector rows. A detector row appears only when its tripwire phrase family matches; the row becomes a violation when the matching text lacks the required legitimizer phrase family.
The three detector families correspond exactly to the standard's required negative cases. detect_permission_gate_without_blocker looks for permission ceremony phrases and accepts them only when the same text names a blocker such as destructive scope, secrets, a public sharing boundary, a remote push, a concurrent-owner conflict, or validation failure. detect_self_error_without_capture looks for self-error phrases and accepts them only when the same text binds the mistake to a durable capture surface such as a CAP, Work item, work log row, or quick-capture reference. detect_command_displacement_to_operator looks for safe-command handoff phrases and accepts them only when the same text records that the agent ran the command or reports an execution result record.
evaluate_fixture_dir is the proof-consumer harness over this mechanism. It loads the public JSON fixture cases, runs evaluate_text for each case, compares the observed status with expected_status, and reports aggregate case_count, passed_case_count, and status. The focused pytest file pins one red and one green path for each detector family and verifies that the CLI returns a JSON result record with organ_id: engine_room_egress_self_compliance_gate and status: pass for the fixture matrix.
Reader Evidence Routing
fixture CLI: inspect phrase-membership detector behavior over public fixture roots.
focused pytest: inspect the detector matrix and CLI result record contract.
paper-module coverage contract: verify that this slug explains its JSON bundle binding with an exact source ref and generated projection boundary.
doctrine projection check: corpus/parity evidence only; it is not accepted component admission.
non-proof boundary: passing result records show the staged fixture exercise is replayable and that the scope limit stayed intact. They do not prove semantic compliance, taint analysis, sandboxing, prompt-injection defense, information-flow control, launch-scope decision, JSON bundle authority, or accepted component admission.
Prior Art Grounding
The component borrows from policy-as-code and output-gate traditions: make a policy machine-readable, evaluate an artifact before it leaves a boundary, and return a specific failure class instead of relying on prose judgment alone. Relevant anchors include:
Open Policy Agent, a general policy engine that externalizes policy decisions from application code.
NIST SP 800-53 Rev. 5, especially the broader audit, accountability, and information-output control tradition.
Microcosm narrows that pattern to explicit phrase-membership checks over agent-output text. The gate is intentionally small: it catches declared egress-failure classes and binds them to durable repair expectations; it does not perform taint analysis, sandbox enforcement, prompt-injection defense, or general information-flow control.
Validation Result record Path
The reader-verifiable result record is the focused pytest plus the paper-module corpus parity check:
Passing these commands proves only that the public fixture behavior and JSON bundle projection remain reproducible; it does not admit an component, unblock the Atlas owner lane, or include launch operations.
The fixture matrix includes red and green cases for each detector. The source refs are system/lib/egress_compliance.py and the Stop-hook wiring anchor .claude/hooks/runtime_hook.py.
Scope limit: this is explicit phrase-membership policy, not taint analysis, prompt-injection defense, sandboxing, or information-flow control. It excludes launch or claim private-system equivalence.
Scope boundary
Scope limit
This module may claim that Microcosm has a staged public exercise for checking three declared Engine Room egress-output failure classes against replayable fixtures. The valid claim is bounded to phrase-membership policy over supplied agent-output text, public fixture result records, the focused pytest matrix, and the JSON bundle binding coverage contract.
The module must not claim accepted component resolution, launch-scope decision, private-system equivalence, semantic compliance, taint analysis, sandbox enforcement, prompt-injection defense, information-flow control, provider authority, source-file changes, or aggregate doctrine-lattice coverage. The current JSON bundle authority is mechanism-level and projection-bound.
Limitations
The mechanism is a phrase-membership gate. It can miss a real egress failure when the output avoids the configured tripwire phrases, and it can flag benign text when a tripwire phrase appears in a different context. The implementation does not parse intent, analyze data flow, prove sandbox isolation, inspect model-output data, or reason over hidden workspace state.
The fixture matrix is deliberately narrow. Passing fixtures show that the three declared detector families and their red/green examples still execute through the public CLI and focused tests; they do not show that every future agent output is safe, that the source hook behavior is equivalent, or that the public refactor covers all egress compliance policy.
The authority boundary is also narrow. The standard and JSON bundle make this a staged public bundle with mechanism-level authority. The module cannot promote itself into an accepted component, activate shared registry integration, authorize generated projection edits, or claim launch-scope decision. Any wider claim requires a separate owner lane with validator, result record, and registry evidence.
Source and projection details
Source-Open Body Floor
Readers should be able to inspect the public body without private-system access:
src/microcosm_core/engine_room/egress_self_compliance_gate.py defines the detector phrases, scope limit, fixture evaluation, and JSON CLI.
tests/test_engine_room_egress_self_compliance_gate.py exercises each red and green detector case and checks the module CLI result record.
fixtures/first_wave/engine_room_egress_self_compliance_gate/input carries the replayable fixture corpus.
core/fixture_manifests/engine_room_egress_self_compliance_gate.fixture_manifest.json binds the fixture set as an inspectable public artifact.
standards/std_microcosm_engine_room_egress_self_compliance_gate.json names the scope limit and the source-to-target relation.
The source refs in the standard are lineage anchors for the public refactor.
Governing Lattice Relation
This module sits in the Engine Room lattice as a staged egress-output gate. It is downstream of the source egress-compliance source refs and upstream of engine_room_demo, which treats it as one public bundle in the composed Engine Room demo. That dependency relation is evidence routing only: the demo can consume the bundle's fixture result record, but this module still remains mechanism-level unless a separate accepted-component lane promotes it.
The governing standard makes the authority boundary explicit: std_microcosm_engine_room_egress_self_compliance_gate declares the source refs, public target refs, required negative cases, validator command, and scope boundary. The JSON bundle binding supplies the source authority for this Markdown projection, while the paper-module coverage contract verifies that the Markdown reader surface names the bundle source ref and projection boundary. Following the repository axiom that JSON is contract and Markdown is projection, the Markdown can explain the mechanism but cannot widen the standard, mutate the bundle, promote the subject, or authorize public launch.
Engine Room Lean Proof Search Lab binds the staged proof-search bundle to a concrete mechanism. It runs tiny public Lean statements through symbolic tactic search, statement-only candidate scoring, problem-id ablation, forward oracle-body rejection, and axiom cleanliness checks, while keeping private source run state, oracle proof bodies, neural theorem proving, frontier-scale automation, launch, and private-system authority out of scope.
Scope limit Public tiny-fixture Lean proof-search evidence only; no neural theorem proving, frontier-scale automation, private source run export, oracle-body forward solving, launch-scope decision, private-system equivalence, source-file changes, or whole-system formal-result correctness.
This staged Engine Room bundle imports the source prover-lab contour into a runnable Lean fixture lab.
Purpose
The hard problem with any proof-search tool is not finding a proof. It is trusting that a reported success was earned rather than leaked or memorised. A search loop can quietly copy the answer out of an oracle field, learn to map a problem id to a stored tactic, or compile a file that secretly leans on sorry. Each of those produces a green result that means nothing. This lab exists to answer one question: when a tiny public theorem is reported solved, did the search actually close it, and does the installed Lean kernel agree on clean axioms?
The approach is to keep candidate generation cheap and untrusted, and to move all authority to the kernel. Candidate tactic bodies are generated from the shape of the statement alone (an Or p q -> Or q p goal draws an Or.inl / Or.inr case split, an equality draws rfl, and so on), then each candidate is written to a temporary .lean file and checked by a real lean subprocess. A result counts only when the process exits zero and a #print axioms audit reports the theorem depends on no axioms, with no sorry in the body. Generation proposes; Lean decides.
What is unusual is how much of the lab is built to refuse false credit rather than to score success. Three guards run alongside the search. A forward firewall walks every input row and rejects it outright if it carries a candidate_body, oracle_body, repair_body, oracle_needed_premise_ids, or provider_text field, so the answer can never be smuggled in as a hint. A problem-id ablation renames each theorem and its id, then checks that the policy picks the same action and reaches the same outcome, which catches a policy that has secretly learned the id instead of the goal. The axiom gate rejects sorry-tainted candidates even when Lean compiles the file. The lab passes only when every problem is genuinely closed, the firewall is clean, the ablation is stable, and no axiom taint is found.
Shape
Diagram source
flowchart LR A["Public theorem statement"] --> B["Forward-field firewall"] B --> C["Bounded symbolic tactic search"] C --> D["Lean subprocess check"] D --> E["Axiom cleanliness audit"] C --> F["Statement-only hammer table"] F --> G["Problem-id ablation"] E --> H["Fixture result record"] G --> H B --> I["Reject oracle or provider body leak"] E --> J["Reject sorry-tainted proof"]
The shape is a small public proof-search lab, not a prover product. It reads tiny Lean theorem statements, rejects forward oracle/provider fields, tries bounded symbolic tactic bodies, checks candidates with the installed Lean kernel, records statement-only action scores, runs a problem-id ablation, and rejects sorry-tainted outputs through a #print axioms gate.
What It Demonstrates
Tiny public theorem statements are solved by bounded symbolic candidate search and checked with the installed lean executable.
Statement-only hammer rows compile tactic candidates without crediting adapter candidates or oracle repair bodies.
The forward manifest rejects candidate_body, oracle, repair, and provider text fields before any solver result can count.
A problem-id ablation renames ids and theorem names, then verifies the blind policy keeps the same action signature and success behavior.
A #print axioms gate rejects sorry-tainted candidates even when Lean returns success for the file.
Prior Art Grounding
This component is grounded in the interactive-theorem-proving pattern where proof automation proposes small tactic scripts and the kernel remains the authority. Theorem Proving in Lean 4 is the direct precedent for tactic-structured proof construction, while the Lean/mathlib ecosystem shows why small checked theorem statements, reusable libraries, and tactic automation are treated as inspectable proof artifacts rather than prose claims.
The statement-only candidate search is also adjacent to "hammer" workflows such as Isabelle Sledgehammer: external or bounded search can suggest proof steps, but the trusted proof assistant must replay or check the result. Microcosm keeps the same separation at toy scale: candidate generation is public fixture behavior; Lean execution and the #print axioms audit decide what may count.
positive symbolic fixture: two tiny public Lean statements were solved by bounded symbolic tactic search and checked by Lean.
oracle-field failures: firewall evidence only. Forward candidate_body, oracle_body, provider_text, repair-body, and model-output data fields cannot enter the public solver path.
memorized-policy failure: an ablation guard. It shows problem-id conditioning is rejected when renaming changes the action signature, not that all memorization risks are impossible.
sorry fixture: an axiom-cleanliness gate. It proves the fixture rejects sorry taint even when Lean can compile a file, not that every future Lean import is globally axiom-free.
non-proof boundary: these result records do not prove frontier theorem proving, library-scale automation, online-RL search, private prover-run parity, launch-scope decision, accepted component admission, or Atlas launch-scope decision.
Validation Result record Path
The reader-verifiable result record is the focused pytest plus the paper-module corpus parity check:
Passing these commands proves only that the public fixture behavior and JSON bundle projection remain reproducible; it does not admit an component, unblock the Atlas owner lane, or include launch operations.
Scope boundary
Scope limit
This is a bounded symbolic prover lab over tiny public fixtures. It is not a neural theorem prover, not frontier-scale math automation, not online-RL bandit search, and not an export of private source prover run state. Easy goals are handled by deterministic tactic templates and Lean itself.
Source and projection details
Source-Open Body Floor
The source-open floor for this module is the staged Engine Room lab plus its fixture and test surfaces:
That floor lets a reader replay the public fixture matrix and inspect the firewall, symbolic candidate search, Lean check, ablation, and axiom audit. It does not expose private source prover run state, oracle repair bodies, model-output data, online-RL traces, or frontier-scale theorem-proving claims.
Engine Room Navigation Fitness BenchmarkPublic Engine Room component: route-packet benchmark evaluator for stable-id recall, precision, forbidden first routes, latency, and debt candidates.
Engine Room Navigation Fitness Benchmark binds the staged navigation-fitness bundle to a concrete mechanism. It evaluates public route-packet fixtures for expected stable-id recall, precision, forbidden first routes, latency budgets, scent terms, and debt candidates without running the private kernel, validating embeddings, claiming universal benchmark authority, or upgrading launch/private-system status.
Scope limit Public route-packet fixture evidence only; no live private kernel run, no embedding benchmark, no universal navigation benchmark, no launch-scope decision, no private-system equivalence, no source-file changes, and no whole-system correctness.
This staged Engine Room bundle imports the metric core of the source navigation-fitness harness into Microcosm as a runnable refactor.
Purpose
When an agent is dropped cold into a large repository, the failure that costs most is not a wrong answer. It is reaching for the wrong first command and landing on the wrong rows. This benchmark exists to make that failure measurable. It answers one question: given a cold task, did the route surface point at the stable ids the task actually needed, or did it send the agent somewhere plausible but wrong?
The unusual choice is that "correct" and "fast" are scored on separate axes. A route packet can name every expected stable id and still be recorded as latency debt because it ran over budget. A packet that comes back quickly can still fail for a missing id, weak information scent, a forbidden first route, a timeout, or an outright error. Most retrieval scores collapse those into one number; this evaluator keeps them apart so that a slow-but-correct route and a fast-but-wrong route are never confused, and each lands in its own repair queue.
The benchmark is deliberately evaluative rather than generative. It reads fixtures and pre-captured route packets, scores them, and emits result records. It does not call the live private kernel, validate embeddings, or claim to measure navigation quality on tasks it has never seen.
What It Demonstrates
Cold-task fixtures name expected stable ids and forbidden first routes.
Route packets are scored for recall, precision, forbidden-route hits, and scent terms.
Latency budgets are tracked separately from sufficiency, so a packet can be correct but still produce latency debt.
Benchmark summaries include p50/p95 wall time, route-type metrics, and debt candidates.
Shape
Diagram source
flowchart LR A["Cold navigation fixture expected ids, forbidden routes, latency budget, scent terms"] --> B["Route packet under test"] B --> S["Sufficiency axis"] B --> L["Latency axis wall time vs budget"] subgraph SufficiencyLadder["Sufficiency verdict (first failing check wins)"] S --> T["Timeout or error?"] T --> M["Missing expected id?"] M --> N["Weak scent term?"] N --> R["Forbidden first route?"] R --> P["Pass"] end S --> Rec["Per-case result record recall, precision, status, failure kind"] L --> Rec Rec --> Sum["Suite summary pass/fail counts, p50/p95 wall, route-type metrics"] Sum --> Debt["Debt candidates sufficiency debt + latency debt"]
The shape is intentionally evaluative, not generative. The benchmark reads public fixture tasks and captured route packets, scores stable-id coverage, forbidden first routes, scent terms, and latency, then emits a per-case and suite-level result record. It does not run the private source kernel, inspect browser or provider state, grade answer quality, mutate route registries, or authorize public sharing.
Technical Mechanism
The evaluator starts with a typed task record, not a free-form benchmark prompt. NavigationFitnessTask fixes the task id, family, prompt, route type, expected stable artifacts, forbidden first routes, latency budget, route role, and scent terms. task_from_mapping converts each fixture row into that record with conservative defaults, so missing fixture fields degrade to explicit public-fixture defaults rather than hidden private context.
evaluate_task is the core predicate. It extracts selected artifacts from the route packet through _packet_artifacts, including both flat selected_artifacts rows and structured selected_rows entries. Expected artifacts may use exact ids or prefix wildcards; _match_expected records found and missing ids, then recall and precision are computed over the bounded packet. The same predicate checks forbidden first routes against first_contact_command, searches packet text for required scent terms, and keeps latency status separate from sufficiency status. This separation is the central mechanism: a packet can route to the right stable ids and still carry latency debt, while a fast packet can still fail for missing ids, weak scent, forbidden first-route use, timeout, or route error.
evaluate_benchmark lifts the per-task predicate to a suite result record. It aggregates pass/fail counts, p50/p95 wall-clock observations, route-type metrics, and debt candidates. _debt_candidates deliberately emits only two classes: sufficiency debt, which points to the missing id, weak scent, error, timeout, or forbidden-route cause; and latency debt, which records wall time, budget, and latency status without marking the route semantically wrong.
evaluate_case and evaluate_fixture_dir are the proof-consumer bridge. A fixture file supplies an expected suite status, summary counts, and selected per-task status expectations. The harness reruns the evaluator, compares the observed result record to those expectations, and reports expectation_met for each case plus aggregate case_count, passed_case_count, and status. The CLI evaluate-fixtures --json exposes that result record without writing durable projection outputs.
Prior Art Grounding
The component borrows from information-retrieval evaluation and information-scent research: define expected targets, score returned routes against relevance criteria, penalize forbidden first moves, and keep latency separate from answer quality. Useful anchors include:
TREC as the benchmark tradition for retrieval runs, relevance judgments, precision, recall, and task-specific evaluation.
Pirolli and Card's information-foraging/information-scent work, represented by the 1999 Psychological Review article Information Foraging.
Microcosm applies those ideas to agent navigation packets rather than document search. It measures whether the route surface points at the expected stable ids, whether forbidden routes appear, whether useful scent terms are present, and whether latency budgets are respected. It is not a universal benchmark or a live-kernel proof.
Reader Evidence Routing
sufficiency_status: pass: the supplied route packet met this fixture's stable-id, forbidden-route, and scent requirements.
latency_status: fail: latency debt only. The benchmark keeps latency separate from sufficiency so a route can be correct but still too slow for the configured budget.
debt_candidate_count: a triage queue for route-surface improvement, not a routing-registry mutation or route deprecation command.
non-proof boundary: these result records do not prove live private kernel.py behavior, unseen-task navigation quality, embedding benchmark performance, launch-scope decision, accepted component admission, or Atlas launch-scope decision.
Named Proof Consumers
The narrow proof consumer is tests/test_engine_room_navigation_fitness_benchmark.py. It checks recall and precision over expected artifacts, forbidden first-route detection, latency debt as an axis separate from sufficiency, suite-level debt candidate counts, fixture matrix parity, and CLI JSON emission for the public fixture root.
The public fixture matrix carries one positive case and three boundary cases:
heldout_paraphrase_pass verifies two nonliteral cold tasks route to the expected stable ids, avoid banned first routes, satisfy scent terms, and stay under latency budgets.
adversarial_forbidden_route verifies that finding the right stable id still fails when the first command uses a forbidden bespoke route.
missing_stable_id_negative verifies that selecting a nearby route row does not satisfy the expected stable-id requirement.
latency_debt_negative verifies that a sufficient route packet can still produce latency debt without being reclassified as a semantic route failure.
Together those consumers prove the evaluator's accounting contract, not live navigation quality. They do not exercise private kernel.py, embeddings, browser state, provider state, generated projection repair, or launch-scope decision.
Passing these commands proves only that the public fixture behavior and JSON bundle projection remain reproducible; it does not admit an component, unblock the Atlas owner lane, or include launch operations.
Scope boundary
Scope limit
This is a curated route-packet benchmark evaluator over public fixtures. It is not a live private kernel.py run, not an embedding benchmark, not a universal navigation benchmark, and not launch-scope decision. Live-kernel claims require route packets captured from the real route runner.
Source and projection details
Source-Open Body Floor
The source-open floor for this module is the runnable Engine Room refactor plus its fixture and test surfaces:
That floor is enough for a reader to inspect the benchmark logic and replay the public fixtures. It is not enough to claim private-system parity, live route-runner coverage, accepted component admission, or launch-scope decision.
Cold Clone ProbeThe cold-clone probe validates the first public source-root bootstrap path: src import, secret-exclusion scan, first-wave pattern-binding fixture replay, public result record refs, and ignored local result record emission.
Cold Clone Probe is the public source-root bootstrap membrane. It binds bootstrap.sh, src/microcosm_core/cold_clone_probe.py, the first-wave pattern-binding fixture, the secret-exclusion scan, public relative result record refs, and focused tests so a fresh checkout has one bounded proof of first-run mechanics before install, CI, hosted launch, or full component inventory review.
Scope limit Public source-root bootstrap mechanics and metadata-only result record refs only; no launch-scope decision, hosted-product readiness, external model access, source-file changes, private-system equivalence, publishing-scope decision, or whole-system correctness.
cold_clone_probe is the source-root probe for a fresh public checkout. It answers one first-contact question: can this clone run the bounded bootstrap contract and write local ignored evidence before the reader installs the console command or opens the long component inventory?
Purpose
The probe exists to keep the public entry path concrete. A cold reader should be able to start at the repository root, run one script, and see three facts:
the package imports from src/ in the checkout;
the first-wave pattern-binding fixture can validate and mirror its public result record set;
the secret-exclusion scan stays in the result record boundary without exposing private bodies.
That is a bootstrap proof, not a launch proof. It is intentionally before make install, make smoke, make ci, or a standalone export review.
Prior Art Grounding
reproducible-builds.org, which frames reproducibility around recreating outputs from declared sources, instructions, and environment constraints.
The Twelve-Factor App, especially the dependency-declaration principle that avoids hidden reliance on ambient system packages.
GitHub Actions, as a common public workflow surface for clean-checkout build and smoke-test automation.
Microcosm borrows the clean-checkout, declared-dependency, and CI-smoke shape, but keeps this module to source-root bootstrap mechanics. It does not certify launch operations, hosted deployment, public sharing, external model access, secret export, Lean/Lake execution, or whole-system correctness.
Shape
Source refs
set PYTHONPATH=src, pick python
bootstrap.sh
MISSING_FIXTURE_INPUT
blocked_dependency_missing
COMMAND_UNAVAILABLE
blocked_command_unavailable
SECRET_EXCLUSION_SCAN_BLOCKED
blocked_secret_exclusion
MISSING_PATTERN_BINDING_RECEIPT
blocked_dependency_missing
Diagram source
flowchart TD A[Fresh public checkout] --> B["bootstrap.sh set PYTHONPATH=src, pick python"] B --> C["run_probe(root, suite, emit_ref)"] C --> D{Suite supported?} D -- no --> X1["blocked_invalid_input UNKNOWN_COLD_CLONE_SUITE"] D -- yes --> E{REQUIRED_INPUTS present?} E -- no --> X2["blocked_dependency_missing MISSING_FIXTURE_INPUT"] E -- yes --> F[Secret-exclusion scan] F -- scan unavailable --> X3["blocked_command_unavailable COMMAND_UNAVAILABLE"] F -- scan fails --> X4["blocked_secret_exclusion SECRET_EXCLUSION_SCAN_BLOCKED"] F -- scan passes --> G["Validate first-wave pattern-binding fixture"] G --> H["Mirror missing PATTERN_RECEIPTS into canonical slots"] H --> I{All five result records present?} I -- no --> X5["blocked_dependency_missing MISSING_PATTERN_BINDING_RECEIPT"] I -- yes --> P["status=pass emit ref + five result record refs, metadata-only scan summary"] P --> R[README map and component map]
The diagram is an audience aid only. Generated lattice Mermaid remains a builder projection over the bundle edge, not a hand-authored source claim.
Technical Mechanism
The mechanism has two layers: a shell membrane at bootstrap.sh and a Python result record predicate at src/microcosm_core/cold_clone_probe.py.
bootstrap.sh fixes the reader's starting position before any Python logic runs. It changes into the repository root, validates the requested suite, adds src to PYTHONPATH, chooses MICROCOSM_PYTHON, PYTHON, python3, or python, and then calls microcosm_core.cold_clone_probe with --suite and --emit. Its --dry-run branch prints the same command and the ignored result record target without writing evidence, so a reader can inspect the bootstrap boundary before running it.
run_probe() is the actual proof predicate. It creates a metadata-only base result record, rejects unknown suites before touching fixture or scanner state, checks all REQUIRED_INPUTS, runs validate_secret_exclusion_scan() before pattern-binding replay, then validates the first-wave pattern-binding fixture into .microcosm/cold_clone_probe/pattern_binding_contract. The mirroring step is intentionally narrow: _mirror_missing_pattern_receipts() only copies the declared PATTERN_RECEIPTS, treats unreadable destinations and sources as missing evidence through _path_exists() and _path_is_file(), and strict-reads the validation result record before adding the expected public result record refs.
The success state is correspondingly small. A passing result record records the emit ref first, followed by the five pattern-binding result record refs, and includes secret-exclusion status without copying private bodies. Non-pass states are typed as blocked_invalid_input, blocked_dependency_missing, blocked_secret_exclusion, or blocked_command_unavailable, which lets a reader distinguish malformed input, missing fixture evidence, private-boundary failure, and unavailable runtime dependencies.
This is the concrete implementation of mechanism.cold_clone_probe.validates_public_source_root_bootstrap and the bundle's concept.entry_and_reveal_route_readiness_bundle edge. The bundle's axiom and principle refs govern the boundary posture: JSON remains contract, Markdown/Mermaid/Atlas remain projections, source-open evidence must stay public-safe, and a local result record proves only the bounded first action from a fresh checkout.
Bundle Refresh Packet
current source authority: generated JSON should report paper_module_payload.source_authority: json_capsule after builder refresh.
refresh condition: rerun scripts/build_doctrine_projection.py --write-paper-module-corpus after bundle or mechanism changes, then verify Mermaid and Atlas remain generated projections with no launch or private-system authority.
scope limit: this Markdown and its generated projections describe reader evidence only; they do not source hosted deployment, public sharing, external model access, launch claims, source-file changes, private-system equivalence, or whole-system correctness.
Public Contract
Run ./bootstrap.sh from the public root. The probe validates source-form package importability and first-wave bootstrap mechanics while preserving the public/private boundary. Use ./bootstrap.sh --dry-run to inspect the exact command without writing the ignored result record.
The successful script output points back to README.md#public-repo-map and README.md#component-map. That is deliberate: the probe proves the first local source-root action, then hands the reader to the public map instead of asking them to trust a hidden setup step.
Reader Evidence Routing
Use the probe evidence by reader question, not by copying private local result records into public projections:
A safety/evals reader starts with the secret-exclusion and scope boundary fields. The useful question is whether the source-open claim excludes private bodies and account secret-equivalent live-access data.
A peer developer starts with bootstrap.sh --dry-run, then reads run_probe() and the focused tests. The useful question is whether the fixture, result record mirroring, and default ignored result record behavior are reproducible from source.
A hiring or review reader starts with the README map after the probe passes. The useful question is whether the first local action is bounded before any larger claim about launch, hosting, package distribution, or whole-system behavior.
Named Proof Consumers
tests/test_cold_clone_probe.py consumes the Python predicate directly. It verifies suite gating, default ignored result record selection, custom emit refs, fixture-missing behavior, secret-scan blocking before pattern replay, pattern-binding result record mirroring, unreadable path handling, duplicate-key rejection in mirrored validation result records, and metadata-only secret scan aliases.
tests/test_bootstrap_script.py consumes the root wrapper. It verifies no-side-effect help and version branches, argument and suite errors, dry-run command disclosure without writes, Python executable selection, custom emit writes, and the default .microcosm/cold_clone_probe.json path.
tests/test_public_entry_docs.py consumes the reader-facing documentation contract. It checks that README, AGENTS, SECURITY, CONTRIBUTING, QUICKSTART, and entry surfaces name the bootstrap path, ignored local result record boundary, public map handoff, and launch/host/non-public-state scope boundaries without reverting to stale tracked receipts/cold_clone_probe.json defaults.
Together these consumers prove the mechanism's accounting order and public reader language. They do not prove package distribution, CI completeness, hosted behavior, launch-scope decision, external model service, private-system equivalence, or sign-off of a cold_clone_probe component.
Validation Result record Path
Reader-verifiable commands, run from the microcosm-substrate/ public root:
The dry run prints the exact source-root command and the ignored result record target without writing evidence. The normal run writes .microcosm/cold_clone_probe.json; the focused pytest line verifies the probe, root script, and public-entry docs against the current checkout.
The focused tests cover default ignored result record behavior, custom local result record overrides, unknown-suite blocking, unreadable input handling, generated result record mirroring, public-entry doc invariants, and the rule that stale tracked receipts/cold_clone_probe.json paths do not become the default first-contact proof.
This result record path is reader-verifiable evidence only. It does not flip Mermaid/Atlas status, create bundle authority, install the package, certify launch operations, export secrets, use external model services, or aggregate doctrine-lattice coverage.
Scope boundary
Scope limit
This module proves only a bounded source-root bootstrap probe for a fresh public checkout: package importability from src/, secret-exclusion scan posture, first-wave pattern-binding fixture validation, and ignored local result record emission. This Markdown does not create or refresh those projections and does not promote them beyond generated-view status. It also does not create hosted deployment evidence, publishing-scope decision, provider authority, package-distribution proof, launch-scope decision, private-system equivalence, or whole-system correctness.
Scope boundary
This module documents clone/bootstrap mechanics only. It does not certify launch operations, hosted deployment, public sharing, recipient work, external model access, secret export, Lean/Lake execution, package distribution, deployment posture, private-system equivalence, or whole-system correctness.
First-Screen Composition RootThe first-screen composition root validates the public one-screen entry card, reader branches, omission result records, evidence accounting frame, text projection, README order, and scope limit without becoming launch or hosted-publishing-scope decision.
First-Screen Composition Root is the public entry-card contract for Microcosm. It binds the package card helper, CLI emitter, standard, README entry order, reader branch ids, doctrine-effect frame, omission result record, observatory landing refs, text projection, and focused tests so a cold reader sees what to inspect first without treating counts as maturity scores or route cards as launch, hosted-public sharing, provider, source-file changes, private-equivalence, or whole-system proof.
Scope limit Public first-screen card composition and focused validation only; no launch-scope decision, hosted-publishing-scope decision, external model access authority, source-file changes, private-data equivalence, score-based progress, reader-success certification, or whole-system correctness.
first_screen_composition_root is the contract for the one screen a cold reader should see before choosing a deeper Microcosm route.
Purpose
Microcosm already has the important deeper surfaces: route maps, workingness, scope limits, standards, result records, source-open body imports, and the localhost observatory. The first-screen problem is not lack of depth. It is that depth lands poorly when the first encounter is a long command inventory or a raw JSON payload.
The single question the composition root answers is: when a cold reader lands on this repository, what has to fit on one screen before they choose a deeper route, and in what order. It answers that with a fixed slot list rather than prose, so every projection of the first screen (terminal card, README, browser board, JSON, video) renders the same surfaces in the same order.
The composition root says what has to fit on one screen:
One shared terminal selector: microcosm hello <project>.
One shared behavior proof: microcosm tour --card <project>.
Six reader branch handles after that shared card: GitHub visitor, safety/evals engineer, hiring reviewer, peer developer, domain specialist, and Type A agent. The shared map and behavior proof always come first; a reader branch only changes which next inspection surface is shown, never the scope limit.
Evidence counts framed as accounting, not maturity or progress scores.
A runnable-to-structural join: the folder-local command is one visible exercise of a larger source-open system.
A doctrine-effect frame: concepts and mechanisms appear as public handles that prevent vague labels and feature prose before deeper standards are opened.
An omission result record: the card names the deeper route map, result records, standards, workingness, authority, and observatory drilldowns instead of copying them.
An scope limit that rejects launch, hosted public sharing, external model access, source-file changes, private-data equivalence, score-based progress, and whole-system correctness.
What is unusual here is that the card is checked against the standard that defines it. The emitter does not free-hand its output. It loads standards/std_microcosm_first_screen_composition_root.json, builds the card, and then scans the card back against the standard's required fields, validator id, reader-route parity, copyable per-reader commands, and denied-authority flags. The scan reports blocked rather than raising when a surface has drifted, so a renamed slot or a missing reader route is visible as a failing check instead of a silent regression. The check proves only that the card is internally consistent with its own contract. It does not certify that any reader will succeed, that the system is mature, or that anything is ready to launch.
flowchart TD A["First-screen standard standards/std_microcosm_first_screen_composition_root.json"] --> B["Compose card first_screen_composition.py: build slots, six reader routes, evidence frame"] B --> C["Scan card against standard _standard_backed_first_screen_scan + _validation_checks"] C -->|all checks true| D["status: pass"] C -->|drift, rename, missing route| E["status: blocked failing check named"] D --> F["Emit scripts/first_screen_composition_card.py --format json or text"] E --> F F --> G["Reader output JSON contract or terminal text card, one screen"] A -.binds.-> H["JSON source record + mechanism subject core/paper_module_capsules.json"]
This is the runtime shape behind the first screen: the standard is loaded, the card is composed from it, the card is scanned back against the standard, and the result is emitted as JSON or a terminal text card. A passing scan means the card is internally consistent with its contract; a blocked scan names the failing check. The diagram is not a public-sharing claims. It binds the mechanism subject and resolved code loci, while keeping accepted-component authority, launch-scope decision, hosted-publishing-scope decision, external model access, source-file changes, score-based progress, private-data equivalence, and whole-system correctness out of scope.
Reader Evidence Routing
A safety/evals reader starts with the focused first-screen text card for safety_evals_engineer, then checks the scope limit, evidence accounting frame, and public-entry doc tests. The useful question is whether the card keeps local behavior proof separate from launch, provider, and whole-system correctness claims.
A hiring or review reader starts with the JSON card and the Reader Branches table. The useful question is whether one rerunnable local command, one shared proof card, and one branch route are enough to inspect the system without mistaking counts for maturity scores.
A peer developer starts with scripts/first_screen_composition_card.py, then reads src/microcosm_core/first_screen_composition.py and the focused tests. The useful question is whether the command/card projection is reproducible from public inputs without reading private runtime state.
Generated entry-packet rows, browser boards, and site cards should point back to these public commands, tests, and omission result records. They do not become source authority for the paper module and must not imply publishing-scope decision or source-file changes.
Concrete One-Screen Artifact
The artifact is a terminal-sized card, not a second README. It should fit the following order without requiring a reader to scroll through the full command inventory:
Claim frame: "Microcosm turns a folder into local routes, work, events, evidence, and explanations." This names the composition root without claiming launch, hosted product status, or whole-system correctness.
Goal entry: microcosm comprehend --first-action "<your goal>" converts an arriving goal into one graph-backed first correct action (demonstrated in FIRST_ACTION.md) before the orientation ladder begins.
First step: microcosm hello <project> gives every reader the same shared entry command before audience branching.
Shared proof: microcosm tour --card <project> shows one local behavior proof that can be repeated from a clone.
Evidence legend: Count, evidence class, proof surface, and scope boundary prevent honest counters from becoming maturity scores.
Doctrine frame: Concepts and mechanisms as mistake-prevention handles let agents find std_microcosm_concept and std_microcosm_mechanism from entry instead of searching the standards tree.
Structural join: "This run exercises one public component inside a larger source-open system." This connects the local command to standards, result records, body imports, and observatory routes.
Reader rail: Safety/evals, hiring, and peer developer branch handles let each reader choose a next drilldown without changing authority.
Exit rule: Stop when the goal entry, first step, shared proof, evidence legend, and one branch next-step are understood. This keeps the card from expanding into the long-form route map.
Any first-screen renderer may change wording, but not the order of those slots. If a field needs more space than one screen, the renderer must replace the body with a result record, paper-module, standard, or observatory handle rather than expanding the card.
Evidence Accounting Frame
The first screen must explain honest counters before a reader sees them as scores. Counts such as source-open body materials, rows with source imports, verified source imports, external subprocess witnesses, and algorithmic projections are accounting fields. They answer "what kind of evidence is this and where can I inspect it," not "how mature is the whole system."
That distinction is reader-visible:
Small verified count: a narrow proof cell exists and carries higher authority. It does not mean the rest of the system is unimplemented.
Large source-open material count: public imported body material is inspectable by path and result record. It does not mean more bodies automatically produce stronger proof.
Algorithmic projection count: a generated surface is present and needs source-coupling context. It does not make generated rows source authority.
Rows with source imports: some components expose copied body material through validators. It does not mean every component has equal evidence depth.
Reader branches can choose different next evidence surfaces, but they inherit the same accounting frame. A safety/evals reader should ask for authority and failure-mode boundaries; a hiring reviewer should ask whether the counters are traceable rather than inflated; a peer developer should ask which command lets them reproduce the local evidence trail.
Discipline Comparison Frame
The first screen must show rigor by naming the collapse it prevents. A compact card is allowed to be small only because it keeps these separations visible:
A status badge says "works": local behavior proof stays separate from launch, proof, and correctness claims. The card names microcosm tour --card <project> plus the scope limit.
Evidence totals look like progress: evidence classes stay attached to scope boundaries and result record refs. A reader can move from count to class to proof surface without inferring maturity.
Governance reads like ceremony: each constraint is phrased as the mistake it blocks. The card says what would be overclaimed if the constraint were absent.
Breadth looks diffuse: the local run is framed as one exercised component inside the accepted runtime spine. The structural join points to spine, workingness, standards, result records, and observatory drilldowns.
This comparison frame is not marketing copy. It is the rule that keeps the first screen from sounding impressive while hiding the authority boundary that made the compression possible.
Observable Artifact Bridge
The first screen has two sibling projections: the terminal card and the compact browser board. They are the same artifact in different media, not two separate claims. The terminal card may be emitted by microcosm hello <project> or microcosm tour --card <project>; the browser board may be emitted by a first-screen or observatory compact endpoint. Both must show the same five slots before linking to deeper drilldowns:
Slot
Terminal card cue
Browser board cue
Open command
Exact command a reader can rerun.
Command label pinned above the board.
Local proof
Route/work/event/evidence chain summary.
The selected route and first causal edge.
Causal chain
Result record or validator refs.
Event/evidence refs before graph expansion.
Evidence legend
Evidence class plus scope boundary.
Legend beside counts, not hidden in hover text.
Scope limit
Forbidden reads named in text.
Boundary band visible before any motion.
The browser projection can make the first artifact more inspectable, but it cannot become a marketing page. A screenshot or video is first-screen material only when it preserves command, result record, evidence class, scope boundary, and scope limit in the frame.
Reader Branches
The shared first command comes before branching. Reader branches select the next inspection surface; they do not create audience-specific authority.
Safety/evals engineer: microcosm status --card <project> plus authority and workingness drilldowns; evidence focus is classes, ceilings, body-copy boundaries, scope boundaries, standards, and failure modes.
Hiring reviewer: legibility scorecard plus compact tour card; evidence focus is whether it is real, local, bounded, and honest about what is not proven.
Peer developer: compact tour card plus project observation drilldown; evidence focus is whether a clone can produce local .microcosm/ state and inspect the route/work/event/evidence chain.
Reader Selection Card
The machine-readable selector lives at atlas/entry_packet.json::reader_first_screen_routes.reader_selection_card. It is the public first-screen handoff between terminal prose and branch-specific drilldowns:
Those focused projections are allowed to hide the other two branches, but not the shared behavior proof, evidence-accounting frame, runnable-to-structural join, omission result record, or scope limit. The selector should therefore be read as a branch router, not a personalized success claim.
Public Card Emitter
scripts/first_screen_composition_card.py projects this contract into a public-root JSON card:
It can also emit the terminal-sized first screen directly:
python3 scripts/first_screen_composition_card.py --project-label <project> --format text
The text projection can focus one reader branch while preserving the same shared first command, evidence-count frame, omission result record, and scope limit:
python3 scripts/first_screen_composition_card.py --project-label <project> --format text --reader safety_evals_engineer
--reader all remains the default.
The emitter is intentionally narrow. It does not import private runtime state or source bodies. It loads this standard, emits the one shared command and three branch handles, frames evidence counts as accounting, names the runnable-to-structural join, and carries the standard's omission result record and scope limit.
Prior Art Grounding
The first-screen contract borrows from command-line usability practice rather than inventing a new onboarding genre. The Command Line Interface Guidelines emphasize concise default help, examples, discoverable next commands, clear exit behavior, and machine-readable output where appropriate. Those patterns show up here as the one shared command, terminal-sized card, rerunnable examples, and explicit reader branches.
The compression rule is also grounded in progressive disclosure: the first screen should reveal enough structure to orient a cold reader without dumping the whole system. Nielsen Norman Group's progressive disclosure pattern is the relevant UX precedent, while W3C PROV informs the insistence that evidence counts remain attached to provenance, result record refs, and authority boundaries instead of becoming freestanding success badges.
Validation Shape
The standard is intentionally a composition contract, not a runtime authority. When a runtime card consumes it, validation should check that the card has a single terminal selector, one shared behavior proof, the three reader route ids, the reader-selection card ref, evidence-accounting context, a runnable-to-structural join, discipline comparison frame, observable artifact bridge, concept and mechanism rows inside the doctrine-effect frame, omission result records, and the scope limit.
Validation Result record Path
Reader-verifiable emitter commands, run from the microcosm-substrate/ public root:
The emitter commands write no private state and print the public first-screen result record to stdout: the JSON card exposes the shared command, behavior proof, evidence accounting frame, omission result record, and scope limit; the focused text command proves a reader branch can be compressed without changing the claim frame. The focused tests verify the card schema, CLI first-screen output, README entry contract, local-state result record trail, overclaim tripwires, and reader route menu.
This result record path is reader-verifiable evidence only. It does not replace the cold-reader route map, certify launch-scope decision, change source files or project state, use external model services, or aggregate doctrine-lattice coverage.
Scope boundary
Scope limit
This module governs only the first-screen compression contract: one shared terminal selector, one shared behavior-proof card, reader branch handles, evidence accounting, omission result records, and an scope limit visible before deeper routes. It does not establish runtime correctness, replace the route map, certify public sharing or launch-scope decision, use external model services, change source files or project state, make counts into maturity scores, or aggregate doctrine-lattice coverage.
Scope limit
This module does not replace the cold-reader route map, standards-control lens, workingness map, public reveal walkthrough, or observatory. It only governs the compression boundary that lets those deeper surfaces land in the right order.
batch7_secondary_runtime_capsule imports a second Set-7 runtime slice into Microcosm. It exact-copies runtime view-model, lane-progress, graph-lens, graph-projection, cartography, stockgrid, and Polymarket source bodies into a public bundle, runs the bounded witness path, and exercises the Python market/numeric cores against synthetic public fixtures.
This module is the reader-facing instrument for the accepted batch7_secondary_runtime_capsule component. Its source authority is the JSON source record in core/paper_module_capsules.json; this Markdown explains what a cold reader may trust from the public secondary-runtime fixture and what remains out of scope.
The component exists to answer one question: do these copied frontend and market bodies still behave the way their original code claims to, when run in isolation over synthetic inputs? It copies eight slices into a bundle, then exercises each one against a small fixture and re-checks the exact behaviour the original author relied on. The interesting part is not that the code runs, but that each engine is paired with a planted regression. The component mutates a single token in the copied body, or feeds an adversarial input, and asserts that the behaviour breaks in the expected way. A check that only passes on good input proves little; a check that also fails on the right bad input is evidence the behaviour is real.
Several of these guards encode a concrete bug that was found in production. The Polymarket order-book reader documents a probe from 2026-05-12: the API can return bids floor-first and asks ceiling-first, so a naive bids[0] / asks[0] reader silently inverts best-bid and best-ask. The body derives best prices by numeric extrema instead, and the polymarket_sorted_book_trap case feeds a deliberately mis-sorted book to confirm the extrema rule still holds. The stockgrid momentum primitive refuses an impossible -100% daily change rather than returning a misleading number. The graph projection drops self-edges so a collapsed cluster does not draw an arrow to itself. The scope stays narrow on purpose: this is local body import and synthetic-fixture witness evidence, not live market access, wallet authority, browser export, or investment-related actions.
Shape
Source refs
Vitest witness
world/graph/cartography tests
Diagram source
flowchart TD bundle["Exported bundle copied bodies + source digest anchors"] witness["Vitest witness world/graph/cartography tests"] subgraph Engines["Eight fixture engines"] ui["Trace view-model and lane progress"] graph["Graph lens and graph projection"] carto["Cartography observe-only render"] market["Stockgrid + Polymarket CLOB and four-lens scoring"] end subgraph Negatives["Planted regressions"] invert["Mis-sorted book must still find extrema"] momentum["-100% change must be refused"] selfedge["Self-edge must be dropped"] resolved["Resolved market must gate NEWSBREAKER"] end result records["metadata-only result records status, digests, anchor checks"] ceiling["scope limit"] bundle --> witness witness --> ui bundle --> graph bundle --> carto bundle --> market ui --> Negatives graph --> Negatives carto --> Negatives market --> Negatives Negatives --> result records result records --> ceiling
Reader Evidence Routing
Start from the component source when checking behavior:
EXPECTED_ENGINES names the eight fixture engines for trace view-models, lane progress, graph lenses, graph projection, cartography, stockgrid, CLOB microstructure, and Polymarket scoring.
EXPECTED_NEGATIVE_CASES names the planted regressions for raw-authority omission, unknown lane state, hidden descendants, self edges, observe-only cartography, extreme stock momentum, sorted-book traps, and resolved-market gating.
AUTHORITY_CEILING keeps launch, public sharing, provider/model dispatch, browser or wallet access, source-file changes, investment-related actions, semantic-truth authority, and test-completeness proof false.
run, run_batch7_secondary_bundle, and result_card expose the reproducible command and metadata-only summary.
What the engines check
Each engine reads a copied body and asserts a specific, checkable behaviour. The four with the clearest stakes:
Polymarket CLOB microstructure.compute_best_prices derives the best bid as the maximum bid price and the best ask as the minimum ask price, never from the first row of each side. This guards a real failure documented in the source: the API can return bids floor-first and asks ceiling-first, which inverts a naive bids[0] / asks[0] reader. The polymarket_sorted_book_trap case feeds a mis-sorted book and confirms the chosen best bid (0.42) and ask (0.53) are not the first entries, then checks the spread and that depth imbalance stays in [-1, 1].
Stockgrid momentum._daily_log_momentum_bps converts a percentage change into a daily log-return in basis points, but returns nothing when the ratio is at or below -0.999999. A claimed -100% daily change has no finite log return, so the primitive refuses it rather than emitting a misleading value. The stockgrid_extreme_momentum case asserts that refusal.
Graph projection.projectGraphForRender groups nodes into per-lane, per-wave summary clusters and rewrites edges between clusters. It drops any edge whose source and target land in the same cluster, so a collapsed cluster never draws an arrow to itself. The graph_projection_self_edge case removes the sourceId === targetId guard from the copied body and confirms the self-edge would otherwise survive.
Polymarket four-lens scoring.calculate_lenses zeroes the NEWSBREAKER lens for any market that is resolved, low-volume, low-uncertainty, or an outlier in velocity. The fixture scores one open and one resolved synthetic market and asserts the resolved one scores zero on NEWSBREAKER while the open one does not.
The remaining engines cover the trace view-model trust taxonomy (seven labels including missing and fallback, with an explicit "raw provider JSONL is unavailable" path), lane-progress state normalisation (an unknown state falls back to idle, not an invented status), the graph lens (collapsing a parent keeps the parent visible but hides its descendants), and the cartography render (a fixed set of mutating actions stays blocked, so the surface observes without creating or editing). Each negative case is run by mutating one token in the copied body or supplying an adversarial input, then checking the engine reports blocked. The result records record status, digests, and anchor matches only; copied bodies and command output are never inlined.
Prior Art Grounding
The component borrows from MVVM/read-model UI architecture, graph visualization, and market-data board patterns: view models shape raw state for views, graph projections make relationships inspectable, and market rows must preserve provider identity and missingness. Useful anchors include:
Microsoft's MVVM guidance, where view models encapsulate presentation state while separating UI from underlying model logic.
D3 force layouts as a common graph visualization family for networks and hierarchies.
Microcosm borrows the view-model, graph-projection, and market-diagnostic shapes, but runs them only over synthetic runtime packets and synthetic market rows. It is not browser/session export, live market data, trading decisions, or proof that frontend projections are complete.
Validation Result record Path
Reader-verifiable fixture command, run from microcosm-substrate/:
Focused test result record, run from the repository root:
The fixture run writes receipts/first_wave/batch7_secondary_runtime_capsule/batch7_secondary_runtime_capsule_result.json, receipts/first_wave/batch7_secondary_runtime_capsule/batch7_secondary_runtime_capsule_validation_receipt.json, and receipts/first_wave/batch7_secondary_runtime_capsule/batch7_secondary_runtime_capsule_board.json; the sign-off file records fixture sign-off. The exported-bundle re-run uses the run-batch7-secondary-bundle action over exported_batch7_secondary_runtime_capsule_bundle.
This result record path is public fixture evidence only. It does not export browser or account sessions, fetch live market data, provide investment-related actions, complete UI/ranking coverage, include launch operations or public sharing, or aggregate doctrine-lattice coverage.
Scope boundary
Scope limit
This bundle can claim fixture-bound public source-body import evidence and secondary runtime/market witness result records. It cannot authorize browser/session export, wallet authority, live market data, investment-related actions, external model access, source-file changes, launch, public sharing, private-system equivalence, semantic truth, complete UI/ranking coverage, or whole-system correctness.
Microcosm Axiom SystemThe public Microcosm axiom system routes readers from axiom doctrine to the read-only support-cover evaluator, routing registry, standard, tests, and result records without claiming proof, launch, or source-file changes.
Microcosm Axiom System is the public doctrine/routing boundary for axiom support. Its bundle binds the authored paper-module projection to the axiom support-cover mechanism and validator locus, with source refs to AXIOMS.md, PRINCIPLES.md, ANTI_PRINCIPLES.md, core/axiom_organ_routing.json, standards/std_microcosm_axiom.json, and focused tests. The validator computes support cases, anti-axiom rejection mappings, candidate pressure, and strong-gate pressure as a read-only projection.
Scope limit Public doctrine/routing/standard/result record evidence and read-only evaluator result records only; axiom witness-route source authority remains core/axiom_organ_routing.json, and claim_ceiling/strongest_allowed_claim remain computed by validator.microcosm.axiom_support_cover, not hand-stamped by this bundle or generated projections. No axiom proof certification, no candidate-law promotion, no source-file changes, no provider/Lean/Lake execution, no launch/publishing-scope decision, no private-system equivalence, and no whole-system correctness.
The Microcosm axiom system routes a reader from axiom doctrine to a read-only evaluator that measures how much on-disk evidence actually backs each axiom, and reports where that evidence runs out.
Purpose
A doctrine file can assert that a system holds twelve axioms. The harder question is whether anything checkable stands behind each one. This module answers a single question: for each piloted axiom, how strong is the support that already exists on disk, and exactly where does it stop short of proof?
The unusual part is what the evaluator refuses to do. The routing source hand-stamps eleven of the twelve axioms as strong and labels each with the anti-axiom it is meant to reject, for example AX-1 against asserted_label_is_truth or AX-5 against silent_default_pass. The evaluator reads those labels and declines to ratify them. It recomputes support from the witness components, checker surfaces, result records, and negative cases each obligation actually cites, and for every axiom it returns a strongest_allowed_claim that is capped below strong. No node ever reports strong_certified: true. The honesty contract is turned on the evaluator itself: because the evaluator is also a Microcosm artifact, AX-12 (meta_artifact_exemption) forbids it from treating its own generated output as evidence for itself.
The reason support cannot reach strong is made explicit rather than hidden. The routing defines strong as computing the property and rejecting the named anti-axiom with a negative case. Today the evaluator can confirm that a witness component's result record records a complete negative-case suite, but it will not map that endpoint coverage onto a particular obligation's anti-axiom slice. That mapping stays mapping_verified: false unless a source-owned row declares an exact or subsuming rejection. The gap is reported as pressure to do more witness work, never closed by relabelling.
Teleology
Microcosm axioms compress the public system's recurring formal commitments into checkable clauses. The goal is not philosophical decoration; the goal is a routing layer where each axiom expands into principles, components, result records, negative cases, witness surfaces, and layer debt.
Governing Standard
This paper module is governed by executable_doctrine_grammar and the Microcosm axiom/principle surfaces:
validator.microcosm.axiom_support_cover is the read-only executable surface for the piloted axioms in core/axiom_organ_routing.json, currently AX-1 through AX-12. It compiles support_cases, support_frontiers, principle_support_index, anti_axiom_rejection_mappings, and strong_gate_summary from source routing, standard grammar, result records, witness surfaces, and evidence-class registries. Its output is a projection below source authority: it may expose pressure and bounded overlap, but it does not mutate axioms, certify strong, or include launch operations.
flowchart TD Routing["Routing rows core/axiom_organ_routing.json axiom, anti-axiom, obligations, hand-stamped witness_strength"] Bind["Per-obligation binding witness components, checker surfaces, result records, negative-case codes"] Resolve{"Does the binding resolve on disk?"} Capped["Capped support blocked or layer_debt"] Ceiling["Eight-component ceiling vector evidence_class, checker_scope, provenance, freshness, domain, negative_case, authority, projection"] Reject["Anti-axiom rejection mapping tier + mapping_relation; mapping_verified stays false without a source-owned row"] Meet["Bilattice meet support status AND rejection status"] Claim["Node scope limit strongest_allowed_claim strong_certified: false"] Pressure["Candidate-axiom pressure witness debt, rejection-mapping debt, sharpen the over-stamped row"] Routing --> Bind --> Resolve Resolve -- "no" --> Capped --> Meet Resolve -- "yes" --> Ceiling --> Meet Bind --> Reject --> Meet Meet --> Claim Claim --> Pressure
The shape makes the axiom system inspectable without converting pressure into proof. Doctrine refs, routing JSON, the axiom standard, validator output, and focused tests can show support/frontier structure; the bundle makes that route walkable through a mechanism and code locus, but it cannot certify a strong gate, promote candidate law, or include launch operations.
How support is computed
Each routing row names an axiom, the anti-axiom it should reject, and a list of obligations. Each obligation carries a binding: the witness components, checker surfaces, result records, and negative-case codes that are meant to back it. The evaluator first checks that every cited piece of material actually resolves. A witness component must exist in core/organ_registry.json, a witness surface must exist on disk or match a glob, and a negative-case code must be declared on the row. Any unresolved reference caps the obligation at blocked_binding_unresolved rather than letting it pass silently. An obligation the routing marks as layer_debt is recorded as partial witness debt that caps the axiom without weakening it.
For obligations whose bindings resolve, the evaluator computes a ceiling vector across eight named components: evidence_class, checker_scope, provenance_class, freshness_state, domain_scope, negative_case_status, authority_scope, and projection_scope. Each component is read off the bound material, not asserted. evidence_class, for instance, is the strongest evidence rank among the witness components and never exceeds it. freshness_state is pinned to its honest floor, unknown_live_freshness_no_refresh_contract, because a deterministic basis digest proves reproducibility but not that the inputs are current. Two components, authority_scope and projection_scope, carry an explicit non-laundering note: read-only validator output and generated projections cannot be read back as source evidence.
The axiom-level verdict is the meet of two separate judgements. One is positive support, folded from the obligations' resolution states. The other is the anti-axiom rejection mapping, tracked apart from support so that component-level negative-case coverage can never quietly stand in for a per-obligation rejection. Because rejection mapping is almost never verified, the meet caps every axiom below strong, and the result record records the precise blocking reason. Where an obligation is unresolved, in layer debt, or has an unverified rejection mapping, the evaluator emits candidate-axiom pressure: witness debt, rejection-mapping debt, or a sharpen signal against a row whose hand-stamped strong it cannot certify. The forbidden move is always the same, recorded on the pressure itself: do not lower the axiom bar to make coverage look green.
Support for principles is derived, not measured directly. The evaluator parses each principle's Obligation grounding: line in PRINCIPLES.md and inherits support from the grounding obligations, so a principle is never stronger than the obligations it rests on. As a separate guard it flags any binding that tries to cite a principle as a witness, since a principle leaning on an axiom cannot also serve as that axiom's evidence.
Anti-Axiom Rejection Mapping
std_microcosm_axiom.json::axiom_payload_contract.anti_axiom_rejection_contract separates positive support from rejection of a named anti-axiom. A first-wave component result record with complete negative-case coverage is admissible evidence material, not a per-obligation rejection by itself. The evaluator therefore maps result record-observed negative families to each obligation slice with a mapping_relation such as unmapped, illustrative_only, or partial_overlap, while keeping mapping_verified: false unless a source-owned mapping row declares exact or subsuming rejection.
The current AX-8 mapping is intentionally non-uniform: O1 remains unmapped because endpoint/component result record coverage does not establish general source->transform->sink propagation; O2 is only partial_overlap against sink-policy evidence; O3 is only illustrative_only until endpoint-label assertion rejection is declared against that obligation. This is the no-laundering floor: organ_receipt_coverage_present can never be promoted directly into exact_obligation_rejection.
Those AX-8 relations live as source-owned, non-certifying rows in core/axiom_organ_routing.json::rows[AX-8].anti_axiom_rejection_mappings[]. The evaluator consumes those rows before any legacy inferred fallback and still recomputes result record material from disk. The rows close hidden-code-schema drift; they do not close rejection, remove layer debt, or upgrade any obligation to strong.
Reader Evidence Routing
A doctrine reader starts with AXIOMS.md, PRINCIPLES.md, ANTI_PRINCIPLES.md, and core/axiom_organ_routing.json. The useful question is which source rows claim support, frontier pressure, and anti-axiom mapping status, not whether this page proves the axioms.
A validator reader runs microcosm_core.validators.axiom_support_cover and opens tests/test_axiom_organ_routing.py plus tests/test_axiom_support_cover.py. The useful question is whether support cases, support frontiers, negative-case evidence, and AX-8 rejection mapping readback are computed from public source.
A launch-boundary reader starts with the Scope limit and scope boundary text before reading generated docs or site cards. The useful question is whether candidate law, Lean proof, strong-gate certification, launch-scope decision, and aggregate lattice coverage stay outside this Markdown.
Generated projections may summarize axiom evidence only by source refs, mapping statuses, booleans, summaries, and result record paths.
Proof-Consumer Readback
This module's proof-consumer value is a narrow evidence-accounting readback: named validators and tests consume public source refs, fixture-visible routing rows, counts, verdict fields, and scope boundaries so a reader can see where axiom support is computed and where it is capped. These consumers do not expand the scope limit, certify strong, promote candidate law, prove axioms in Lean, include launch operations, or change source records.
validator.microcosm.axiom_support_cover consumes core/axiom_organ_routing.json, core/organ_evidence_classes.json, core/organ_registry.json, and standards/std_microcosm_axiom.json, then emits support_cases, support_frontiers, anti_axiom_rejection_mappings, strong_gate_summary, truth_calculus_summary, principle_support_index, and candidate_axiom_pressure. Its own authority_posture remains read_only_evaluator_projection_not_source_of_record.
tests/test_axiom_organ_routing.py consumes the routing registry plus AXIOMS.md, PRINCIPLES.md, ANTI_PRINCIPLES.md, core/organ_registry.json, and public source text to check that routed axioms, principles, anti-principles, witness refs, negative-case codes, and source-owned anti-axiom rejection mappings resolve without laundering component result record coverage into exact obligation rejection.
tests/test_axiom_support_cover.py consumes the evaluator output and checks the proof-consumer floor: AX-1's hand-stamped strong label is not echoed, AX-8 stays capped by layer debt, principles inherit bounded support without becoming witnesses, candidate pressure routes to sharpening or witness debt, and rejection-mapping debt routes to result record-level per-obligation evidence.
src/microcosm_core/doctrine_lattice.py::build_axiom_instance_from_routing_row consumes routing rows into public axiom JSON instances with a microcosm_axiom_substrate_reciprocity_v1 contract. That readback names how law constrains system and how system can refine support-frontier evidence, while explicitly keeping witness components and negative cases as support-calculation inputs rather than support claims.
tests/test_doctrine_lattice_runtime.py consumes generated axiom instances only to verify that reciprocity contract and routing-derived fields survive projection. It does not treat generated graph, health, Markdown, or Atlas output as source evidence.
tests/test_microcosm_paper_module_coverage_contract.py consumes this Markdown as a bundle-backed reader lane by requiring the source bundle, axiom-support validator locus, generated projection boundary, and scope limit to stay in sync with generated paper-module coverage.
The readback condition is intentionally modest: if those consumers still derive bounded support, negative-case and rejection-mapping pressure, source-ref resolution, and scope limits from public source, this page remains useful reader evidence. If a consumer starts treating generated projection output as source evidence, or treats coverage pressure as formal-result correctness, the correct repair is in the source owner or validator lane, not in stronger prose here.
Prior Art Grounding
The axiom system draws from two older patterns: formal assumptions should be inspectable, and machine-readable schemas should make support claims testable. Lean's proof environment gives the immediate formal-methods analogue through its axiom-audit practice: a theorem can be checked, then separately inspected for assumptions through commands such as #print axioms. Microcosm adapts that spirit to doctrine by making each axiom expand into witness surfaces, negative cases, routing rows, and support-frontier status instead of treating the axiom prose as self-certifying.
The JSON-controlled side of the module is grounded in JSON Schema, which frames schemas as a way to define validation rules, document shared structure, and improve interoperability. The provenance side is adjacent to W3C PROV: support rows, witness refs, and anti-axiom mappings are evidence links with bounded meaning, bounded evidence of whole-system completeness.
Validation Result record Path
Reader-verifiable evaluator command, run from the microcosm-substrate/ public root:
Focused test result record, run from the repository root:
The evaluator command writes a read-only support-cover result record that reports support cases, support frontiers, anti-axiom rejection mappings, principle support inheritance, and strong-gate pressure without mutating law. The focused tests verify routing-schema parity, witness refs, negative-case evidence, AX-8 rejection mapping readback, and the rule that result record coverage is not laundered into exact obligation rejection.
This result record path is reader-verifiable evidence only. It does not establish axioms in Lean, certify a strong gate, promote candidate law, include launch operations, or change source records.
Scope boundary
Scope limit
This module proves only that the axiom-support boundary is inspectable through public doctrine refs, routing rows, evidence-class vocabulary, validator loci, support-frontier result records, and anti-axiom mapping statuses. A diagram view and navigation-atlas card for this module are not yet generated because the module has not been admitted through the standard subject-resolution path; it does not create bundle authority, promote candidate law, prove axioms in Lean, certify the support-cover strong gate, close rejection obligations, authorize public sharing or launch, change source records, or stand alone as a complete coverage claim.
Scope limit
This module is a reader instrument for a staged paper-module boundary. It does not generate diagram views or navigation-atlas cards, create or amend core/paper_module_capsules.json, certify the support-cover strong gate, promote candidate law, prove axioms in Lean, authorize public sharing or launch, change source records, or stand alone as a complete coverage claim. Those effects require source-owner rows, builder regeneration, and their own validation result records.
Scope boundary
This module does not establish the axioms in Lean, does not claim whole-system completeness, does not claim all paper modules were semantically exhausted by the first write, and does not grant launch-scope decision. It is a routing and derivation surface whose claims are bounded by the witness strengths recorded in core/axiom_organ_routing.json.
Source and projection details
Source Authority Re-entry Guard
The closure sequence for this module is source-first:
the mechanism registry admits mechanism.microcosm_axiom_substrate.validates_public_axiom_support_boundary as a read-only validator mechanism;
the paper-module bundle names that mechanism as its resolving subject, names the resolved validator code_loci, and retains existing axiom/principle/concept refs plus the same anti-scope limit;
readiness evidence must prove required_subject_gap_ids no longer includes this module and that Mermaid moved to available_from_capsule_edges because the source row exists.
This Markdown remains a reader route over axiom support evidence, source-linked only for claims beyond the bundle edge.