Microcosm
This page

Area · 18 components

Formal math & proof

Inspectable pieces of a proof pipeline: premise retrieval over a copied Lean Std index, tactic routing, verifier-trace repair, and claim-separation result records. Three components run the real Lean/Lake prover locally on bounded examples; the rest publish the pipeline's checking layers as contracts you can open.

Components

Proof Diagnostic Evidence SpineSorts proof-pipeline checks into accepted or rejected without inflating a pass.3/5

Does An evidence checkpoint that sits in front of formal-proof work. It reads the diagnostic records left by earlier proof-pipeline steps and writes a "diagnostic board" listing which checks were accepted, which were rejected, and why. The board shows exactly what evidence was kept, and refuses to let raw model output, a stale record, or a merely-passing check get inflated into a claim that the math is actually correct. It only arranges and judges existing records; it never runs a proof checker itself.

Scope limit It records proof/evidence diagnostics over existing result record references only. It does not run Lean, use external model services, expose proof bodies, turn a passing check into formal-proof or theorem authority, prove runtime or whole-system correctness, authorize later components, certify public launch, authorize public sharing or recipient work, or establish secret export.

Run
microcosm proof-diagnostic-evidence-spine run --input fixtures/first_wave/proof_diagnostic_evidence_spine/input --out receipts/first_wave/proof_diagnostic_evidence_spine --card

Paper module Proof Diagnostic Evidence Spine

proof_diagnostic_evidence_spine sits one step before formal proof authority. It holds diagnostic evidence from the formal-math evaluation and premise-retrieval pipeline as result record-backed cells, and refuses to let any of them be read as a proof.

Purpose

The component answers a single question: does a diagnostic check that claims to be backed by real Ring2 runtime evidence actually recompute against that evidence, or is it asserting more than its refs support? Without this membrane, a check row could name a failure-taxonomy report or a graph-update candidate set, declare itself passing, and be trusted on its own word. The spine refuses that.

What is unusual is that the validator does not trust the fixture's own pass label. It ignores the legacy expected_result field as a non-authoritative fixture label and rederives the verdict itself. For each check it resolves the named source_ref to a real file, re-hashes that file with sha256, and confirms the hash matches the expected digest. It then opens the named result record anchor and checks that the result record payload actually contains that source ref and digest. A check is accepted only when the source, the digest, and the result record all agree. The pass is a recomputation, not a claim copied from the fixture.

The second idea is that negative evidence is kept rather than hidden. A stale source fingerprint is recorded as source_fingerprint_status: stale and retained as diagnostic evidence; a provider advisory row is preserved as metadata while being rejected as authority; a forbidden proof-body field turns a row into a regression fixture rather than silently dropping it. The board shows what did not hold, which is the point of an evidence membrane.

Teleology

proof_diagnostic_evidence_spine is the body-safe evidence membrane before formal proof work. It records proof/evidence diagnostics while rejecting proof bodies, provider output bodies, source-authority upgrades, stale coupling, and runtime-correctness overclaims.

Public Contract

The validator consumes failure-taxonomy records, graph-update traces, verifier-trace repair artifacts, and formal evidence-cell anchor result record refs from the formal-math evaluation and premise-retrieval pipeline, then emits diagnostic result records over those refs. Provider-advisory rows are bounded evidence authority. Passing diagnostic checks do not become formal proof authority or formal-result correctness.

How a check is accepted

A check row carries three lists: source_refs, receipt_anchor_refs, and source_digest_refs. The validator does not take the row's word for whether it passes. It recomputes the verdict from the system.

For each source_ref it resolves a real file, reads it, and hashes the bytes with sha256. That hash must equal the expected digest the component holds for the ref. It then opens each result record anchor and checks that the result record payload actually contains the source ref and its digest, so a check is only "result record-backed" if the result record it cites genuinely references it. On top of that the component applies a semantic floor: a check whose id mentions a failure taxonomy must point at a source file that carries a failure-taxonomy report with representative failures and at a result record that carries a failure-mode ledger; a graph-update check needs graph-update candidates with ids and a matching result record anchor. The check is accepted only when every source resolves, every digest matches, every cited result record backs the ref, the semantic floor is satisfied, and no expected-negative error code is declared.

The concrete failure mode this guards against is a plausible-looking row that names real artifact paths but does not actually recompute: a digest that has drifted, a result record that does not mention the ref it claims, or a check labelled as failure-taxonomy evidence while pointing at an unrelated file. Each of those becomes a rejection finding rather than a silent pass. The recompute is also why a passing check stays bounded. It establishes that the named evidence is present and coupled, not that the underlying runtime is correct, which is why a row that adds claims_runtime_correctness is rejected as an overclaim.

Shape

all agreeany mismatchDiagnostic check rowsource_refs,receipt_anchor_refs,source_digest_refsDiagnostic check row source_refs, receipt_anchor_refs, source_digest_refsResolve source refto real public fileResolve source ref to real public fileRe-hash file (sha256)compare to expected digestRe-hash file (sha256) compare to expected digestOpen result record anchordoes payload containthis ref and digest?Open result record anchor does payload contain this ref and digest?Semantic floorfailure-taxonomy /graph-updatesource and result recordmatchSemantic floor failure-taxonomy / graph-update source and result record matchAccepted checkverdict = recomputed,body_in_receipt falseAccepted check verdict = recomputed, body_in_receipt falseRejected / retainedas diagnostic evidenceRejected / retained as diagnostic evidenceStale source fingerprintStale source fingerprintProvider advisory payloadProvider advisory payloadForbidden proof-body fieldForbidden proof-body fieldevidence accounting onlyevidence accounting only

Source refs

evidence accounting only
diagnostic_board.json
Diagram source
flowchart TD Check["Diagnostic check row source_refs, receipt_anchor_refs, source_digest_refs"] Resolve["Resolve source ref to real public file"] Hash["Re-hash file (sha256) compare to expected digest"] Result record["Open result record anchor does payload contain this ref and digest?"] Floor["Semantic floor failure-taxonomy / graph-update source and result record match"] Accept["Accepted check verdict = recomputed, body_in_receipt false"] Reject["Rejected / retained as diagnostic evidence"] Stale["Stale source fingerprint"] Provider["Provider advisory payload"] Proofbody["Forbidden proof-body field"] Check --> Resolve --> Hash --> Result record --> Floor Floor -->|all agree| Accept Floor -->|any mismatch| Reject Stale -. retained as evidence .-> Reject Provider -. metadata kept, authority denied .-> Reject Proofbody -. scrubbed, kept as regression .-> Reject Accept --> Board["diagnostic_board.json evidence accounting only"] Reject --> Board Board -. denies .-> Ceiling["no Lean/Lake run, no formal-result correctness, no provider authority, no launch"]

Evidence/accounting refs:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[14] sets source_authority: json_capsule, names subjects proof_diagnostic_evidence_spine and mechanism.proof_diagnostic_evidence_spine.validates_ring2_diagnostic_evidence_membrane, resolves code_loci[0].path to src/microcosm_core/organs/proof_diagnostic_evidence_spine.py, and keeps generated_projections.markdown.generated: false, generated_projections.mermaid.status: available_from_capsule_edges, and generated_projections.atlas_card.status: linked_from_capsule_edges.
  • Generated instance boundary: paper_modules/proof_diagnostic_evidence_spine.json::paper_module_payload.projection_contract records authority_flip_status: not_flipped, while paper_modules/proof_diagnostic_evidence_spine.json::relationships.edges carries source-justified links to the component, mechanism, concept, principles, axioms, dependencies, and code locus.
  • Component/source locus: organs/proof_diagnostic_evidence_spine.json::organ_payload.source_atlas_row names the first command, claim_ceiling_restated, mechanism_refs[0], wires_to, and the same code-locus symbols implemented in src/microcosm_core/organs/proof_diagnostic_evidence_spine.py (PROOF_AUTHORITY_CEILING, EXPECTED_NEGATIVE_CASES, validate_copied_macro_body_artifacts, validate_evidence_receipts, validate_provider_payload_policy, validate_authority_ceiling, run, and run_evidence_bundle).
  • Standard contract: standards/std_microcosm_proof_diagnostic_evidence_spine.json::authority_boundary_detail limits the component to copied Ring2 diagnostic runtime artifacts, summary metrics, graph-variant metadata, and anchor result record refs. Its body_import_verification.source_open_body_import_floor records 13 copied artifact bodies, 10 exact copies, 3 public-light edits, and body_text_exported_in_receipts: false; its body_import_verification.public_organ_source_body_floor records one exact copied public component source body.
  • Bundle floor: examples/proof_diagnostic_evidence_spine/exported_evidence_bundle/bundle_manifest.json has schema_version: proof_diagnostic_evidence_spine_exported_evidence_bundle_v1, bundle_id: ring2_proof_diagnostic_evidence_runtime_example, copied_macro_body_artifacts count 13, and an scope limit of Ring2 diagnostic result record refs only, not formal proof authority.
  • Source-body floor: examples/proof_diagnostic_evidence_spine/exported_evidence_bundle/source_body_floor/source_module_manifest.json::modules[0] records source ref src/microcosm_core/organs/proof_diagnostic_evidence_spine.py, source_to_target_relation: exact_copy, sha256_match: true, body_in_receipt: false, and omitted material including model-output data bodies, account or browser state, browser UI live-access state, recipient-send state, private proof bodies, and oracle-needed premise ids.
  • Result record behavior: receipts/first_wave/proof_diagnostic_evidence_spine/proof_evidence_validation_receipt.json records accepted_count: 2, rejected_count: 1, missing_negative_cases: [], body_in_receipt: false, source_fingerprint_status: stale, and observed negative cases for source-authority upgrade, missing result record fields, runtime-correctness overclaim, provider/proof body rejection, and stale coupling. The sibling provider_payload_policy_result.json::provider_payload_policy preserves advisory metadata while rejecting the forbidden proof-body payload, and diagnostic_board.json::authority_ceiling rejects model-output data authority, source-authority upgrade, runtime-correctness claims, and formal prover execution.
  • Focused regression surface: tests/test_proof_diagnostic_evidence_spine.py asserts the observed negative cases match EXPECTED_NEGATIVE_CASES and checks the exported evidence bundle path. These tests support reader wiring and evidence accounting only; they do not establish formal-result correctness, provider authority, runtime correctness, publishing-scope decision, or launch-scope decision.

Reader Evidence Routing

Route currentness questions through ## JSON Bundle Binding and the validation commands in ## Validation Result record Path. The tests and corpus check confirm reader wiring and projection health; they do not establish proof authority.

Route source/body-floor questions through ## Source-Open Body Floor and the fixture/example paths named under ## Structured Lattice Bindings. The diagnostic artifact copies from the formal-math evaluation pipeline, public component-source copy, manifests, and digest coupling are evidence-accounting inputs; they are bounded evidence bodies, model-output data bodies, runtime correctness claims, or source-authority upgrades.

Route claim-safety and public-copy questions through ## Scope limit, ## Evidence-As-Accounting Shape, and ## Scope boundary, then pair this module with batch12_release_claim_language_gate when public wording is being checked. If the question is "did the validator still enforce the membrane?", use the focused pytest and corpus check in ## Validation Result record Path before citing the reader page.

Evidence-As-Accounting Shape

This component is the proof-adjacent evidence membrane behind Microcosm's scope limits. It accepts diagnostic runtime artifacts, result record refs, source digests, and negative-case results as evidence cells, while refusing to treat any of them as theorem authority.

The accounting rule is two-sided. A copied artifact from the formal-math evaluation and premise-retrieval pipeline can strengthen only the diagnostic claim named by its result record, digest, and validator; it cannot upgrade itself into formal-result correctness, provider authority, launch-scope decision, or private-system equivalence. Stale source coupling is retained as diagnostic evidence instead of hidden, and provider-advisory rows remain metadata without payload bodies.

Use this module with batch12_release_claim_language_gate when evaluating public copy: the evidence spine says what result record-backed cells exist, and the language gate decides whether a public sentence stays within that ceiling.

Prior Art Grounding

The evidence spine is grounded in assurance-case practice: evidence should be connected to claims, assumptions, and limits before it is treated as support. NASA's Goal Structuring Notation example for spacecraft assurance is a useful public analogue because it frames assurance as model-structured evidence rather than document-level persuasion: NTRS 20160005295.

The result record membrane also borrows from W3C PROV and observability practice: diagnostic artifacts are evidence cells with provenance, not theorem authority. That is why the component accepts digest-coupled diagnostic refs and negative cases while rejecting proof bodies, model-output data bodies, and stale source-coupling overclaims.

Validation Result record Path

./repo-pytest tests/test_proof_diagnostic_evidence_spine.py -q --basetemp=/tmp/microcosm_proof_diagnostic_evidence_spine_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This module can claim reader wiring for the proof-diagnostic evidence membrane: the component and mechanism subject resolve, and the runtime source locus is named. It cannot claim Lean or Lake execution, formal proof authority, formal-result correctness, provider authority, runtime correctness of the imported systems, source-file changes, launch-scope decision, publishing-scope decision, hosted deployment, or whole-system correctness.

Diagnostic result records, copied runtime artifacts from the formal-math evaluation pipeline, copied public component source, source digests, and focused tests can support only bounded evidence-accounting claims: which public refs, manifests, negative cases, and body-hygiene checks were validated. A diagram view and atlas entry are generated for this module; they do not convert diagnostics into formal-result correctness or provider/publishing-scope decision.

Scope boundary

This module documents diagnostic result record anchors over real system from the formal-math evaluation and premise-retrieval pipeline, and keeps forbidden proof/provider body cases as regression-only guards. It does not run Lean, use external model services, expose proof bodies, prove runtime correctness, certify public launch operations, authorize public sharing or recipient work, establish secret export, or claim whole-system correctness.

Source and projection details
Source-Open Body Floor

The public bundle carries two bounded body floors. The runtime-artifact floor copies thirteen diagnostic artifacts from the formal-math evaluation and premise-retrieval pipeline under examples/proof_diagnostic_evidence_spine/exported_evidence_bundle/source_artifacts and records their source/target digest coupling in bundle_manifest.json. Three rows are source-faithful public-light edits that redact operator absolute paths and retain both source and target digests.

The component-source floor copies the public source body for src/microcosm_core/organs/proof_diagnostic_evidence_spine.py under source_body_floor/source_modules. Generated state/runs JSON artifacts are evidence bodies, not source-body authority. Neither body floor places body text in result records or workingness cards, and neither imports proof bodies, model-output data bodies, account or browser state, browser UI live access, recipient-send state, account secrets, private proof bodies, or oracle-needed premise ids.

Formal Math Readiness GateReads declared math setups and lists which proof tactics may be attempted versus blocked.3/5

Does Before anyone tries to prove a math theorem with the Lean prover, this gate reads simple description files that declare what a math setup is supposed to have — which math library is claimed to be present, which proof tactics are reported as already probed, which lemmas may be looked up, and which limits apply to the text budgets handed to AI providers — and writes a plain checklist of what is allowed to be attempted versus blocked. It works only from those declared description files; it does not inspect the real toolchain or run anything. Its guards keep the claims honest and checkable: it refuses to let a library be marked available unless a probe result backs that up, blocks routing a proof tactic that was not probed, and refuses to let any real proof text sneak into the lemma-lookup tables or provider budgets.

Scope limit It only validates and projects declared readiness metadata; it does not run Lean/Lake, inspect the real toolchain, use external model services, prove any theorem correct, produce benchmark claims, or authorize Mathlib-dependent proof attempts.

Run
microcosm formal-math-readiness-gate run --input fixtures/first_wave/formal_math_readiness_gate/input --out receipts/first_wave/formal_math_readiness_gate

Paper module Formal Math Readiness Gate

Teleology

formal_math_readiness_gate is the public runtime cell that turns the formal math slice from a deferred slogan into an executable boundary. It validates synthetic readiness metadata for corpus availability, tactic probes, premise indexes, target-shape routing, and provider context recipes before any future Lean witness can claim authority.

The page should let a cold reader answer one question without rereading the component: what evidence has Microcosm actually validated, and where does that evidence stop?

Purpose

Formal-math tooling fails quietly when a library, tactic, or corpus is assumed present rather than checked. A pipeline that routes a proof to aesop when aesop is not actually available, or that treats a premise index as proof evidence because it happens to carry a proof body, has already lost the boundary between "ready to attempt" and "proven". This component exists to make that boundary explicit before any downstream proof work begins. It answers one question: which declared formal-math inputs are well-formed and honest enough that a later proof witness could safely consume them, and where exactly does that warrant stop?

The mechanism is a deterministic reducer over five public JSON inputs: corpus readiness, tactic-portfolio availability, a premise index, target-shape routing, and provider context recipes. It does not run Lean or Lake. Instead it reads what those inputs declare and refuses the specific ways they can lie. A corpus that claims Mathlib is available without a passing probe is rejected. A tactic marked available without a probe result record is rejected. A premise row carrying a proof_body or oracle_needed_premise_ids field is rejected. A route that admits a tactic the portfolio probe already marked unavailable is rejected. The output is a readiness board, not theorem evidence.

The design choice worth noticing is that the gate proves its own discipline through negative cases. Alongside the positive inputs, the fixture carries five inputs that each commit a known overclaim, and the run passes only when every one of those overclaims is caught and no unexpected finding appears. The gate is therefore not merely asserting "we check Mathlib availability"; it is demonstrating, on each run, that a falsified Mathlib claim is actually refused. A second guard keeps the floor source-open without leaking: copied prover probe bodies are verified by digest through a manifest, while proof bodies, model-output data, and private state stay out of the result records entirely.

Shape

unavailable tactic idsFive public JSON inputscorpus, tactics, premises,routes, provider recipesFive public JSON inputs corpus, tactics, premises, routes, provider recipesSecret-exclusion scanzero blocking hits requiredSecret-exclusion scan zero blocking hits requiredreject Mathlib-availabilityoverclaimreject Mathlib-availability overclaimeach available tactic needs aprobe result recordeach available tactic needs a probe result recordvalidate_premise_indexreject proof_body / oraclepremise idsvalidate_premise_index reject proof_body / oracle premise idsreject route admitting anunavailable tacticreject route admitting an unavailable tacticreject over-budget orproof-body recipereject over-budget or proof-body recipecopied probe bodies,digest-checkedcopied probe bodies, digest-checkedReconcile findings vsEXPECTED_NEGATIVE_CASESevery known overclaim must becaughtReconcile findings vs EXPECTED_NEGATIVE_CASES every known overclaim must be caughtReadiness board + extensionboardavailable / blockedcapabilities, countsReadiness board + extension board available / blocked capabilities, countsScope limitno Lean/Lake, proof,provider, launch, orprivate-system authorityScope limit no Lean/Lake, proof, provider, launch, or private-system authority

Source refs

reject Mathlib-availability overclaim
validate_corpus_readiness
each available tactic needs a probe result record
validate_tactic_portfolio
reject route admitting an unavailable tactic
validate_target_shape_routing
reject over-budget or proof-body recipe
validate_provider_context_recipes
copied probe bodies, digest-checked
validate_source_module_imports
Diagram source
flowchart TD Inputs["Five public JSON inputs corpus, tactics, premises, routes, provider recipes"] Scan["Secret-exclusion scan zero blocking hits required"] Corpus["validate_corpus_readiness reject Mathlib-availability overclaim"] Tactics["validate_tactic_portfolio each available tactic needs a probe result record"] Premises["validate_premise_index reject proof_body / oracle premise ids"] Routing["validate_target_shape_routing reject route admitting an unavailable tactic"] Provider["validate_provider_context_recipes reject over-budget or proof-body recipe"] SourceFloor["validate_source_module_imports copied probe bodies, digest-checked"] Reconcile["Reconcile findings vs EXPECTED_NEGATIVE_CASES every known overclaim must be caught"] Board["Readiness board + extension board available / blocked capabilities, counts"] Ceiling["Scope limit no Lean/Lake, proof, provider, launch, or private-system authority"] Inputs --> Scan Scan --> Corpus Scan --> Tactics Scan --> Premises Scan --> Provider Tactics -->|unavailable tactic ids| Routing Corpus --> Reconcile Tactics --> Reconcile Premises --> Reconcile Routing --> Reconcile Provider --> Reconcile SourceFloor --> Reconcile Reconcile --> Board Board --> Ceiling

The machine graph remains the generated paper_module.formal_math_readiness_gate.mermaid projection derived from the source record, not from this hand-authored Mermaid block.

Reader Evidence Routing

Read this module in evidence order:

  1. Start at core/paper_module_capsules.json::paper_modules[21:paper_module.formal_math_readiness_gate]. That row names the source authority, subjects, mechanism refs, code locus, Microcosm concept/principle/axiom refs, generated projection statuses, and the bundle scope limit.
  2. Check the generated structured source record paper_modules/formal_math_readiness_gate.json. Its relationships.edges cite the bundle source refs and show the generated Mermaid status, Atlas status, source_authority: json_capsule, and unresolved selective-relation count.
  3. Inspect the runtime locus src/microcosm_core/organs/formal_math_readiness_gate.py, especially run, run_readiness_bundle, validate_source_module_imports, write_receipts, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and SOURCE_MODULE_MANIFEST_NAME.
  4. Use fixture evidence for the gate behavior: fixtures/first_wave/formal_math_readiness_gate/input, receipts/first_wave/formal_math_readiness_gate/readiness_gate_result.json, formal_math_readiness_board.json, formal_math_readiness_extension_board.json, formal_math_readiness_validation_receipt.json, and result records/sign-off/first_wave/formal_math_readiness_gate_fixture_acceptance.json.
  5. Use exported-bundle evidence for source-open body-floor claims: examples/formal_math_readiness_gate/exported_formal_math_readiness_bundle/source_module_manifest.json, bundle_manifest.json, source_artifacts/, source_body_floor/source_modules/, and receipts/runtime_shell/demo_project/organs/formal_math_readiness_gate/exported_formal_math_readiness_bundle_validation_result.json.
  6. Use tests/test_formal_math_readiness_gate.py for the behavioral result record boundary. The tests cover negative cases, exported bundle sign-off, source-module digest and target-ref mismatch rejection, bounded command-card output, source-body omission from result records, secret-exclusion/public-relative result record paths, and non-writing plan preview.

Do not route a proof claim through this page. It routes readiness evidence, result record integrity, and source-body-floor accounting only.

Technical Mechanism

The runtime is a deterministic readiness reducer over declared public inputs. run() evaluates the first-wave fixture directory with positive and negative JSON cases enabled; run_readiness_bundle() evaluates the exported public bundle without fixture-negative cases and requires the bundle source-module manifest. Both entrypoints call _build_result(), so the fixture and exported bundle result records share one scope limit, one secret scan, one source-module digest checker, and one readiness-board schema.

_build_result() first loads the five public input families: corpus_readiness.json, tactic_portfolio_availability.json, premise_index.json, target_shape_tactic_routing.json, and provider_context_recipes.json. It then scans those inputs plus any declared source artifacts through secret_exclusion_scan.scan_paths, using the public Microcosm forbidden-class policy. The scan is not advisory: the result can pass only when the scan has zero blocking hits, source-module imports pass, all expected fixture-negative cases are observed, and no unexpected positive-case findings remain.

The mechanism is split into six validators:

  • validate_corpus_readiness() records Lean and Mathlib readiness metadata and adds lean_std_synthetic_core:mathlib to blocked capabilities when Mathlib is unavailable. A Mathlib availability claim without a passing probe becomes MATHLIB_AVAILABILITY_OVERCLAIM.
  • validate_tactic_portfolio() separates available from unavailable tactics and requires every available tactic to carry a probe result record. Synthetic probe labels are accepted only when _tactic_probe_realness_evidence() binds them to copied source modules or fixture-manifest source-open evidence.
  • validate_premise_index() admits premise rows as metadata only. It counts premises, namespaces, retrieval terms, and split eligibility, but rejects proof_body, ground_truth_proof, provider_output_body, and oracle_needed_premise_ids.
  • validate_target_shape_routing() intersects each route case's allowed tactics with the unavailable tactics emitted by the portfolio validator. Any overlap becomes ROUTING_ALLOWS_UNAVAILABLE_TACTIC, so routing cannot silently re-enable a tactic that the probe plane blocked.
  • validate_provider_context_recipes() records byte budgets and deliverable shape while rejecting public recipes over 32,768 bytes or recipes that allow proof bodies or provider-body material.
  • validate_source_module_imports() verifies the exported bundle's source_module_manifest.json, target refs, source refs, line counts, target digests, source digests, exact-copy rows, and the two permitted private-path rewrites. It reports digest/ref failures without placing copied source bodies in result records.

After the validators run, _merge_observed() and _merge_findings() compare observed fixture failures against EXPECTED_NEGATIVE_CASES. This is the local scope limit: the fixture run must prove that the known overclaims are caught, while the exported-bundle run must prove that the positive public bundle has no unexpected findings. _build_extension_board() then projects the accepted metadata into the extension board: selected pattern ids, namespace and split counts, tactic availability counts, Mathlib-dependent unavailable tactics, blocked route cases, provider budgets, source-body import counts, the scope limit, and the scope boundary.

Result record writing preserves the same boundary. write_receipts() emits the gate result, readiness board, extension board, validation result record, and sign-off result record for fixture mode. run_readiness_bundle() emits the exported-bundle result record. The focused test suite asserts the mechanism rather than just file existence: it checks the five expected negative case ids, local Lean/Lake probe metadata with Mathlib unavailable, six available tactics with aesop blocked, eleven premises, five route cases, three provider recipes, thirteen verified source artifacts, source/target digest mismatch rejection, target-ref mismatch rejection, secret-exclusion/public-relative result record paths, and result record omission of copied body text.

Public Contract

The component does not run Lean or Lake. It consumes public JSON fixtures and exported bundles, records which capabilities are available or blocked, rejects Mathlib availability overclaims, rejects unprobed tactics, rejects premise rows that contain proof bodies, rejects routes that admit unavailable tactics, and rejects provider recipes that exceed the public budget or allow proof bodies.

The accepted result is a readiness board. That board can tell a later component what is safe to attempt, but it is bounded evidence evidence, benchmark evidence, or permission to execute a theorem prover.

Prior Art Grounding

This component is grounded in formal-math benchmark and environment-readiness work where the presence of a library, tactic, or corpus is not enough by itself. miniF2F motivates explicit benchmark split discipline for formal mathematics, LeanDojo motivates reproducible theorem-proving environments, and mathlib makes the availability of library imports a concrete precondition rather than a vague capability claim.

Microcosm borrows the readiness-gate pattern: corpus availability, Mathlib probes, tactic probes, premise indexes, target-shape routing, and context budgets must be checked before downstream proof or retrieval language is allowed. It excludes Lean execution or proof authority.

Runtime Surfaces

  • python -m microcosm_core.organs.formal_math_readiness_gate run --input fixtures/first_wave/formal_math_readiness_gate/input --out receipts/first_wave/formal_math_readiness_gate
  • python -m microcosm_core.organs.formal_math_readiness_gate run-readiness-bundle --input examples/formal_math_readiness_gate/exported_formal_math_readiness_bundle --out receipts/runtime_shell/demo_project/organs/formal_math_readiness_gate
  • python -m microcosm_core.organs.formal_math_readiness_gate plan --input fixtures/first_wave/formal_math_readiness_gate/input
  • microcosm formal-math-readiness-gate run --input fixtures/first_wave/formal_math_readiness_gate/input --out receipts/first_wave/formal_math_readiness_gate
  • microcosm formal-math-readiness-gate plan --input fixtures/first_wave/formal_math_readiness_gate/input

Relationship To Lean Witness

formal_math_lean_proof_witness remains deferred. This gate makes the deferral typed and testable: Mathlib is absent until a passing probe says otherwise, unavailable tactics cannot be routed, premise indexes cannot carry proof or oracle bodies, and provider recipes cannot smuggle proof-body deliverables.

Validation Result record Path

./repo-pytest tests/test_formal_math_readiness_gate.py -q --basetemp=/tmp/microcosm_formal_math_readiness_gate_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
jq '{edge_count:(.relationships.edges|length), mermaid_status:.paper_module_payload.generated_projections.mermaid.status, atlas_status:.paper_module_payload.generated_projections.atlas_card.status, source_authority:.relationships.source_authority, unresolved_selective_relation_count:(.relationships.unpopulated_selective_relations|length)}' paper_modules/formal_math_readiness_gate.json

Expected generated-row proof: edge_count: 15, mermaid_status: available_from_capsule_edges, atlas_status: blocked_until_organ_atlas_owner_lane_binds_edges, source_authority: json_capsule, and unresolved_selective_relation_count: 0.

Scope boundary

Scope limit

This module may claim that Microcosm has a public readiness gate for formal math system preparation. The valid claim is bounded to corpus availability, Mathlib and tactic probe metadata, premise-index coverage, target-shape tactic routing, provider context budget checks, extension-board pattern ids, public PROVER smoke-run source artifacts, an exact public component-source body floor, and fixture or exported-bundle result records.

The module must not claim Lean/Lake execution, theorem proving, formal proof authority, formal-result correctness, Mathlib-dependent proof success, benchmark performance, provider-call execution, private proof-body import, oracle-needed premise disclosure, source-file changes, publishing-scope decision, hosted deployment, recipient work, secret export, or whole-system correctness. Its strongest launch-facing statement is readiness-boundary enforcement over public metadata and copied source artifacts.

Limitations

The runtime validates finite public fixtures and exported-bundle manifests. It does not execute Lean or Lake, import Mathlib in the current environment, call a provider, or check theorem statements. When the result reports blocked capabilities such as lean_std_synthetic_core:mathlib, that is a readiness boundary for downstream components, not an invitation to route around the gate.

The copied source artifacts are source-open body-floor evidence only. Digest and target-ref checks show that selected PROVER readiness/probe bodies and the public component source copy match their manifests; they do not authorize source-file changes, private source-root export, proof-body disclosure, recipient work, hosted deployment, or public sharing. Result records intentionally carry counts, digests, paths, negative-case coverage, and authority flags instead of copied body text.

The negative cases are also finite. They cover the known overclaims encoded in EXPECTED_NEGATIVE_CASES: Mathlib availability without a passing probe, unprobed tactic availability, premise rows with proof bodies, target routes that admit unavailable tactics, and provider recipes that exceed public budgets or permit proof bodies. A new formal-math claim needs either a new source-backed negative case here or a different proof consumer; this page should not be used as a generic formal-proof claim surface.

Scope boundary

This module documents a public readiness gate only. It excludes Lean/Lake execution, formal proof authority, Mathlib-dependent proof attempts, external model access, benchmark claims, public launch, hosted deployment, public sharing, recipient work, secret export, or whole-system correctness. It also does not make private source-root material, browser UI state, account or browser material, browser state, account secrets, source notes, model-output data bodies, recipient-send state, or private proof bodies part of the public Microcosm body floor.

Source and projection details
Source-Open Body Floor

The exported readiness bundle carries thirteen PROVER smoke-run readiness/probe bodies under source_artifacts. They cover corpus readiness, tactic-affordance probe metadata, Mathlib and trace probes, and the copied portfolio-core Lean probes used to decide which tactics are blocked or available. Two JSON rows are private-path rewrites; those rows retain source and target digests plus the rewrite mode.

The bundle also carries an exact public component-source copy for src/microcosm_core/organs/formal_math_readiness_gate.py under source_body_floor/source_modules. Generated state/runs Lean artifacts are runnable readiness evidence, not source-body authority. Neither floor places body text in result records or workingness cards, and neither imports model-output data bodies, account or browser state, browser UI live access, recipient-send state, account secrets, private proof bodies, or oracle-needed premise ids.

The source-module manifest and bundle manifest are the right surfaces for body-floor inspection. The validation result records intentionally carry status, digests, counts, and public-relative refs rather than copied source bodies.

Wave 011 adds the explicit extension board for the source intake cell formal_math_readiness_extensions. The board is still metadata-only, but it is more useful than the older flat counts: it records the selected pattern ids (lean_std_toolchain_premise_index, tactic_portfolio_availability_probe, target_shape_tactic_routing_gate), the source projection intake ref, public target refs, validation refs, namespace and split coverage for the premise index, tactic availability status counts, Mathlib-dependent unavailable tactics, target-shape routing admissibility, and provider context budgets.

Governing Lattice Relation

The bundle binds this module to concept.formal_math_and_proof_witness_bundle because the component is not a theorem prover; it is the membrane that decides which public formal-math inputs are safe enough for a later proof witness to consume. The governing mechanisms split that membrane in two. The validates_public_formal_math_readiness_bundle mechanism names the positive bundle path: run, run_readiness_bundle, validate_source_module_imports, and write_receipts validate the declared corpus, tactic, premise, routing, provider-budget, source-module-manifest, and source-body-floor evidence before writing readiness boards. The validates_public_readiness_boundary mechanism names the negative path: validate_corpus_readiness, validate_tactic_portfolio, validate_premise_index, validate_target_shape_routing, and validate_provider_context_recipes reject the cases that would turn readiness metadata into proof authority.

The principle and axiom refs are therefore operational, not decorative. P-1, P-2, and P-3 are expressed by keeping the JSON bundle, generated structured source record, runtime code locus, and result records as separate authority classes. P-6 and P-8 are expressed by the body-floor and secret-exclusion contracts: copied PROVER probe bodies and the public component source copy can be inspected through digests and manifests, while private proof bodies, model-output data bodies, and browser or account state stay outside the public floor. AX-1, AX-2, AX-5, and AX-7 are the local reason the downstream paper_module.formal_math_lean_proof_witness remains a dependency rather than an already-proven conclusion.

The generated lattice edge count is small on purpose: it proves that this page is bundle-backed, source-bound, and connected to one deferred proof-witness module.

Corpus Readiness Mathlib Absence GateRuns the real Lean toolchain to confirm the math library is absent, then gates proof tasks.4/5Runs real tools

Does It reads a recorded readiness report from a Lean math toolchain run and makes one fact inspectable: when the report was captured, the Mathlib library was not importable (its import probe failed). From that, it lists which math corpora are absent or usable only for translation smoke tests, and which downstream tasks must be blocked before any Mathlib-dependent proof work is attempted. It also re-checks that the recorded source files match their recorded SHA-256 digests and that no proof bodies, provider outputs, or non-public paths leaked into the public output. The result shows exactly where the proof pipeline draws a hard "not ready, do not proceed" line, with provenance, instead of quietly assuming the environment is fine.

Scope limit It only projects and gate-checks recorded corpus/toolchain readiness accounting, re-verifies recorded source digests and leakage guards, and runs a bounded Lean/Lake import probe when a toolchain is present. It does not run a full Lake build, prove formal-result correctness, claim Mathlib is available beyond the probe result, benchmark corpora, score model performance, use external model services, or include launch operations or public sharing.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run --input fixtures/first_wave/corpus_readiness_mathlib_absence_gate/input --out receipts/first_wave/corpus_readiness_mathlib_absence_gate

EvidenceExternal tool runevidence 4/5Real runtime result

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Corpus Readiness Mathlib Absence Gate

Abstract

corpus_readiness_mathlib_absence_gate is the public formal-math corpus readiness boundary for Microcosm. It carries copied corpus/toolchain rows from the 2026-05-11 proof-state curriculum smoke run and forces Mathlib absence, absent-corpus blocking, consumer gate decisions, and source-module digest coupling to be visible before any downstream retrieval, tactic-routing, or proof-witness language is allowed.

Purpose

Formal-math agents fail in a specific way: they treat "there is a corpus" as if it meant "this corpus is usable for the proof route I am about to take". A roster lists miniF2F, PutnamBench, ProofNet, LeanDojo and Mathlib, the agent assumes the libraries are present, and the failure only surfaces later as a broken import or a tactic that needs a premise the host cannot resolve. This component answers one question before that happens: for each corpus, is it actually present on this host, and is the Mathlib import lane actually available, or not?

The unusual part is that the gate does not take the answer on trust. It runs a bounded Lean/Lake import probe in a temporary directory: one small file that imports Std and is expected to compile, and one that imports Mathlib and is expected to be rejected with the toolchain's own unknown module prefix 'Mathlib' error. A corpus is only marked usable for Mathlib-dependent work when the runtime evidence agrees the corpus exists, carries a Lake file, and the Mathlib lane probe passes. In the current system the Mathlib probe stays false, so every Mathlib-dependent consumer is blocked, and the one consumer that passes is the Lean3 translation smoke, which needs no Mathlib project at all.

This closes the most common way a readiness claim drifts. Stale alias fields such as mathlib_available, or a PASS lean status, cannot turn the gate green on their own; they must agree with the live probe or the row is flagged. The probe is deliberately narrow. It checks that imports resolve and that Mathlib is genuinely absent. It does not run a lake build, prove any theorem, or claim Mathlib is installed. The output is a readiness board and a set of blocked consumer verdicts, bounded evidence.

Shape

no, probe falseFixture or exported bundleinputcorpus readiness rows +consumer gate casesFixture or exported bundle input corpus readiness rows + consumer gate caseslake env lean: Std compiles,Mathlib import rejectedlake env lean: Std compiles, Mathlib import rejectedcheck SHA-256 digests, parseprobe JSONcheck SHA-256 digests, parse probe JSONMathlib lane available?corpus exists + Lake file +probe passesMathlib lane available? corpus exists + Lake file + probe passes7 corpus rows, alias fieldsmust agree with probe7 corpus rows, alias fields must agree with probederive verdicts fromreadiness factsderive verdicts from readiness facts4 copied source artifacts,digest match4 copied source artifacts, digest matchAllowed: Lean3 translationsmoke(needs no Mathlib project)Allowed: Lean3 translation smoke (needs no Mathlib project)Blocked: Mathlib-dependentand absent-corpus consumersBlocked: Mathlib-dependent and absent-corpus consumersmetadata-only result recordsresult, board, validation,sign-off, bundlemetadata-only result records result, board, validation, sign-off, bundleScope limitno Mathlib availability,proof, provider, launchScope limit no Mathlib availability, proof, provider, launch

Source refs

lake env lean: Std compiles, Mathlib import rejected
runtime_lean_import_probe
check SHA-256 digests, parse probe JSON
validate_runtime_source_artifacts
7 corpus rows, alias fields must agree with probe
validate_corpus_readiness
derive verdicts from readiness facts
validate_consumer_gate_cases
4 copied source artifacts, digest match
validate_source_module_imports
Diagram source
flowchart TD fixture["Fixture or exported bundle input corpus readiness rows + consumer gate cases"] probe["runtime_lean_import_probe lake env lean: Std compiles, Mathlib import rejected"] artifacts["validate_runtime_source_artifacts check SHA-256 digests, parse probe JSON"] mathlib{"Mathlib lane available? corpus exists + Lake file + probe passes"} corpus["validate_corpus_readiness 7 corpus rows, alias fields must agree with probe"] gates["validate_consumer_gate_cases derive verdicts from readiness facts"] imports["validate_source_module_imports 4 copied source artifacts, digest match"] allowed["Allowed: Lean3 translation smoke (needs no Mathlib project)"] blocked["Blocked: Mathlib-dependent and absent-corpus consumers"] result records["metadata-only result records result, board, validation, sign-off, bundle"] ceiling["Scope limit no Mathlib availability, proof, provider, launch"] fixture --> artifacts artifacts --> probe probe --> mathlib mathlib -->|no, probe false| corpus corpus --> gates gates --> allowed gates --> blocked fixture --> imports corpus --> result records gates --> result records imports --> result records result records --> ceiling

This reader diagram is intentionally smaller than the generated doctrine-lattice graph.

Mechanism

The mechanism is a readiness reducer, not a theorem-proving backend. The runtime entrypoints run and run_projection_bundle both call _build_result, which loads public fixture or exported-bundle inputs, scans those inputs against the non-public-state exclusion policy, verifies copied source artifacts, and then combines corpus readiness, consumer gate, source-module import, negative-case, and scope limit fields into one metadata-only result.

validate_runtime_source_artifacts anchors the reducer to four source refs: the corpus readiness rows, tactic-affordance probe, Mathlib import probe Lean file, and tactic portfolio availability JSON. It checks expected SHA-256 digests, parses the JSON source artifacts, and runs a bounded Lean/Lake import probe that can show Std imports and Mathlib remains absent without running a Lake build or exporting Lean bodies.

validate_corpus_readiness normalizes seven corpus rows against those runtime source artifacts. A corpus is usable for Mathlib-dependent work only when the runtime evidence says the corpus exists, has a Lake file, and mathlib_lake_project_import_available is true. In the current fixture and bundle evidence that field remains false, so Mathlib-dependent capabilities are blocked, absent corpora are recorded, and stale alias fields such as mathlib_available cannot turn the gate green.

validate_consumer_gate_cases then derives consumer verdicts from the normalized readiness facts instead of trusting expected-decision labels. The translation smoke consumer can pass because it does not require a Mathlib Lake project and names an available Lean3 reference corpus; Mathlib-dependent or absent-corpus consumers stay blocked. validate_source_module_imports adds the exported bundle floor by requiring the manifest class copied_non_secret_macro_body, material classes, target/source digest agreement, and no body material in result records.

The proof consumers are the two component commands, the focused regression test tests/test_corpus_readiness_mathlib_absence_gate.py, the paper-module corpus check, and the command-card surfaces emitted by result_card. Together they exercise the success path, contradictory Mathlib claims, consumer-gate skips, source digest tampering, private-path rewrites, runtime-probe blocks, and result record body exclusion. The resulting evidence relates the bundle's two mechanisms to concept.formal_math_and_proof_witness_bundle, P-8, and AX-7 by making readiness visibility a precondition for downstream formal-math claims while keeping the scope limit below theorem, provider, benchmark, or launch-scope decision.

Public Surfaces

  • Component runner: python -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run --input fixtures/first_wave/corpus_readiness_mathlib_absence_gate/input --out receipts/first_wave/corpus_readiness_mathlib_absence_gate
  • Exported bundle runner: python -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run-projection-bundle --input examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle --out receipts/runtime_shell/demo_project/organs/corpus_readiness_mathlib_absence_gate
  • Standard: standards/std_microcosm_corpus_readiness_mathlib_absence_gate.json
  • Source-module manifest: examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle/source_module_manifest.json
  • Runtime result record: receipts/runtime_shell/demo_project/organs/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle_validation_result.json

Reader Evidence Routing

Read this module in five passes:

  1. Start with the source record at core/paper_module_capsules.json::paper_modules[8:paper_module.corpus_readiness_mathlib_absence_gate]. It is the source authority that names source_authority: json_capsule, the component subject, two mechanism subjects, the resolved runtime code locus, the concept concept.formal_math_and_proof_witness_bundle, the dependency paper_module.tactic_portfolio_availability, P-8, and AX-7.
  2. The reader proof is the current row shape: eight generated relationship edges, Mermaid available_from_capsule_edges, Atlas blocked_until_organ_atlas_owner_lane_binds_edges, and no unpopulated paper-module selective dependency residual for the tactic-portfolio edge. The structured source record is wiring evidence, not theorem-correctness, runtime-correctness, launch, provider, or production authority.
  3. Inspect the runtime locus src/microcosm_core/organs/corpus_readiness_mathlib_absence_gate.py. The load-bearing symbols are run, run_projection_bundle, validate_corpus_readiness, validate_consumer_gate_cases, validate_source_module_imports, _build_result, write_receipts, result_card, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_NAME, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.
  4. For fixture evidence, use fixtures/first_wave/corpus_readiness_mathlib_absence_gate/input and the result records under receipts/first_wave/corpus_readiness_mathlib_absence_gate/ plus result records/sign-off/first_wave/corpus_readiness_mathlib_absence_gate_fixture_acceptance.json. The first-wave result result record records seven corpus rows, seven consumer cases, one allowed Lean3 translation-smoke case, six blocked absent or Mathlib-dependent cases, mathlib_lake_project_import_available: false, body_in_receipt: false, and the five negative cases mathlib_available_without_probe, consumer_skips_readiness_gate, private_corpus_source_ref, proof_body_leakage, and release_overclaim.
  5. For exported-bundle evidence, use examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle/source_module_manifest.json and receipts/runtime_shell/demo_project/organs/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle_validation_result.json. The manifest verifies four copied source artifacts: corpus readiness JSON, tactic-affordance probe JSON, the Mathlib import probe Lean file, and tactic portfolio availability JSON. The exported result record records source_module_import_count: 4, copied_source_artifact_count: 4, source_modules_pass: true, body_in_receipt: false, and three blocked absent or Mathlib-dependent bundle consumer cases.

If a reader needs validation result records rather than prose, run the commands in ## Validation Result record Path, including the focused regression test and paper-module corpus check. Treat every result record as corpus-readiness boundary evidence only; it does not create Lean/Lake execution authority, Mathlib availability, theorem-proof authority, provider authority, private-system equivalence, or launch-scope decision.

Prior Art Grounding

This component is grounded in Lean corpus and neural theorem-proving work where library availability, premise access, and benchmark splits are part of the claim. The Lean mathematical library establishes Mathlib as a large community-maintained formal mathematics corpus, miniF2F gives a cross-system benchmark for formal Olympiad statements, and LeanDojo shows why reproducible corpus extraction and accessible-premise metadata matter for theorem-proving agents.

Microcosm borrows the readiness gate: corpus rows, Mathlib availability probes, blocked consumer cases, source-module digests, and negative leakage guards must be visible before retrieval, tactic-routing, or proof-witness language is allowed. It does not claim Mathlib is present or that any theorem was proved.

Research Bet

Formal-math agents fail when they treat "there is a corpus" as equivalent to "this corpus is usable for this proof route." This component makes that boundary runnable. It records seven corpus rows, blocks six absent or Mathlib-dependent consumer cases, allows only the Lean3 translation-smoke case, and keeps the Mathlib probe false until an actual passing probe is present.

The exported bundle carries four copied body artifacts: corpus readiness JSON, tactic-affordance probe JSON, the Mathlib import probe Lean file, and tactic portfolio availability JSON. Two rows are exact copies and two use a verified private-path rewrite. The result record records the manifest status, counts, material classes, digests, and metadata-only policy; the copied bodies stay under source_artifacts/, not inside result records.

Source-Backed Doctrine Binding

  • Component: src/microcosm_core/organs/corpus_readiness_mathlib_absence_gate.py
  • Bundle: core/paper_module_capsules.json#paper_module.corpus_readiness_mathlib_absence_gate
  • Mechanism: core/mechanism_sources.json#mechanism.corpus_readiness_mathlib_absence_gate.validates_public_corpus_readiness_boundary
  • Standard: standards/std_microcosm_corpus_readiness_mathlib_absence_gate.json
  • Evidence class: core/organ_evidence_classes.json::corpus_readiness_mathlib_absence_gate records algorithmic_projection at rank 3.
  • Source-module manifest: examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle/source_module_manifest.json
  • Sign-off result records: receipts/first_wave/corpus_readiness_mathlib_absence_gate/* and result records/sign-off/first_wave/corpus_readiness_mathlib_absence_gate_fixture_acceptance.json

Cold-Agent Use

Open the source-module manifest first, then the runtime bundle result record, then the first-wave result result record. The useful claim is not that Microcosm has Mathlib or can prove downstream theorems. The useful claim is that Microcosm can force a formal-math route to expose corpus availability, Mathlib absence, consumer gating, source-module digest evidence, copied-body boundaries, negative-case result records, and an explicit scope boundary before any proof route is treated as usable.

Re-entry condition: the current atlas row already points at this paper module. After the sibling organ_atlas.json lane releases, bind this bundle's mechanism ref and code locus into the atlas row and rerun python -m microcosm_core.doctrine_lattice --check.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

PYTHONPATH=src python3 -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run \
  --input fixtures/first_wave/corpus_readiness_mathlib_absence_gate/input \
  --out /tmp/microcosm-corpus-readiness-mathlib-absence-vrp
PYTHONPATH=src python3 -m microcosm_core.organs.corpus_readiness_mathlib_absence_gate run-projection-bundle \
  --input examples/corpus_readiness_mathlib_absence_gate/exported_corpus_readiness_bundle \
  --out /tmp/microcosm-corpus-readiness-mathlib-absence-bundle-vrp
PYTHONPATH=src python3 scripts/build_doctrine_projection.py --check-paper-module-corpus
PYTHONPATH=src .venv/bin/python -m pytest -p no:cacheprovider --basetemp=/tmp/microcosm-corpus-readiness-mathlib-absence-tests -q tests/test_corpus_readiness_mathlib_absence_gate.py
jq '{edge_count:(.relationships.edges|length), mermaid_status:.paper_module_payload.generated_projections.mermaid.status, atlas_status:.paper_module_payload.generated_projections.atlas_card.status, source_authority:.paper_module_payload.source_authority, unresolved_selective_relation_count:(.relationships.unpopulated_selective_relations|length)}' paper_modules/corpus_readiness_mathlib_absence_gate.json

The fixture command writes the public corpus-readiness board, result result record, and validation result record. The bundle command validates the exported source-module manifest and metadata-only runtime result record. The corpus check and jq structured source record query prove the bundle-derived projection currentness without hand-editing generated JSON. The focused test keeps the Mathlib absence boundary, consumer gate cases, source-module digest checks, non-public paths rewrite policy, and scope boundary behavior from regressing.

Passing these commands does not establish Mathlib is installed, rerun Lean/Lake, validate downstream formal-result correctness, benchmark a corpus, authorize external model access, or approve launch; it only proves the bounded fixture and exported bundle result records preserve the declared readiness boundary.

Scope boundary

Scope limit

This component is algorithmic projection over copied source system, not a Lean/Lake rerun and not Mathlib proof authority. Its strongest public claim is that a fixture and exported bundle agree about corpus readiness, Mathlib absence, blocked consumers, copied source-module digests, metadata-only result records, and negative leakage guards. It does not establish formal-result correctness, claim Mathlib is available, benchmark a corpus, expose proof/provider/private bodies, call a provider, change source files, or include launch operations.

Scope limit

The JSON bundle proves a public corpus-readiness boundary only: copied corpus/toolchain rows, absent-Mathlib blocking, consumer gate decisions, source-module digest coupling, metadata-only result records, and negative leakage guards. Mermaid availability reflects bundle edges, while the Atlas row still waits on the component-atlas owner lane. This module does not establish Mathlib is installed, rerun Lean or Lake, validate formal-result correctness, benchmark corpus quality, authorize retrieval or tactic routing, use external model services, expose private proof bodies, change source records, or approve launch.

Result record Shape

The first-wave result result record records corpus_count: 7, consumer_case_count: 7, allowed_case_ids, blocked_case_ids, absent_corpus_ids, mathlib_lake_project_import_available: false, body_in_receipt: false, the scope limit, and five observed negative cases:

  • mathlib_available_without_probe
  • consumer_skips_readiness_gate
  • private_corpus_source_ref
  • proof_body_leakage
  • release_overclaim

The exported runtime result record records source_module_import_count: 4, copied_source_artifact_count: 4, source_modules_pass: true, and the same metadata-only result record boundary.

Scope boundary

This is a source-backed corpus readiness boundary with copied source corpus/toolchain material, not Lean/Lake execution, Mathlib availability, theorem-proof authority, corpus benchmark authority, provider authority, or launch-scope decision.

Mathematical Strategy Atlas Hypothesis ScorerPicks a first-guess proof strategy from a problem's tags and flags any it cannot map.3/5

Does Before any proof is attempted, this component looks at a math problem's feature tags and writes down its first-guess strategy (for example, "this looks like an if-and-only-if, so split it both ways"), and flags anything it cannot map as an explicit "no strategy matched" instead of a silent failure. The chosen opening move, why it was chosen, and the cases it could not map are all recorded in machine-readable result records.

Scope limit It only projects pre-oracle strategy-hypothesis and retrieval mechanics; it does not run Lean/Lake, prove theorems, establish domain or formal-result correctness, reveal oracle labels, expose proof bodies, use external model services, tune on test answers, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.mathematical_strategy_atlas_hypothesis_scorer run --input fixtures/first_wave/mathematical_strategy_atlas_hypothesis_scorer/input --out receipts/first_wave/mathematical_strategy_atlas_hypothesis_scorer

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Mathematical Strategy Atlas

mathematical_strategy_atlas_hypothesis_scorer is the public pre-oracle strategy layer for Microcosm formal-math work. It turns problem feature tags into an explicit strategy hypothesis before premise retrieval or proof execution, then records the result as redacted result records.

The point is not to prove anything. The point is to make the first mathematical move inspectable: an iff_goal shape selects iff_split, a recursive list shape selects recursive_data_induction, arithmetic normalization selects the arithmetic lens, and unmapped shapes become a typed STRATEGY_SELECTION_MISS instead of a hidden failure mode.

The current body-floor import carries eight copied source bodies: the prover graph benchmark harness, the provider result record reducer, their strategy-boundary regression tests, the compute-provider strategy classification standard, and three public runtime artifacts from PROVER_PROVIDER_CONTEXT_SWEEP_20260510_v0 (strategy_cards.json, strategy_hypothesis_set.json, and prover_skill_atlas.json). They live in source_artifacts/ under both the first-wave fixture input and the exported bundle; result records carry refs, counts, hashes, anchors, and verdicts instead of body text.

Purpose

A proof search has to start somewhere. Before any premise is retrieved or any tactic is run, an agent has already committed to a first move: a goal shape, a lens, a family of tactics it expects to use. That choice is usually implicit, buried inside a model call or a prompt. This component exists to pull it into the open. The single question it answers is: for a given problem shape, which strategy did the system pick first, and on what visible evidence?

The interesting part is what the answer is allowed to depend on. The scorer never sees the oracle's expected strategy, the ground-truth proof, or any provider output. It works only from public problem features and a strategy atlas of trigger features, negative triggers, and retrieval-expansion terms. The selected strategy is therefore a hypothesis, recomputed from inputs a cold reader can also read, not a result borrowed from the answer key.

That constraint is what the page guards. The common failure mode for a "strategy classifier" is to bake the answer in: declare the chosen strategy as a plain label, or score it on shallow feature overlap that happens to line up with the known-good label. The component rejects both. A declared selection must match the score the scorer recomputes from evidence, and a strategy chosen on overlap alone is a typed negative case rather than a pass.

Shape

The local component standard, when changing runtime behavior or the claim envelope, is standards/std_microcosm_mathematical_strategy_atlas_hypothesis_scorer.json; the general paper-module contract remains standards/std_microcosm_paper_module.json.

The diagram below traces the scorer's runtime flow inside that projection: how public inputs become a per-candidate score, how a selection or a typed miss is chosen, and how the result is recomputed and written as metadata-only result records under the scope limit.

trigger / negative /retrieval termstrigger / negative / retrieval termsfeature tags, oracle hiddenfeature tags, oracle hiddencandidate strategy idscandidate strategy idsscore = trigger_hits x4negative_hits x3+ retrieval_bonus (cap 2)score = trigger_hits x4 negative_hits x3 + retrieval_bonus (cap 2)rank positive scorestie-break by order, then idrank positive scores tie-break by order, then idany positivescore?any positive score?selected_strategy_id+ score componentsselected_strategy_id + score componentsSTRATEGY_SELECTION_MISS(unknown)STRATEGY_SELECTION_MISS (unknown)recompute vs declaredselection / score / rankingrecompute vs declared selection / score / rankingmetadata-only result recordsrefs, counts, hits, verdictsmetadata-only result records refs, counts, hits, verdictsScope limitno Lean/Lake, oracle labels,external model access, orlaunchScope limit no Lean/Lake, oracle labels, external model access, or launch

Source refs

trigger / negative / retrieval terms
strategy_atlas.json
feature tags, oracle hidden
problem_features.json
candidate strategy ids
hypothesis_cases.json
Diagram source
flowchart TD subgraph Inputs["Public inputs"] atlas["strategy_atlas.json trigger / negative / retrieval terms"] features["problem_features.json feature tags, oracle hidden"] cases["hypothesis_cases.json candidate strategy ids"] end subgraph Scoring["Per-candidate scoring"] score["score = trigger_hits x4 - negative_hits x3 + retrieval_bonus (cap 2)"] rank["rank positive scores tie-break by order, then id"] end select{"any positive score?"} selected["selected_strategy_id + score components"] miss["STRATEGY_SELECTION_MISS (unknown)"] recheck["recompute vs declared selection / score / ranking"] result records["metadata-only result records refs, counts, hits, verdicts"] ceiling["Scope limit no Lean/Lake, oracle labels, external model access, or launch"] atlas --> score features --> score cases --> score score --> rank rank --> select select -- yes --> selected select -- no --> miss selected --> recheck miss --> recheck recheck --> result records result records --> ceiling

The generated instance currently exposes 19 concrete relationships.edges: two subject edges for the component and mechanism, one governing concept edge, six principle edges, six axiom edges, three sibling paper-module dependency edges, and one resolved code-locus edge into src/microcosm_core/organs/mathematical_strategy_atlas_hypothesis_scorer.py. relationships.unpopulated_selective_relations is empty, so the module-level unresolved selective-relation count available from this instance is 0.

Runtime evidence enters through the fixture input fixtures/first_wave/mathematical_strategy_atlas_hypothesis_scorer/input, the exported bundle examples/mathematical_strategy_atlas_hypothesis_scorer/exported_mathematical_strategy_atlas_bundle, and their copied source_artifacts/ / source_module_manifest.json bundles. The focused test file is tests/test_mathematical_strategy_atlas_hypothesis_scorer.py; result records include receipts/first_wave/mathematical_strategy_atlas_hypothesis_scorer/mathematical_strategy_atlas_result.json, mathematical_strategy_atlas_board.json, mathematical_strategy_atlas_validation_receipt.json, result records/sign-off/first_wave/mathematical_strategy_atlas_hypothesis_scorer_fixture_acceptance.json, and runtime-shell exported-bundle validation result records.

The honest ceiling is narrow by design: this module can say that public pre-oracle strategy hypotheses, retrieval-lens metadata, copied public source tool/standard/runtime bodies, source-artifact digests, and negative cases are inspectable. It cannot say that Lean or Lake ran, that a theorem was proved, that oracle labels or model-output data are visible, that benchmark performance is certified, that public sharing is approved, that launch is approved, or that the private root has been made public-safe.

How it works

The scorer reads three public inputs: a strategy atlas, a set of problem features, and a set of hypothesis cases. For each candidate strategy in a case it computes a single integer score from three terms. Each problem feature that matches a strategy's trigger_features adds four points. Each feature that matches the strategy's negative_triggers subtracts three. Retrieval-query terms that appear in the strategy's expansion terms add one point each, capped at two. Plain feature overlap is recorded as a diagnostic count but is deliberately kept out of the score.

Selection is then a deterministic sort. Only strategies with a positive score are eligible. Among those, the scorer ranks by score (highest first), breaking ties by the strategy's declared order and then its id, and takes the top row. If no candidate scores positive, the case resolves to the typed STRATEGY_SELECTION_MISS rather than guessing. The output for each case carries the selected id, the score, the component breakdown, the ranked candidate scores, and the trigger, negative, and retrieval hits that produced them, so the choice can be re-derived by hand.

The weights matter because they encode the design intent. Trigger matches are worth more than retrieval matches, so a strategy is chosen mainly for the shape it claims to handle, not for how many search terms happen to coincide. Negative triggers can veto a strategy that looks superficially apt. The retrieval cap stops a strategy from winning on keyword volume alone. A fixture that tries to score on overlap without these terms is caught by the superficial_overlap_only_scoring negative case.

The same recomputation is what enforces honesty. When a case declares its own selected_strategy_id, score, classifier, retrieval_bonus, or candidate_scores, the component recomputes each from the evidence and reports a stale-declaration finding on any mismatch. Declaring the selected strategy as a bare label, with nothing for the scorer to check against, is itself rejected: a label with no derivable evidence is not strategy evidence. Alongside this, the copied source artifacts are checked for leakage policy, so the strategy cards, hypothesis set, and skill atlas stay pre-oracle, free of proof bodies, and free of oracle strategy ids.

Reader Evidence Routing

Read this module as a pre-oracle strategy-hypothesis audit, not as a proof result. The primary reader path is:

  • Start with strategy_atlas.json, problem_features.json, and hypothesis_cases.json to see how public feature tags select a strategy id before retrieval or proof execution.
  • Check source_module_manifest.json and the copied source_artifacts/ bodies to verify that the imported source bodies are public tool/runtime bodies with exact digests, required anchors, and body-floor result records.
  • Inspect the fixture and exported-bundle result records to confirm that strategy ids, retrieval-term effects, oracle-label exclusion, source-card consistency, and negative cases are checked without exposing proof bodies or model-output data.
  • Use the structured source record only for structural lattice proof: it confirms bundle-backed subjects, code loci, doctrine refs, and dependency edges; it does not establish the scorer's correctness or any theorem.

Public Inputs

  • strategy_atlas.json defines the known strategy enum, match features, and retrieval-term additions.
  • problem_features.json carries synthetic public problem features with oracle labels hidden.
  • hypothesis_cases.json validates deterministic pre-oracle strategy scoring.
  • source_module_manifest.json binds copied source body files to exact source refs, SHA-256 digests, byte counts, line counts, material classes, and required anchors.
  • Negative cases reject unknown strategy ids, proof bodies, oracle labels, post-oracle strategy selection, and launch/proof/provider overclaims.

Result records

The component emits:

  • mathematical_strategy_atlas_result.json
  • mathematical_strategy_atlas_board.json
  • mathematical_strategy_atlas_validation_receipt.json
  • mathematical_strategy_atlas_hypothesis_scorer_fixture_acceptance.json

Runtime-shell exported bundle validation writes exported_mathematical_strategy_atlas_bundle_validation_result.json.

Prior Art Grounding

The strategy atlas is grounded in the formal-methods practice of separating problem-shape classification from proof execution. Lean's tactic model, as introduced in Theorem Proving in Lean 4, gives the immediate precedent: proof work is often arrange around tactics chosen for a goal shape, while the kernel checks the final proof state. The mathlib overview also motivates explicit retrieval terms and domain tags because a large formal library is navigated by topic, structure, and reusable theorem families.

The atlas is also adjacent to hammer-style premise and method selection, such as Isabelle Sledgehammer, where a front-end tool searches for useful facts or proof methods before replay. This module keeps the pattern pre-oracle and metadata-only: it records why a first strategy hypothesis was selected, not whether the proof can be completed.

Validation Result record Path

Run from microcosm-substrate:

PYTHONPATH=src ../repo-python -m microcosm_core.organs.mathematical_strategy_atlas_hypothesis_scorer run \
  --input fixtures/first_wave/mathematical_strategy_atlas_hypothesis_scorer/input \
  --out /tmp/microcosm-mathematical-strategy-atlas-hypothesis-scorer/fixture \
  --card
PYTHONPATH=src ../repo-python -m microcosm_core.organs.mathematical_strategy_atlas_hypothesis_scorer run-strategy-bundle \
  --input examples/mathematical_strategy_atlas_hypothesis_scorer/exported_mathematical_strategy_atlas_bundle \
  --out /tmp/microcosm-mathematical-strategy-atlas-hypothesis-scorer/bundle \
  --card
PYTHONPATH=src ../repo-python -m pytest -p no:cacheprovider tests/test_mathematical_strategy_atlas_hypothesis_scorer.py -q
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

A green result record proves only pre-oracle strategy-hypothesis metadata, copied public source tool bodies, source artifact digests, and negative-case enforcement; it does not run Lean or Lake, prove formal-result correctness, reveal oracle labels, export proof bodies, use external model services, certify benchmark performance, authorize public sharing, or include launch operations.

Scope boundary

Scope limit

The atlas is metadata and strategy-hypothesis machinery only. It does not run Lean or Lake, claim formal-result correctness, reveal oracle strategy labels, expose proof bodies, use external model services, tune on test answers, include launch operations, or make Mathlib-dependent proof claims. The copied runtime artifacts are public strategy traces, not oracle labels, model-output data, or proof bodies.

Scope limit

This module supports only the reader-verifiable claim that public strategy-hypothesis metadata, copied source tool bodies, source artifact digests, and negative cases can be checked before oracle labels or proof execution. It does not run Lean or Lake, prove formal-result correctness, reveal oracle labels, expose proof bodies, use external model services, certify benchmark performance, authorize public sharing, include launch operations, or make Mathlib-dependent proof claims.

Tactic Portfolio Availability ProbeMaps which Lean proof tactics a recorded run marked usable before any code relies on one.3/5

Does It turns one captured Lean run's results into an inspectable list of which proof shortcuts ("tactics" like `rfl`, `simp`, `omega`, `aesop`) were recorded as compiling, showing at a glance which are usable and which are off before anything treats a tactic as available. In this fixture seven tactics are marked usable and `aesop` is marked failed (its recorded run hit a missing-Mathlib error). The tool reads pre-recorded status rows and checks them for honesty; it does not run Lean itself.

Scope limit It only projects and validates which tactics were recorded as compiling in one captured environment; it does not run Lean/Lake at all, prove any goal, certify domain-level conclusions, use external model services, claim benchmark performance, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.tactic_portfolio_availability_probe run --input fixtures/first_wave/tactic_portfolio_availability_probe/input --out receipts/first_wave/tactic_portfolio_availability_probe --acceptance-out receipts/acceptance/first_wave/tactic_portfolio_availability_probe_fixture_acceptance.json

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Tactic Portfolio Availability

tactic_portfolio_availability_probe is the public component that turns tactic callability into an explicit artifact before routing or proof search treats a tactic as usable.

The fixture is copied from real source system: the 2026-05-11 PROVER_PROOF_STATE_SEARCH_CURRICULUM smoke run's Lean/Std tactic affordance probe. It records compile-status rows for rfl, decide, omega, simp, simp_all, grind, native_decide, and aesop, with source digests for the run-level affordance probe, the portfolio_core_v0 tactic availability artifact, and the paired corpus-readiness boundary. The Mathlib-dependent aesop row is marked environment_fail because the paired environment probe reports mathlib_lake_project_import_available=false.

The component validates:

  • every tactic has an environment-scoped compile_status;
  • Mathlib-dependent tactics are not marked available without a passing Mathlib import probe;
  • downstream consumers reference only tactics present in the probe portfolio;
  • proof bodies, raw model-output data, benchmark claims, launch-scope decision, and non-public paths stay out of the public artifact.

The generated board is a callability map, bounded evidence evidence. It can make target-shape routing cheaper and more honest, but it cannot prove a goal, widen Lean/Lake authority, use external model services, claim benchmark performance, or include launch operations.

The result record contract reports body_material_status=copied_non_secret_macro_body_with_provenance, tactic_availability_status=real_lean_std_tactic_affordance_probe_rows, source digests, target refs, and secret_exclusion_scan. It does not use body-redaction or non-public-state-scan grammar as product evidence.

Purpose

A tactic name is not a usable tactic. aesop is callable only if the surrounding Lean and Std environment actually carries the imports it needs; omega is callable in one project layout and not in another. Routing or proof search that trusts a bare tactic name will reach for tactics that the current environment cannot run, and then misread the resulting failure as a property of the goal rather than a property of the environment. This component answers one question: in the observed Lean/Std environment, which tactics were actually callable, and on what evidence?

The interesting part is how it treats failure. A copied probe row that reports a Lean FAIL is not flattened into a single "unavailable" verdict. When a tactic declares requires_mathlib and the paired environment probe reports that the Mathlib import is absent, the failure is classified as environment_fail with the reason MATHLIB_IMPORT_MISSING. The same Lean FAIL for a tactic that does not depend on Mathlib is classified as compile_fail. The distinction keeps a missing import from masquerading as a broken tactic, and it preserves Mathlib absence as a recorded fact about the environment rather than discarding it. A downstream router can then re-attempt the same tactic in a different environment instead of striking it off permanently.

The second deliberate choice is that none of this is a measurement of quality. The component copies probe durations and bands them as fast, moderate, or slow so a router can prefer a cheaper available tactic, but the latency profile is stamped as environment-scoped, not benchmark authority. Callability and speed in one observed environment are useful for cheaper routing; they are explicitly not evidence that a tactic is correct, that a goal was proved, or that Lean was rerun by this component.

Shape

PASSFAIL + requires_mathlib + Mathlib absentFAIL otherwisenoyesCopied Lean/Std affordanceprobe rows(compile_status,requires_mathlib,duration_ms)Copied Lean/Std affordance probe rows (compile_status, requires_mathlib, duration_ms)Tactic portfolio availabilityprobeTactic portfolio availability probeEnvironment probeEnvironment probeCopied compile_statusCopied compile_statusavailableband duration fast / moderate/ slowavailable band duration fast / moderate / slowenvironment_failreason MATHLIB_IMPORT_MISSINGenvironment_fail reason MATHLIB_IMPORT_MISSINGcompile_failcompile_failAvailability board fortarget-shape routingAvailability board for target-shape routingDownstream tactic referenceDownstream tactic referenceTactic in probed portfolio?Tactic in probed portfolio?Rejected: unprobed tacticreferencedRejected: unprobed tactic referencedmetadata-only fixture andbundle result recordsno proof, Lean, or providerbodiesmetadata-only fixture and bundle result records no proof, Lean, or provider bodiesGenerated paper-module rowand validation result recordsGenerated paper-module row and validation result records

Source refs

Tactic portfolio availability probe
tactic_portfolio_availability_probe
Environment probe
mathlib_lake_project_import_available
Diagram source
flowchart TD A["Copied Lean/Std affordance probe rows (compile_status, requires_mathlib, duration_ms)"] --> B["tactic_portfolio_availability_probe"] C["Environment probe mathlib_lake_project_import_available"] --> B B --> D{"Copied compile_status"} D -->|PASS| E["available band duration fast / moderate / slow"] D -->|FAIL + requires_mathlib + Mathlib absent| F["environment_fail reason MATHLIB_IMPORT_MISSING"] D -->|FAIL otherwise| G["compile_fail"] E --> H["Availability board for target-shape routing"] F --> H G --> H I["Downstream tactic reference"] --> J{"Tactic in probed portfolio?"} J -->|no| K["Rejected: unprobed tactic referenced"] J -->|yes| H B --> L["metadata-only fixture and bundle result records no proof, Lean, or provider bodies"] L --> M["Generated paper-module row and validation result records"]

The flow is deliberately smaller than the generated doctrine-lattice graph.

Reader Evidence Routing

Read this page in four passes:

  1. Start with the bundle source row at core/paper_module_capsules.json::paper_modules[40:paper_module.tactic_portfolio_availability]. It names the public component subject, mechanism subject, resolved code locus, Microcosm concept, governing principles, axioms, and sibling paper-module dependencies that generate the relationship edges.
  2. Inspect the runtime system at src/microcosm_core/organs/tactic_portfolio_availability_probe.py. The load-bearing symbols are run, run_availability_bundle, _build_result, _write_receipts, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING; those are the code-loci symbols that make the paper module about an executable component instead of a prose topic.
  3. Reproduce the evidence floor with the fixture input fixtures/first_wave/tactic_portfolio_availability_probe/input, the exported bundle examples/tactic_portfolio_availability_probe/exported_tactic_portfolio_availability_bundle, the focused test tests/test_tactic_portfolio_availability_probe.py, and the paper-module corpus check. Treat the result records as environment-scoped tactic-callability evidence only; validation result records do not widen the proof boundary, scope limit, launch posture, provider posture, or benchmark posture.

Prior Art Grounding

The module is patterned after feature-detection probes and proof-assistant tactic inventories. GNU Autoconf's configure workflow established the habit of testing local capability before relying on it; Lean's tactic documentation shows that tactic use is environment- and goal-sensitive, so a tactic name is not enough to justify downstream routing. This component applies that older probe discipline to Microcosm: it records which tactics were callable in the observed Lean/Std environment and preserves Mathlib-dependent absence as evidence, without treating callability as proof quality.

Prior-art anchors:

  • GNU Autoconf feature/configuration probing: https://ftp.gnu.org/old-gnu/Manuals/autoconf-2.57/html_chapter/autoconf.html
  • Lean 4 tactic documentation: https://lean-lang.org/theorem_proving_in_lean4/Tactics/

Primary commands:

Validation Result record Path

From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:

The expected projection row is paper_module.tactic_portfolio_availability with 18 generated relationship edges, no unpopulated selective relations, Mermaid status available_from_capsule_edges, and Atlas status linked_from_capsule_edges. These checks validate environment-scoped tactic availability rows and bundle result records only; they do not turn callability into proof quality, benchmark performance, Mathlib proof authority, or launch-scope decision.

Scope boundary

Scope limit

The JSON bundle and generated row prove only environment-scoped tactic callability evidence: copied Lean/Std tactic affordance rows, compile-status rows, Mathlib absence evidence, downstream tactic-reference checks, source digests, secret-exclusion checks, negative cases, and validation result records. They do not prove formal-result correctness, expand Lean or Lake authority, use external model services, claim benchmark performance, export non-public paths, include launch operations or public sharing, or treat tactic callability as proof quality.

Scope limit

This component is environment-scoped tactic callability evidence only. It does not establish formal-result correctness, expand Lean/Lake authority, use external model services, claim benchmark performance, export non-public paths, include launch operations, or treat tactic callability as proof quality.

Target Shape Tactic Routing GateRecords an allow-or-reject decision and reason for each proof tactic before any proof runs.3/5

Does Before a proof is attempted, this component checks a list of candidate proof tactics for a given goal and writes down an allow-or-reject decision, with a plain reason, for each one. Rejections fall into three kinds: the tactic isn't actually available in the declared environment, it was never listed in the environment's tested set of tactics, or it simply doesn't fit the kind of goal being proved. The resulting record shows, tactic by tactic, exactly what was admitted or blocked and why, instead of an opaque "we tried these" claim. It only inspects and records the routing decision over references that already exist; it never runs a prover or proves anything itself.

Scope limit It only inspects and records the projection mechanics of pre-execution tactic-routing references — emitting per-tactic allow/reject decisions with reasons. It does not run Lean/Lake, does not establish or judge the correctness of any goal, emits no proof bodies, makes no external model access, performs no post-execution route selection, reports no benchmark claims or maturity, and excludes launch.

Run
PYTHONPATH=src python3 -m microcosm_core.cli target-shape-tactic-routing-gate run-routing-bundle --input examples/target_shape_tactic_routing_gate/exported_target_shape_tactic_routing_bundle --out receipts/runtime_shell/demo_project/organs/target_shape_tactic_routing_gate

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Target Shape Tactic Routing

target_shape_tactic_routing_gate is the public Microcosm component for the pre-execution tactic admissibility layer.

It turns real problem-domain, failure-class, and graph-update candidate refs from the formal-math evaluation pipeline into route decisions: which tactics are admitted, which are rejected as unavailable, which are rejected as unprobed, and which are rejected because they do not match the declared goal shape.

Purpose

A proof attempt is expensive, and most of that cost is spent on tactics that were never going to work: tactics the environment cannot run, tactics absent from the probe portfolio, or tactics that do not match the shape of the goal. This component answers one question before any Lean call is made: given the target shape and the current availability probe, which tactics may a route even attempt?

The decision is deliberately made early. Routing happens before execution, so a case that carries Lean result records, execution results, or a post-execution stage is rejected outright rather than trusted. The point is to decide admissibility from evidence that already exists, not from the outcome of the attempt the gate is meant to filter.

What is unusual is that the gate recomputes the choice rather than accepting the declared one. Each target shape carries a small preferred-tactic order (for example omega for integer linear arithmetic, decide for closed natural-number decisions). The gate walks that order, skips any preferred tactic that is unprobed or unavailable, records why it skipped, and falls back to the next allowed candidate or to a default safe order for shapes it does not recognise. A route whose declared selection disagrees with this computed preference is flagged rather than honoured. The route is a claim about what should run; the gate treats it as something to check, not something to believe.

Shape

JSON bundleJSON bundleGenerated structured sourcerecordGenerated structured source recordRuntime componentRuntime componentTactic probe portfolioavailable/unavailable tacticidsTactic probe portfolio available/unavailable tactic idsTarget-shape route casespre_execution selectedtacticsTarget-shape route cases pre_execution selected tacticsCopied Ring2 source artifacts4 body imports,body_in_receipt=falseCopied Ring2 source artifacts 4 body imports, body_in_receipt=falseRoute admissiblebefore proof execution?Route admissible before proof execution?Result recordsresult, board, validation,sign-offResult records result, board, validation, sign-offFocused testsnegative cases and digestchecksFocused tests negative cases and digest checksScope limitno Lean/Lake, proof,provider, post-execution,launchScope limit no Lean/Lake, proof, provider, post-execution, launch

Source refs

JSON bundle
paper_module.target_shape_tactic_routing
Generated structured source record
paper_modules/target_shape_tactic_routing.json
Runtime component
target_shape_tactic_routing_gate.py
Diagram source
flowchart TD Bundle["JSON bundle paper_module.target_shape_tactic_routing"] structured source record["Generated structured source record paper_modules/target_shape_tactic_routing.json"] Component["Runtime component target_shape_tactic_routing_gate.py"] Portfolio["Tactic probe portfolio available/unavailable tactic ids"] Routes["Target-shape route cases pre_execution selected tactics"] SourceFloor["Copied Ring2 source artifacts 4 body imports, body_in_receipt=false"] Decisions{"Route admissible before proof execution?"} Result records["Result records result, board, validation, sign-off"] Tests["Focused tests negative cases and digest checks"] Ceiling["Scope limit no Lean/Lake, proof, provider, post-execution, launch"] Bundle --> structured source record Bundle --> Component Component --> Portfolio Component --> Routes Portfolio --> Decisions Routes --> Decisions SourceFloor --> Decisions Decisions --> Result records Tests --> Result records Result records --> Ceiling

Technical Mechanism

The named mechanism mechanism.target_shape_tactic_routing_gate.validates_public_tactic_routing_boundary is a fail-closed scorer over two public input planes: the tactic probe portfolio and the target-shape route cases. _build_result loads the fixture or exported bundle payloads, scans the inputs and copied source artifacts for forbidden body material, derives known/available/unavailable tactic sets, scores every route case, checks copied Ring2 source-artifact digests, and emits metadata-only result, board, validation, and sign-off result records.

For each route case, _decision_for_tactic rejects a candidate before selection if the tactic id is absent from the public probe portfolio, marked unavailable, or outside the case's declared allowed_tactic_ids. Only a tactic that is probed, available, and target-shape-admissible can receive TARGET_SHAPE_ADMISSIBLE. _shape_preferred_selection then applies the local target-shape preference map, records the unknown-shape default fallback when no specific map exists, and records the preferred-unavailable fallback when the first preferred tactic is known but not usable. _route_integrity_findings turns any unavailable admission, unprobed admission, post-execution route, or declared-selection mismatch into typed findings.

The proof consumer is tests/test_target_shape_tactic_routing_gate.py: it asserts seven pre_execution route cases, shape-preferred selection for the real Ring2 cases, unknown-shape and unavailable-Mathlib fallback behavior, rejection of mutated shape and availability inputs, exported-bundle sign-off, four copied source artifacts with digest verification, compact card omission of the full routing board, and result record text without non-public paths or body fields. Those tests consume the same fixture and exported-bundle surfaces named by the mechanism row, so this page's evidence is the runnable route-reference and result record contract rather than a prose-only claim.

The governing lattice stays explicit: the bundle binds the module to concept.formal_math_and_proof_witness_bundle, principles P-1, P-2, P-3, P-6, P-8, and P-9, axioms AX-1, AX-2, AX-5, AX-7, and AX-8, and dependency modules for tactic portfolio availability, formal-math readiness, proof-diagnostic evidence, verifier-trace repair, and formal evidence-cell anchor resolution. The standard narrows that lattice to one allowed claim: public pre-execution route cases may admit only tactics that were both probed and available before proof execution. The same standard forbids widening this mechanism into formal-result correctness, Lean/Lake execution, external model access, proof or provider body export, post-execution route authority, publishing-scope decision, or launch-scope decision.

Evidence/accounting:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[41:paper_module.target_shape_tactic_routing] names source_authority: json_capsule, subjects component:target_shape_tactic_routing_gate and mechanism.target_shape_tactic_routing_gate.validates_public_tactic_routing_boundary, the resolved code locus src/microcosm_core/organs/target_shape_tactic_routing_gate.py, and generated projection statuses mermaid.status: available_from_capsule_edges plus atlas_card.status: linked_from_capsule_edges.
  • Generated structured source record: paper_modules/target_shape_tactic_routing.json carries relationships.edges for the bundle subjects, concept/principle/axiom refs, dependency paper modules, and code locus; relationships.unpopulated_selective_relations: []; and scope boundaries that the JSON row does not establish runtime correctness, launch-scope decision, or whole-system completeness.
  • Runtime contract: standards/std_microcosm_target_shape_tactic_routing_gate.json limits the allowed claim to pre-execution tactic admission from probed, available tactics; its required_fields bind tactic_portfolio_availability.tactics[].tactic_id, availability_status, target_shape_routes.route_cases[].target_shape, allowed_tactic_ids, selected_tactic_id, and route_stage.
  • Source-body accounting: examples/target_shape_tactic_routing_gate/exported_target_shape_tactic_routing_bundle/source_module_manifest.json records source_import_class: copied_non_secret_macro_body, module_count: 4, body_in_receipt: false, three verified_public_safe_private_path_rewrite rows, and one exact_copy row.
  • Fixture/bundle behavior: examples/target_shape_tactic_routing_gate/exported_target_shape_tactic_routing_bundle/target_shape_routes.json has seven pre_execution route cases, while tactic_portfolio_availability.json marks decide, omega, simp_all, and rfl available and aesop unavailable.
  • Result record floor: receipts/first_wave/target_shape_tactic_routing_gate/target_shape_tactic_routing_result.json, target_shape_tactic_routing_board.json, target_shape_tactic_routing_validation_receipt.json, and result records/sign-off/first_wave/target_shape_tactic_routing_gate_fixture_acceptance.json report status: pass, route_case_count: 7, copied_source_artifact_count: 4, source_artifacts_pass: true, missing_negative_cases: [], secret_exclusion_scan.blocking_hit_count: 0, and authority flags with Lean/Lake, proof, provider, post-execution routing, and launch-scope decision set false.
  • Test boundary: tests/test_target_shape_tactic_routing_gate.py checks observed negative cases, shape-preferred selection, unknown-shape and Mathlib-unavailable fallback, exported-bundle sign-off, source-module digest verification, compact card omission of full boards, and result record output without non-public paths or body fields.

Reader Evidence Routing

Read this module as a pre-execution admissibility gate, not as a proof attempt. The primary reader path is:

  • Start with the problem-domain, failure-class, graph-update candidate, and tactic-probe refs in the fixture input. They are the public route evidence the gate is allowed to inspect before any Lean/Lake work in the formal-math evaluation and premise-retrieval pipeline.
  • Compare each target-shape route case against the selected tactic ids and rejection reasons: admitted tactics must match both the declared goal shape and the public availability probe.
  • Inspect negative cases before the happy path. The important behavior is that unavailable tactics, unprobed tactics, proof/provider body leakage, post-execution routing, and launch overclaims all fail closed.
  • Use the structured source record only for structural lattice proof: it confirms subjects, code loci, doctrine refs, and dependency edges; it does not establish the tactic route can solve the target.

Runtime Surfaces

PYTHONPATH=src python3 -m microcosm_core.organs.target_shape_tactic_routing_gate run --input fixtures/first_wave/target_shape_tactic_routing_gate/input --out receipts/first_wave/target_shape_tactic_routing_gate
PYTHONPATH=src python3 -m microcosm_core.cli target-shape-tactic-routing-gate run-routing-bundle --input examples/target_shape_tactic_routing_gate/exported_target_shape_tactic_routing_bundle --out receipts/runtime_shell/demo_project/organs/target_shape_tactic_routing_gate

Negative Cases

  • unavailable_tactic_admitted rejects an aesop route while Mathlib is absent.
  • unprobed_tactic_allowed rejects a tactic absent from the public probe portfolio.
  • proof_body_leakage rejects proof/provider/Lean body fields.
  • post_execution_route rejects route selection after execution evidence.
  • release_overclaim rejects proof, provider, Lean/Lake, public sharing, and launch-scope decision overclaims.

Prior Art Grounding

The routing layer follows established proof-search and policy-gating patterns: match a goal shape to methods that are known to be available before spending runtime on them. Lean's tactic documentation supplies the local proof-assistant context for goal-directed tactic choice, while Isabelle/Sledgehammer represents a mature prior-art pattern for selecting external provers and relevant facts from a goal. Microcosm narrows that idea to a pre-execution admissibility filter: target shape, allowed references, and current tactic availability must line up before a tactic route can be exported.

Prior-art anchors:

  • Lean 4 tactic documentation: https://lean-lang.org/theorem_proving_in_lean4/Tactics/
  • Isabelle Sledgehammer user guide: https://isabelle.in.tum.de/doc/sledgehammer.pdf

Why It Matters

After corpus readiness and strategy scoring, Microcosm needs a visible gate that prevents wasted or misleading proof attempts. This component shows that gate over the formal-math evaluation and premise-retrieval pipeline already feeding verifier repair, evidence anchoring, and proof diagnostics: a tactic is not tried just because it exists; it is admitted only when the target shape and the public availability probe both allow it.

Validation Result record Path

From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:

These checks validate route-reference fixture and bundle result records only; they do not widen the no-Lean/no-proof scope limit.

Scope boundary

Scope limit

This component does not run Lean or Lake and does not establish a target. It validates only the route references that must exist before a proof attempt in the formal-math evaluation and premise-retrieval pipeline: tactic probe availability, target-shape route cases, selected tactic ids, failure-class refs, graph-update candidate refs, and negative-case result records.

Forbidden outputs include proof bodies, provider bodies, post-execution route selection, Lean result record claims, external model access, launch claims, and Mathlib-dependent proof authority.

Scope limit

This module covers only public pre-execution tactic routing evidence: the route references used before a formal proof attempt, tactic probe availability, target-shape cases, selected tactic ids, failure-class refs, graph-update candidate refs, negative-case result records, source-module digest evidence, and validation result records. It does not run Lean or Lake, prove formal-result correctness, export proof bodies or provider bodies, authorize post-execution route selection, use external model services, claim Mathlib-dependent proof authority, authorize public sharing, include launch operations, or prove whole-system correctness.

Lean Std Premise IndexLists a fixed catalog of public Lean building blocks and confirms none hides proof text or test answers.3/5

Does Presents a small, fixed catalog of Lean standard-library "premises" (named building blocks like facts about numbers, booleans, lists, and basic logic) along with the labels and source references that say where each one comes from. It shows what proof ingredients are on the table and that they were copied from public Lean sources, with no hidden proof text, no Mathlib, and nothing that secretly gives away test answers. It only checks and displays this catalog; it does not run Lean or prove anything.

Scope limit It only validates the projection of premise metadata and copied source bodies; it does not run Lean or Lake, prove any theorem correct, expose proof bodies or oracle-needed ids, use external model services, produce benchmark claims, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.lean_std_premise_index run --input fixtures/first_wave/lean_std_premise_index/input --out receipts/first_wave/lean_std_premise_index --acceptance-out receipts/acceptance/first_wave/lean_std_premise_index_fixture_acceptance.json

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Lean/Std Premise Index

lean_std_premise_index is the closed public premise-index lane for the formal-math slice. It validates premise metadata and selected Ring2 premise-retrieval source result record bodies that a cold reader can inspect without importing Mathlib, exposing proof bodies, or relying on private source run state.

Purpose

A premise index is the catalogue a theorem-proving system reads before it tries to prove anything: a list of the named lemmas and definitions it is allowed to cite, with enough metadata to retrieve the relevant ones. This component answers a narrower question. Given that such an index already exists inside a private Ring2 benchmark run, can a cold reader inspect its public shape and be sure that what they are reading is a faithful copy of the real thing, and not a separate hand-written stand-in?

The answer rests on one design choice that is worth noticing. The validator does not just describe eleven premise rows; it opens the declared source artifact from the Ring2 premise-retrieval run, recomputes its SHA-256, and checks every public row against the matching source row by premise_id. The only permitted difference is a path rewrite: a raw Lean toolchain path becomes a public lean-toolchain://.../Init/... reference, so the reader sees where a lemma lives in the standard library without seeing a private filesystem. If the public catalogue ever drifts from the source it claims to copy, the digest or the row-signature comparison fails and the result record is blocked.

The interesting tension is the line between a useful index and a leaked answer key. A premise index for a benchmark is one edit away from telling a solver exactly which lemmas it needs. So the same pass that admits names, namespaces, retrieval terms, and train/dev/test eligibility rejects the things that would turn the catalogue into proof authority: Mathlib references, proof bodies, the oracle-needed premise ids that name the answer, and any flag that authorises tuning on the test split. The catalogue stays inspectable precisely because those are kept out.

Shape

This module is a cold-reader map from a JSON bundle and copied public Lean/Std premise artifacts into metadata-only validation result records. The readable path is bundle -> generated instance/status -> runtime validator -> fixtures and exported source bundle -> tests and result records -> scope limit; none of those projections expands the closed-index boundary.

source basis: source recordsource basis: source recordgenerated instance fromsource recordgenerated instance from source recordGenerated statusGenerated statusrun / run_index_bundle /scope_limitrun / run_index_bundle / scope_limitclosed Lean/Std premise-indexcontractclosed Lean/Std premise-index contractprojection_protocol,premise_index, index_policy,negative casesprojection_protocol, premise_index, index_policy, negative casessource_module_manifest: 6copied body modulessource_module_manifest: 6 copied body modulesfixture, manifest, bundle,and runtime-shape checksfixture, manifest, bundle, and runtime-shape checksResult recordsResult recordsScope limitno Lean/Lake, Mathlib, proofbodies, providers, benchmarkauthority, source-filechanges, public sharing, orlaunch-scope decisionScope limit no Lean/Lake, Mathlib, proof bodies, providers, benchmark authority, source-file changes, public sharing, or launch-scope decision

Source refs

source basis: source record
core/paper_module_capsules.jsonpaper_module.lean_std_premise_index
generated instance from source record
paper_modules/lean_std_premise_index.json
run / run_index_bundle / scope_limit
src/microcosm_core/organs/lean_std_premise_index.py
closed Lean/Std premise-index contract
standards/std_microcosm_lean_std_premise_index.json
projection_protocol, premise_index, index_policy, negative cases
fixtures/first_wave/lean_std_premise_index/input
source_module_manifest: 6 copied body modules
examples/lean_std_premise_index/exported_lean_std_premise_index_bundle
fixture, manifest, bundle, and runtime-shape checks
tests/test_lean_std_premise_index.py
Result records
receipts/first_wave/lean_std_premise_indexreceipts/runtime_shell/demo_project/organs/lean_std_premise_index
Diagram source
flowchart TD bundle["core/paper_module_capsules.json paper_module.lean_std_premise_index source basis: source record"] instance["paper_modules/lean_std_premise_index.json generated instance from source record Markdown stays reader projection"] generated["Generated status Mermaid: available_from_capsule_edges Atlas: blocked_until_organ_atlas_owner_lane_binds_edges"] runtime["src/microcosm_core/components/lean_std_premise_index.py run / run_index_bundle / scope_limit"] standard["standards/std_microcosm_lean_std_premise_index.json closed Lean/Std premise-index contract"] fixtures["fixtures/first_wave/lean_std_premise_index/input projection_protocol, premise_index, index_policy, negative cases"] bundle["examples/lean_std_premise_index/exported_lean_std_premise_index_bundle source_module_manifest: 6 copied body modules"] tests["tests/test_lean_std_premise_index.py fixture, manifest, bundle, and runtime-shape checks"] result records["result records/first_wave/lean_std_premise_index result records/runtime_shell/demo_project/components/lean_std_premise_index"] ceiling["Scope limit no Lean/Lake, Mathlib, proof bodies, providers, benchmark authority, source-file changes, public sharing, or launch-scope decision"] bundle --> instance instance --> generated standard --> runtime fixtures --> runtime bundle --> runtime runtime --> tests tests --> result records generated --> ceiling result records --> ceiling

Technical Mechanism

The mechanism is a two-entry validator over copied public artifacts, not a proof engine. run reads the first-wave fixture inputs, opens the declared source premise-index source artifact, verifies the declared source_sha256, normalizes Lean toolchain paths into lean-toolchain://.../Init/... public refs, compares every public row against the source row signature, and then checks the protocol, policy, copied-material contract, namespace coverage, split coverage, negative cases, secret exclusion scan, and scope limit before writing metadata-only result, board, validation, and sign-off result records. run_index_bundle applies the same public boundary to the exported bundle and requires the source-module manifest to verify six copied body-material files by source ref, target ref, digest, line count, byte count, and source-to-target equivalence while keeping body text out of result records.

The proof consumer is therefore concrete and local: tests/test_lean_std_premise_index.py asserts that the validator observes all five negative cases, imports the real Ring2 premise-index source artifact, rejects digest, row-count, row-signature, source-ref, source-module digest, and rehash-body-swap mutations, and validates the runtime-shell bundle shape. The positive fixture carries 11 premise rows across Nat, Bool, List, and Iff; the source-open body floor carries one normalized Lean/Std premise index plus five Ring2 source result record or pattern bodies. This is evidence of a bounded public premise catalog and copied-source manifest, not evidence of Lean formal-result correctness.

The governing lattice is source-backed through the bundle-generated instance: paper_module.lean_std_premise_index explains the lean_std_premise_index component and the two mechanism.lean_std_premise_index.* mechanisms, is governed by concept.formal_math_and_proof_witness_bundle, cites P-1, P-2, P-3, P-6, and P-8, abides by AX-1, AX-2, AX-5, and AX-7, and depends only on paper_module.formal_math_premise_retrieval.

Inputs

  • projection_protocol.json records source pattern ids, source source refs, public replacement refs, projection result records, omitted material, and copy policy.
  • premise_index.json carries public metadata rows: premise id, declaration name, namespace, Init/ source ref, retrieval terms, and split eligibility.
  • index_policy.json keeps the closed-index scope limit explicit.
  • source_module_manifest.json records six source-open body imports: the normalized Lean/Std premise index plus five exact bodies from the formal-math premise-retrieval pipeline (source result records and graph-pattern bodies) under source_modules/.

Prior Art Grounding

This component is grounded in formal-library indexing and premise-selection work. The Lean mathematical library anchors the library-as-corpus side, while LeanDojo and HOList anchor the need for premise metadata, retrieval splits, and theorem-proving environments that can be inspected by learning systems.

Microcosm borrows the closed-index discipline: premise ids, declaration names, namespaces, source refs, retrieval terms, split eligibility, and source-module digests are public metadata, while proof bodies and oracle-needed ids remain outside the public boundary. It does not import Mathlib or prove theorems.

Negative Cases

The fixture rejects:

  • Mathlib premise refs;
  • proof-body leakage;
  • oracle-needed premise ids;
  • test-split tuning authority;
  • namespace rows without Init/ source refs.

These are stable negative cases because the index is intended to be useful without becoming proof authority.

Result records

The validator emits:

  • lean_std_premise_index_result.json;
  • lean_std_premise_index_board.json;
  • lean_std_premise_index_validation_receipt.json;
  • an sign-off result record under result records/sign-off/first_wave/.

Runtime-shell execution emits exported_lean_std_premise_index_bundle_validation_result.json after checking the source-module manifest, target file digests, line counts, byte counts, and secret-exclusion boundary.

Reader Evidence Routing

  • Start with the JSON Bundle Binding to identify the source row, generated instance, and scope limit.
  • Use Structured Lattice Bindings only as navigation evidence; the resolved dependency edge points to the premise-retrieval module and does not expand the closed-index proof boundary.
  • Use Inputs and Result records when checking whether public metadata, copied body manifests, and runtime-shell validation stayed body-safe.
  • Use Negative Cases and Scope limit together when deciding whether a proposed public claim exceeds the closed-index boundary.

Validation Result record Path

./repo-pytest tests/test_lean_std_premise_index.py -q --basetemp=/tmp/microcosm_lean_std_premise_index_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This lane is body only. It does not:

  • run Lean or Lake;
  • import Mathlib;
  • expose proof bodies;
  • expose oracle-needed premise ids;
  • tune on test split truth;
  • use external model services;
  • certify theorem validity;
  • authorize public launch;
  • claim secret export.
Scope limit

This module supports only the reader-verifiable claim that public Lean/Std premise metadata, source refs, retrieval terms, split eligibility, and copied source-module digests can be indexed without exposing proof bodies or oracle-needed ids. It does not run Lean or Lake, import Mathlib, prove formal-result correctness, tune on test split truth, use external model services, include launch operations, or certify secret-export safety.

Formal Math Premise RetrievalShows which lemmas a plain search surfaces per query, and never leaks proof text or answer keys.3/5

Does Given a small copied set of Lean/Std math-lemma descriptions plus some search queries, this component shows which lemmas a plain term-matching search would surface for each query, how it keeps each assembled context within a fixed size budget, and that it never exposes proof text or "answer-key" hints (the premise ids a solver would only get to see after the fact). On the bundled first-wave fixture, the result record shows the retrieval mechanism working in miniature alongside deliberate bad inputs (a leaked proof body, leaked answer-key ids, a budget overflow, an attempt to peek at test answers, and an unknown strategy) that the component catches; the leak and budget guards actually fire.

Scope limit It only checks that public retrieval metadata is internally coherent, term-scored over a copied index, budget-bounded, and leakage-clean; it does not run Lean/Lake, use external model services, prove any theorem or its own correctness, claim benchmark performance, or include launch operations.

Run
PYTHONPATH=src python -m microcosm_core.organs.formal_math_premise_retrieval run --input fixtures/first_wave/formal_math_premise_retrieval/input --out receipts/first_wave/formal_math_premise_retrieval

Paper module Formal Math Premise Retrieval

formal_math_premise_retrieval is the source-available first real formal-math import slice after the source projection protocol. It turns the source prover lab's premise-index, term-scoring, context-budget, and strategy-selection patterns into a runnable Microcosm component.

It is still deliberately below proof authority. It validates:

  • Lean/Std premise metadata;
  • query term scoring across public premise ids, namespaces, declaration names, statement excerpts, and retrieval terms;
  • split eligibility;
  • context recipe budgets;
  • public strategy ids;
  • redacted result records;
  • negative cases.

It does not run Lean or Lake, use external model services, expose proof bodies, expose oracle-needed premise ids, tune on test split truth, claim formal-result correctness, or include launch operations.

Purpose

Before a model can attempt a formal proof, it has to find the right lemmas. A Lean library holds thousands of theorems and definitions, and the useful ones for a given goal are a handful. Premise selection is the step that narrows that library down to candidates worth putting in front of a prover. This component is the smallest honest version of that step: it takes a query, scores every public premise against it, and returns a ranked shortlist.

The single question it answers is narrow and checkable: given a copied catalogue of public Lean/Std premise metadata, does a transparent term-scoring retrieval return the premises a query should find, without ever touching a proof? Both halves matter. The retrieval has to actually work, so each fixture query carries the premise ids it is expected to surface and the run fails if the shortlist misses them. And the boundary has to hold, so the same run refuses any input that smuggles in a proof body, an oracle answer, or test-split truth.

What is unusual is the restraint. The retrieval index is not a learned embedding model and the scoring is not a benchmark claims. It is plain term overlap over fields that a reader can inspect: premise ids, namespaces, declaration names, statement excerpts, and retrieval terms. The interesting claim is therefore not "this retrieves well" but "this retrieves over real, copied Lean metadata and can be audited end to end, and the design forbids the shortcuts that would make a premise-selection result look better than it is".

Shape

JSON source recordJSON source recordGenerated paper-moduleinstance15 relationship edgesGenerated paper-module instance 15 relationship edgesRuntime componentRuntime componentPremise indexcopied Lean/Std metadataPremise index copied Lean/Std metadataRetrieval queriesterms, split, strategy, top_kRetrieval queries terms, split, strategy, top_kContext recipesbyte budgetsContext recipes byte budgetsNegative-case inputsproof body, oracle ids,test-split tuning, budget,strategyNegative-case inputs proof body, oracle ids, test-split tuning, budget, strategySplit gateskip premises not inallowed_for_splitSplit gate skip premises not in allowed_for_splitTerm-overlap scoringshared tokens + strategybonusTerm-overlap scoring shared tokens + strategy bonusRanked top_k shortlistRanked top_k shortlistRecall checkvs expected premise idsRecall check vs expected premise idsRequired rejectionsfive leakage/overclaim guardsRequired rejections five leakage/overclaim guardsmetadata-only result recordsboard, validation, sign-offmetadata-only result records board, validation, sign-offScope limitmetadata coherence, noLean/Lake, no proofScope limit metadata coherence, no Lean/Lake, no proof

Source refs

JSON source record
paper_module.formal_math_premise_retrieval
Runtime component
formal_math_premise_retrieval.py
Diagram source
flowchart TD bundle["JSON source record paper_module.formal_math_premise_retrieval"] --> instance["Generated paper-module instance 15 relationship edges"] instance --> component["Runtime component formal_math_premise_retrieval.py"] subgraph Inputs["Public inputs"] index["Premise index copied Lean/Std metadata"] queries["Retrieval queries terms, split, strategy, top_k"] recipes["Context recipes byte budgets"] negatives["Negative-case inputs proof body, oracle ids, test-split tuning, budget, strategy"] end component --> index component --> queries component --> recipes component --> negatives index --> split["Split gate skip premises not in allowed_for_split"] queries --> split split --> score["Term-overlap scoring shared tokens + strategy bonus"] score --> shortlist["Ranked top_k shortlist"] shortlist --> recall["Recall check vs expected premise ids"] negatives --> reject["Required rejections five leakage/overclaim guards"] recipes --> reject recall --> result records["metadata-only result records board, validation, sign-off"] reject --> result records result records --> ceiling["Scope limit metadata coherence, no Lean/Lake, no proof"]

Evidence/accounting:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[25:paper_module.formal_math_premise_retrieval] has source_authority: json_capsule, three subjects, one resolved code_loci[0].path, depends_on naming paper_module.formal_math_lean_proof_witness, and generated projection statuses for Markdown, Mermaid, and Atlas.
  • Generated instance: paper_modules/formal_math_premise_retrieval.json::paper_module_payload repeats the bundle authority_ceiling, reports Mermaid status available_from_capsule_edges, and derives 15 relationships.edges with relationships.unpopulated_selective_relations: [].
  • Component atlas: core/organ_atlas.json::organs[9:formal_math_premise_retrieval] classifies the component in family: formal_math_and_proof, cites the runtime locus, and restates that retrieval metadata coherence is not Lean/Lake, provider, theorem-correctness, benchmark, or launch-scope decision.
  • Mechanism rows: core/mechanism_sources.json::mechanisms[27:mechanism.formal_math_premise_retrieval.validates_public_premise_retrieval_slice] and core/mechanism_sources.json::mechanisms[37:mechanism.formal_math_premise_retrieval.validates_public_premise_retrieval_projection] point at src/microcosm_core/organs/formal_math_premise_retrieval.py and name first-wave, sign-off, and runtime-shell result record refs.
  • Runtime and tests: src/microcosm_core/organs/formal_math_premise_retrieval.py exposes run, run_retrieval_bundle, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING; tests/test_formal_math_premise_retrieval.py checks 11 premises, 4 queries, 44 considered candidates, five negative cases, metadata-only result records, and compact runtime-shell cards.
  • Result records: receipts/first_wave/formal_math_premise_retrieval/formal_math_premise_retrieval_result.json records status: pass, 11 premises, 4 queries, 44 considered candidates, five observed negative cases, missing_negative_cases: [], and a secret-exclusion scan with blocking_hit_count: 0; the exported runtime result record at receipts/runtime_shell/demo_project/organs/formal_math_premise_retrieval/exported_premise_retrieval_bundle_validation_result.json records status: pass, the same premise/query/candidate counts, no negative cases, and secret_exclusion_scan.scanned_path_count: 11.
  • Standard ceiling: standards/std_microcosm_formal_math_premise_retrieval.json::authority_ceiling has status: pass while keeping formal_proof_authority, lean_lake_authority, provider_authority, and release_authority false.

Runtime Surfaces

  • Component runner: python -m microcosm_core.organs.formal_math_premise_retrieval run --input fixtures/first_wave/formal_math_premise_retrieval/input --out receipts/first_wave/formal_math_premise_retrieval
  • Exported bundle runner: python -m microcosm_core.organs.formal_math_premise_retrieval run-retrieval-bundle --input examples/formal_math_premise_retrieval/exported_premise_retrieval_bundle --out receipts/runtime_shell/demo_project/organs/formal_math_premise_retrieval
  • CLI route: microcosm formal-math-premise-retrieval run-retrieval-bundle
  • Standard: standards/std_microcosm_formal_math_premise_retrieval.json
  • Fixture manifest: core/fixture_manifests/formal_math_premise_retrieval.fixture_manifest.json

Public Claim

Microcosm can show a real formal-math retrieval mechanism in miniature:

  • a source-available Lean/Std premise index;
  • public field-haystack term-scored queries;
  • split-aware eligibility;
  • context recipe ceilings;
  • strategy gates;
  • redacted validation result records.

How retrieval scoring works

Each premise row contributes five inspectable fields to the haystack: its premise id, namespace, declaration name, statement excerpt, and a list of retrieval terms. A query carries its own terms, a data split, an optional strategy id, a context recipe, and the public premise ids it is expected to return.

Scoring is term overlap, computed per query. Both the query and each premise are tokenised into lowercase word counts. A premise is only considered if the query's split appears in that premise's allowed_for_split list, which is how test-split leakage is kept out at the structural level rather than by trust. For each eligible premise the score is the summed minimum count of every shared token across the five fields, so a term that appears in both the query and the premise contributes as many points as the smaller of the two counts. A premise that also carries the query's strategy id as a tag gets a single extra point. The ranked list is sorted by score descending, ties broken by premise id, and the top of that list up to the query's top_k is taken as the retrieval.

The retrieval is then graded against itself. Each query declares the public premise ids it should surface, and the component computes recall as the fraction of those expected ids that actually landed in the shortlist. A query that declares expectations but misses any of them blocks the run. In the first-wave fixture this is eleven premises and four queries, scoring forty-four considered candidates in total, and every query is expected to reach full recall.

The failure mode this guards against is a premise-selection result that looks good because it cheated. The five negative-case inputs each encode one such shortcut: a premise index that ships a proof body, a query that lists the oracle premise ids it is "meant" to find, a query that tunes on test-split truth, a context recipe that blows past the byte budget, and a query naming a strategy id outside the allowed set. The run is required to observe all five rejections; if any expected rejection is missing, the whole fixture is blocked rather than passed. Recall over copied real metadata is the positive signal; the refusals are what keep that signal honest.

Prior Art Grounding

This component is grounded in premise-selection and retrieval-augmented theorem proving work. LeanDojo is the closest modern anchor because it couples Lean interaction with retrieval-augmented premise selection. Earlier theorem-proving environments such as HOList and GamePad also motivate extracting proof-state or premise metadata for learning-assisted theorem proving.

Microcosm borrows the retrieval accounting pattern: premise ids, namespaces, statement excerpts, retrieval terms, split eligibility, context budgets, and strategy gates must be inspectable before premise-retrieval claims are admitted. It does not run Lean/Lake or expose proof bodies.

Negative Cases

  • premise_index_proof_body_forbidden
  • query_oracle_ids_forbidden
  • test_split_tuning_attempt
  • context_recipe_budget_overflow
  • unknown_strategy_id

Reader Evidence Routing

  • Start with the JSON Bundle Binding to identify the source record, generated instance, proof boundary, and scope limit.
  • Use Structured Lattice Bindings for navigation; the generated JSON row is the authority for relationship counts and dependency state.
  • Use Runtime Surfaces and Result record Expectations when checking metadata coherence, redaction, leakage checks, and source-available bundle behavior.
  • Use Negative Cases, Scope limit, and Scope limit together before admitting any formal-math public claim.

Validation Result record Path

./repo-pytest tests/test_formal_math_premise_retrieval.py -q --basetemp=/tmp/microcosm_formal_math_premise_retrieval_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

The component proves only that public retrieval metadata is internally coherent and leakage-checked. The deferred formal_math_lean_proof_witness boundary remains unchanged.

Scope limit

This module supports only the reader-verifiable claim that public premise metadata, retrieval terms, split eligibility, strategy gates, and redacted result records are coherent and leakage-checked. It does not run Lean or Lake, prove formal-result correctness, expose proof bodies, authorize oracle-needed premise ids, tune on test split truth, use external model services, approve public sharing, or expand the deferred Lean proof-witness boundary.

Formal Math Verifier Trace Repair LoopReplays how a proof lab turns verifier failures into fixes, with no promotion without a fresh re-run.3/5

Does It replays how a proof-lab turns a verifier's failure feedback into a teaching signal, working from copied (non-secret) run data so the failure categories, the repair action tied to each failure, and the rule that nothing gets promoted without a fresh re-run result record are all inspectable. Actual proofs, answer keys, and model outputs are deliberately kept out, so the whole correction loop is visible without exposing any of them.

Scope limit It demonstrates control-loop projection mechanics over copied Ring2 run rows only; it does not run Lean/Lake, use external model services, expose proof bodies or oracle premise ids, treat human or provider advice as correctness, prove any theorem, or include launch operations.

Run
microcosm formal-math-verifier-trace-repair-loop run-loop-bundle --input examples/formal_math_verifier_trace_repair_loop/exported_verifier_trace_repair_bundle --out receipts/runtime_shell/demo_project/organs/formal_math_verifier_trace_repair_loop

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Formal Math Verifier Trace Repair Loop

formal_math_verifier_trace_repair_loop is the source-available replay of a source proof-lab pattern over copied Ring2 run system: verifier feedback becomes a teaching signal only after a trace grade, a repair action, a failure-mode ledger append, a curriculum delta, and a cold rerun result record.

It is deliberately not a Lean/Lake proof component. It sits between the existing readiness, premise retrieval, tactic routing, proof diagnostic, and Lean witness surfaces so a cold reader can inspect real failure taxonomy, graph-update candidates, and oracle-repair contrast rows without seeing proof bodies, oracle premise ids, model-output data bodies, or private run logs.

Purpose

A failed proof attempt is cheap to throw away and expensive to learn from. The question this component answers is narrow: can a verifier's failure be turned into a reusable repair signal, on the public side, without that signal quietly inheriting the authority of a real theorem prover? It exists because the interesting work in a proof-repair loop is the bookkeeping, not the proving, and that bookkeeping is where overclaim usually creeps in.

The design choice worth noticing is that the loop refuses to collapse its stages into a single verdict. A verifier failure only counts as a teaching signal once it carries a trace grade backed by trace events, a repair action named against the verifier failure class it responds to, a failure-mode ledger append, a curriculum delta, and a cold-rerun result record. Each of those is a separate field, and promotion is blocked until the cold-rerun result record is present. The same separation keeps the dangerous material out: proof bodies, oracle-needed premise ids, and model-output data bodies are forbidden keys, so a row may name a failure class without ever exposing the proof or the oracle answer that produced it.

The failure mode it guards against is stale copied rows pretending to be live proof-lab evidence. The repair rows here are imported from a real Ring2 benchmark run, so the temptation is to treat the copy as if the run were happening now. The realness gate is the answer: it only reaches its top rung when every verifier attempt and curriculum row replays cleanly against the imported source bodies, and the focused tests deliberately perturb an oracle row, a manifest digest, an attempt label, and a curriculum count so that any drift downgrades the verdict rather than passing quietly. A single deterministic toy-theorem rerun is the one thing actually executed here, and it is plain arithmetic over public inputs, not a Lean proof.

Shape

Fixture input or exportedbundlecopied Ring2 rows +source-module manifestFixture input or exported bundle copied Ring2 rows + source-module manifestProjection protocolcopied-material provenanceProjection protocol copied-material provenanceSource-module manifestdigest, line and byte match,body_in_receipt falseSource-module manifest digest, line and byte match, body_in_receipt falseSecret-exclusion scanproof bodies, oracle ids,model-output data forbiddenSecret-exclusion scan proof bodies, oracle ids, model-output data forbiddenVerifier-attempt replaygrade needs trace events,repair needs failure classVerifier-attempt replay grade needs trace events, repair needs failure classRepair-curriculum replayfailure-mode ledger,curriculum deltasRepair-curriculum replay failure-mode ledger, curriculum deltasPromotion policyrequires cold-rerun resultrecordPromotion policy requires cold-rerun result recordDeterministic toy rerunfail then repair over publicinputsDeterministic toy rerun fail then repair over public inputsRealness gateclean source replay -> toprung;any drift downgradesRealness gate clean source replay -> top rung; any drift downgradesmetadata-only result recordsresult, board, validation,sign-offmetadata-only result records result, board, validation, sign-offScope limitrepair-loop accounting,bounded evidenceScope limit repair-loop accounting, bounded evidence
Diagram source
flowchart TD Input["Fixture input or exported bundle copied Ring2 rows + source-module manifest"] Protocol["Projection protocol copied-material provenance"] Manifest["Source-module manifest digest, line and byte match, body_in_receipt false"] Secret["Secret-exclusion scan proof bodies, oracle ids, model-output data forbidden"] Attempts["Verifier-attempt replay grade needs trace events, repair needs failure class"] Curriculum["Repair-curriculum replay failure-mode ledger, curriculum deltas"] Promotion["Promotion policy requires cold-rerun result record"] Toy["Deterministic toy rerun fail then repair over public inputs"] Realness["Realness gate clean source replay -> top rung; any drift downgrades"] Result records["metadata-only result records result, board, validation, sign-off"] Ceiling["Scope limit repair-loop accounting, bounded evidence"] Input --> Protocol Protocol --> Manifest Manifest --> Secret Secret --> Attempts Attempts --> Curriculum Curriculum --> Promotion Promotion --> Toy Attempts --> Realness Curriculum --> Realness Toy --> Realness Realness --> Result records Result records --> Ceiling

Technical Mechanism

The named mechanism mechanism.formal_math_verifier_trace_repair_loop.validates_public_verifier_trace_repair_bundle is a staged public verifier-repair validator, not a proof executor. _build_result composes five checks over the fixture or exported bundle: projection-protocol density, copied source-module manifest integrity, verifier attempt replay, repair-curriculum replay, promotion policy, and one deterministic toy-theorem repair rerun. The result is pass only when the projection protocol has copied-material provenance, the secret scan has no blocking hits, source modules pass when required, verifier attempts and curriculum rows replay against their imported Ring2 source bodies, promotion requires a cold rerun reference, and the toy rerun succeeds.

The exported-bundle path is intentionally stricter than the fixture path. validate_source_module_manifest requires a source import class, body_in_receipt: false, one row for each declared Ring2 source ref, matching target digests, line counts, and byte counts, and a metadata-only source_open_body_imports summary. _validate_attempt_source_replay then dereferences the premise-run row, oracle-repair contrast row, and graph-update candidate for each verifier attempt. Mismatches become typed findings such as VERIFIER_TRACE_SOURCE_REPLAY_MISMATCH, VERIFIER_TRACE_ORACLE_REPLAY_MISMATCH, VERIFIER_TRACE_COLD_RERUN_SOURCE_MISMATCH, or VERIFIER_TRACE_CANDIDATE_REPLAY_MISMATCH; curriculum-source mismatches are checked separately by validate_repair_curriculum.

The realness gate is also mechanical. _runtime_realness_evidence reaches the R4 state only for an exported bundle with verified source modules, at least 30 source replay checks, zero source replay mismatches, at least three attempts, at least nine trace events, at least three failure modes, and a passing toy rerun. The focused tests deliberately perturb the oracle source row, a manifest digest, a verifier-attempt source label, and a curriculum source count; each mutation blocks the verdict or downgrades the realness evidence instead of letting stale copied rows masquerade as proof-lab evidence.

The proof consumer is tests/test_formal_math_verifier_trace_repair_loop.py: it asserts five attempts, 15 trace events, five repair actions, three cold-rerun promotions, three toy-theorem failures repaired into four passing rerun inputs, seven exported source modules, 37 source replay checks, compact-card omission and fresh-result record reuse, public-relative result record paths, no private/body fields in result records, and exact source module copies. Those checks consume the same fixture, bundle, source-module manifest, and mechanism row cited by this page, so the evidence is executable replay accounting rather than a prose-only description.

The governing lattice is deliberately narrow: the bundle binds the module to concept.formal_math_and_proof_witness_bundle, principles P-1, P-2, P-3, P-6, and P-8, axioms AX-1, AX-2, AX-5, and AX-7, and dependency modules for the Lean standard premise index, tactic portfolio availability, target-shape tactic routing, and formal-math premise retrieval. The standard allows only copied Ring2 verifier-trace repair result record schemas and metadata-only public fields. It does not widen a passing replay into Lean/Lake authority, formal-result correctness, proof-body evidence, oracle premise authority, provider authority, human-approval proof authority, publishing-scope decision, launch-scope decision, or whole-system correctness.

Evidence/accounting:

  • Bundle authority: core/paper_module_capsules.json::paper_modules[23:paper_module.formal_math_verifier_trace_repair_loop] sets source_authority: json_capsule, binds the component, binds mechanism.formal_math_verifier_trace_repair_loop.validates_public_verifier_trace_repair_bundle, and resolves src/microcosm_core/organs/formal_math_verifier_trace_repair_loop.py.
  • Generated instance: paper_modules/formal_math_verifier_trace_repair_loop.json reports paper_module_payload.source_authority: json_capsule, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, 17 relationship edges, and resolved paper_module.depends_on.paper_module edges to the Lean standard premise index, tactic portfolio, target-shape routing, and formal-math premise retrieval modules named by the active standard.
  • Runtime, fixture, and bundle: src/microcosm_core/organs/formal_math_verifier_trace_repair_loop.py exposes run, run_loop_bundle, validate_source_module_manifest, _write_receipts, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and SOURCE_MODULE_MANIFEST_REF. The fixture input and exported bundle replay copied Ring2 verifier-trace repair metadata, source-module digests, failure classes, repair actions, promotion gates, and one deterministic public toy-theorem rerun.
  • Result record and test floor: receipts/first_wave/formal_math_verifier_trace_repair_loop/formal_math_verifier_trace_repair_loop_result.json, verifier_trace_repair_board.json, formal_math_verifier_trace_repair_loop_validation_receipt.json, and result records/sign-off/first_wave/formal_math_verifier_trace_repair_loop_fixture_acceptance.json are metadata-only evidence. tests/test_formal_math_verifier_trace_repair_loop.py checks source-module manifest validation, negative cases, toy rerun evidence, and scope limits.
  • Claim boundary: standards/std_microcosm_formal_math_verifier_trace_repair_loop.json and the generated structured source record limit this module to copied Ring2 verifier-trace repair metadata, source-module digests, public fixture result records, and deterministic toy rerun evidence. They do not authorize Lean/Lake authority, formal-result correctness, proof bodies, oracle premise ids, external model access, human approval as proof authority, launch-scope decision, publishing-scope decision, or whole-system correctness.

Reader Evidence Routing

Those rows prove reader wiring, not formal-result correctness.

Route runtime and replay questions through ## Runtime, ## Receipts, and the fixture/bundle paths in the validation command. The fixture runner, exported bundle runner, CLI route, standard, and fixture manifest show how verifier-trace repair accounting is replayed over copied public rows without importing proof bodies, oracle-needed premise ids, model-output data bodies, or private logs.

Route claim-safety questions through ## What It Proves, ## What It Refuses, ## Result record Expectations, and ## Scope limit. If the question is whether the repair loop is still body-safe and result record-backed, run the focused pytest and paper-module corpus check before citing this page.

Prior Art Grounding

This component is grounded in interactive theorem-proving feedback loops and learning environments where failed proof attempts become structured training or repair signals. GamePad and HOList both expose theorem-proving interaction data for machine-learning experiments, while LeanDojo reinforces the need to keep proof assistant feedback, retrieval, and proof-state interaction reproducible.

Microcosm borrows the repair-loop accounting pattern: verifier events, grades, failure classes, repair actions, curriculum deltas, and cold rerun result records are separate fields. It does not treat human or provider advice as formal-result correctness.

Runtime

  • Component runner: python -m microcosm_core.organs.formal_math_verifier_trace_repair_loop run --input fixtures/first_wave/formal_math_verifier_trace_repair_loop/input --out receipts/first_wave/formal_math_verifier_trace_repair_loop
  • Exported bundle runner: python -m microcosm_core.organs.formal_math_verifier_trace_repair_loop run-loop-bundle --input examples/formal_math_verifier_trace_repair_loop/exported_verifier_trace_repair_bundle --out receipts/runtime_shell/demo_project/organs/formal_math_verifier_trace_repair_loop
  • CLI: microcosm formal-math-verifier-trace-repair-loop run-loop-bundle --input examples/formal_math_verifier_trace_repair_loop/exported_verifier_trace_repair_bundle --out receipts/runtime_shell/demo_project/organs/formal_math_verifier_trace_repair_loop
  • Standard: standards/std_microcosm_formal_math_verifier_trace_repair_loop.json
  • Fixture manifest: core/fixture_manifests/formal_math_verifier_trace_repair_loop.fixture_manifest.json

What It Proves

  • A public verifier replay can require trace events before trace grades.
  • Copied Ring2 failure rows can feed a repair curriculum without becoming proof authority.
  • A repair action must name the verifier failure class it responds to.
  • A failure-mode ledger update can be represented without proof bodies.
  • Promotion requires a cold rerun result record reference.
  • Human or provider advice stays advisory until checker evidence exists.

What It Refuses

  • Proof bodies in public verifier traces.
  • Oracle-needed premise ids in public inputs.
  • model-output data bodies in fixtures or result records.
  • Human approval as checker authority or theorem-quality evidence.
  • launch, public sharing, secret export, or general theorem-proving claims.

Result records

  • receipts/first_wave/formal_math_verifier_trace_repair_loop/formal_math_verifier_trace_repair_loop_result.json
  • receipts/first_wave/formal_math_verifier_trace_repair_loop/verifier_trace_repair_board.json
  • receipts/first_wave/formal_math_verifier_trace_repair_loop/formal_math_verifier_trace_repair_loop_validation_receipt.json
  • result records/sign-off/first_wave/formal_math_verifier_trace_repair_loop_fixture_acceptance.json

Validation Result record Path

./repo-pytest tests/test_formal_math_verifier_trace_repair_loop.py -q --basetemp=/tmp/microcosm_formal_math_verifier_trace_repair_loop_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
jq '{edge_count:(.relationships.edges|length), mermaid_status:.paper_module_payload.generated_projections.mermaid.status, atlas_status:.paper_module_payload.generated_projections.atlas_card.status, source_authority:.relationships.source_authority, unresolved_selective_relation_count:(.relationships.unpopulated_selective_relations|length)}' paper_modules/formal_math_verifier_trace_repair_loop.json

Expected generated-row proof: edge_count: 17, mermaid_status: available_from_capsule_edges, atlas_status: linked_from_capsule_edges, source_authority: json_capsule, and unresolved_selective_relation_count: 0.

Scope boundary

Scope limit

The authority boundary is copied Ring2 verifier trace repair public fields only. The component demonstrates control-loop mechanics over real run rows, not formal-result correctness.

Scope limit

This module supports only the reader-verifiable claim that copied Ring2 verifier rows can drive a public verifier-trace repair loop with trace-event requirements, failure-class routing, promotion gates, and metadata-only result records. It does not establish formal-result correctness, expose proof bodies, authorize human or provider advice as proof authority, publish private run logs, approve launch, or certify whole-system correctness.

Formal Evidence Cell Anchor ResolverResolves each proof-flavored math claim to named evidence and flags ones that overreach or lack backing.3/5

Does When the project's writeups make proof-flavored claims about its formal-math work, this component checks each claim against a named piece of recorded evidence and the public reference files in the repo, confirms the claim is no stronger than that evidence allows, and flags claims that have no backing or that overreach. The record shows which claims are anchored to evidence and which are just words, while proof contents and any private file references are kept out of the output.

Scope limit It validates claim-to-evidence anchoring mechanics only: claim-to-cell resolution, source-anchor presence, permitted claim strength, copied-source-module digest checks, and leakage refusals. It does not run Lean/Lake, certify theorem or mathematical correctness, expose proof bodies or non-public source refs, use external model services, or include launch operations/public sharing.

Run
microcosm formal-evidence-cell-anchor-resolver run-anchor-bundle --input examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle --out receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Formal Evidence Cell Anchor Resolver

formal_evidence_cell_anchor_resolver makes Microcosm's formal-math evidence claims inspectable without turning result record summaries into proof authority. It resolves paper-module claims to evidence-cell ids, checks source-anchor refs, records machine-anchor classes, and enforces a claim-strength boundary before any proof-language claim can pass. Its formal-math trace cell anchors the real Ring2 verifier-trace repair result records.

It is not a theorem prover. It does not execute Lean or Lake, expose proof bodies, expose non-public source refs, use external model services, or claim formal-result correctness. It emits real runtime result records over the imported evidence-cell system, carries digest-bearing Ring2 failure-taxonomy and graph-update source refs, and uses secret-exclusion scanning only for account secret-equivalent or non-result record body payloads.

Purpose

Proof-adjacent prose is the easiest place for a claim to drift. A paper module can write "this proves the theorem" or "this is certified" and a cold reader has no cheap way to tell whether the words are backed by a checked artifact or by nothing at all. This component answers one question: when a claim uses proof language, can the words be resolved to a specific piece of public evidence, and does that evidence stay below theorem-correctness authority?

The mechanism is an evidence cell. A cell is a stable id that stands in for a bundle of result record-backed evidence: its source-anchor refs, a machine_anchor_class that names what kind of machine artifact backs it, and the list of claim strengths the cell is allowed to support. The policy proof_language_requires_machine_anchor is the rule that makes the resolver useful. A claim that uses proof language must name a cell, the cell must resolve in the registry, and its source anchors must point at files that actually exist on the public path. A claim that uses proof language but names no cell, or names a cell that is not in the registry, lowers the run to a blocked status rather than passing as green prose.

What is worth noticing is what the cell id buys. It is a compressed handle: one short reference that a reader can follow back to the real result records behind a claim, instead of inlining proof bodies or trusting narrative. Two boundaries sit on top of that handle. Claim strength is capped by the cell, so a claim cannot assert more than its anchored evidence allows. And human approval is refused as a substitute for a machine anchor, which keeps a sign-off from being treated as proof.

Shape

source recordsource recordstructured source recordsource basis: source recordstructured source record source basis: source recorddiagram viewdiagram viewmap viewmap viewthis pagethis pagethis page this pageruntime locusruntime locusfirst-wave fixture inputfirst-wave fixture inputexported evidence-cell anchorbundleexported evidence-cell anchor bundlesource-open body manifestsource-open body manifestvalidation result recordsvalidation result recordsruntime-shell result recordruntime-shell result recordproof boundary + scope limitanchor metadata only, notformal-result correctnessproof boundary + scope limit anchor metadata only, not formal-result correctness

Source refs

source record
core/paper_module_capsules.json::paper_modules[24]
structured source record source basis: source record
paper_modules/formal_evidence_cell_anchor_resolver.json
runtime locus
src/microcosm_core/organs/formal_evidence_cell_anchor_resolver.py
first-wave fixture input
fixtures/first_wave/formal_evidence_cell_anchor_resolver/input
exported evidence-cell anchor bundle
examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle
source-open body manifest
source_module_manifest.json
validation result records
receipts/first_wave/... + receipts/acceptance/...
runtime-shell result record
receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver/...
Diagram source
flowchart TD Bundle["source record core/paper_module_capsules.json::paper_modules[24]"] --> structured source record["structured source record paper_modules/formal_evidence_cell_anchor_resolver.json source basis: source record"] structured source record --> Mermaid["diagram view available_from_capsule_edges"] structured source record --> Atlas["map view blocked_until_organ_atlas_owner_lane_binds_edges"] structured source record --> Reader["this page this page"] Reader --> Runtime["runtime locus src/microcosm_core/components/formal_evidence_cell_anchor_resolver.py"] Runtime --> Fixture["first-wave fixture input fixtures/first_wave/formal_evidence_cell_anchor_resolver/input"] Runtime --> Bundle["exported evidence-cell anchor bundle examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle"] Bundle --> Manifest["source-open body manifest source_module_manifest.json"] Fixture --> Result records["validation result records result records/first_wave/... + result records/sign-off/..."] Bundle --> BundleReceipt["runtime-shell result record result records/runtime_shell/demo_project/components/formal_evidence_cell_anchor_resolver/..."] Result records --> Ceiling["proof boundary + scope limit anchor metadata only, not formal-result correctness"] BundleReceipt --> Ceiling

Read the diagram left to right: the bundle and generated structured source record name the relationships; the runtime validates fixture and bundle inputs; the result records show what passed; the scope limit prevents any of those surfaces from becoming proof, launch, provider, private-system, or theorem-correctness authority.

Reader Evidence Routing

A cold reader should inspect this module through these system surfaces, in order:

  1. Authority seed: core/paper_module_capsules.json::paper_modules[24:paper_module.formal_evidence_cell_anchor_resolver]. This is the source record that binds the Markdown projection, generated JSON, runtime locus, fixture, exported bundle, mechanism rows, and scope boundaries.
  2. Generated structured source record: paper_modules/formal_evidence_cell_anchor_resolver.json. Check relationships.source_authority, the 15 relationship edges, the generated_projections statuses, unpopulated_selective_relations, and the bundle-carried scope limit before trusting any prose summary.
  3. Runtime locus: src/microcosm_core/organs/formal_evidence_cell_anchor_resolver.py. The relevant runtime symbols are run, run_anchor_bundle, validate_source_module_manifest, _build_result, _source_module_summary_card, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_REF, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.
  4. Fixture and exported bundle: fixtures/first_wave/formal_evidence_cell_anchor_resolver/input, examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle, and examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle/source_module_manifest.json. The first-wave fixture exercises negative cases and Ring2 result record anchors; the exported bundle validates six source-open body modules by digest while keeping source bodies out of result records.
  5. Result records: receipts/first_wave/formal_evidence_cell_anchor_resolver/formal_evidence_cell_anchor_resolver_result.json, receipts/first_wave/formal_evidence_cell_anchor_resolver/evidence_cell_anchor_board.json, receipts/first_wave/formal_evidence_cell_anchor_resolver/formal_evidence_cell_anchor_resolver_validation_receipt.json, result records/sign-off/first_wave/formal_evidence_cell_anchor_resolver_fixture_acceptance.json, and receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle_validation_result.json. These result records report pass/fail state, metadata-only public refs, negative-case observations, and explicit release_authorized=false, provider_calls_authorized=false, lean_lake_execution_authorized=false, formal_proof_authority=false, and theorem_correctness_authority=false ceilings.
  6. Focused checks: tests/test_formal_evidence_cell_anchor_resolver.py, scripts/build_doctrine_projection.py --check-paper-module-corpus, and the JSON-row proof query in the validation section below. Those checks validate the reader route and generated-row parity; they do not authorize public sharing or formal proof claims.

Prior Art Grounding

This component is grounded in provenance and proof-certificate work where claims must point at checkable evidence rather than untyped narrative. The W3C PROV model is a general anchor for linking entities, activities, and agents in an evidence graph, while Proof-Carrying Code and small-kernel proof assistants motivate separating a certificate or anchor from the trusted checker that bounds its meaning.

Microcosm borrows the anchor-resolution pattern: proof-language claims must name evidence-cell ids, source anchors, machine-anchor classes, and claim strength limits. It does not turn metadata cells into theorem-correctness authority.

Runtime

  • Component runner: python -m microcosm_core.organs.formal_evidence_cell_anchor_resolver run --input fixtures/first_wave/formal_evidence_cell_anchor_resolver/input --out receipts/first_wave/formal_evidence_cell_anchor_resolver
  • Exported bundle runner: python -m microcosm_core.organs.formal_evidence_cell_anchor_resolver run-anchor-bundle --input examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle --out receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver
  • CLI: microcosm formal-evidence-cell-anchor-resolver run-anchor-bundle --input examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle --out receipts/runtime_shell/demo_project/organs/formal_evidence_cell_anchor_resolver
  • Standard: standards/std_microcosm_formal_evidence_cell_anchor_resolver.json
  • Fixture manifest: core/fixture_manifests/formal_evidence_cell_anchor_resolver.fixture_manifest.json

What It Establishes As Evidence Routing

  • Proof-language claims must resolve to a public evidence cell before this reader treats them as routed evidence.
  • Evidence cells must carry source-anchor refs.
  • Machine-anchor metadata is visible as metadata, not formal-result correctness.
  • Claim strength is bounded by the resolved cell.
  • Secret, account secret-equivalent, or non-result record body payloads must have explicit exclusion result records.
  • The verifier-trace cell is anchored to the first-wave formal_math_verifier_trace_repair_loop result, board, validation result record, and Ring2 failure-taxonomy source digest.

What It Refuses

  • Unknown evidence-cell ids used as proof authority.
  • Proof-language claims without evidence-cell ids.
  • Proof bodies in public claim rows.
  • non-public source refs in public claim or cell rows.
  • Human approval as proof authority.
  • Theorem-correctness claims from metadata cells.
  • launch, public sharing, secret export, or provider authority.

Result records

  • receipts/first_wave/formal_evidence_cell_anchor_resolver/formal_evidence_cell_anchor_resolver_result.json
  • receipts/first_wave/formal_evidence_cell_anchor_resolver/evidence_cell_anchor_board.json
  • receipts/first_wave/formal_evidence_cell_anchor_resolver/formal_evidence_cell_anchor_resolver_validation_receipt.json
  • result records/sign-off/first_wave/formal_evidence_cell_anchor_resolver_fixture_acceptance.json

Validation Result record Path

./repo-pytest tests/test_formal_evidence_cell_anchor_resolver.py -q --basetemp=/tmp/microcosm_formal_evidence_cell_anchor_resolver_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
jq '{edge_count:(.relationships.edges|length), mermaid_status:.paper_module_payload.generated_projections.mermaid.status, atlas_status:.paper_module_payload.generated_projections.atlas_card.status, source_authority:.relationships.source_authority, unresolved_selective_relation_count:(.relationships.unpopulated_selective_relations|length)}' paper_modules/formal_evidence_cell_anchor_resolver.json

Expected generated-row proof: edge_count: 15, mermaid_status: available_from_capsule_edges, atlas_status: blocked_until_organ_atlas_owner_lane_binds_edges, source_authority: json_capsule, and unresolved_selective_relation_count: 0.

Scope boundary

Limitations

This module is a proof-adjacent evidence router, not a proof system. The fixture proves a bounded resolver contract over three paper claims, three evidence cells, seven declared negative-case classes, eight source anchors, three machine anchors, and zero copied source modules in fixture mode. The exported bundle proves the same public runtime shape over three claims, three evidence cells, five source anchors, six copied source-open body modules, and metadata-only result records. These counts are the claim boundary, not a scale claim about the formal-math corpus.

The source-module proof is digest and authority-ref parity for the six exported body modules named by the bundle manifest. It does not establish that every source formal-math source file has been imported, that future source drift is absent, or that copied body availability confers public launch-scope decision. A digest match also excludes exporting proof bodies, non-public source refs, model-output data, oracle material, account secrets, browser UI/operator UI state, or source notes.

The checker rejects unknown cells, missing source anchors, proof language without cells, non-public refs, proof bodies, theorem-correctness overclaims, and human approval as proof authority. That refusal coverage does not certify Lean or Lake execution, formal-result correctness, proof completeness, benchmark performance, deployment posture, or whole-system correctness.

Scope limit

The authority boundary is evidence-cell anchor resolution backed by real runtime result records. The component makes claim boundaries legible; it does not certify mathematical truth.

Scope limit

This module supports only the reader-verifiable claim that public evidence-cell anchor metadata can bind proof-language claims to result record-backed cells and exclude private bodies, proof bodies, model-output data, oracle material, and secret-equivalent refs. Its generated Mermaid/Atlas statuses and relationship counts are JSON-bundle projections; they do not certify formal-result correctness, proof completeness, launch-scope decision, publishing-scope decision, provider authority, or whole-system correctness.

Source and projection details
Governing Lattice Relation

The lattice edge is not just that this page "mentions" formal math evidence. The generated structured source record binds the page to one component, two mechanism rows, concept.formal_math_and_proof_witness_bundle, P-1, P-2, P-3, P-6, P-8, AX-1, AX-2, AX-5, AX-7, the sibling paper_module.formal_math_verifier_trace_repair_loop, and the resolved runtime source locus. That is the governing shape: proof-adjacent claims enter as paper-claim rows, evidence-cell ids, source anchors, machine-anchor classes, and copied source-module manifests; _build_result recomputes the pass or blocked status from those lower-level artifacts; _source_module_summary_card and run_anchor_bundle export compact, metadata-only evidence.

P-1 and AX-1 require a recomputed checker result rather than a label. P-2 and AX-2 keep the scope limit at the strength of the resolver and its certificates. P-3 makes the small resolver/manifest checker the authority surface instead of broad proof-language prose. P-6, P-8, AX-5, and AX-7 explain the blocked path: missing anchors, proof bodies, non-public source refs, source-module digest drift, theorem-correctness language, or human approval as proof authority must lower the status or return a refusal with evidence rather than preserving a green reader claim.

The focused proof consumer is tests/test_formal_evidence_cell_anchor_resolver.py. It asserts the fixture path observes all seven expected negative cases, resolves three claims to three evidence cells, records eight source anchors and three machine anchors, anchors the verifier-trace row to Ring2 result records, keeps formal-proof and theorem-correctness authority false, validates the exported bundle with six copied source modules, rejects theorem-correctness overclaims, rejects digest and rehashed-body swaps, and keeps command-card result records compact and metadata-only. Those checks are the local mechanism witness for the lattice relation.

Source-Open Body Floor

The exported bundle carries a source-open body floor at examples/formal_evidence_cell_anchor_resolver/exported_evidence_cell_anchor_bundle/source_module_manifest.json. It imports the paper-module formal-evidence auditor, formal evidence-cell registry builder, focused runtime tests, public formal-evidence registry state, Erdos257 issue217 evidence-cell manifest, and the std_paper_module formal-evidence-cell contract body. Result records and workingness cards expose digests and validation status, not body text, proof bodies, model-output data, non-public refs, oracle material, or theorem-correctness authority.

Undeclared Library Prior Symbol ClassifierDetects when a checked Lean proof cites a library result outside its approved set.3/5

Does It checks whether a Lean proof cites a library result (a lemma or definition) that was never on its approved list. Even after a prover accepts a proof, that proof can still quietly use a library symbol it wasn't allowed to, and this component surfaces those out-of-bounds uses as an inspectable record that names each symbol and where the rule came from. It matters because "the proof checked" does not mean "the proof stayed within the allowed set of building blocks," and this makes that gap visible without ever reading the proof's own steps.

Scope limit It only projects the symbol-boundary classification mechanic over copied Lean/Std premise rows and pre-extracted symbol observations; it does not read proof source, run Lean or Lake, prove formal-result correctness, treat the whole standard library as an implicit allowlist, claim Mathlib availability, use external model services, or include launch operations.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.undeclared_library_prior_symbol_classifier run --input fixtures/first_wave/undeclared_library_prior_symbol_classifier/input --out receipts/first_wave/undeclared_library_prior_symbol_classifier

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Undeclared Library Prior Classifier

This module is the Microcosm projection of the formal-prover rule that a Lean-accepted proof can still violate the evaluation contract when it uses a real library symbol that was not in the allowed premise set. It is a provenance-bearing symbol-boundary component, not a proof checker.

The fixture carries copied Lean/Std premise rows from the real Ring2 premise-index system and real Ring2 problem ids / candidate artifact digests for the symbol-boundary examples. It records extracted qualified symbol refs and classifies a known symbol outside allowed_premise_ids as UNDECLARED_LIBRARY_PRIOR. If cited_unallowed_premise_ids is present, that explicit budget violation takes precedence and routes as PREMISE_BUDGET_VIOLATION.

The source chain is digest-bearing: the real Ring2 premise index sha256:c78b176388a5e81bd8a785950e7db0c9a65fd38e556515134146163b48604df1, Ring2 run summary sha256:93304410f32d40f5cad1c161c1d01a5d6f353ee10b7cf3fecbaaf7b068b43008, copied Lean/Std premise fixture sha256:0be36ba5b75b40d2ede2d90cefa5181829420df7abbae216d18282b92a30f869, and the adjacent corpus-readiness / tactic-availability result records anchor the Mathlib-absent toolchain boundary.

The exported bundle carries a source-open body floor at examples/undeclared_library_prior_symbol_classifier/exported_symbol_classifier_bundle/source_module_manifest.json. It imports the reducer and set-calibration builder source bodies exactly, plus run bodies for the Ring2 premise index, Ring2 run summary, recipe policy metrics, and result record reduction matrix. The two run-state bodies are path-normalized to <repo-root> and <lean-toolchain-root> while preserving source and target digests, line counts, byte counts, and required anchors.

Purpose

A theorem prover can return a proof that Lean accepts, yet that proof can still break the rules of the evaluation it was run under. The usual reason is simple: the proof reached for a library lemma that the recipe never put on the table. The symbol is real and the proof is sound, but the run quietly used a fact it was not allowed to assume. This component answers one question. Given a set of premises a candidate was allowed to use and the symbols it actually reached for, did it cite a known library symbol that was outside that allowed set?

The unusual choice is what the classifier refuses to do. It does not run Lean, it does not read the proof body, and it does not treat the standard library as an implicit allowlist where anything that exists is fair game. It works only from a copied premise index and a list of symbol observations that were extracted beforehand, and it compares the two. That keeps the check cheap and keeps proof material out of the public result record, but it also means the allowed set is closed by construction: a symbol is admissible only because a premise row names it, never because it happens to live in Lean's standard library.

The check also separates two failure modes that are easy to confuse. An explicit budget breach, where the candidate names a premise id the recipe did not allow, is not the same as a residual breach, where the candidate used an allowed-looking symbol that turns out to be undeclared. The first is settled directly from the cited ids and takes precedence; the second is what the symbol comparison is for. Treating both as one class would either over-escalate honest retries or let genuine out-of-recipe library use slip through as a budget note. Keeping them apart is the point.

Shape

JSON source recordJSON source recordstructured source record19 edges, no selectiveresidualsstructured source record 19 edges, no selective residualsRuntime componentRuntime componentCopied Lean/Std premise index11 sanctioned symbolsCopied Lean/Std premise index 11 sanctioned symbolsPre-extracted symbolobservationsPre-extracted symbol observationsBudgetBudgetKnown qualified symboloutside allowed_premise_idsKnown qualified symbol outside allowed_premise_idsAllowed symbol or no knownundeclared symbolAllowed symbol or no known undeclared symbolPREMISE_BUDGET_VIOLATIONroute: retryPREMISE_BUDGET_VIOLATION route: retryUNDECLARED_LIBRARY_PRIORroute: bridge_escalateUNDECLARED_LIBRARY_PRIOR route: bridge_escalateNONEroute: accept_as_advisoryNONE route: accept_as_advisoryResult record streamfixture, board, validation,sign-offResult record stream fixture, board, validation, sign-offScope limitno Lean/Lake, proof,provider, launch,private-system claimScope limit no Lean/Lake, proof, provider, launch, private-system claim

Source refs

JSON source record
paper_module.undeclared_library_prior_classifier
Runtime component
undeclared_library_prior_symbol_classifier.py
Pre-extracted symbol observations
Nat/List/Bool/Iff/Eq refs
Budget
cited_unallowed_premise_ids present
Diagram source
flowchart TD bundle["JSON source record paper_module.undeclared_library_prior_classifier"] structured source record["structured source record 19 edges, no selective residuals"] runtime["Runtime component undeclared_library_prior_symbol_classifier.py"] premise["Copied Lean/Std premise index 11 sanctioned symbols"] observations["Pre-extracted symbol observations Nat/List/Bool/Iff/Eq refs"] budget["cited_unallowed_premise_ids present"] residual["Known qualified symbol outside allowed_premise_ids"] clean["Allowed symbol or no known undeclared symbol"] retry["PREMISE_BUDGET_VIOLATION route: retry"] escalate["UNDECLARED_LIBRARY_PRIOR route: bridge_escalate"] advisory["NONE route: accept_as_advisory"] result records["Result record stream fixture, board, validation, sign-off"] ceiling["Scope limit no Lean/Lake, proof, provider, launch, private-system claim"] bundle --> structured source record structured source record --> runtime runtime --> premise runtime --> observations observations --> budget observations --> residual observations --> clean budget --> retry residual --> escalate clean --> advisory retry --> result records escalate --> result records advisory --> result records result records --> ceiling

Technical Mechanism

The component separates three questions that are easy to conflate in proof evaluation: whether a candidate explicitly cites a premise outside the recipe, whether it uses a known Lean/Std symbol that was not in the allowed premise set, and whether the theorem is actually correct. Only the first two are in scope. validate_premise_index builds the closed allowlist from copied Lean/Std premise rows, validate_symbol_observations reads pre-extracted qualified symbol observations, and _classify_row applies the precedence rule: cited_unallowed_premise_ids yields PREMISE_BUDGET_VIOLATION with retry; otherwise a known qualified symbol outside allowed_premise_ids yields UNDECLARED_LIBRARY_PRIOR with bridge_escalate; clean or unknown observations remain advisory. The classifier records observed symbols and computed/asserted classes, but it never evaluates proof bodies or runs Lean.

The exported-bundle mechanism is a second boundary rather than a richer proof. validate_source_module_manifest requires source_module_manifest.json, rejects manifest or row-level body_in_receipt: true, verifies six declared body imports against source/target digests, line counts, byte counts, required anchors, material classes, and relation type, and keeps path-normalized Ring2 run-state bodies separate from exact copied reducer bodies. secret_exclusion_scan then checks the declared public fixture and bundle inputs for proof-body, provider-payload, private-ref, and host-path sentinel classes. _write_receipts writes result, board, validation, and sign-off result records; result_card deliberately emits a small pass/fail card that omits source modules, source digests, proof bodies, non-public source refs, secret-scan detail, and scope limit bodies. This is why the module can be source-open about the symbol-boundary system without becoming a proof-body export.

The governing lattice follows the same separation. The bundle binds the component to mechanism.undeclared_library_prior_symbol_classifier.validates_public_symbol_boundary, concept.formal_math_and_proof_witness_bundle, principles P-1, P-2, P-3, P-6, P-8, and P-9, and axioms AX-1, AX-2, AX-5, AX-7, AX-8, and AX-10. The technical claim is therefore limited to public symbol-budget classification over copied, digest-bearing premise evidence. It does not establish theorem truth, Mathlib availability, Lean/Lake execution, launch-scope decision, provider correctness, or complete library allowlisting.

Reader Evidence Routing

Start with the source record, not this prose: core/paper_module_capsules.json::paper_modules[56:paper_module.undeclared_library_prior_classifier] is the source authority that names the component subject undeclared_library_prior_symbol_classifier, the mechanism mechanism.undeclared_library_prior_symbol_classifier.validates_public_symbol_boundary, the code locus src/microcosm_core/organs/undeclared_library_prior_symbol_classifier.py, the concept concept.formal_math_and_proof_witness_bundle, the governing principles P-1, P-2, P-3, P-6, P-8, and P-9, the axioms AX-1, AX-2, AX-5, AX-7, AX-8, and AX-10, and the sibling modules paper_module.corpus_readiness_mathlib_absence_gate, paper_module.tactic_portfolio_availability, and paper_module.lean_std_premise_index.

Then read the generated structured source record paper_modules/undeclared_library_prior_classifier.json. It is the parity projection from the bundle, carrying source_authority: json_capsule, Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, 19 generated relationship edges, and no unpopulated selective relations. The structured source record is evidence that the reader page is wired into the doctrine lattice; it is not theorem-correctness, launch, or runtime-correctness authority.

For runtime behavior, inspect src/microcosm_core/organs/undeclared_library_prior_symbol_classifier.py. The named locus validates projection protocol, premise index, classifier policy, source-module manifest, symbol observations, secret-exclusion scan, result construction, result record writing, and result-card compaction. The load-bearing classifier rule is _classify_row: explicit cited_unallowed_premise_ids short-circuit as PREMISE_BUDGET_VIOLATION with retry; otherwise a known qualified Lean/Std symbol outside allowed_premise_ids classifies as UNDECLARED_LIBRARY_PRIOR with bridge_escalate. Negative cases reject proof bodies, non-public source refs, theorem-correctness overclaims, allowed-symbol false positives, unqualified-token overclaims, and missing escalation.

For public fixture evidence, use fixtures/first_wave/undeclared_library_prior_symbol_classifier/input/. The fixture carries the premise index, classifier policy, projection protocol, symbol observations, and the seven negative-case files named by EXPECTED_NEGATIVE_CASES. For exported source-open body-floor evidence, use examples/undeclared_library_prior_symbol_classifier/exported_symbol_classifier_bundle/source_module_manifest.json. That manifest verifies six source body imports: reducer source, set-calibration builder source, path-normalized Ring2 premise-index state, path-normalized Ring2 run summary, recipe policy metrics, and result record reduction matrix. The manifest keeps body_in_receipt false and checks source/target digests plus required anchors; it does not export proof bodies, model-output data bodies, account or browser state, source notes, or private source-root bodies.

For result records, read receipts/first_wave/undeclared_library_prior_symbol_classifier/undeclared_library_prior_symbol_classifier_result.json, receipts/first_wave/undeclared_library_prior_symbol_classifier/undeclared_library_prior_symbol_classifier_board.json, receipts/first_wave/undeclared_library_prior_symbol_classifier/undeclared_library_prior_symbol_classifier_validation_receipt.json, and result records/sign-off/first_wave/undeclared_library_prior_symbol_classifier_fixture_acceptance.json. The fixture result record reports 11 premises, 3 classifications, 1 undeclared-library prior, 1 premise-budget-precedence case, 1 bridge escalation, 1 retry, zero blocking secret-exclusion hits, and the scope boundary that this is not Lean/Lake, formal-result correctness, provider, private-ref, whole-library-allowlist, or launch-scope decision.

Focused regression coverage lives in tests/test_undeclared_library_prior_symbol_classifier.py. It runs both the fixture command and run-symbol-bundle, checks public-relative result records, verifies digest/manifest boundary failures, and confirms the compact card reuses a fresh result record without exporting source modules, body ids, secret-scan details, source digests, proof bodies, or non-public source refs. The paper-module coverage contract also names this module in tests/test_microcosm_paper_module_coverage_contract.py; that is route coverage evidence, not runtime proof evidence.

Named Proof Consumers

The fixture consumer is microcosm_core.organs.undeclared_library_prior_symbol_classifier run over fixtures/first_wave/undeclared_library_prior_symbol_classifier/input. It proves the public example still classifies 11 copied premise rows and 3 symbol observations into one undeclared-library-prior escalation, one premise-budget retry, and one advisory clean case, while the expected negative cases cover proof-body export, non-public refs, theorem-correctness overclaim, allowed-symbol false positives, unqualified-token overclaims, and missing escalation.

The exported-bundle consumer is microcosm_core.organs.undeclared_library_prior_symbol_classifier run-symbol-bundle over examples/undeclared_library_prior_symbol_classifier/exported_symbol_classifier_bundle. It proves the six source-open body imports remain digest/size/anchor checked and public-safe, including the exact copied reducer and calibration-builder bodies plus path-normalized Ring2 state, recipe metrics, and reduction-matrix bodies. It is the consumer that catches source-module digest drift and manifest-boundary violations; it does not certify formal-result correctness.

The focused regression consumer is tests/test_undeclared_library_prior_symbol_classifier.py. It ties the fixture and bundle commands to public-relative result records, source-module digest mismatch blocking, manifest and row-level body_in_receipt rejection, compact-card redaction, and fresh-card reuse. The corpus consumer is scripts/build_doctrine_projection.py --check-paper-module-corpus, which proves the Markdown remains part of the 98-module Microcosm paper-module corpus. That corpus check is routing and projection parity evidence only; it is not a runtime proof substitute.

Public Mechanics

  • Qualified symbol refs are restricted to Nat, List, Bool, Iff, and Eq namespaces in this public fixture.
  • The closed premise index is an allowlist boundary, not permission to use the whole standard library.
  • UNDECLARED_LIBRARY_PRIOR routes to bridge_escalate because the proof may be informative while still out of recipe.
  • PREMISE_BUDGET_VIOLATION routes to retry and short-circuits the residual symbol classifier.
  • Result records expose ids, candidate artifact digests, symbols, counts, failure classes, source refs, source digests, and scope limits.
  • secret_exclusion_scan records zero blocking hits for the declared sentinel classes in the public result record stream; it is not a complete secret audit, launch clearance, or proof that no private material exists anywhere.

Prior Art Grounding

This classifier is grounded in formal-methods work on premise control and library-aware proof search. Isabelle/Sledgehammer makes relevant-fact selection an explicit part of automated proof search, and Lean/Mathlib practice makes clear that accepted proofs can depend on a large library context. Microcosm uses that insight as a boundary check: an accepted proof artifact is not enough if it quietly used symbols outside the declared premise set. The component classifies the symbol-budget violation without judging theorem truth or exporting proof bodies.

Prior-art anchors:

  • Isabelle Sledgehammer and relevant-fact selection: https://isabelle.in.tum.de/doc/sledgehammer.pdf
  • Lean community Mathlib overview: https://leanprover-community.github.io/mathlib-overview.html
  • Lean 4 tactic and proof environment context: https://lean-lang.org/theorem_proving_in_lean4/Tactics/

Regression Cases

The forbidden proof-body, private-ref, allowed-symbol false-positive, unqualified-token, and theorem-correctness cases are regression-only leakage guards. They are not product evidence and cannot stand in for the copied Lean/Std symbol-boundary system.

Validation Result record Path

Run from microcosm-substrate:

The expected bundle projection is Mermaid available_from_capsule_edges, Atlas linked_from_capsule_edges, and 19 generated relationship edges with no unpopulated selective relations. A green result record proves only the allowed-premise and symbol-budget classification boundary; it does not establish formal-result correctness, run Lean or Lake, expose proof bodies, authorize external model access, claim Mathlib availability, or broaden all Std and Mathlib declarations into allowed priors.

Scope boundary

Scope limit

The JSON bundle and generated row prove only allowed-premise and symbol-budget classification evidence: copied Lean/Std premise rows, real Ring2 ids and digests, extracted qualified symbol refs, declared budget-violation cases, source-open body-floor digest evidence, leakage regression cases, negative cases, and validation result records. They do not prove formal-result correctness, run Lean or Lake, expose proof bodies, use external model services, import non-public source refs, claim Mathlib availability, treat all Std or Mathlib declarations as allowed priors, include launch operations, authorize public sharing, or prove whole-system correctness. They also do not expose model-output data bodies, account or browser state, source notes, or private source-root bodies.

Limitations

The classifier depends on copied, premise rows and pre-extracted qualified symbol observations. It does not parse arbitrary Lean syntax, expand imports, normalize proof terms, or run Lean/Lake to discover symbols. Unknown or unqualified tokens are deliberately kept outside the positive undeclared-library-prior claim unless the public observation and closed premise index make the boundary explicit.

The public source-open body floor is a provenance check, not semantic equivalence for the full private source system. Exact copied bodies and path-normalized run-state bodies are checked for source/target digests, line counts, byte counts, and required anchors; that does not certify every upstream private root, model-output data, account state, or operator context that may have informed the original source run.

The leakage and launch boundaries are also scoped. secret_exclusion_scan checks declared sentinel classes in the public fixture and bundle inputs, while the focused pytest checks regression cases for proof-body export, non-public refs, overclaims, and compact-card redaction. Those checks do not replace a whole-repo secret audit, a public sharing review, theorem-correctness evidence, or a Mathlib availability proof. The paper-module corpus and generated-row checks prove routing parity only.

Scope limit

This module is allowed-premise and symbol-budget classification evidence only. It does not establish formal-result correctness, run Lean or Lake, expose proof bodies, use external model services, import non-public source refs, treat all Std or Mathlib declarations as allowed priors, claim Mathlib availability, or include launch operations.

Scope boundary

This module does not establish formal-result correctness, run Lean or Lake, expose proof bodies, use external model services, import non-public source refs, treat all Std/Mathlib declarations as allowed priors, claim Mathlib availability, or include launch operations.

Source and projection details
Governing Lattice Relation

The governing relation is the path from bundle authority to a bounded proof consumer. The source row binds this module to the undeclared_library_prior_symbol_classifier component, the mechanism mechanism.undeclared_library_prior_symbol_classifier.validates_public_symbol_boundary, the runtime locus src/microcosm_core/organs/undeclared_library_prior_symbol_classifier.py, the concept concept.formal_math_and_proof_witness_bundle, six principles, six axioms, and the sibling paper modules for corpus readiness, tactic availability, and Lean/Std premise indexing.

The principle layer explains why the classifier is a boundary component rather than a theorem authority. P-1 requires the symbol class to be recomputed from premise rows and observations instead of echoed from prose. P-2 lowers the claim to what the checker actually tests: allowed-premise and symbol-budget classification. P-3 concentrates trust in the small component and source-module manifest validators. P-6 fails closed on missing or stale evidence. P-8 turns inadmissible computations into typed outcomes such as PREMISE_BUDGET_VIOLATION and UNDECLARED_LIBRARY_PRIOR. P-9 carries source refs, target refs, digests, and body-material status through the fixture, bundle, and result record layers.

The axiom layer supplies the same ceiling in machine-checkable form. AX-1 requires derivation before assertion, so the page points to fixture and bundle result records instead of declaring theorem truth. AX-2 keeps verification inside kernelized validators. AX-5 prevents an authority upgrade without stronger evidence. AX-7 allows typed partiality and refusal when the proof body, non-public refs, or theorem-correctness claim is inadmissible. AX-8 preserves provenance while keeping proof/provider/private bodies out of public result records.

The generated JSON row currently contributes 19 relationship edges with no unpopulated selective relations. Those edges are evidence of route parity, not new authority: the source authority remains the JSON bundle and the proof authority remains the focused fixture, bundle, and regression consumers.

This page treats those generated navigation surfaces as bundle-derived projections while explaining the resolved symbol-boundary component, code-locus, law, and sibling-paper links.

Ring2 Premise Retrieval Precision Recall HarnessScores how much proof support a premise search found, problem by problem.3/5

Does When a math-proving system searches for the supporting facts ("premises") a proof will need, this component replays saved records of that search and reports, problem by problem, how much of the needed support the search actually turned up. Per problem it labels one of four outcomes: the search found everything needed and the proof went through; it found everything needed but the proof still failed; it found only some of the needed support; or it found none of it. Separating "the proof failed even though every needed premise was found" from "the proof failed because the search missed a needed premise" shows which part to fix. It also runs as a regression guard that refuses inputs which try to slip the answer into the search itself (the known-correct premises planted in the ranked results), leak proof text, tune on the test answers, or claim more than retrieval-quality numbers.

Scope limit These are after-the-fact retrieval-attribution labels and precision/recall counts over copied run records only. The component does not run Lean or Lake, call any provider, expose proof bodies, tune on test answers, claim benchmark performance, prove formal-result correctness, or include launch operations, and its labels are explicitly forbidden from flowing into provider context. The aggregate numbers describe only the copied fixture/bundle replayed, not any benchmark claims.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.ring2_premise_retrieval_precision_recall_harness run --input fixtures/first_wave/ring2_premise_retrieval_precision_recall_harness/input --out receipts/first_wave/ring2_premise_retrieval_precision_recall_harness

EvidenceComputed projectionevidence 3/5Source-faithful refactor

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Ring-2 Premise Precision Recall

ring2_premise_retrieval_precision_recall_harness is the public Microcosm component for evaluating copied Ring-2 premise retrieval rankings against after-the-fact labels.

The component computes precision and recall per problem, then classifies the result as retrieval_hit, partial_retrieval_miss, retrieval_miss, or proof_failure_despite_hit. That distinction matters because a failed proof with all needed premises retrieved is a different failure than a missing premise retrieval path.

Purpose

When a proof search fails, it is easy to blame the prover and miss the simpler cause: the right supporting facts were never put in front of it. This component exists to keep those two cases apart. It answers one question: did the retrieval step actually surface the premises a problem needed, or did the failure happen somewhere downstream after the premises were already in hand?

It answers that by recomputing precision and recall from copied records rather than trusting a reported figure. For each problem it intersects the retrieved premise ids with the labelled needed-premise ids, then reads the proof outcome alongside that overlap. Full recall with a passing proof is a retrieval_hit; full recall with a non-passing proof is proof_failure_despite_hit, the case where retrieval did its job and the fault lies elsewhere. Partial overlap and zero overlap are graded as partial_retrieval_miss and retrieval_miss.

The unusual part is the direction the labels are allowed to flow. The needed premise ids are after-the-fact measurement labels, and the component treats them as strictly one-way: they may be used to score a finished run, but they may not be fed back into the retrieval ranking, used to tune on a test split, or carried into a provider-context recipe. Planting an oracle label inside a ranking, or tuning on test answers, is a typed refusal, not a higher score. The point is a metric that cannot quietly become the very advantage it is meant to measure, and that never inflates a retrieval result into a claim about formal-result correctness.

Shape

source recordsource recordstructured source recordstructured source recordthis pagethis pagediagram viewdiagram viewmap viewmap viewfixture inputfixture inputruntime componentruntime componentexported bundleexported bundleprecision/recall labelsretrieval vs proof-failureattributionprecision/recall labels retrieval vs proof-failure attributionvalidation result recordsfirst_wave + runtime_shellvalidation result records first_wave + runtime_shellnegative casesleakage, tuning, overclaim,missing decoynegative cases leakage, tuning, overclaim, missing decoyproof boundarymetrics and copied artifactsonlyproof boundary metrics and copied artifacts only

Source refs

source record
core/paper_module_capsules.json[42]
structured source record
paper_modules/ring2_premise_precision_recall.json
fixture input
fixtures/first_wave/.../input
runtime component
ring2_premise_retrieval_precision_recall_harness.py
exported bundle
examples/.../exported_ring2_precision_recall_bundle
Diagram source
flowchart TD Bundle["source record core/paper_module_capsules.json[42]"] --> JSON["structured source record paper_modules/ring2_premise_precision_recall.json"] JSON --> Markdown["this page reader projection"] JSON --> Mermaid["diagram view available_from_capsule_edges"] JSON --> Atlas["map view organ_atlas.ring2_premise_retrieval_precision_recall_harness"] Fixture["fixture input fixtures/first_wave/.../input"] --> Runtime["runtime component ring2_premise_retrieval_precision_recall_harness.py"] Bundle["exported bundle examples/.../exported_ring2_precision_recall_bundle"] --> Runtime Runtime --> Metrics["precision/recall labels retrieval vs proof-failure attribution"] Runtime --> Result records["validation result records first_wave + runtime_shell"] Runtime --> Negatives["negative cases leakage, tuning, overclaim, missing decoy"] Result records --> Boundary["proof boundary metrics and copied artifacts only"]

Technical Mechanism

The runtime splits the proof consumer into three evidence classes before it reports any metric. _load_payloads reads the declared fixture or exported bundle inputs; _validate_run_material checks that copied Ring-2 run material carries source refs, target refs, validation refs, digests, and the expected copied_non_secret_macro_body_with_provenance status; and _validate_source_artifacts verifies the four copied source artifacts against either the source digest or the private-path rewrite digest. The result record therefore proves the presence and provenance of the copied public artifacts before the precision/recall scores can be interpreted.

The scoring core is _evaluate. It indexes after-the-fact labels by problem_id, applies the policy default_top_k or per-ranking top_k, truncates retrieved premise ids to that cutoff, intersects retrieved ids with labelled needed-premise ids, and computes precision_at_k = hits/top_k and recall_at_k = hits/needed. Aggregate precision and recall use total hit, candidate, and needed-premise counts, then compare the computed aggregate metrics with the policy's expected values. This is why the paper module can distinguish a retrieval miss from a proof failure after full premise recall without asserting anything about the downstream proof.

The failure taxonomy is mechanical rather than rhetorical. Full recall plus a passing proof is retrieval_hit; full recall plus a non-passing proof is proof_failure_despite_hit; partial overlap is partial_retrieval_miss; and zero overlap is retrieval_miss. The policy floor also requires expected failure modes and an adversarial decoy whose needed premise is absent or missed. Those gates make the metric harness test the shape of the evaluation set, not just the happy path.

The negative cases enforce the scope limit. EXPECTED_NEGATIVE_CASES requires oracle labels planted in rankings, proof-body leakage, test-split tuning, metric-overclaim, and missing-decoy inputs to produce typed refusal codes. The result record-writing path then exposes import ids, target refs, digest status, aggregate counts, failure-mode counts, and secret-scan status while keeping proof bodies, model-output data, and non-public paths outside the public result record. That implements the bundle's P-1/P-2/P-6/P-8/P-9 and AX-1/AX-2/AX-5/AX-7 posture: metrics are recomputed from copied artifacts, blocked states stay blocked, and no metric label becomes Lean, provider, benchmark, or launch-scope decision.

Reader Evidence Routing

  • Bundle authority: core/paper_module_capsules.json::paper_modules[42:paper_module.ring2_premise_precision_recall] names the component subject, mechanism subject, concept ref, principle refs, axiom refs, dependencies, runtime code locus, and projection statuses. Edit the source record, not this page, if those relationships change.
  • Generated structured source record: paper_modules/ring2_premise_precision_recall.json is the structured source record to inspect for source_authority: json_capsule, the 18 generated relationship edges, zero unresolved selective relations, Mermaid available_from_capsule_edges, and Atlas linked_from_capsule_edges.
  • Runtime locus: src/microcosm_core/organs/ring2_premise_retrieval_precision_recall_harness.py owns run, run_precision_recall_bundle, _build_result, _write_receipts, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING. It computes aggregate precision/recall, enforces copied source-artifact digests, writes result records, and carries the provider/proof/launch refusal flags.
  • Fixture and exported bundle: fixtures/first_wave/ring2_premise_retrieval_precision_recall_harness/input/ includes the public input records plus five negative cases; examples/ring2_premise_retrieval_precision_recall_harness/exported_ring2_precision_recall_bundle/ is the runtime-shell bundle. Both routes expose source artifacts under source_artifacts/ while result records carry import ids, target refs, and digest status rather than private proof bodies.
  • Result record and test surfaces: receipts/first_wave/ring2_premise_retrieval_precision_recall_harness/ring2_precision_recall_result.json, receipts/first_wave/ring2_premise_retrieval_precision_recall_harness/ring2_precision_recall_validation_receipt.json, result records/sign-off/first_wave/ring2_premise_retrieval_precision_recall_harness_fixture_acceptance.json, receipts/runtime_shell/demo_project/organs/ring2_premise_retrieval_precision_recall_harness/exported_ring2_precision_recall_bundle_validation_result.json, and tests/test_ring2_premise_retrieval_precision_recall_harness.py are the reader-verifiable validation result records for the local public boundary.

Runtime Surfaces

PYTHONPATH=src python3 -m microcosm_core.organs.ring2_premise_retrieval_precision_recall_harness run --input fixtures/first_wave/ring2_premise_retrieval_precision_recall_harness/input --out receipts/first_wave/ring2_premise_retrieval_precision_recall_harness
PYTHONPATH=src python3 -m microcosm_core.cli ring2-premise-retrieval-precision-recall-harness run-precision-recall-bundle --input examples/ring2_premise_retrieval_precision_recall_harness/exported_ring2_precision_recall_bundle --out receipts/runtime_shell/demo_project/organs/ring2_premise_retrieval_precision_recall_harness

Body-Floor Import

The fixture and exported bundle both carry exact copied source artifacts under source_artifacts/ for the Ring2 aggregate report, graph-variant run summary, graph comparison, and problem-source manifest. The validator treats those four digest-matched files as source_open_body_imports with body_in_receipt=false: workingness can count the real source result record bodies, while result records expose only import ids, target refs, and digest status.

Negative Cases

  • oracle_labels_in_ranking rejects oracle-needed premise ids inside rankings.
  • proof_body_leakage rejects proof, provider, or private body fields.
  • test_split_tuning_attempt rejects retrieval tuned on test labels.
  • metric_overclaim rejects proof, benchmark, provider, launch, or publishing-scope decision claims.
  • missing_adversarial_decoy rejects a metric harness without a decoy miss case.

Prior Art Grounding

This component is grounded in information-retrieval evaluation. NIST's TREC evaluation measures provide the older precision/recall frame for judging retrieval systems, and scikit-learn's precision/recall metric API shows the common machine-learning interface for reporting those labels.

The theorem-proving side is adjacent to premise-selection and hammer workflows, such as Isabelle Sledgehammer, where finding the right facts is a distinct step from replaying a proof. Microcosm keeps that distinction explicit: precision/recall can say whether needed support was ranked, but it cannot become Lean correctness, benchmark performance, or provider-output authority.

Why It Matters

Premise retrieval should be measurable without becoming theorem authority. This component gives Microcosm a compact public harness for asking whether a retrieval path missed the needed support, hit the support but failed later, or hid a dangerous truth-side shortcut inside the public runtime.

Validation Result record Path

From microcosm-substrate/, reproduce this page's proof boundary with temporary result records:

The expected projection row is paper_module.ring2_premise_precision_recall with 18 generated relationship edges, zero unresolved selective relations, Mermaid status available_from_capsule_edges, and Atlas status linked_from_capsule_edges. These checks validate copied retrieval records, metric labels, and bundle result records only; they do not become Lean/Lake, benchmark, provider, or theorem authority.

Scope boundary

Scope limit

This component does not run Lean or Lake, use external model services, emit proof bodies, tune retrieval on test answers, claim benchmark performance, prove formal-result correctness, or include launch operations. Its labels are metric labels only; they are not allowed to flow into provider context recipes.

Scope limit

This module supports only the reader-verifiable claim that copied public premise-retrieval records can be scored for precision/recall labels, adversarial decoys, body-floor imports, and metric overclaim refusals. It does not establish Lean correctness, benchmark performance, provider output quality, theorem truth, launch-scope decision, publishing-scope decision, or whole-system correctness.

Limitations

The harness is a local evidence-accounting check over copied artifacts. It does not execute Lean, Lake, Sledgehammer, or any external prover; it does not inspect proof bodies; and it does not decide whether a theorem is true. A retrieval_hit label means the needed-premise ids appeared in the ranking under this fixture policy, not that the downstream proof search is sound or complete.

The reported precision and recall are bounded by the declared Ring-2 fixture and exported bundle. Different corpora, retrieval cutoffs, premise labels, decoy construction, or source-artifact digests require rerunning the component and cannot be inferred from this page. The negative cases prove specific forbidden flows are rejected here; they do not exhaust all possible leakage, tuning, non-public-state, provider-output, or benchmark-gaming failures.

Source and projection details
Governing Lattice Relation

Ring-2 precision/recall sits between premise retrieval and proof diagnosis. The bundle explains the runtime component and the mechanism.ring2_premise_retrieval_precision_recall_harness.validates_public_premise_retrieval_attribution mechanism, which is grounded in the same component source and in concept.formal_math_and_proof_witness_bundle. That relation is deliberately proof-adjacent rather than proof-authoritative: it can show whether copied retrieval rankings hit the labelled needed premises, but it cannot promote a hit into a Lean proof, a benchmark claim, or a provider-context label.

The governing principles make the scoring path stricter than a label echo. P-1 requires recomputing precision and recall from copied rankings and labels; P-2 keeps the scope limit at metric-checker strength; P-3 concentrates authority in the small harness and focused tests; P-6 keeps missing source artifacts, negative cases, or digests blocked; P-8 turns leakage, tuning, and overclaim cases into typed refusals; and P-9 preserves provenance as records cross from source run artifacts into public fixture and bundle result records. The axiom layer matches that mechanism: AX-1 and AX-2 require derived checker evidence, AX-5 and AX-7 force blocked or refused states instead of inflated metrics, AX-6 keeps the labelled premise domain explicit, and AX-8 prevents metric labels from flowing into forbidden sinks.

Formal Math Lean Proof WitnessCompiles a tiny Lean example with the real prover and records whether it built, leaking no proof text.4/5Runs real tools

Does This takes a small, purpose-built Lean math file (a handful of toy theorems written just for this demo) and a tiny project setup, copies them into a throwaway scratch folder, and actually tries to compile them with the installed Lean theorem-prover and its Lake build tool. It then writes down exactly what happened: whether the Lean and Lake tools were found, whether the build passed, fingerprints (hashes) of the source files, the names of the theorems it defined, and how many lines each file had. It also deliberately feeds in a broken proof and a couple of off-limits files to confirm they get rejected. The point is to show real proof-checking machinery run on a small example, while keeping the written records honest and redacted: no proof text or internal logs leak out, and it states plainly that this is a narrow toy check on one fixture, not a general-purpose proof system.

Scope limit It authorizes only a witness that a tiny declared public toy proof compiled under the locally installed Lean/Lake toolchain in a temporary workspace, plus confirmation that its leakage guardrails fired. It excludes Mathlib/Aesop/Batteries-dependent or general proof or theorem-program authority, external model access, private proof import, benchmark or performance claims, whole-system correctness, or any launch, hosted deployment, or public sharing.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.formal_math_lean_proof_witness run --input fixtures/first_wave/formal_math_lean_proof_witness/input --out receipts/first_wave/formal_math_lean_proof_witness

EvidenceExternal tool runevidence 4/5Real runtime result

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Formal Math Lean Proof Witness

Purpose

This component exists to make one claim checkable instead of asserted: that Microcosm can actually run the Lean toolchain, not merely talk about it. The single question it answers is whether the installed Lean toolchain will compile a declared, tiny synthetic Lean project end to end, and whether that run can be recorded without leaking the proof.

The unusual part is the discipline around the run, not the run itself. The component copies a bounded public Lake project into a temporary workspace and invokes lake build, but the result record keeps only the return code, the standard-output and standard-error line counts, the source hashes, and the declaration names pulled out by a regular expression. The proof text and the raw command output never reach the result record. A reader gets evidence that the build happened and what it contained, without the page becoming a copy of the proof.

Two failure modes drive the design. The first is a proof-assistant integration that reports success without ever running the checker; the witness guards against that by executing a real subprocess and recording its exit status, and by deliberately compiling an invalid Lean file in a negative case to confirm the toolchain rejects it. The second is a circular pass, where the manifest quietly carries the answer. The component refuses manifests that embed a proof_body, a ground-truth proof, provider output, or oracle premise ids, so a green result cannot be smuggled in through the inputs.

The scope is small on purpose. Imports of Mathlib, Aesop, and Batteries are rejected before anything runs, so this is a witness for a toy theorem under a local toolchain, not a claim about library-dependent proof work. That boundary is the point: it shows the result record discipline a larger formal-math component would need, without borrowing authority it has not earned.

Teleology

formal_math_lean_proof_witness is the bounded public crossing from formal-math readiness into an actual local Lean/Lake run. It exists so a cold reader can see Microcosm compile a tiny synthetic proof witness with the installed toolchain while the result records stay redacted, public-relative, and honest about the narrow authority boundary.

Shape

First-wave fixtureFirst-wave fixturerun()include_negative=truerun() include_negative=trueExported public bundleExported public bundlerun_witness_bundle()include_negative=falserun_witness_bundle() include_negative=falseValidate witness manifest:reject embedded proof bodies,oracle ids, non-public sourcerefsValidate witness manifest: reject embedded proof bodies, oracle ids, non-public source refsValidatesource_module_manifest.json:copied public source digests,exact-copy vs replacementValidate source_module_manifest.json: copied public source digests, exact-copy vs replacementCopy Lake project to tempworkspacelake buildMicrocosmProofWitnessCopy Lake project to temp workspace lake build MicrocosmProofWitnessNegative cases run real Lean:invalid proof rejected,Mathlib/Aesop/Batteriesimport blockedNegative cases run real Lean: invalid proof rejected, Mathlib/Aesop/Batteries import blockedStandalone exported-witnesscontractor fresh bundle result recordreuse(no live build)Standalone exported-witness contract or fresh bundle result record reuse (no live build)metadata-only JSON resultrecords:return code, line counts,hashes, declaration namesmetadata-only JSON result records: return code, line counts, hashes, declaration namesScope limit:toy public witness onlyScope limit: toy public witness only

Source refs

First-wave fixture
fixtures/first_wave/.../input
Exported public bundle
examples/.../exported_lean_proof_witness_bundle
Diagram source
flowchart TD A["First-wave fixture fixtures/first_wave/.../input"] --> B["run() include_negative=true"] C["Exported public bundle examples/.../exported_lean_proof_witness_bundle"] --> D["run_witness_bundle() include_negative=false"] B --> E["Validate witness manifest: reject embedded proof bodies, oracle ids, non-public source refs"] D --> F["Validate source_module_manifest.json: copied public source digests, exact-copy vs replacement"] E --> G["Copy Lake project to temp workspace lake build MicrocosmProofWitness"] G --> H["Negative cases run real Lean: invalid proof rejected, Mathlib/Aesop/Batteries import blocked"] F --> I["Standalone exported-witness contract or fresh bundle result record reuse (no live build)"] G --> J["metadata-only JSON result records: return code, line counts, hashes, declaration names"] H --> J I --> J J --> K["Scope limit: toy public witness only"]

Reader Evidence Routing

Route bundle/currentness questions through ## JSON Bundle Binding, the source record, and the structured source record. The expected generated-row evidence is source_authority: json_capsule, edge_count: 8, Mermaid available_from_capsule_edges, Atlas blocked_until_organ_atlas_owner_lane_binds_edges, and zero unresolved selective relations. That evidence proves reader wiring and source authority placement, not formal-result correctness.

Route runtime questions through the runtime locus and the two public input surfaces. The first-wave fixture runs run() against the public Lake project and checks the four expected negative cases. The exported bundle runs run_witness_bundle() against copied public source modules, validates source_module_manifest.json, and records digest/source-module status without placing proof bodies in JSON result records.

Route result record and test questions through the required result record paths, the focused pytest, and the corpus check. The focused test asserts local Lake build success for the tiny witness when Lean/Lake are available, eight compiled declarations, four negative-case observations for the fixture, public-relative redacted result records, five exported source-module rows, source digest checks, metadata-only result record policy, and tamper-blocking behavior. Those validation result records do not authorize Mathlib-dependent proofs, external model access, private proof import, benchmark claims, launch-scope decision, deployment posture, public sharing, hosted deployment, source-file changes, or private-system equivalence.

Public Contract

The component copies examples/formal_math_lean_proof_witness/exported_lean_proof_witness_bundle or the first-wave fixture Lake project into a temporary workspace and runs lake build. The public result record records tool availability, Lake build status, source hashes, declaration names, line counts, negative-case coverage, and the scope limit. It does not export proof bodies in JSON result records.

The accepted witness scope is deliberately small:

  • public synthetic Lean source is allowed;
  • JSON manifests and result records may not embed proof bodies;
  • Mathlib, Aesop, and Batteries imports are rejected until a wider scope limit exists;
  • non-public source refs, model-output data, oracle proofs, and private source run bodies remain outside the public root.

Prior Art Grounding

This component is grounded in the Lean proof-assistant lineage and the broader small-kernel theorem-proving tradition. The Lean theorem prover system description anchors the local Lean/Lake witness route, and the Lean mathematical library shows why proof authority depends on explicit imports, declarations, and checked environments.

Microcosm borrows the proof-witness discipline: a local toolchain run, source hashes, declarations, negative cases, and metadata-only result records must be visible before Lean witness language is allowed. It does not claim Mathlib-dependent proof authority or benchmark performance.

Validation Result record Path

./repo-pytest tests/test_formal_math_lean_proof_witness.py -q --basetemp=/tmp/microcosm_formal_math_lean_proof_witness_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
jq '{edge_count:(.relationships.edges|length), mermaid_status:.paper_module_payload.generated_projections.mermaid.status, atlas_status:.paper_module_payload.generated_projections.atlas_card.status, source_authority:.relationships.source_authority, unresolved_selective_relation_count:(.relationships.unpopulated_selective_relations|length)}' paper_modules/formal_math_lean_proof_witness.json

Expected generated-row proof: edge_count: 8, mermaid_status: available_from_capsule_edges, atlas_status: blocked_until_organ_atlas_owner_lane_binds_edges, source_authority: json_capsule, and unresolved_selective_relation_count: 0.

Scope boundary

Limitations

This module is a bounded public witness, not a formal-proof authority. Its positive evidence is one declared toy Lean/Lake fixture, one exported public witness bundle, five copied source-module body rows, local toolchain metadata, eight compiled declarations when Lean/Lake are available, and four expected negative-case observations. That evidence is enough to show the mechanism's result record discipline; it is not enough to prove arbitrary Lean goals, Mathlib coverage, formal-result correctness, benchmark performance, or private proof import equivalence.

The copied-body floor is public but narrow. Result records may cite source refs, hashes, material classes, declaration names, counts, manifest verdicts, tool-return summaries, and scope limit fields. They may not embed proof bodies, model-output data, oracle answers, non-public source refs, raw command output bodies, account secrets, account or browser state, or private source-root material. The source-open claim is therefore limited to the declared public fixture and exported bundle body classes.

The focused regression validates the stated fixture and exported-bundle shape. It checks streaming source scans, tool-version caching, temporary Lake project reuse, Lake build behavior, public-relative redacted result records, source-module digest parity, standalone exported-bundle handling, tamper rejection, negative case coverage, and the generated-row proof. It excludes future fixture families, Atlas/site public sharing, source-file changes, launch, or a larger formal-math proof claim without the owning builder and launch lanes.

Scope limit

This module authorizes only a tiny public fixture witness compiled by local Lean/Lake in a temporary workspace. It excludes Mathlib-dependent proofs, external model access, private proof import, benchmark performance claims, launch operations, hosted deployment, public sharing, recipient work, secret export, or whole-system correctness.

Scope limit

This module supports only the reader-verifiable claim that a tiny public Lean fixture witness can run in a temporary local workspace, emit metadata-only result records, and expose source hashes, declarations, and negative cases. It does not establish Mathlib-dependent theorems, benchmark performance, provider outputs, private proof imports, launch-scope decision, hosted deployment, publishing-scope decision, secret export safety, or whole-system correctness.

Source and projection details
Governing Lattice Relation

The bundle binds this module to concept.formal_math_and_proof_witness_bundle: public proof-adjacent language must pass through explicit witness artifacts before it becomes reader evidence. Here the witness artifacts are the temporary Lake project copy, local Lean/Lake tool probes, lake build MicrocosmProofWitness, source hashes, declaration metadata, source-module manifest checks, negative-case observations, and metadata-only result records. The Markdown page explains that lattice; it does not upgrade the generated JSON row, the local toolchain, or the copied source body floor into theorem authority.

P-3 is the governing principle edge for claim discipline. The mechanism rows do not ask a reader to trust a proof story from prose; they route the claim through run, run_witness_bundle, validate_source_module_imports, _build_result, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, and SOURCE_MODULE_MANIFEST_NAME. Those symbols are the mechanism's concrete boundary: they decide which public source refs may be copied, which imports are blocked, which negative cases count, and which result record fields may be exposed.

AX-2 supplies the hard law boundary. Public proof claims stay inside declared fixture evidence, public-relative refs, source digests, declaration counts, tool-return metadata, and negative-case verdicts. Proof bodies, model-output data, non-public source refs, stdout/stderr bodies, private source-root material, launch decisions, and whole-system correctness remain outside the module's authority even when the focused test and corpus check are green.

The dependency on paper_module.corpus_readiness_mathlib_absence_gate prevents the most tempting overread. This witness intentionally rejects Mathlib, Aesop, and Batteries imports until a different scope limit exists. A reader can therefore interpret the module as a toy Lean/Lake execution cell upstream of larger formal-math components, not as evidence that Microcosm can certify Mathlib-dependent theorem work.

Verifier Lab KernelFolds nine proof checks into one report labeling each line by which source actually backs it.5/5

Does This assembly point for the Lean/proof toolkit runs nine smaller formal-math pipeline checkers together and folds their results into one leak-proof report that labels every line by where it came from: a Lean verifier, an answer-key (oracle) comparator, an AI suggestion, a retrieval miss, or a row thrown out for breaking the rules. The report carries only references, hashes, counts, and verdicts, never the actual proof text, AI output, or answer-key bodies. One result record shows which claims a Lean verifier actually backed versus which are just hints or were rejected, instead of leaving a pile of separate outputs to be taken on faith.

Scope limit It validates the declared public contract shape of the proof packet and component result records only; it does not establish anything correct, count oracle/provider output as forward proof success, import private or Mathlib-dependent proof bodies, use external model services, change source files, or claim benchmark solve rates, launch, or maturity.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.verifier_lab_kernel run --input fixtures/first_wave/verifier_lab_kernel/input --out receipts/first_wave/verifier_lab_kernel

Paper module Verifier Lab Kernel

verifier_lab_kernel is the public composition root for the formal-math verifier lab. It is not a theorem prover, a benchmark runner, a private Lean import, or a frontend surface. It composes already-public Microcosm components into one leak-proof result record so a reader can see which claim came from a verifier, which claim came from an oracle comparator, which claim came from a provider hypothesis, and which rows were rejected by contract.

The component consumes:

  • a public ForwardProblem packet with target shape, statement summary, public input hash, and allowed premise ids;
  • an OracleSidecar packet that may compare against hidden or hindsight knowledge but never increments forward success;
  • verifier attempts and verifier result classes;
  • provider/NIM hypotheses as advisory residual diagnoses only;
  • CP2 typed action candidates, bounded evidence bodies or raw tactic scripts;
  • bounded Evolve candidates over policy artifacts only.

The runnable fixture also calls the existing public components:

  • tactic_portfolio_availability_probe;
  • target_shape_tactic_routing_gate;
  • formal_math_verifier_trace_repair_loop;
  • formal_math_lean_proof_witness.

Purpose

In a formal-math agent loop, several different things can look like progress. A Lean checker can accept a term. An oracle holding a hindsight answer can say a candidate matches. A provider model can offer a plausible next tactic. A retrieval step can return a premise. Treated loosely, all of these blur into a single sense of "it worked", and oracle or provider success quietly inflates the count of theorems actually proved. This component exists to stop that blur. The one question it answers is: for each row of evidence, which authority class does it belong to, and what may that class claim?

The composition root runs or consumes nine named component components (corpus readiness, Lean Std premise indexing, premise retrieval, tactic availability, target-shape routing, Ring2 precision and recall, verifier trace repair, proof diagnostics, and the Lean proof witness) and sorts every result into seven separate buckets: verifier-checked, provider-suggested, oracle-compared, retrieval-miss, CP2-translated, Evolve-candidate, and contract-rejected. Each bucket keeps its own authority. A passing component cannot lend its standing to a different bucket.

The unusual part is how the boundary is enforced rather than merely described. The kernel keeps two counters, oracle_forward_success_increment_count and provider_results_counted, and they must read zero. An oracle that marks itself as forward success, or a provider hypothesis that claims proof authority, is recorded as a contract violation, not as a result. The same discipline applies to data: forward problems and CP2 actions are scanned for fields that would smuggle in a proof body, an ideal answer, or an oracle's needed premise ids, and CP2 and Evolve outputs are confined to a fixed vocabulary of action classes and policy artifacts. What the reader receives is a single aggregate result record that carries references, digests, counts, and verdicts, with the proof, provider, oracle, and stdout bodies left out.

Shape

Read the verifier lab kernel as a public result record composition route, not as a proof oracle. The local path spine is the bundle and structured source record (core/paper_module_capsules.json::paper_modules[0:paper_module.verifier_lab_kernel], paper_modules/verifier_lab_kernel.json), the runtime composition root (src/microcosm_core/organs/verifier_lab_kernel.py), the public packet (fixtures/first_wave/verifier_lab_kernel/input/verifier_lab_packet.json), and the emitted public result records under receipts/first_wave/verifier_lab_kernel/.

Bundle and structured sourcerecordBundle and structured source recordPublic verifier packetPublic verifier packetComposition rootComposition rootPublic component resultrecordstactic portfolio / targetshape / trace repair / LeanwitnessPublic component result records tactic portfolio / target shape / trace repair / Lean witnessSeparated claim bucketslean_verified |oracle_compared |provider_suggested |retrieval_miss |cp2_translated |evolve_candidate |contract_rejectedSeparated claim buckets lean_verified | oracle_compared | provider_suggested | retrieval_miss | cp2_translated | evolve_candidate | contract_rejectedPublicboard/result/validationresult recordsPublic board/result/validation result recordsScope limitno proof-body import; nooracle/provider forwardsuccess; no launch claimScope limit no proof-body import; no oracle/provider forward success; no launch claim

Source refs

Bundle and structured source record
core/paper_module_capsules.jsonpaper_modules/verifier_lab_kernel.json
Public verifier packet
fixtures/first_wave/verifier_lab_kernel/input/verifier_lab_packet.json
Composition root
src/microcosm_core/organs/verifier_lab_kernel.py
Public board/result/validation result records
receipts/first_wave/verifier_lab_kernel/*.json
Diagram source
flowchart TD bundle["Bundle and structured source record core/paper_module_capsules.json paper_modules/verifier_lab_kernel.json"] packet["Public verifier packet fixtures/first_wave/verifier_lab_kernel/input/verifier_lab_packet.json"] kernel["Composition root src/microcosm_core/components/verifier_lab_kernel.py"] components["Public component result records tactic portfolio / target shape / trace repair / Lean witness"] buckets["Separated claim buckets lean_verified | oracle_compared | provider_suggested | retrieval_miss | cp2_translated | evolve_candidate | contract_rejected"] result records["Public board/result/validation result records result records/first_wave/verifier_lab_kernel/*.json"] ceiling["Scope limit no proof-body import; no oracle/provider forward success; no launch claim"] bundle --> packet --> kernel kernel --> components --> buckets --> result records kernel --> ceiling buckets --> ceiling

Prior Art Grounding

This component is grounded in small-kernel theorem-proving and proof-certificate composition patterns. The LCF approach and HOL Light anchor the idea that a verifier lab should distinguish trusted checked results from heuristics and automation. Lean-oriented work such as LeanDojo adds the modern agent context: retrieval, provider hypotheses, and proof-state interaction need explicit boundaries before they can influence proof claims.

Microcosm borrows the composition discipline: verifier success, oracle comparison, provider hypothesis, CP2 translation, and Evolve candidate rows are separate buckets with separate authority. It does not count oracle or provider success as forward proof success.

The sign-off result record must separate these buckets:

  • lean_verified;
  • provider_suggested;
  • oracle_compared;
  • contract_rejected;
  • retrieval_miss;
  • cp2_translated;
  • evolve_candidate.

The kernel rejects five contract failures:

  • forward problems that carry candidate, ideal, repair, oracle, source proof, proof body, or base-index fields;
  • oracle comparator success counted as forward success;
  • provider hypotheses claiming proof authority;
  • CP2 candidates carrying proof bodies, raw tactic scripts, provider bodies, or oracle templates;
  • Evolve candidates mutating anything outside the bounded policy-artifact set.

Reader Evidence Routing

Cold-reader audit starts with the generated structured source record for this module, not with a broad theorem-proving claim. The structured source record must confirm that verifier and mechanism subjects resolve and that a diagram view and atlas card are available for this module.

Evidence should be read in this order:

  • Module definition: core/paper_module_capsules.json::paper_module.verifier_lab_kernel and paper_modules/verifier_lab_kernel.json.
  • Runtime proof: src/microcosm_core/organs/verifier_lab_kernel.py, the fixture input packet, and the public component calls listed above.
  • Bucket-separation proof: result record rows for lean_verified, provider_suggested, oracle_compared, contract_rejected, retrieval_miss, cp2_translated, and evolve_candidate.
  • Negative boundary proof: rejection of private proof bodies, oracle-to-forward success, provider proof authority, CP2 proof bodies, arbitrary Evolve mutation, source-file changes, benchmark solve-rate claims, launch claims, hosted-deployment claims, and secret export.

Validation Result record Path

./repo-pytest tests/test_verifier_lab_kernel.py -q --basetemp=/tmp/microcosm_verifier_lab_kernel_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This paper module describes public fixture and exported bundle result records only. It excludes private proof-body import, Mathlib-dependent proof authority, oracle-to-forward success, provider proof authority, CP2 proof bodies, arbitrary Evolve mutation, source-file changes, benchmark solve-rate claims, launch, public sharing, hosted deployment, or secret export.

Limitations

The verifier lab kernel is a composition and result record-boundary mechanism. It does not establish formal-result correctness beyond the public component result records it consumes or emits, and it does not create Mathlib import authority when the corpus-readiness gate reports only bounded fixture evidence. A Lean/Lake return code or compiled declaration count is evidence for the corresponding public fixture or exported bundle, not a license to generalize to arbitrary formal math benchmarks.

Oracle structured source record remain hindsight or comparator evidence. They can diagnose a forward problem but cannot increment forward_success; the runtime authority counters must keep oracle_forward_success_increment_count at zero. Provider or NIM hypotheses remain residual diagnoses until a verifier result record or other system effect exists, so provider_results_counted must also remain zero.

CP2 rows are limited to typed action candidates from the bounded action-class vocabulary, with disconfirmation tests before rerun promotion. They are bounded evidence bodies, raw tactic scripts, provider output bodies, or oracle templates. Evolve rows are limited to the named policy-artifact set and must cite baseline or rerun result records; they do not authorize arbitrary source-file changes. Public result records must keep proof, provider, oracle, stdout/stderr, and private-source bodies out of exported evidence.

Coverage is finite: the present proof consumer exercises the first-wave fixture and exported-bundle contracts, the five named negative cases, and the component-stack result record shape. New claim classes, new fixture packets, or new launch/public sharing language need a fresh proof consumer and negative cases before this module can carry them.

Scope limit

This paper module can claim reader wiring for the verifier lab kernel composition root: verifier and mechanism subjects resolve, the runtime source locus is named, a diagram view and atlas card are generated for this module. It cannot claim private proof-body import, Mathlib-dependent proof authority, oracle-to-forward success, provider proof authority, CP2 proof bodies, arbitrary Evolve mutation, source-file changes, benchmark solve-rate claims, publishing-scope decision, hosted deployment, launch-scope decision, secret export, or whole-system correctness.

Fixture result records, exported-bundle result records, focused tests, and public component composition can support only bucket separation across verifier, oracle, provider, CP2, and Evolve rows. The diagram view and atlas card are navigation aids; they do not convert oracle or provider success into forward proof success, and they do not authorize benchmark or launch claims.

Source and projection details
Governing Lattice Relation

The governing lattice should be read as a claim-separation contract. The concept edge to concept.formal_math_and_proof_witness_bundle says the reader is looking at a proof-witness bundle, not a single proof oracle. The mechanism edge to mechanism.verifier_lab_kernel.composes_public_formal_math_receipts narrows that concept to one public operation: compose formal-math component result records into a leak-proof aggregate while keeping verifier, oracle, provider, retrieval, CP2, Evolve, and contract-rejected buckets distinct.

The code-locus edge is the runtime authority boundary. run and run_kernel_bundle select fixture or exported-bundle mode, _build_result loads the public packet and negative cases, validates the proof-lab route, runs or consumes the component stack, scans for forbidden classes, builds claim_separation, and records authority counters. _write_receipts then emits the board, result, validation, and sign-off result records with body_in_receipt: false, the result record-transparency contract, and the same scope boundary.

The nine depends_on paper-module edges are not a loose bibliography. They are the proof-lab dependency spine: corpus readiness, Lean Std premise indexing, premise retrieval, tactic availability, target-shape routing, Ring2 precision and recall, verifier trace repair, proof diagnostic evidence, and the Lean proof witness each remain separately bounded before the kernel aggregates their result records. This prevents a successful component from lending authority to a different bucket. The principle refs P-1, P-2, P-3, P-6, P-8, and P-15, plus axiom refs AX-1, AX-2, AX-5, and AX-7, are therefore read as ceiling law: public result record evidence may be composed, but hidden bodies, provider/oracle success, source-file changes, launch-scope decision, and whole-system correctness cannot cross the lattice boundary.

Focused test evidence checks the same relation. The verifier-lab test asserts that all expected negative cases are observed, all component statuses pass, claim_separation contains exactly the seven public buckets, oracle/provider authority counters stay at zero, body_in_receipt is false, public result record paths do not leak local roots, and legacy redaction fields do not survive result record normalization. Those checks make the lattice relation concrete for this module: the public aggregate result record is evidence of separation and containment, not of unbounded proof authority.

Evidence binding:

  • JSON bundle authority: core/paper_module_capsules.json#paper_module.verifier_lab_kernel.
  • Mechanism source: core/mechanism_sources.json#mechanism.verifier_lab_kernel.composes_public_formal_math_receipts.
  • Component atlas edge: core/organ_atlas.json#verifier_lab_kernel.
  • Runtime source: src/microcosm_core/organs/verifier_lab_kernel.py.
  • First command: PYTHONPATH=src python3 -m microcosm_core.organs.verifier_lab_kernel run --input fixtures/first_wave/verifier_lab_kernel/input --out receipts/first_wave/verifier_lab_kernel.
Verifier Lab Execution SpineRuns Lean on small bounded proof attempts in a temp copy and records what passed or failed.4/5Runs real tools

Does It copies a small public Lean math project into a throwaway temporary workspace and actually runs the Lean/Lake checker on a handful of small, bounded proof-step attempts the tool builds itself. It then writes down what the checker said: which attempts were accepted, which failed, and the failure category for each, plus safety counts (for example, how many attempts tried to sneak in forbidden content and were rejected). The pass/fail facts and the safety counts are readable directly, while the tool never shows the underlying proof text, never calls any outside service, and never modifies the original project or any existing source files.

Scope limit It is a tool-witness result record for bounded public Lean transition rows only: it does not establish general proof authority, count oracle/provider output as proof, export proof bodies or tactic scripts, use external model services, change source files, claim benchmark solve-rates, or include launch operations/public sharing.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.verifier_lab_execution_spine run --input fixtures/first_wave/verifier_lab_execution_spine/input --out .microcosm/verifier_lab_execution_spine

EvidenceExternal tool runevidence 4/5Real runtime result

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Verifier Lab Execution Spine

verifier_lab_execution_spine is the public execution witness for the verifier lab lane. It is narrower than verifier_lab_kernel: it actually runs bounded Lean transition candidates in a throwaway Lake project, records the return code of each run, and keeps every line of generated proof text and tool output out of the result record. A reader can then separate real execution evidence from overstated proof claims.

The component consumes a public execution packet with:

  • transition candidates, each naming a problem id, a target shape, and one action class from a fixed vocabulary (rfl, decide, cases, induction, exact_premise, and similar);
  • a small Lake project whose MicrocosmProofWitness library the component builds once and reuses;
  • CP2 translation requests that ask for the next typed action after a residual, and Evolve mutations that adjust bounded policy artifacts;
  • negative fixtures that smuggle a proof body, an oracle structured source record, a provider hypothesis, or an unbounded source-file changes into a row.

The component writes one .lean file per transition, runs lake env lean on it, and treats a zero exit code as accepted. It records the return code, the action class, and the failure class, but never the proof text, the stdout body, or the stderr body. The exported-bundle lane re-validates the same shape from a copied source-module manifest without re-running Lean, so a third party can inspect the bundle without a Lean toolchain installed.

Purpose

Automated proof systems can blur how a result was obtained. A model can be handed the answer by an oracle, or prompted with the proof by a provider, and still report the result as if it had found the proof unaided. This component exists to keep that blurring out of the result record. It answers one question: did a bounded Lean candidate actually pass the verifier, with no help that the result record is hiding?

The discipline that makes this work is the separation of authority classes. Every row lands in exactly one bucket: lean_verified for candidates the verifier accepted, oracle_compared and provider_suggested for rows that existed only as references, cp2_translated for the typed next-action layer, retrieval_miss and proof_synthesis_fail for residuals, and contract_rejected for anything that broke the leak rules. The unusual choice is what does not happen: an oracle match never increments forward success, and provider text is never counted as a proof. The counters oracle_forward_success_increment_count and provider_results_counted are held at zero by construction.

The second idea is that real execution and clean result records are not in tension. A candidate carrying oracle_visible: true, or a forbidden field such as proof_body or raw_tactic_script, is rejected before Lean is ever invoked, so the run cannot be contaminated. The transition then runs for real, and the result record carries the return code and the failure class while the proof text and the stdout and stderr bodies stay out. The result record is public evidence precisely because the only things omitted are the things that would leak.

Shape

leak foundcleanexit 0non-zeroExecution packettransition candidates, CP2requests,Evolve mutations,oracle/provider refsExecution packet transition candidates, CP2 requests, Evolve mutations, oracle/provider refsLeak contract gateforbidden fields?oracle/provider visible?action class out ofvocabulary?Leak contract gate forbidden fields? oracle/provider visible? action class out of vocabulary?contract_rejectedrejected before Lean runscontract_rejected rejected before Lean runsBuild Lake projectlake buildMicrocosmProofWitness (once,cached)Build Lake project lake build MicrocosmProofWitness (once, cached)Run candidatewrite .lean, lake env lean,return code = accepted?Run candidate write .lean, lake env lean, return code = accepted?lean_verifiedreturn code 0lean_verified return code 0retrieval_miss /proof_synthesis_failnon-zero return coderetrieval_miss / proof_synthesis_fail non-zero return codecp2_translatedtyped next action, no proofbodycp2_translated typed next action, no proof bodyevolve_candidate /evolve_acceptedbounded policy artifacts onlyevolve_candidate / evolve_accepted bounded policy artifacts onlyoracle_compared /provider_suggestedreferences, never counted assuccessoracle_compared / provider_suggested references, never counted as successAuthority countersoracle_forward_success = 0,provider_results = 0,proof_body_export = 0Authority counters oracle_forward_success = 0, provider_results = 0, proof_body_export = 0metadata-only result recordsresult, board, validation,sign-off;return codes kept, bodiesomittedmetadata-only result records result, board, validation, sign-off; return codes kept, bodies omittedScope limitbounded public transitionresult record onlyScope limit bounded public transition result record only
Diagram source
flowchart TD Packet["Execution packet transition candidates, CP2 requests, Evolve mutations, oracle/provider refs"] Gate["Leak contract gate forbidden fields? oracle/provider visible? action class out of vocabulary?"] Rejected["contract_rejected rejected before Lean runs"] Build["Build Lake project lake build MicrocosmProofWitness (once, cached)"] Run["Run candidate write .lean, lake env lean, return code = accepted?"] Verified["lean_verified return code 0"] Residual["retrieval_miss / proof_synthesis_fail non-zero return code"] CP2["cp2_translated typed next action, no proof body"] Evolve["evolve_candidate / evolve_accepted bounded policy artifacts only"] Refs["oracle_compared / provider_suggested references, never counted as success"] Counters["Authority counters oracle_forward_success = 0, provider_results = 0, proof_body_export = 0"] Result records["metadata-only result records result, board, validation, sign-off; return codes kept, bodies omitted"] Ceiling["Scope limit bounded public transition result record only"] Packet --> Gate Gate -->|leak found| Rejected Gate -->|clean| Build Build --> Run Run -->|exit 0| Verified Run -->|non-zero| Residual Packet --> CP2 Packet --> Evolve Packet --> Refs Verified --> Counters Residual --> Counters CP2 --> Counters Evolve --> Counters Refs --> Counters Rejected --> Result records Counters --> Result records Result records --> Ceiling

Evidence/accounting used for this shape:

  • core/paper_module_capsules.json::paper_modules[44:paper_module.verifier_lab_execution_spine] is the source bundle with source_authority: json_capsule, subjects for component: verifier_lab_execution_spine and mechanism.verifier_lab_execution_spine.validates_public_verifier_transition_witness, resolved code_loci.path: src/microcosm_core/organs/verifier_lab_execution_spine.py, and generated projection statuses available_from_capsule_edges / linked_from_capsule_edges.
  • paper_modules/verifier_lab_execution_spine.json::paper_module_payload.source_row carries the generated copy of that source record; relationships.edges has 19 entries and relationships.unpopulated_selective_relations is empty. This is readback evidence only, not an editable source.
  • core/organ_atlas.json::organs[18] classifies the component as evidence_class: external_subprocess_witness, names the first command, resolves the mechanism edge, and restates that the scope limit is bounded public Lean transition rows only.
  • src/microcosm_core/organs/verifier_lab_execution_spine.py defines the runtime spine: EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, RECEIPT_TRANSPARENCY_CONTRACT, ANTI_CLAIM, validate_source_module_imports, _build_lake_project, _build_result, write_receipts, run, and run_execution_bundle.
  • core/fixture_manifests/verifier_lab_execution_spine.fixture_manifest.json names the fixture inputs, four expected negative cases, stable error codes, generated result record paths, result record field floor, and body_copied_material_count: 5 for the exported body-floor lane.
  • examples/verifier_lab_execution_spine/exported_verifier_lab_execution_spine_bundle/source_module_manifest.json records module_count: 5, body_in_receipt: false, exact-copy digest matches, validation refs, and blocked private/external model service payload bodies.
  • result records/sign-off/first_wave/verifier_lab_execution_spine_fixture_acceptance.json records status: pass, accepted_scope: bounded_public_lean_transition_execution_only, accepted_transition_count: 4, residual_transition_count: 2, zero provider/oracle/proof-body/source-file changes counters, the four observed negative cases, and release_authorized: false.
  • tests/test_verifier_lab_execution_spine.py checks fixture execution, exported-bundle structure, source-module digest blocking, metadata-only result record transparency, and exact public body-floor manifest behavior.

Reader Evidence Routing

A cold-reader audit starts with the module definition and structured source record proof, then moves to the fixture and exported bundle.

Evidence should be read in this order:

  • Bundle proof: core/paper_module_capsules.json::paper_module.verifier_lab_execution_spine and paper_modules/verifier_lab_execution_spine.json.
  • Execution proof: declared command intent, fixture input ref, tool version facts, stdout/stderr classification, validator result record refs, and sign-off result record refs.
  • Bundle proof: exported execution-bundle run and the same command/tool/result record membrane in disposable outputs.
  • Negative boundary proof: missing command intent, missing tool facts, missing result record refs, stale execution facts, proof-authority overclaiming, proof-body export, model-output data export, benchmark solve-rate certification, hosted deployment, and launch-scope decision.

Prior Art Grounding

This component is grounded in reproducible execution and proof-assistant witness patterns. Lean/Lake execution inherits from the small-kernel proof-assistant tradition represented by the Lean theorem prover and by LCF/HOL systems such as HOL Light. Artifact evaluation practice also motivates recording command identity, tool facts, stdout/stderr classification, and result record refs separately from the claim they support.

Microcosm borrows the execution-spine discipline: a command can witness that a bounded tool run happened, but tool output must not become theorem-certification or benchmark authority. It does not expose proof bodies or certify solve rates.

Validation Result record Path

Run from microcosm-substrate:

A green result record proves only bounded execution-spine evidence: command intent, tool facts, stdout/stderr classification, result record refs, and explicit missing-fact failures. It does not establish general proof certification, proof-body safety beyond the fixture membrane, benchmark solve rate, hosted deployment, or launch.

Scope boundary

Scope limit

This paper module can claim the following for the verifier lab execution spine: the component subject resolves, the runtime source locus is named, a diagram view is generated for this module, and an atlas card is generated for this module. It cannot claim general proof certification, Mathlib-dependent proof authority, proof-body safety beyond the fixture membrane, benchmark solve-rate certification, provider authority, source-file changes, hosted deployment, launch-scope decision, publishing-scope decision, or whole-system correctness.

Fixture result records, exported execution-bundle result records, focused tests, command intent, tool-version facts, stdout/stderr classification, result record refs, and missing-fact failures can support only bounded execution-spine evidence. The diagram and atlas views are navigation aids derived from the module definition; they do not promote a tool run into proof certification, benchmark authority, or launch-scope decision.

Scope limit

This paper module describes public execution-spine result records only. It does not establish general proof certification, authorize Mathlib-dependent proof authority, expose private proof bodies, certify benchmark solve rates, use external model services, change source files, include launch operations, or authorize hosted deployment.

Certificate Kernel Execution LabRuns the Lean verifier over a small public proof project and reports which rows it accepted.4/5Runs real tools

Does It builds a small public Lean/Lake project, then runs the Lean verifier over declared "transition" rows that reference a set of generated "certificate" declarations, and writes a structured result record showing which rows the verifier accepted, which it left unresolved or rejected, plus the exact build command, return code, and file hashes. The result record is honest, inspectable evidence that a real Lean verifier ran on public material, with proof text, provider/oracle output, and private source deliberately excluded from the result record and that exclusion recorded (not silently dropped) rather than passed off as evidence.

Scope limit It is a local tool-witness that the declared public fixture rows compiled and were adjudicated by the local Lean verifier; it excludes general proof authority, count oracle/provider output as proof, expose proof text, change source files, claim a benchmark solve-rate, or include launch operations.

Run
microcosm certificate-kernel-execution-lab run --input fixtures/first_wave/certificate_kernel_execution_lab/input --out receipts/first_wave/certificate_kernel_execution_lab

EvidenceExternal tool runevidence 4/5Real runtime result

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Certificate Kernel Execution Lab

Abstract

certificate_kernel_execution_lab is a source-available public runtime refactor of the source certificate-kernel pattern. It runs a small Lean/Lake certificate kernel, generated certificate rows, analyzer metadata, CP2 typed-action reruns, and bounded Evolve policy reruns without importing private proof bodies. The exported bundle also carries copied source body modules from the real Erdos #257 certificate-kernel system: Lean kernel files, generated certificates, the strike runner, toolchain files, and Lean profile result records. The v2 fixture carries both a simple NatSumCertificate row family and a miniature BoundedOrderCertificate family so the public lab is no longer only a single-shape arithmetic result record.

Purpose

This component exists to stop a proof-adjacent claim from resting on prose. The single question it answers is narrow: did a small Lean kernel actually compile and accept the declared certificate rows, here and now, with the command, the return code, and the source hashes on record? Everything else in the page is accounting that keeps the answer honest.

The reduction it relies on is the interesting part. A large class of proof-adjacent facts can be expressed as a finite certificate plus a decidable Boolean checker shaped like validate : Cert -> Bool. The agent is never asked to write a human proof. It is asked to supply the right certificate rows, and Lean decides. The fixture carries two checker families, NatSumCertificate over arithmetic and BoundedOrderCertificate over a bounded modular order, so the sign-off is not a single hard-coded shape. A row counts as accepted only when the runner shells out to lake env lean over a temporary copy of the public project and receives exit code 0.

What is unusual is the weight placed on rejection. Deliberately wrong rows, a missing certificate, a bad arithmetic certificate, a bad bounded-order certificate, must fail through the same real Lean route, in the residual class the fixture predicted. A bundle that can show only green sign-off is treated as a replay artifact, not as certificate-kernel evidence. The runner also keeps the proof channel separate from the language model channel: a transition that can see oracle structured source record or provider hypothesis text is rejected before execution, so a model's confidence can never be quietly counted as a proof. The result record records command identity, counts, and verdicts, and never the proof bodies themselves.

Shape

JSON bundle authorityJSON bundle authorityMarkdownmechanism source rowmechanism source rowcertificate-kernel runtimecertificate-kernel runtimefirst-wave Lean fixturefirst-wave Lean fixtureexported certificate bundleexported certificate bundlecertificate manifestcertificate manifestLean/Lake subprocessLean/Lake subprocessLean analyzer metadataLean analyzer metadatatransition trace rowstransition trace rowsCP2 typed-action rerunsCP2 typed-action rerunsbounded Evolve rerunsbounded Evolve rerunssource-module body floorsource-module body floorpublic readoutpublic readoutmetadata-only result recordsmetadata-only result recordsscope limitscope limit
Diagram source
flowchart TD bundle["JSON bundle authority"] markdown["Markdown reader projection"] mechanism["mechanism source row"] component["certificate-kernel runtime"] fixture["first-wave Lean fixture"] bundle["exported certificate bundle"] manifest["certificate manifest"] lake["Lean/Lake subprocess"] analyzer["Lean analyzer metadata"] transitions["transition trace rows"] cp2["CP2 typed-action reruns"] evolve["bounded Evolve reruns"] source_modules["source-module body floor"] readout["public readout"] result records["metadata-only result records"] ceiling["scope limit"] bundle --> markdown bundle --> mechanism mechanism --> component component --> fixture component --> bundle fixture --> manifest bundle --> manifest manifest --> lake lake --> analyzer lake --> transitions transitions --> cp2 cp2 --> evolve source_modules --> analyzer analyzer --> readout evolve --> result records readout --> result records result records --> ceiling

The module shape is a bounded public certificate-kernel execution witness, not general theorem authority. This page points at the mechanism and runtime component; the runtime validates Lean/Lake command identity, source hashes, generated certificate rows, analyzer metadata, transition traces, CP2 typed-action reruns, bounded Evolve reruns, source-module manifest digests, negative cases, public readout, metadata-only result records, and an scope limit.

Mechanism

The mechanism is a finite-certificate execution reducer. The public entrypoints run and run_certificate_bundle both call _build_result, which loads the certificate lab packet, certificate manifest, Lean project, optional negative fixtures, and optional exported-bundle source manifest before any claim is recorded. The fixture path may run Lean/Lake in a temporary public workspace; the exported-bundle path validates the standalone runtime contract and copied body floor without rerunning private source machinery.

The reducer first establishes source and result record boundaries. _input_paths enumerates the public Lean files and JSON inputs, then scan_paths checks them against core/private_state_forbidden_classes.json. _source_module_manifest_result verifies the exported bundle's nine copied source bodies by material class, target presence, required anchors, and SHA-256 equality; _source_open_body_import_summary turns that manifest into the body floor that result records can cite without carrying proof bodies.

Execution evidence is split into three layers. _build_lake_project runs lake build MicrocosmCertificateLab for the fixture path, while _analyze_lean_project records public Lean imports, declarations, line counts, and hashes with body_in_receipt: false. _execute_transitions then sets certificate transition rows through Lean: accepted rows must return zero, missing or bad certificate rows must fail in the expected residual class, and CP2/Evolve rows must rerun within allowed action and artifact classes instead of mutating arbitrary source.

The negative cases are part of the proof consumer, not examples around it. EXPECTED_NEGATIVE_CASES requires rejection of provider/oracle-visible transition rows, CP2 proof-body leakage, Evolve source-file changes, and non-public source refs in the manifest. The focused regression test tests/test_certificate_kernel_execution_lab.py exercises those refusals, digest mismatch handling, cached command-card economy, public readout generation, and the counters that keep oracle/provider/proof-body/source-file changes at zero.

AUTHORITY_CEILING and RECEIPT_TRANSPARENCY_CONTRACT bind the mechanism back to the lattice relation. The module can claim bounded public fixture and bundle evidence over Lean/Lake command identity, certificate rows, analyzer metadata, transition outcomes, CP2/Evolve reruns, source manifest digests, and metadata-only result records. It cannot claim general theorem authority, provider proof authority, benchmark solve rate, private-body equivalence, source-file changes, launch, or whole-system correctness.

Public Surfaces

  • Component runner: python -m microcosm_core.organs.certificate_kernel_execution_lab run --input fixtures/first_wave/certificate_kernel_execution_lab/input --out receipts/first_wave/certificate_kernel_execution_lab
  • Exported bundle runner: python -m microcosm_core.organs.certificate_kernel_execution_lab run-certificate-bundle --input examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle --out receipts/runtime_shell/demo_project/organs/certificate_kernel_execution_lab
  • CLI: microcosm certificate-kernel-execution-lab run --input fixtures/first_wave/certificate_kernel_execution_lab/input --out receipts/first_wave/certificate_kernel_execution_lab
  • Standard: standards/std_microcosm_certificate_kernel_execution_lab.json
  • Fixture manifest: core/fixture_manifests/certificate_kernel_execution_lab.fixture_manifest.json
  • Source-module manifest: examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle/source_module_manifest.json

Prior Art Grounding

This component is grounded in proof-carrying and proof-assistant traditions. Necula's Proof-Carrying Code anchors the idea that an untrusted producer can supply a certificate checked by a small trusted verifier. The Lean theorem prover continues the small-kernel proof-assistant lineage, and LeanDojo shows why reproducible Lean environments, premise access, and programmatic proof-state interaction matter for theorem-proving agents.

Microcosm borrows the certificate-kernel discipline: certificate rows, Lean/Lake command identity, return codes, source hashes, transition traces, negative rows, and metadata-only result records must be visible before proof-adjacent language is allowed. It does not claim general theorem proof authority.

Research Bet

This component is the certificate-kernel bet in runnable form: a large class of proof-adjacent facts can be reduced to a finite certificate plus a decidable Boolean checker. The public lab keeps the agent task narrow. It does not ask the agent to synthesize a human proof; it asks for the right certificate rows, then lets Lean/Lake decide whether the checker accepts them.

The toy path uses a Lean certificate kernel shaped like validate : Cert -> Bool and accepts only when Lean can compile and run the declared check. The source-body import path carries the real Erdos #257 source floor: Lean kernel files, generated certificate shards, toolchain files, and profile result records from the Mathlib formalization family. The result record may say "accepted" only when the public runner shells out to Lean/Lake and receives exit code 0 for the declared bundle.

The negative floor is part of the proof, not decoration. Deliberately wrong certificate rows must be rejected by the real Lean route, including arithmetic and bounded-order failures. A bundle that cannot show genuine rejection cases is only a replay artifact, not certificate-kernel evidence.

Source-Backed Doctrine Binding

  • Component: src/microcosm_core/organs/certificate_kernel_execution_lab.py
  • Bundle: core/paper_module_capsules.json#paper_module.certificate_kernel_execution_lab
  • Mechanism: core/mechanism_sources.json#mechanism.certificate_kernel_execution_lab.validates_public_certificate_kernel_execution
  • Standard: standards/std_microcosm_certificate_kernel_execution_lab.json
  • Evidence class: core/organ_evidence_classes.json::certificate_kernel_execution_lab records external_subprocess_witness at rank 4.
  • Source-module manifest: examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle/source_module_manifest.json declares nine copied Lean/tool/profile body modules.
  • Runtime result record: receipts/runtime_shell/demo_project/organs/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle_validation_result.json
  • Sign-off result records: receipts/first_wave/certificate_kernel_execution_lab/* and result records/sign-off/first_wave/certificate_kernel_execution_lab_fixture_acceptance.json

Reader Evidence Routing

  • Bundle route: core/paper_module_capsules.json::paper_modules[7:paper_module.certificate_kernel_execution_lab] is the JSON authority row. A diagram view is generated for this module; the Atlas card for this module is staged and will appear once the component-atlas lane completes its binding pass.
  • Mechanism route: core/mechanism_sources.json::mechanism.certificate_kernel_execution_lab.validates_public_certificate_kernel_execution binds the validator command, exported-bundle validator command, focused regression, guardrails, input refs, result record refs, and runtime code locus.
  • Runtime route: src/microcosm_core/organs/certificate_kernel_execution_lab.py owns run, run_certificate_bundle, _source_module_manifest_result, _source_open_body_import_summary, _build_result, _receipt_freshness, build_public_readout, EXPECTED_NEGATIVE_CASES, AUTHORITY_CEILING, SOURCE_MODULE_MANIFEST_NAME, BUNDLE_RESULT_NAME, and CARD_SCHEMA_VERSION.
  • Exported-bundle route: examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle is the public runtime bundle. Open source_module_manifest.json before using copied-body counts, then inspect the runtime validation result record and public readout.
  • Focused-test route: tests/test_certificate_kernel_execution_lab.py verifies Lean/Lake execution, analyzer output, transition batching, CP2/Evolve counters, public structured bundle shape, digest mismatch rejection, exact copied source modules, cached command-card economy, transparent metadata-only result records, and the cold-reader public readout.

Cold-Agent Use

Open the source-module manifest first, then the runtime result record, then the component source. The useful claim is not that Microcosm proved the Erdos #257 theorem, solved a benchmark, imported private proof bodies, or gained provider/oracle authority. The useful claim is that Microcosm can force a proof-adjacent story to expose Lean/Lake command identity, return codes, source hashes, declaration counts, certificate rows, transition traces, typed CP2 actions, bounded Evolve reruns, source-module body refs, negative-case result records, and authority counters before certificate-kernel language is allowed.

Re-entry condition: after the sibling organ_atlas.json lane releases, bind this paper-module bundle, mechanism ref, and code locus into the atlas row and rerun python -m microcosm_core.doctrine_lattice --check.

Validation Result record Path

Run the first-wave fixture into disposable result records from the Microcosm root:

Run the exported bundle through the same component:

cd microcosm-substrate
PYTHONPATH=src ../repo-python -m microcosm_core.organs.certificate_kernel_execution_lab run-certificate-bundle --input examples/certificate_kernel_execution_lab/exported_certificate_kernel_execution_lab_bundle --out /tmp/microcosm_certificate_kernel_execution_lab_bundle
cd microcosm-substrate
../repo-pytest tests/test_certificate_kernel_execution_lab.py -q
cd ..
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Authority Boundary

The lab proves only that the declared public Lean fixture compiled and that the declared transition rows were accepted, rejected, or left residual under the local verifier. The copied source body modules are public source-open body material, but result records cite them only by manifest row, hash, class, count, and required anchor. It does not expose proof text through result records, count oracle/provider output as proof authority, change source files, claim benchmark solve-rate, or include launch operations.

Result record Shape

Result records are public evidence. The lab exposes structured theorem/declaration names, Lean/Lake command identity, return codes, hashes, declaration counts, accepted/residual counts, negative-case ids, CP2 action classes, Evolve policy artifact ids, source-module manifest status, copied body-material counts, authority counters, scope limit, and scope boundary. It omits only proof, provider, oracle-answer, private-source, and stdout/stderr payload bodies, and records that omission through secret_exclusion_scan and body_in_receipt: false rather than treating absence as product evidence.

  • Lean/Lake build result record for MicrocosmCertificateLab.
  • Analyzer metadata for public Lean files: imports, declarations, hashes, and line counts with proof bodies omitted from JSON result records.
  • Transition rows for valid certificates, missing certificate rows, bad generated certificate rows, and bounded order-certificate rows.
  • CP2 typed-action translations over missing-certificate residuals, with Lean reruns proving downstream effect.
  • Bounded Evolve mutations over certificate row selection policy, accepted only after reruns and no leakage regression.
  • Source-open body import rows for the real source certificate-kernel body floor: exact copied targets under source_modules/ai_workflow, source/target hashes, material classes, and provenance anchors, with result record body text forbidden.
Scope boundary

This is a source-available certificate-kernel laboratory with copied source body material, not a private source dump and not general proof authority beyond the declared fixture rows and source-module body refs.

Scope limit

This paper module can claim a certificate-kernel laboratory backed by a structured doctrine row, with a diagram view generated from that row. The Atlas card for this module is staged pending the component-atlas lane's binding pass; that is honest coordination state, not a content gap.

It cannot claim formal-result correctness, benchmark solve rate, private proof body export, provider or oracle authority, source-file changes, publishing-scope decision, launch-scope decision, or whole-system proof authority. The Atlas card must be completed by the owning component-atlas/bundle route and builder regeneration, not by hand-editing Markdown.

Limitations

This module is a bounded public execution witness, not a theorem-proving authority. Its evidence depends on the shipped public Lean/Lake fixture, generated certificate rows, analyzer metadata, CP2 typed-action reruns, bounded Evolve reruns, and copied source-module manifest. A green run proves that this certificate-kernel bundle follows those constraints; it does not establish the Erdos #257 theorem, Mathlib coverage, benchmark solve rate, or correctness of private source proof bodies.

The source-open body floor is intentionally narrow. The exported bundle carries nine copied Lean/tool/profile bodies under source_modules/, and the result records may cite only refs, hashes, material classes, counts, required anchors, and verdicts. Proof bodies, raw tactic scripts, model-output data, oracle answers, private source paths, stdout/stderr bodies, account secrets, and private source-root material remain outside the public result record surface.

The focused regression covers the declared fixture and exported bundle shape. It checks Lean/Lake execution boundaries, analyzer output, transition batching, CP2/Evolve counters, digest mismatch rejection, exact copied source modules, cached command-card economy, transparent metadata-only result records, and public readout shape. It excludes future certificate families, generated Atlas/site public sharing, source-file changes, or public launch without the owning builder and launch lanes.

Source and projection details
Governing Lattice Relation

The bundle places this module under concept.formal_math_and_proof_witness_bundle: proof-adjacent public claims must be reduced to explicit witness artifacts before a reader is allowed to treat them as evidence. In this module, the witness artifacts are the public Lean/Lake subprocess result, generated certificate rows, analyzer metadata, transition traces, CP2/Evolve rerun evidence, copied source-module manifest, and metadata-only result records. Markdown explains that lattice; it does not replace the JSON bundle or the validator result records.

P-3 is the governing principle edge for the module's claim discipline. The runtime does not ask whether a proof story is persuasive; it requires a finite certificate family, a named verifier route, visible command identity, explicit return codes, public-relative refs, and result record transparency. That is why the mechanism row binds run, run_certificate_bundle, _source_module_manifest_result, _source_open_body_import_summary, _build_result, _receipt_freshness, and build_public_readout as the code locus instead of treating the paper module as independent proof evidence.

AX-2 is the hard boundary: public proof language must remain inside the declared certificate-kernel execution evidence. The standard's scope limit keeps formal_proof_authority limited to bounded public fixture rows and keeps external model access, oracle success, source-file changes, private-system equivalence, launch-scope decision, runtime correctness, and whole-system correctness false.

The dependency on paper_module.verifier_lab_execution_spine tells a reader how to interpret the lab. The certificate kernel is one proof-adjacent execution cell inside the verifier-lab spine: it can show accepted/residual transition rows and rerun effects, but it cannot promote those rows into launch, public sharing, benchmark, or theorem-authority claims without the sibling verifier and launch lanes.

Proof / Control / Runtime Import BundleChecks fourteen proof, control, and runtime parts as one unit that rejects every overclaim.5/5

Does This bundle imports the Set-4 proof/control/runtime source modules and checks them as one inspectable unit. It surfaces the 14 mechanisms, the copied module manifest, the digest/anchor evidence, and the negative cases that reject proof, benchmark, launch, runtime, and non-public-state overclaims without exposing source bodies in result records.

Scope limit It validates only a public source-open bundle and bounded negative fixtures; it is not an Erdos #257 solution, not benchmark evidence, not public sharing or launch-scope decision, not live Codex/browser/runtime authority, and not private-system equivalence.

Run
microcosm batch4-proof-authority-runtime run --input fixtures/first_wave/batch4_proof_authority_runtime/input --out receipts/first_wave/batch4_proof_authority_runtime --acceptance-out receipts/acceptance/first_wave/batch4_proof_authority_runtime_fixture_acceptance.json

EvidenceVerified source importevidence 5/5Copied source body

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Set 4 Proof, Authority, and Runtime Bundle

batch4_proof_authority_runtime is the public source-open evidence membrane for fourteen source mechanisms that are easy to overclaim: proof search, machine-checked mathematics, reasoning-authority fences, completion planning, Codex runtime diagnostics, bitemporal coordination, taskpolicy wrapping, and context-yield attribution.

Purpose

These fourteen mechanisms sit close to claims a reader will want to make on their behalf. A proof-search benchmark looks like solving open problems. A copied CertificateKernel.lean for Erdos #257 looks like a solution. A reasoning-grant fence looks like a live sandbox. The single question this bundle answers is narrow and deliberately so: can each of these mechanisms be shown to a cold reader as copied, anchored, public source, without any of them quietly inheriting an authority it does not have?

The unusual part is how the bundle resists the easy inflation. It does not run the mechanisms; it imports their source bodies, checks each one against named required anchors, and then recomputes a stable negative case per mechanism from that source rather than trusting a fixture to declare its own verdict. For the Erdos #257 row it runs a static token scan over the copied Lean source and rejects sorry, admit, and axiom, so an absent proof obligation cannot be smuggled in. An optional local Lean/Lake compile probe is wired in too, but a pass means only that the copied kernel elaborated without error, and the code records that as a non-authoritative availability signal, never as formal-result correctness.

The result is a membrane, not a flagship. The interesting claim is the one it refuses: source import is made auditable, every result record stays metadata-only, and each tempting stronger statement is forced into a visible scope boundary with the authority delta held at none.

Abstract

batch4_proof_authority_runtime is a technical paper module for the Set 4 proof/authority/runtime bundle. Its positive claim is deliberately narrow: Microcosm imports exact copied source source modules into a public bundle, checks source digests and required anchors, runs bounded fixture and bundle validators, records semantic negative cases, and emits metadata-only result records with explicit scope limits.

This module does not claim formal formal-result correctness. It is not a Lean/Lake execution component, not an Erdos #257 solution, not an official benchmark result, not live sandbox enforcement, not live Codex orchestration, not external model access, not source-file changes, not publishing-scope decision, not launch-scope decision, not private-system equivalence, and not whole-system correctness. Where the paper mentions Lean/Lake, it distinguishes Set 4's static copied-source checks from sibling witness components that actually run local Lean/Lake processes.

Telos

The Set 4 bundle exists to make proof-adjacent runtime claims inspectable without leaking private roots or inflating source import into proof authority. It gathers fourteen mechanism families that otherwise invite overclaiming: strategy-control proof search, prover-skill foundry work, VeriSoftBench harness diagnostics, Erdos #257 certificate-kernel source anchors, Lean packet replay, dry-run authority grants, completion planning, Codex runtime diagnostics, bitemporal coordination, macOS taskpolicy wrapping, and context-yield attribution.

The paper's job is not to make those systems authoritative by prose. Its job is to explain the public result record membrane: what was copied, what was checked, what negative cases were observed, what was omitted from result records, and which scope limit remains in force.

Mechanism Overview

The public fixture manifest names fourteen mechanism rows and one stable negative case per mechanism:

  • lean_strategy_control_benchmark
  • prover_skill_foundry
  • verisoftbench_harness_differential
  • verisoftbench_calibration_executor
  • erdos257_certificate_kernel
  • lean_full_fidelity_packet_verifier
  • reasoning_execution_authority_grant
  • forward_integration_policy_fence
  • closeout_executor_state_machine
  • codex_cdp_driver
  • codex_idle_heartbeat_fsm
  • metabolism_bitemporal_claim_log
  • macos_taskpolicy_actuator
  • context_yield_attribution

The exported bundle contains nineteen exact copied source source modules. Validation checks their manifest rows, SHA-256 digests, line counts, required anchors, and per-mechanism public exercise clauses. Result records carry source refs, digests, anchors, counts, verdicts, negative-case ids, and scope limits; they do not inline copied body text or private runtime state.

Runtime Mechanism

The runtime has two public entry shapes:

  1. run consumes fixtures/first_wave/batch4_proof_authority_runtime/input, evaluates the Set 4 fixture manifest, writes the public result board, and emits sign-off JSON.
  2. validate-bundle consumes examples/batch4_proof_authority_runtime/exported_batch4_proof_authority_runtime_bundle, validates the copied-source manifest, and emits a bundle validation result record.

Both paths enforce the same ceiling. They validate public fixture evidence and copied-source integrity; they do not run providers, dispatch live Codex state, execute a live sandbox, change source files, submit benchmark results, approve public sharing, approve launch, or establish formal-result correctness.

For the Erdos #257 certificate-kernel row, Set 4 performs a static placeholder-token scan over copied Lean source and ties that scan to target-runner anchor evidence. That scan may reject sorry, admit, and axiom mutations in the copied source floor. It is not a Lean proof check and not a certificate that the open problem has been solved.

Diagram

Public fixture manifest14 mechanism rows + 14negative casesPublic fixture manifest 14 mechanism rows + 14 negative casesExported public bundle19 copied source modulesExported public bundle 19 copied source modulesSet 4 runtimerun / validate-bundleSet 4 runtime run / validate-bundlePer-mechanism source checkmodule present + requiredanchors in bodyPer-mechanism source check module present + required anchors in bodyErdos #257 static scanreject sorry / admit / axiomErdos #257 static scan reject sorry / admit / axiomOptional Lean/Lake probecopied kernel elaborates?availability onlyOptional Lean/Lake probe copied kernel elaborates? availability onlyNegative cases recomputedverdict derived from source,not declaredNegative cases recomputed verdict derived from source, not declaredmetadata-only result recordsrefs, digests, anchors,counts, verdictsmetadata-only result records refs, digests, anchors, counts, verdictsScope limitauthority delta = noneScope limit authority delta = noneSibling Lean/Lake componentsactually run local proofsSibling Lean/Lake components actually run local proofs
Diagram source
flowchart TD fixture["Public fixture manifest 14 mechanism rows + 14 negative cases"] bundle["Exported public bundle 19 copied source modules"] runtime["Set 4 runtime run / validate-bundle"] anchors["Per-mechanism source check module present + required anchors in body"] scan["Erdos #257 static scan reject sorry / admit / axiom"] probe["Optional Lean/Lake probe copied kernel elaborates? availability only"] negatives["Negative cases recomputed verdict derived from source, not declared"] result records["metadata-only result records refs, digests, anchors, counts, verdicts"] ceiling["Scope limit authority delta = none"] leanWitness["Sibling Lean/Lake components actually run local proofs"] fixture --> runtime bundle --> runtime runtime --> anchors runtime --> scan scan --> probe runtime --> negatives anchors --> result records scan --> result records probe --> result records negatives --> result records result records --> ceiling leanWitness -. "separate execution evidence" .-> ceiling

The dashed edge is intentional. Lean/Lake subprocess evidence informs the technical boundary, but Set 4 itself does not inherit proof authority from sibling components.

Semantic Negatives And Threat Model

The negative cases are not decoration. They are the public failure floor that prevents a source-import bundle from becoming an unbounded proof or runtime claim. The fixture includes negatives for weak proof skeletons, low-repair foundry promotion, benchmark truth leakage, prefix-answer leakage, Erdos solution overclaim, packet hash corruption, forbidden authority grants, dirty forward integration targets, stale completion heads, absent CDP ports, stale idle snapshots, expired bitemporal claims, missing taskpolicy binaries, and accepted read guards.

The threat model is overclaiming. A green result record must not be interpreted as:

  • a formal proof of a theorem;
  • a solution to Erdos #257;
  • an official benchmark result or leaderboard submission;
  • a live provider, browser, sandbox, Codex, or metabolism run;
  • authorization to change source files, publish, launch, or export private state;
  • evidence that public copied modules are equivalent to a private root.

Result Interpretation

A passing fixture command evidences that the public manifest, mechanism rows, negative cases, result record body scan, and scope limit are internally consistent for the Set 4 fixture. A passing bundle command evidences that the exported copied-source manifest matches expected digests and anchors while keeping result records metadata-only. A passing focused pytest evidences regression coverage for fixture execution, bundle validation, source digest mismatch, mutated Lean body rejection, exact-copy imports, private-body omission, and semantic negative-case evaluation.

These are engineering result records. They are not formal proof certificates. They support public reader confidence in the bundle's source-open evidence membrane; they do not certify theorem truth, benchmark claims, launch-scope decision, or whole-system correctness.

Relationship To Formal-Proof Concepts

Set 4 relates to formal-proof practice through boundary discipline, not through theorem authority. The local concept edge is concept.formal_math_and_proof_witness_bundle: proof-adjacent claims must pass through explicit witness artifacts, source refs, digests, declaration or anchor metadata, negative cases, and metadata-only result records before they become reader evidence.

The sibling formal_math_lean_proof_witness component supplies the small public Lean/Lake witness pattern. The sibling certificate_kernel_execution_lab component supplies the bounded certificate-kernel execution pattern. Set 4 imports and validates copied source-body evidence around those themes, but it keeps the authority delta at none.

This distinction is the main technical result of the paper: a source-open public bundle can be useful without pretending to be a formal proof. It can make evidence auditable, show exactly where a proof-adjacent route stops, and force every tempting stronger claim into a visible scope boundary.

Data And Artifact Availability

The public artifact boundary is the standalone microcosm-substrate root. A cold reader should use the paper module, generated structured source record, standard, fixture manifest, exported bundle manifest, focused test, and metadata-only result records inside that root. Public links and public sharing surfaces must resolve to the public Microcosm system, not private source roots, model-output data stores, browser state, prompt-shelf bodies, or operator-voice material.

Prior Art Grounding

The runtime keeps the authority to act separate from the evidence that an action is permitted. This is the idea behind proof-carrying code (Necula, 1997) and capability-based security, where a request arrives with evidence of its own legitimacy rather than relying on ambient trust. Microcosm borrows the proof-before-authority ordering over fixtures; the result is fixture-bound evidence, not a verified authorization system or launch-scope decision.

Reproducibility Route

Run these commands from microcosm-substrate/ when validating this module without changing durable generated projections:

The projection checks for the broader paper-module corpus remain:

PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus
PYTHONPATH=src ../repo-python scripts/build_doctrine_projection.py --check

The direct runtime commands and focused pytest are the minimum useful validation.

Validation Result record Path

Reader-verifiable commands, run from the microcosm-substrate/ public root:

PYTHONPATH=src python3 -m pytest tests/test_batch4_proof_authority_runtime.py -q
PYTHONPATH=src python3 scripts/build_doctrine_projection.py --check-paper-module-corpus

These are reader-verifiable evidence only and do not include launch operations, external model access, source-file changes, or whole-system correctness.

Scope boundary

Source Authority And Projection Boundary

The source authority for this paper-module identity is the JSON source record:

  • core/paper_module_capsules.json::paper_modules[77:paper_module.batch4_proof_authority_runtime]
  • generated structured source record: paper_modules/batch4_proof_authority_runtime.json
  • local standard: standards/std_microcosm_batch4_proof_authority_runtime.json
  • runtime locus: src/microcosm_core/organs/batch4_proof_authority_runtime.py
  • focused validator: tests/test_batch4_proof_authority_runtime.py

It may explain the source record, the generated relationship set, and the validation route, but it does not mint new subject edges, proof authority, Mermaid authority, Atlas authority, or launch status. Future relationship changes belong in the source record plus builder regeneration, not in hand-authored Markdown.

Lean/Lake Witness Boundary

Set 4 should be read as the import/result record bundle, not as the Lean/Lake executor. Actual local Lean/Lake subprocess evidence lives in sibling public components:

  • formal_math_lean_proof_witness runs a tiny public Lean/Lake fixture and exported witness bundle, records local tool availability, build status, declaration metadata, four negative-case observations, and metadata-only result records. Its scope limit is toy public witness evidence only; it rejects Mathlib, Aesop, and Batteries authority unless a wider authority plane is introduced.
  • certificate_kernel_execution_lab runs a bounded public certificate-kernel lab through Lean/Lake machinery, records command identity, transition rows, accepted/residual counts, copied-source manifest status, negative cases, and metadata-only result records. Its scope limit is bounded certificate-kernel evidence, not general theorem authority.

Therefore the correct reading is layered:

  • Set 4 validates source-open source-body import, static placeholder scanning, authority-boundary fields, and semantic negatives.
  • The Lean/Lake witness components validate that specific public fixtures can route through local Lean/Lake subprocesses under their own ceilings.
  • None of these pages, individually or together, claim arbitrary formal-result correctness, Mathlib-dependent proof authority, benchmark claims, Erdos #257 solution status, publishing-scope decision, launch-scope decision, or private-system equivalence.
Public/Private Boundary

Allowed public material:

  • mechanism ids, source-module ids, negative-case ids, and stable error codes;
  • exact copied source modules in the exported public bundle;
  • source refs, SHA-256 digests, line counts, required anchors, and bounded outcomes;
  • scope limits, scope boundaries, and metadata-only validation verdicts.

Forbidden public material:

  • keys, account secrets, browser state, account or browser state, model-output data bodies, browser UI live-access material, live Codex state exports, live metabolism DB exports, private runtime state, source notes, prompt-shelf bodies, theorem work-product bodies, raw command-output bodies, public sharing operation state, and official benchmark submission state.

The exported bundle may contain approved copied source modules. The result records are stricter: they identify copied modules by refs, digests, anchors, classes, counts, and verdicts, not by inlining source bodies.

Limitations

The current module has these hard limits:

  • Set 4 does not execute Lean/Lake; it performs static checks over copied source and validates public manifest evidence.
  • Static placeholder-token scanning is bounded evidence checking.
  • Digest and anchor equality do not prove semantic equivalence to a private root.
  • Negative-case coverage is finite and fixture-bound.
  • metadata-only result records improve public safety, but they are not a substitute for formal proof review.
  • Generated Mermaid, Atlas, and JSON structured source record are projections; they do not create source authority.
  • Accepted-component status means accepted current public result record inventory for this verified source-body import, not launch, public sharing, benchmark, or theorem authority.
Scope limit

This module may claim fixture-bound public source-body import, exact copied source-module digest checks, required-anchor checks, static placeholder-token scan evidence, dry-run authority-boundary evidence, semantic negative-case evidence, and metadata-only result record discipline.

It may not claim theorem success, Lean formal-result correctness, Erdos #257 solution status, official benchmark claims, live sandbox enforcement, live Codex orchestration, external model access, source-file changes, publishing-scope decision, launch-scope decision, private-system equivalence, or whole-system correctness.

Proof Derived Governed Mutation AuthorizationChecks a synthetic change-authorization record for its proof-and-approval chain, bound to a real commit.5/5

Does Replays a make-believe example of "should this change be allowed to run?" and shows, step by step, why each proposed action was permitted. All three actions (a look-only inspection, a small config write, and an undo of that write) had to carry proof evidence and two visible policy approvals before anything was admitted; on top of that, the two actions that actually change something (the config write and the undo) also had to show a logged record of the change and a matching undo result record. Just holding a password or account secret is never treated as permission, and nothing here touches a real account or makes any real change.

Scope limit It validates only a declared, synthetic governed-mutation contract and excludes live cloud/account action, standing account secrets, source or irreversible mutation, policy-after-execution, hidden votes, external model access, benchmark-score claims, or launch.

Run
PYTHONPATH=src python3 -m microcosm_core.organs.proof_derived_governed_mutation_authorization run --input fixtures/first_wave/proof_derived_governed_mutation_authorization/input --out receipts/first_wave/proof_derived_governed_mutation_authorization --acceptance-out receipts/acceptance/first_wave/proof_derived_governed_mutation_authorization_fixture_acceptance.json

EvidenceContract validatorevidence 5/5Import validation

formal-methodstheorem-provinglean

Source Design note · Source atlas

Paper module Proof-Derived Governed Mutation Authorization

proof_derived_governed_mutation_authorization is the public mutation-authority replay component for showing that a mutation proposal cannot grant itself authority. It validates a synthetic governed-mutation bundle where read-only inspection, scoped config write, and rollback proposals are admitted only when proof cells, visible pre-execution policy verdicts, side-effect logs, rollback result records, cold replay, negative cases, non-public-state scan, and scope limits line up.

This module is source-backed public doctrine, not the source of authority. The source rows are the JSON bundle, mechanism registry row, component atlas binding, standard contract, fixture, exported bundle, component source module, and result records named below. Markdown remains an authored projection over those rows.

Purpose

The component answers one question: can a mutation proposal acquire the authority to change something just by asserting that it should? In an agent system the danger is an action that grants itself permission, for example by claiming a standing account secret, by recording a governance-vote nobody can see, or by reporting success after the fact. This fixture is the boundary that refuses each of those moves.

Authorisation here is derived, not asserted. A proposal is admitted only when an independent chain resolves: redacted proof cells that name validator result records, at least two visible policy verdicts evaluated before any execution identity is minted, a logged side-effect diff for write and rollback proposals, a paired rollback result record, and a cold-replay result record. The validator recomputes an evidence-chain hash from those resolved rows and rejects the proposal if the declared hash does not match. Impressive language, an admin-looking identity, or a final answer that says it worked all fail on their own.

The less obvious part is the anti-bake gate. Passing the synthetic chain is not enough: every authorised proposal must also bind to a real repository record, a concrete git commit that the validator resolves with a git subprocess and checks touched this component's own source or its focused test. The validator then re-derives the proof, policy, and rollback refs from the evidence indices and compares them to what the record declares. A fixture cannot pre-bake its answer, because the answer is reconstructed from real commit scope and the resolved rows rather than read from the file. The fixture admits exactly three synthetic proposals (read-only inspection, scoped config write, rollback) and rejects eight named overclaims; none of this grants any live mutation authority.

Shape

  • Subject: proof_derived_governed_mutation_authorization, with mechanism mechanism.proof_derived_governed_mutation_authorization.validates_synthetic_governed_mutation_authorization.
  • Runtime locus: src/microcosm_core/organs/proof_derived_governed_mutation_authorization.py, especially run, run_authorization_bundle, validate_mutation_proposals, validate_proof_evidence_cells, validate_policy_verdicts, validate_side_effect_ledger, validate_rollback_receipts, validate_cold_replay, _source_module_manifest_result, _source_open_body_import_summary, EXPECTED_NEGATIVE_CASES, and AUTHORITY_CEILING.
  • The positive fixture admits exactly three synthetic proposals: read-only inspection, scoped config write, and rollback.
  • Every admitted proposal must bind intent bundle refs, proof-cell validator result records, visible pre-execution policy verdicts, ephemeral execution identity refs, an evidence-chain hash, cold replay refs, and an scope limit.
  • Write and rollback proposals also need logged side-effect diff refs and a paired rollback result record before authorization.
  • The exported bundle imports six copied source bodies through source_module_manifest.json and validates them by exact-copy digest evidence without exporting source body text in result records.
matchmismatchreal record boundunbound or baked3 synthetic proposals:read-only, scoped write,rollback3 synthetic proposals: read-only, scoped write, rollbackvalidator-backed proof refsvalidator-backed proof refs2+ visible verdictsbefore execution identity2+ visible verdicts before execution identitylogged diff for write /rollbacklogged diff for write / rollbackpaired rollback result recordpaired rollback result recordcold rerun per proposalcold rerun per proposalRecompute evidence-chain hashdeclared == derived?Recompute evidence-chain hash declared == derived?real repo record + git commitrefreal repo record + git commit refAnti-bake gategit commit touched thissource/test?re-derived refs matchdeclared?Anti-bake gate git commit touched this source/test? re-derived refs match declared?6 copied source bodiesverified by digest6 copied source bodies verified by digest8 negative casesstanding account secret,hidden vote,policy-after-execution, ...8 negative cases standing account secret, hidden vote, policy-after-execution, ...metadata-only result recordsresult, board, validation,sign-offmetadata-only result records result, board, validation, sign-offscope limitno account secrets, livemutation, provider,source-file changes, hosting,public sharing, or launchscope limit no account secrets, live mutation, provider, source-file changes, hosting, public sharing, or launchEvidenceEvidence

Source refs

3 synthetic proposals: read-only, scoped write, rollback
mutation_proposals.json
validator-backed proof refs
proof_evidence_cells.json
2+ visible verdicts before execution identity
policy_verdicts.json
logged diff for write / rollback
side_effect_ledger.json
paired rollback result record
rollback_receipts.json
cold rerun per proposal
cold_replay.json
real repo record + git commit ref
governed_mutation_records.json
6 copied source bodies verified by digest
source_module_manifest.json
Diagram source
flowchart TD Proposals["mutation_proposals.json 3 synthetic proposals: read-only, scoped write, rollback"] subgraph Evidence["Resolved evidence chain"] ProofCells["proof_evidence_cells.json validator-backed proof refs"] Policies["policy_verdicts.json 2+ visible verdicts before execution identity"] Effects["side_effect_ledger.json logged diff for write / rollback"] Rollbacks["rollback_receipts.json paired rollback result record"] Replay["cold_replay.json cold rerun per proposal"] end Hash{"Recompute evidence-chain hash declared == derived?"} Records["governed_mutation_records.json real repo record + git commit ref"] AntiBake{"Anti-bake gate git commit touched this source/test? re-derived refs match declared?"} SourceManifest["source_module_manifest.json 6 copied source bodies verified by digest"] Negatives["8 negative cases standing account secret, hidden vote, policy-after-execution, ..."] Result records["metadata-only result records result, board, validation, sign-off"] Ceiling["scope limit no account secrets, live mutation, provider, source-file changes, hosting, public sharing, or launch"] Proposals --> Evidence Evidence --> Hash Hash -->|match| AntiBake Hash -->|mismatch| Negatives Records --> AntiBake AntiBake -->|real record bound| Result records AntiBake -->|unbound or baked| Negatives SourceManifest --> Result records Negatives --> Result records Result records --> Ceiling

How it works

Take the scoped config write proposal. To be admitted it must carry the fourteen required fields, including proof_cell_refs, policy_verdict_refs, policy_evaluated_before_execution, side_effect_class, evidence_chain_hash, and cold_replay_ref. The validator then checks each one against the other input files rather than trusting the proposal's own summary.

For the proof refs it confirms each cell names the same proposal, carries evidence refs and validator-result record refs, is body-redacted, and does not export a proof body. For the policy refs it counts how many verdicts are visible to the result record, are not hidden votes, read allow or warn, and resolve back to a proof cell for that proposal. Fewer than two visible resolving verdicts blocks the proposal under GOV_MUT_CONSENSUS_WITHOUT_EVIDENCE. Because a scoped write has a reversible side effect, it also needs a logged diff ref in the side-effect ledger and a passing rollback result record for the same proposal. A write or rollback proposal with no rollback ref is rejected as an irreversible mutation.

The validator then recomputes the evidence-chain hash. It hashes the resolved proof digests, policy digests, side-effect ref, rollback ref, and cold-replay ref together and compares the result to the proposal's declared evidence_chain_hash. A mismatch fails the proposal, so the hash cannot be a hand-written constant. Only after the synthetic chain resolves does the real-record gate run. The governed-mutation record must declare a repo record class, a forty-character-or-shorter hex commit ref, and source refs covering git, mission-transaction, work-landing, and ledger material. The validator shells out to git to confirm the commit exists and that its changed files include this component's source module or its focused test, and it re-derives the proof, policy, and rollback refs from the indices so the record's claims must match independently computed values. An authorised proposal whose proposal id is not in the accepted real-record set is downgraded to blocked. The result is that a green result record requires three synthetic proposals, three real records bound to real commits, and a matching anti-bake status, none of which a static fixture can fake.

Public Contract

  • The source pattern is proof_derived_governed_mutation_authorization_compound.
  • The fixture lives at fixtures/first_wave/proof_derived_governed_mutation_authorization/input/.
  • The runtime example lives at examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle/.
  • The validator is microcosm_core.organs.proof_derived_governed_mutation_authorization.
  • The governing standard is standards/std_microcosm_proof_derived_governed_mutation_authorization.json.
  • The component model row is core/organ_atlas.json#proof_derived_governed_mutation_authorization.
  • The sign-off row is core/organ_registry.json#proof_derived_governed_mutation_authorization.

The fixture has three positive proposals: read-only inspection, scoped config write, and rollback. Every admitted proposal must cite an intent bundle, scope limit, proof cell, visible policy verdicts, ephemeral execution identity, evidence-chain hash, and cold replay ref. Write and rollback proposals also require logged side-effect diff refs and a verified rollback result record paired before the mutation is admitted.

Source-Backed Mechanism

The mechanism row mechanism.proof_derived_governed_mutation_authorization.validates_synthetic_governed_mutation_authorization points at these runnable source loci:

  • run and run_authorization_bundle for fixture and exported-bundle entry.
  • validate_mutation_proposals, validate_proof_evidence_cells, validate_policy_verdicts, validate_side_effect_ledger, validate_rollback_receipts, and validate_cold_replay for the authorization predicate.
  • _source_module_manifest_result and _source_open_body_import_summary for digest-verified copied source-body evidence without body text in result records.
  • EXPECTED_NEGATIVE_CASES and AUTHORITY_CEILING for falsification and scope boundary enforcement.

The exported governed-mutation bundle imports six source bodies through examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle/source_module_manifest.json. Those bodies are copied into source_modules/ with digest provenance:

  • state/microcosm_portfolio/extracted_patterns_ledger.jsonl
  • state/microcosm_portfolio/reconstruction/high_novelty_substrate_gap_scout_v1.json
  • tools/meta/control/mission_transaction_preflight.py
  • tools/meta/control/scoped_commit.py
  • tools/meta/factory/work_ledger.py
  • system/lib/work_landing_status.py

Result records may report module ids, refs, counts, classes, hashes, and verdicts. They may not duplicate source body text, proof bodies, governance-vote bodies, model-output data, account secrets, account refs, or live access material.

Reader Evidence Routing

  • Open standards/std_microcosm_proof_derived_governed_mutation_authorization.json for required witnesses, negative-floor classes, denied authority, result record expectations, validator contract, and source refs.
  • Open core/fixture_manifests/proof_derived_governed_mutation_authorization.fixture_manifest.json for positive fixture inputs, eight negative fixtures, body-import summary, durable result record refs, and source-open omission rules.
  • Open examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle/source_module_manifest.json before inspecting copied source modules; result records carry refs, hashes, counts, and verdicts, not copied source body text.
  • Open tests/test_proof_derived_governed_mutation_authorization.py for the focused assertions on proposal counts, negative cases, source-module digest mismatch, public-relative redaction, and card result record reuse.
  • Run the fixture or exported-bundle route from microcosm-substrate/. The CLI supports --card, but it does not expose a --json flag.
  • Use scripts/build_doctrine_projection.py --check-paper-module-corpus to verify this Markdown projection still satisfies the shared paper-module coverage contract.

First Commands

From microcosm-substrate/, a cold agent can refresh the fixture result records with:

The exported bundle validator proves the copied source-body floor without writing durable result records:

PYTHONPATH=src python3 -m microcosm_core.organs.proof_derived_governed_mutation_authorization run-authorization-bundle --input examples/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle --out /tmp/microcosm-proof-derived-governed-mutation --card

Evidence Result records

  • receipts/first_wave/proof_derived_governed_mutation_authorization/proof_derived_governed_mutation_authorization_result.json
  • receipts/first_wave/proof_derived_governed_mutation_authorization/proof_derived_governed_mutation_authorization_board.json
  • receipts/first_wave/proof_derived_governed_mutation_authorization/proof_derived_governed_mutation_authorization_validation_receipt.json
  • receipts/first_wave/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle_validation_result.json
  • receipts/runtime_shell/demo_project/organs/proof_derived_governed_mutation_authorization/exported_governed_mutation_authorization_bundle_validation_result.json
  • result records/sign-off/first_wave/proof_derived_governed_mutation_authorization_fixture_acceptance.json

Current result record evidence records three proposals, three authorized synthetic mutations, three proof cells, six visible policy verdicts, two logged side effects, two rollback passes, three cold replay passes, no missing negative cases, private_state_scan.status=pass, and body_in_receipt=false for copied source source modules.

Negative Cases

The fixture rejects the eight named negative cases in core/fixture_manifests/proof_derived_governed_mutation_authorization.fixture_manifest.json: standing account secret authority, policy-after-execution, hidden governance-vote, live cloud account secret, irreversible mutation, unlogged side effect, consensus without evidence, and final-answer-only success.

These negative fixtures are the security argument. A proposal with impressive language, an admin-looking identity, hidden votes, post-hoc approvals, or a final answer that says it succeeded still fails unless the public evidence tables resolve to the authorization predicate.

Prior Art Grounding

The governed-mutation shape is grounded in admission-control and policy-as-code practice: a proposed state change is evaluated before it mutates the system, and the decision is separate from the actor's own assertion. The closest public anchors are Open Policy Agent, which separates policy decision-making from enforcement over structured input, and Kubernetes admission controllers, which validate or mutate API requests before persistence.

The rollback and side-effect portions are also adjacent to controlled rollout practice, including feature-flag and canary-launch patterns described by Martin Fowler. Microcosm keeps the pattern synthetic and replay-only: the component validates visible policy verdicts, side-effect logs, rollback result records, and cold replay without granting live mutation authority.

Public Scope

This component is a synthetic, public, source-open replay. It validates fixture and exported-bundle result records plus copied source bodies with digest provenance. The replay stays inside local files and does not use standing account secrets, access live cloud or account systems, use external model services, change source files, expose private proofs, expose policy-vote bodies, or claim benchmark safety.

Validation Result record Path

./repo-pytest tests/test_proof_derived_governed_mutation_authorization.py -q --basetemp=/tmp/microcosm_proof_derived_governed_mutation_authorization_pytest
./repo-python scripts/build_doctrine_projection.py --check-paper-module-corpus

Scope boundary

Scope limit

This paper module can claim backed reader wiring for the synthetic governed-mutation replay: component and mechanism subjects resolve, the runtime source locus is named, and diagram and atlas views are generated for this module. It cannot claim live mutation authority, standing account secrets, cloud or account access, irreversible approval, source-file changes permission, provider authority, proof-body export, benchmark safety, launch-scope decision, hosted deployment, publishing-scope decision, or whole-system correctness.

Fixture result records, exported-bundle result records, focused tests, and source-copy digests can support only the bounded replay claim: synthetic proposal admission, proof-cell refs, visible policy verdicts, side-effect logs, rollback result records, cold replay refs, negative cases, and body-hygiene behavior. The diagram and atlas views are navigation aids derived from the module bindings; they do not expand the proof boundary.

Source refs

Built from public source refs, with each input path recorded for provenance.

Each component has a stable public source path with commands, source links, and its supported scope.