This manual explains how to use Observer's Python-facing provider library.
It is written for projects that want Python-native test authoring while still exposing deterministic provider contracts to Observer.
The key framing is that Python is often the proxy language for verification, not the thing being verified.
That means a Python Observer test usually should express one real behavior of some other subject, rather than merely wrapping a large external process and treating script success as the only proof.
If you want the fastest path to the Python provider model, start in starter/ and run:
make list
make inventory
cat tests.inv
make run
make verifyThat shows the whole path in order:
- raw provider host discovery
- derived canonical inventory
- the exact execution contract Observer will run against
- a real suite execution with the human console
- hash and JSONL verification against checked-in expected artifacts
If your application already owns its CLI, use starter-embedded/ instead. That path keeps normal application behavior outside an explicit observe routing point.
This library is the Python-facing micro-SDK for Observer's provider contract.
It lets Python authors write tests with a human-first surface:
describe(...)test(...)it(...)expect(...)ctx.observe().metric/vector/tag
Underneath that surface, Observer still enforces its own rules:
- deterministic collection
- explicit canonical identity
- duplicate validation
- bounded observation
- standard provider host transport for
list,run, andobserve
Prefer this shape:
- use Python to author granular checks against the real subject under test
- give each target a meaningful identity tied to that real subject
- let Observer see those checks as separate verification units
Examples of good subjects for Python-authored Observer tests:
- a compiler invocation
- a generated manifest
- a CLI command contract
- an HTTP endpoint behavior
- a package install surface
Avoid this shape:
- one Python script orchestrates everything
- Observer only sees that script as one target
exit 0becomes the whole proof
That anti-pattern hides the real verification surface from inventory, reports, compare artifacts, and product certification.
from observer import collect_tests, describe, expect, test
def build_tests():
with describe("math"):
@test("adds two numbers")
def add(ctx):
ctx.stdout("ok\n")
expect(2 + 3).to_be(5)
tests = collect_tests(build_tests)If id is omitted, Observer derives the canonical identity from suite path plus test title.
Examples:
- suite path
pkg, titlesmoke test->pkg :: smoke test - duplicate derived title in the same suite ->
pkg :: smoke test #2
If a test needs a refactor-stable identity, give it an explicit id=:
@test("smoke test", id="pkg::smoke")
def smoke(ctx):
expect(True).to_be_truthy()That explicit id becomes both canonical name and target in this first cut.
Each test receives one context object.
The common author operations are:
ctx.stdout(...)ctx.stderr(...)ctx.fail(...)ctx.observe()
Telemetry remains observational and bounded:
observer = ctx.observe()
assert observer.metric("wall_time_ns", 104233.0)
assert observer.vector("request_latency_ns", [1000.0, 1100.0, 980.0])
assert observer.tag("resource_path", "fixtures/config.json")The first-cut matcher surface includes:
expect(value).to_be(expected)expect(value).to_equal(expected)expect(value).to_contain(expected)expect(value).to_match(pattern)expect(value).to_be_truthy()expect(value).to_be_falsy()expect(value).not_.to_be(...)
Failures become nonzero test exits with stderr evidence through the standard Observer host path.
The Python library owns the standard provider host transport.
That means the host surface is:
listrun --target <target> --timeout-ms <u32>observe --target <target> --timeout-ms <u32>
Direct host example:
from observer import collect_tests, describe, expect, observer_host_main, test
def build_tests():
with describe("pkg"):
@test("smoke test", id="pkg::smoke")
def smoke(ctx):
ctx.stdout("ok\n")
expect(True).to_be_truthy()
if __name__ == "__main__":
observer_host_main("python", collect_tests(build_tests))If a Python application already owns main(), it can expose an embedded observe namespace:
if len(sys.argv) > 1 and sys.argv[1] == "observe":
observer_host_dispatch_embedded("python", "observe", tests)
else:
app_main(sys.argv)That matches the same own-main integration pattern already used in the Rust and TypeScript surfaces.
The runnable starter in starter/ shows the real adoption path:
- write ordinary Python code under test
- author Observer tests in Python against the real behavior you want to verify
- expose those tests through a small provider host script
- derive canonical inventory through
observer derive-inventory - run a suite against that inventory
- verify canonical hashes and report output against checked-in snapshots
Read that as: Python is the test-authoring surface, but the subject is the underlying behavior of the ledger example, not "does a wrapper Python script run".
The failing companion in starter-failure/ keeps the same flow but preserves one intentional failure so the deterministic failure report path stays easy to inspect.
The runnable starter in starter-embedded/ shows the application-owned CLI path:
- keep normal app behavior in
main() - route
observe ...throughobserver_host_dispatch_embedded(...) - configure Observer with
args = ["observe"] - derive canonical inventory through
observer derive-inventory - run the same suite shape against the embedded provider boundary
- verify canonical hashes and report output against checked-in snapshots
That is the preferred model when a Python application already uses its own subcommands and should not give the top-level command namespace to Observer.
The SDK now carries a stdlib-only self-test suite under tests/:
python3 -m unittest discover -s lib/python/tests -p 'test_*.py'It also carries minimal packaging metadata in pyproject.toml, so the module can be installed directly from the repo copy when needed:
python3 -m pip install ./lib/python