ProductEngine: Architecture Design

This document is the definitive reference for the ProductEngine architecture. It describes what ProductEngine is, how it works, and why it works that way. Every decision recorded here is a decision, not a proposal.

---

1. What ProductEngine Is

ProductEngine is a Python framework for building hybrid desktop/web applications where:

All functionality is delivered by plugins. The framework provides infrastructure; plugins provide everything the user sees and does.
The framework is designed from the ground up for AI observability and control. An AI agent can inspect all application state as structured JSON and dispatch any action through the same system the user uses. No screenshots, no OCR, no screen scraping.
The same codebase runs as a desktop application (PyWebView wraps localhost) or as a cloud service (browser hits the server directly). The difference is configuration, not code.

Future north star (not a current goal): the plugin architecture could eventually underpin a desktop environment, where applications are plugin collections running in a shared runtime. This informs architectural decisions but is not a deliverable.

---

2. Runtime Stack

Layer	Technology	Role
ASGI server	Granian (Rust)	HTTP/2 native, ~3x Uvicorn performance, zero-copy file serving
Web framework	Custom ASGI micro-framework (`asgi.py`)	Routing, static file serving, SPA fallback, middleware, lifespan (zero external deps beyond msgspec)
Serialization	msgspec	Validation, JSON/MessagePack encoding (~15-30x faster than Pydantic for JSON)
Desktop shell	PyWebView	Native window wrapping localhost
Frontend framework	Svelte 5	Reactive UI components
Frontend tooling	Vite	Bundling, HMR during development
File watching	watchfiles	Plugin hot-reload detection

2.1 Why Granian

Granian was chosen over Uvicorn, Hypercorn, and Daphne for three reasons:

HTTP/2 is essential for dashboard apps. Browsers enforce a 6-connection-per-origin limit under HTTP/1.1. ProductEngine apps maintain multiple SSE streams alongside API requests and WebSocket connections. Under HTTP/1.1, SSE connections consume the connection budget and starve API requests -- the user sees a live-updating dashboard that freezes when they click a button because the API call is queued behind long-lived SSE connections. HTTP/2 multiplexing replaces this with ~100 concurrent streams over a single TCP connection. Uvicorn has no HTTP/2 support at all, which is disqualifying for this use case.

Scenario	HTTP/1.1	HTTP/2
3 SSE streams + 10 API calls	SSE consumes half the connection budget; API calls queue	All 13 requests share one connection; no queuing
Dashboard with live updates	SSE connections starve API requests	Multiplexed; no interference
Multi-window app (3 windows)	18 total connections possible (6 per origin per window)	3 connections, ~300 streams total

Performance. Granian is ~3x faster than Uvicorn in connection-heavy scenarios. The gains come from Rust-based HTTP parsing and connection handling, which is exactly the workload pattern ProductEngine targets. The pathsend ASGI extension enables zero-copy file transfers directly from kernel space, bypassing Python object marshaling entirely.

Active maintenance. Granian releases frequently (used in production by paperless-ngx, Reflex, SearXNG, Weblate, and companies including Microsoft, Mozilla, and Sentry). Hypercorn, the only other HTTP/2-capable ASGI server, is classified as inactive by Snyk. Daphne is Django-specific and slower than both.

2.2 HTTP/2 and TLS for Localhost

Browsers require TLS for HTTP/2 -- no major browser supports h2c (HTTP/2 over cleartext). This is a browser policy, not a protocol limitation. For cloud deployments, TLS is already standard. For localhost, locally-trusted certificates are required.

mkcert is the solution: it creates a local Certificate Authority, installs it in the system trust store, and generates certificates trusted by all browsers on the local machine. Self-signed certificates are not viable -- they can cause garbled binary output when HTTP/2 ALPN negotiation occurs with an untrusted cert, and require re-acceptance after browser updates or profile changes. mkcert certificates are indistinguishable from real CA-signed certificates.

The framework will provide a convenience command for generating mkcert certificates and will fail with a clear error when mkcert is not installed, rather than silently falling back to HTTP/1.1.

Granian configuration: --http auto negotiates HTTP/2 with TLS and falls back to HTTP/1.1 without. This is the default for ProductEngine's server module.

2.3 Why msgspec over Pydantic

msgspec is ~15-30x faster than Pydantic v2 for JSON encoding and ~6-15x faster for JSON decoding. These are not micro-benchmark artifacts -- the difference comes from immutable Structs with strict typing that enable pre-compiled optimized code paths and zero-copy optimizations for primitive types.

msgspec also generates JSON Schema 2020-12 (OpenAPI 3.1 compatible) and supports MessagePack/YAML/TOML natively. Since ProductEngine does not depend on FastAPI, Starlette, or any third-party web framework, there is no expectation of Pydantic compatibility. Users can still use Pydantic in their own application code alongside ProductEngine if they wish.

2.4 The BaseHTTPMiddleware Problem

Starlette's BaseHTTPMiddleware intercepts ASGI messages and asserts message["type"] == "http.response.body". When Granian sends other ASGI message types (like http.response.pathsend for zero-copy file transfers), BaseHTTPMiddleware crashes because it does not recognize them. This is a Starlette bug -- the ASGI spec allows extension message types, but BaseHTTPMiddleware assumes a fixed set.

Beyond the crash, BaseHTTPMiddleware also:

Silently terminates SSE connections: it wraps streaming responses in anyio.create_memory_object_stream(max_buffer_size=0). Under real-world I/O pressure, the anyio cancel scope fires CancelledError on the SSE generator, killing the connection. The server logs nothing -- the browser sees readyState: 2 (CLOSED) and reconnects. This is intermittent and nearly impossible to diagnose because it only manifests under load, not in tests.
Breaks context variable propagation: prevents contextvars.ContextVar changes from propagating upward through the middleware stack, disrupting any subsequent pure ASGI middleware.
Is a hidden dependency: third-party middleware that extends BaseHTTPMiddleware reintroduces the problem even if all your own middleware is pure ASGI. One BaseHTTPMiddleware in the stack breaks SSE for all endpoints.

Starlette's maintainers have deprecated BaseHTTPMiddleware and plan to remove it in Starlette 1.0. ProductEngine uses a custom ASGI micro-framework (asgi.py) with no Starlette dependency and pure ASGI middleware throughout, making this entire class of bug structurally impossible.

2.5 Deployment Modes

Two deployment modes exist. Both use the same server, the same frontend, and the same plugins. (The original STACK.md evaluation also described a third "hybrid" mode -- local PyWebView frontend backed by both a local and a cloud ProductEngine server for offline-first sync. That mode remains a future possibility but is not a current deliverable.)

Mode	How the user accesses it	What differs
Desktop	PyWebView opens a native window pointing at `localhost`	Window chrome is native OS; server lifecycle is tied to the window
Cloud	Browser navigates to the server's URL	No PyWebView; server runs independently; authentication required

The framework detects the mode at startup and adjusts behavior accordingly (e.g., PyWebView lifecycle hooks in desktop mode, auth middleware in cloud mode). Plugin code is identical in both modes.

2.6 Module Layout

The Python backend is split into focused modules under src/productengine/:

Module	Responsibility
`app.py`	Routes, ASGI lifespan, middleware wiring
`dispatch.py`	Command dispatch engine (resolution, execution, sub-command expansion, SSE broadcast)
`ai_api.py`	AI surface endpoints (`/ai/query`, `/ai/dispatch`, `/ai/manual`)
`debug.py`	Debug panel (`/_debug`)
`types.py`	Typed contracts (`CommandResult`, `SubCommand`, `AnimateSpec`, `ToastSpec`, etc.)
`config.py`	Environment-variable configuration
`constants.py`	Centralized constants (timeouts, buffer sizes, file patterns)
`errors.py`	Error ring buffer for runtime error tracking
`snapshots.py`	State snapshot save/restore
`state.py`	State store, changelog ring buffer, versioning
`registry.py`	Plugin and command registry
`metacommands.py`	Meta-command handler implementations
`sse.py`	SSE broadcast infrastructure
`auth.py`	Authentication middleware and token store
`tokens.py`	Design token system and themes
`scene.py`	Scene graph building
`asgi.py`	Custom ASGI micro-framework
`server.py`	Granian server entry point

---

3. Plugin System

3.1 What a Plugin Is

A plugin is a directory containing:

**plugin.toml** -- the manifest declaring name, version, commands, state schema, dependencies, and handler language
Handler file(s) -- pure functions implementing commands (Starlark; Python and Lua are planned/future)
**ui.json** (optional) -- SDUI component tree describing the plugin's UI
Any other assets the plugin needs (CSS files, static images, data files)

A plugin directory is self-contained. Moving it into plugins/ installs it. Deleting it uninstalls it. Pushing it to a git remote publishes it.

3.2 Everything Is a Plugin

There is no distinction between "core" and "user" plugins. All application functionality -- including things that feel like framework features (theming, settings panels, system panels) -- are plugins. The framework itself is only:

The plugin loader and registry
The command dispatcher
The state store
The SDUI renderer
The AI surface endpoints
The git-managed persistence layer

Everything else is a plugin. If it has UI, state, or commands, it is a plugin.

3.3 Plugin Lifecycle

Discovery: scan plugins/ for directories containing plugin.toml
Dependency resolution: topological sort with cycle detection; fail fast on cycles with a clear error naming the cycle
Loading: parse manifest, load handlers via the appropriate language adapter, register all contributions (commands, state, UI, input bindings, styles)
Hot-reload: watchfiles detects changes to any file in a plugin directory; the framework unregisters old contributions, reloads the handler module, re-parses the manifest (in case it changed), and re-registers contributions; state survives reload because state lives in the framework, not the plugin
Runtime creation: an AI agent (or user) creates a new plugin directory with a manifest and handler files; the hot-reload watcher picks it up automatically and loads it as if it had been there at startup

3.4 No Special Treatment

All plugins are readable and writable. The AI can modify any plugin's files -- manifest, handlers, UI definition, styles. There is no read-only/writable distinction, no "runtime overlay," no "base vs. extension" hierarchy. Every plugin is equal. The framework does not protect any plugin from modification.

This is a deliberate choice. Protection mechanisms add complexity that conflicts with the goal of full AI control. If a plugin should not be modified, that is a policy enforced by the AI agent or the user, not by the framework.

---

4. Commands

4.1 Command Model

Every action that modifies state is a command. Commands are declared in plugin manifests and implemented as pure handler functions.

Handler contract:

(state: dict, params: dict, ctx: DispatchContext) -> CommandResult

Handlers return a CommandResult struct (defined in types.py) with fields: state, result, sub_commands, animate, toast, tokens_changed, broadcast_full_state. All fields are optional (defaulting to None / False). Starlark handlers may return a plain dict; the adapter layer converts it to a CommandResult.

Handlers are stateless. They receive the current state, the command parameters, and a DispatchContext carrying identity and correlation metadata (see section 18). They return the new state. All mutable state lives in the framework's state store, not in the handler module. This makes hot-reload safe (no in-module state to lose) and AI introspection complete (the framework sees all state).

4.2 Meta-Commands

The framework provides built-in meta-commands for modifying the system itself:

Meta-command	What it does
`set_ui`	Replace a plugin's UI tree
`register_command`	Add a new command (via macro or Starlark-in-payload)
`unregister_command`	Remove a command
`set_input_bindings`	Change input bindings for a plugin
`set_styles`	Apply CSS rules to a plugin
`pe.animate`	Trigger an emphasis animation on a target element (see section 15)

Meta-commands go through the same dispatch system as regular commands. They modify plugin files on disk, and git tracks the changes. There is no separate "admin API" -- meta-commands are commands.

All meta-command handlers signal errors by raising ValueError. The dispatch layer (dispatch.py) catches exceptions, logs them, records the error in the error ring buffer (errors.py), broadcasts an error toast via SSE, and returns {"error": str(exc)}.

4.3 Sub-Command Expansion

Handlers can return sub_commands in their CommandResult. The dispatch layer (dispatch.py) executes these sequentially through the normal pipeline, collecting results into a steps array. If any sub-command fails, expansion stops and the partial steps are returned with the error. This replaces the old macro system with handler-driven expansion.

4.4 Runtime Command Registration (Starlark-in-Payload)

Commands can be registered at runtime without pre-existing handler files via Starlark-in-payload: the dispatch sends Starlark source code as a string in the command params. The framework validates the code for syntax errors (_validate_code in metacommands.py), writes it to the plugin's handler file on disk, and registers the resulting function as a command. This is safe because Starlark is hermetically sandboxed -- no filesystem access, no network, no imports beyond what the framework explicitly provides.

4.5 Batch Dispatch

Commands dispatch progressively (streaming model):

Execute one at a time, in order
Each command's result is rendered immediately via SSE to the frontend
The user watches changes happen in real-time
If a command fails, the AI decides how to proceed: fix the issue and continue, dispatch compensating commands to undo partial changes, or abort
Git tracks the full sequence; git revert is available for manual rollback of any individual command

This is better than all-or-nothing transactions. The user sees progress. The AI can adapt to errors mid-stream. And the git history provides a complete, revertible record of every step.

---

5. State

5.1 State Store

The framework holds per-plugin state. Plugins declare their state schema in the manifest, including field names, types, and default values. The framework:

Initializes each plugin's state from the declared defaults at load time
Provides get(plugin, key), set(plugin, key, value), and get_all(plugin) operations
Passes the full state dict to handlers and accepts the new state dict from their return value

State is the single source of truth for all application data. Handlers read it, transform it, return new state. The framework persists it. No plugin holds its own mutable state.

5.2 State Versioning

Every state mutation increments a monotonic version number (global, not per-plugin). This supports:

Diff-based state queries: "what changed since version N?" -- the AI can poll efficiently without re-reading the entire state tree
Stale-read detection: if a command dispatch includes an expected version and the current version has advanced, the framework can reject or warn
Git history correlation: version numbers map to git commits, providing a bridge between the in-memory state timeline and the persistent history

---

6. AI Surface

The full design lives in todo/.done/agentic-perception.md; this section is the architectural overview. CLAUDE.md carries the working summary tuned for in-session reference.

6.1 Design Principle

The AI is not a chatbot or inline assistant. It is an external agent that connects to the same backend the human uses, via dedicated HTTP endpoints. Two principles govern the surface:

Agentic search, not push-state. The AI does not receive a 20 KB blob describing every plugin and its state. It fetches a small master prompt that maps every queryable surface, then drills agentically using whatever query language fits each surface. This mirrors how code-search agents work over a filesystem, applied to product state and rendered perception.
One dispatch engine, two interfaces. The Svelte SDUI frontend fires commands via POST /api/commands/{name}; the AI fires the same commands via POST /ai/dispatch. Both converge at dispatch_command() in dispatch.py. Anything a human button can do, the AI can dispatch programmatically. The framework IS the agent interface; there is no separate adapter layer.

Reads and writes are different verbs. Queries are about finding; dispatch is about doing. The protocol reflects this: a queryable surface model on the read side and a command-shaped dispatch on the write side.

6.2 Endpoints

Endpoint	Method	Purpose
`/ai/manual`	GET	Composed master prompt as `text/markdown`. One H1 per plugin, one H2 per surface, with each surface's query language, freshness strategy, byte budgets, and recipes. ETagged on the global hash; warm sessions return 304.
`/ai/query`	POST	Multi-surface read query. Always-array body `[{plugin, surface, query, target_hint?}, ...]`. Each entry is dispatched to the evaluator declared by its surface (`jq`, `css`, or `sql`). Returns one envelope per query: `{result, meta: {complete, truncated_at, total_estimate, last_updated, version_hash, target_rect}}`. `?live=1` forces a fresh compute (bypassing the cache) for browser-resident surfaces. `Accept: application/x-ndjson` opts into streaming for legitimate big queries.
`/ai/dispatch`	POST	Always-array body `{"commands": [...], "rendering"?, "changeset"?}`. Returns `{results, state, batch_id, commands_executed}`. The same `dispatch_command()` engine human button clicks use.
`/api/state/stream`	GET (SSE)	Real-time event stream. The AI subscribes for state changes, plus the perception-specific events (`perception_invalidate`, `perception_query`, `target_rect`, `perception_active`) that drive the eye + shimmer collaboration UI.

6.3 Surfaces

Every queryable read surface is registered at the resolver definition site via the surface(...) wrapper from productengine.surface. Each declaration carries:

Name and plugin -- queries address surfaces by (plugin, surface).
Query language -- jq for tree-shaped JSON, css for layout/DOM-shaped data, sql for tabular/log-shaped data.
Freshness strategy -- required field, no default. One of dirty_timer(window_ms), every_dispatch, lazy, CustomFreshness(callable). (Continuous is reserved; the factory raises NotImplementedError until the scheduler branch is wired.)
Soft and hard byte budgets -- soft truncation kicks in over soft_budget_bytes with meta.complete: false; hard rejection over hard_budget_bytes returns 413 with suggested_narrowings from the surface's recipes.
Resolver function -- (state, params, ctx) -> {result, target_rect}. May return the BROWSER_RESIDENT sentinel for surfaces whose data lives in the human's browser; the perception cache handles the round-trip.
Master prompt fragment -- content_md markdown body, optionally pulled from a sibling file via recipes_from. Composes into /ai/manual.

Plugins declare their own surfaces. The framework provides no cross-plugin aggregator; the AI fans out across plugins itself, optionally batching multiple (plugin, surface, query) entries in a single POST /ai/query array.

6.4 Framework Surfaces (the `pe` plugin)

Surface	Lang	Freshness	What
`pe.state`	jq	`every_dispatch`	Plugins, commands, state, ui, bindings, gesture_schemas, styles, scene, tokens, preferences, errors, client_logs, snapshots, feedback, version. The replacement for the retired `GET /ai/state` endpoint.
`pe.view`	css	`dirty_timer(2 s)`	Layout snapshot of the rendered page (browser-resident). Per-node fields: `tag, id, classes, attrs, rect, text, bg, fg, font, z, overflow, position, display, visible, children`.
`pe.a11y`	jq	`dirty_timer(2 s)`	Accessibility tree (browser-resident). jq, not CSS, because the records are `{role, name, states, rect, children}` -- not DOM-shaped.
`pe.anomalies`	jq	`dirty_timer(2 s)`	Heuristic UI scan (browser-resident). Five heuristics: zero-area, off-screen, parent-overflow clipping (uses the real `overflow` field on the snapshot), z-index conflicts, low contrast. Each anomaly carries its own `target_rect` so the shimmer paints every suspect at once.
`pe.components`	jq	`dirty_timer(2 s)`	Per-instance SDUI component self-reports (`{status, error, frame_count, summary}`). Mounted-outside-the-tree framework chrome (`AIEye`, `AIShimmer`) is intentionally excluded.
`pe.console`	sql	`lazy`	Browser `console.error`/`warn` ring buffer (server-resident, fed by `/api/console_log`).
`pe.mutations`	sql	`lazy`	DOM mutation diff stream tagged with the current dispatch_id (server-resident, fed by `/api/mutation_log`).
`pe.screenshot`	jq (passthrough)	`lazy`	Partial PNG screenshot (browser-resident). The query is `selector:<css>` or `rect:x,y,w,h`; the resolver receives it via `passthrough_query=True`.
`pe.log`	sql	`lazy`	Backend log tail.
`pe.network`	sql	`lazy`	Backend HTTP request log (timing, status, identity).
`pe.changelog`	sql	`every_dispatch`	Dispatch history for "what just happened" queries.

Browser ingestion endpoints (/api/console_log, /api/mutation_log, /api/perception_response) carry browser-resident data back to the server cache; they are not AI-facing.

6.5 Freshness Model

Each surface declares its own freshness strategy at registration:

**dirty_timer(window_ms)** -- timer is set to window_ms on every dispatch, user input, or AI query. It decrements toward 0; the warm fires at the 0-edge. 0 is the hot moment (re-compute now); the window value is the dirty/expecting-more-changes state. The default window comes from PE_PERCEPTION_DIRTY_TIMER_MS (2000 ms). Used for layout-class surfaces where reads should be cheap and reasonably current.
**every_dispatch** -- warm only on dispatch boundaries; no idle timer. Used for surfaces that change in lockstep with command actions (pe.state, pe.changelog).
**lazy** -- never warm; compute only on AI query. Used for ring-buffer-backed surfaces where reading the buffer is itself the cheapest path (pe.console, pe.mutations, pe.log, pe.network) and for opt-in expensive computations (pe.screenshot).
**CustomFreshness(fn)** -- plugin-provided callable returning one of the other strategies; for conditional logic ("lazy while a canvas is animating, dirty_timer otherwise").

The cache is a hybrid eviction store: history ring buffer keeps last N or last T minutes of snapshots, whichever bound trips first. AI queries can read the latest cached value (paying nothing for "still fresh") or escalate with ?live=1 to force a fresh compute. Each cached entry carries a last_updated timestamp; the AI's "acceptably recent" decision is its own.

6.6 Visual Collaboration

Two SSE-driven UI elements signal AI activity to the human user:

**Corner eye (AIEye.svelte)** -- a single global icon that opens when an /ai/query request begins and closes after a brief idle. Driven by perception_active SSE events. Defense-in-depth EYE_IDLE_MS timer in the frontend ensures the eye closes even if the server skips the trailing perception_active(false).
**Positional shimmer (AIShimmer.svelte)** -- an animated outline traces whatever rectangle the AI just queried. Driven by target_rect SSE events, with multi-rect support (one shimmer per matched element). Resolvers always return a target_rect (single, list, or null); helpers target_from_selector, target_from_rect, target_from_plugin_root make this a one-liner. The AI may override via target_hint in the query envelope.

Together they make AI activity legible: the human watches a luminous trail across the UI as the AI works.

6.7 Budget and Errors

A four-tier model keeps queries bounded without poisoning batches:

AI self-regulation -- the master prompt declares soft_budget_bytes per surface; the AI narrows preemptively.
Soft truncation -- responses larger than soft_budget_bytes are truncated mid-data; envelope carries meta: {complete: false, truncated_at, total_estimate}.
Hard rejection in-band -- per-query violations over hard_budget_bytes return {result: null, error: {code: "payload_too_large", suggested_narrowings: [...]}} inside the response array (HTTP 200). One bad query in a batch does not poison the others.
Aggregate hard rejection -- only when the cumulative response exceeds AI_QUERY_GLOBAL_HARD_BYTES does the endpoint return top-level 413. Opt-in NDJSON streaming (Accept: application/x-ndjson) bypasses both budgets for legitimate big queries.

6.8 Acknowledgment Loop

The dispatch path retains its full round-trip confirmation. The AI dispatches a change, the server applies it, SSE pushes the update to the frontend, the frontend renders and POSTs an acknowledgment, and the server includes that ack in the dispatch response (with a configurable timeout). The AI always knows whether its changes actually rendered. The /ai/query path does not use this loop; reads do not mutate state.

---

7. Server-Driven UI (SDUI)

7.1 Model

Plugins describe their UI as JSON component trees in ui.json. The framework's frontend has a recursive SDUI renderer that interprets these trees and produces real DOM elements. Plugins never send executable code to the frontend -- only data.

This means:

The AI can read and modify any plugin's UI by reading and writing JSON
UI changes are just data changes, tracked by git like everything else
The frontend is a universal renderer, not a collection of bespoke plugin UIs

7.2 Component Vocabulary

The framework ships a fixed set of 30+ built-in component types. If a component type does not exist in the vocabulary, it cannot be used. There is no runtime extensibility of the component vocabulary. This keeps the system controllable and predictable -- the renderer knows every possible component type at build time.

Category	Components
Layout	column, row, grid, stack, spacer, divider, tabs, accordion, split-pane
Display	text, heading, badge, icon, image, code-block, markdown, progress-bar, spinner
Input	button, text-input, number-input, checkbox, toggle, select, slider, color-picker, date-picker
Data	list, table, tree-view, key-value
Canvas	canvas (2D scene graph renderer -- see section 8)
Feedback	alert, toast, modal, tooltip, popover

The specific props, behavior, and styling of each component will be defined in a component specification. The categories and component names listed here are the vocabulary.

7.3 Expression Syntax

Component props support ${expression} for data binding. Expressions are evaluated by a custom parser -- never by eval() or equivalent. The expression language is intentionally limited:

Expression	Meaning
`${state.fieldName}`	Read from the plugin's state
`${item.fieldName}`	Current item in a list iteration
`${len(array)}`	Array length
`${if(condition, then, else)}`	Conditional value

The expression language supports property access, function calls from a fixed allowlist, and basic comparison operators. It does not support assignment, loops, or arbitrary computation. It is a data-binding language, not a programming language.

7.4 Actions

Buttons and interactive components trigger commands through the same dispatch system the AI uses:

{"type": "button", "props": {"label": "Delete",
  "command": "remove_node", "params": {"id": "${item.id}"}}}

User clicks and AI dispatches are identical. The SDUI renderer translates a button click into a command dispatch to the same endpoint the AI calls. There is one command system, not two.

---

8. Scene Graph (Canvas)

8.1 Separate from SDUI

The canvas is a special SDUI component type that renders a 2D scene graph. It has its own data model separate from the layout component system because:

Layout components are DOM elements positioned with CSS flexbox/grid
Canvas content is SVG or WebGL with custom rendering: shapes, hit-testing, coordinate transforms, zoom, pan

The canvas component is an SDUI component (it appears in ui.json), but its children are scene graph elements, not other SDUI components.

Rendering constants (node radius, colors, stroke widths, etc.) are shared between the Python backend and the TypeScript frontend via scene_defaults.json at the project root. Both sides import from this single file to keep visual defaults in sync.

8.2 Scene Graph Schema

The scene graph describes visual elements as data in the plugin's state. It is not a separate data store -- it is regular plugin state that the canvas component knows how to render.

Element type	Properties
circle	cx, cy, radius, fill, stroke, stroke-width, opacity
rect	x, y, width, height, rx, ry, fill, stroke, stroke-width, opacity
line	x1, y1, x2, y2, stroke, stroke-width, opacity
path	d (SVG path data), fill, stroke, stroke-width, opacity
polygon	points, fill, stroke, stroke-width, opacity
text-label	x, y, text, font-size, font-family, fill, text-anchor
group	children (list of elements), transform (translate, rotate, scale)
layer	children, visible, opacity, name

Each element also supports:

Interaction regions: hit-test areas that may differ from the visual bounds (e.g., a wider hit area for thin lines)
Drag handles: named points on the element that can be dragged, mapped to commands via input bindings

8.3 Addressability

Every scene graph element has a stable, deterministic ID derived from the data (e.g., node-{id} for graph nodes, edge-{source}-{target} for graph edges). This enables:

The AI can reference specific visual elements by ID
Styling rules can target specific elements or element types
Input bindings can reference elements by ID, type, or group membership

---

9. Input Bindings

9.1 Declarative Bindings

Input handling is declared as data, not hardcoded in event handlers. Each binding maps a named gesture to a command:

{"gesture": "click", "target": "canvas_empty",
  "command": "add_node", "params": {"x": "${event.x}", "y": "${event.y}"}}

Bindings are declared in the plugin manifest or in a dedicated bindings file within the plugin directory. They are data, so they are readable and writable by the AI, tracked by git, and modifiable at runtime.

9.2 Named Gesture Primitives

The framework defines a fixed vocabulary of gestures. Each gesture type has a known phase schema defining what context variables are available in expressions.

Category	Gestures
Pointer	click, double_click, right_click, drag (start/move/end phases), long_press, hover_enter, hover_leave
Keyboard	key_press (with modifiers: ctrl, shift, alt, meta), key_combo (chords like ctrl+shift+z)
Scroll	scroll (with delta_x, delta_y), pinch_zoom (with scale)
Selection	select, deselect

Each gesture type provides a fixed context schema:

Gesture	Context variables
click	event.x, event.y, event.target_id, event.target_type
drag	event.start_x, event.start_y, event.current_x, event.current_y, event.delta_x, event.delta_y, event.target_id, event.phase
key_press	event.key, event.ctrl, event.shift, event.alt, event.meta
scroll	event.delta_x, event.delta_y, event.x, event.y
select	event.target_id, event.target_type

9.3 Runtime Modification

Input bindings are modifiable at runtime via meta-commands (set_input_bindings and pe.set_input_bindings). The AI can rebind what right-click does, add keyboard shortcuts, change drag behavior. Changes persist to disk like everything else.

---

10. Styling and Addressability

10.1 AI-Driven Styling

Every visual element in the UI is addressable via stable, deterministic selectors. The framework enforces addressability -- the SDUI renderer emits DOM elements with data attributes:

Attribute	Example	Purpose
`data-plugin`	`data-plugin="graphist"`	Identifies which plugin owns the element
`data-component`	`data-component="text"`	Identifies the SDUI component type
`data-role`	`data-role="title"`	Semantic role within the plugin's UI
`data-id`	`data-id="node-abc123"`	Unique element identifier from the data

The AI can:

Read current computed styles of any element via the AI surface
Generate and apply CSS rules targeting any combination of these selectors
Respond to user descriptions ("this text is too small") by inspecting visual properties and applying fixes
Create entirely new themes (dark mode, high contrast) by generating a complete stylesheet

10.2 Design Token System

The framework provides a design token system (tokens.py) that maps named tokens to CSS custom property values. Tokens cover colors, spacing, typography, borders, shadows, and animation timing.

Default tokens: a comprehensive set of --pe-<token-name> custom properties (e.g., --pe-color-primary, --pe-spacing-md, --pe-font-body)
Themes: built-in dark (default) and light themes that override color-related tokens. The active theme is switched via the pe.set_theme meta-command.
Per-token overrides: individual tokens can be overridden at runtime via the pe.set_tokens meta-command, merging on top of theme defaults.
API endpoint: GET /api/tokens returns the resolved token set and active theme name. Token changes are broadcast to the frontend via tokens_changed SSE events.

The AI can also write CSS directly via pe.set_styles. The token system provides consistent defaults; direct CSS provides full expressive power when needed.

10.3 Style Persistence

Style changes are written to CSS files in the plugin directory and tracked by git. The framework loads plugin CSS files and injects them into the page. When the AI modifies styles, the changes are files on disk, not ephemeral runtime state.

---

11. Persistence and Git

11.1 Model

Every change to any plugin's files is automatically committed to git. The framework manages git operations transparently. Neither humans nor AIs interact with git through the framework -- the framework handles it. (Users and AIs can still use git CLI directly for advanced operations like branching or rebasing.)

11.2 What This Enables

History: full undo/redo via git log; every state of the application is recoverable
Diffing: see exactly what changed between any two points in time
Rebasing: when an upstream plugin updates, rebase local modifications on top
Collaboration: multiple agents' changes are tracked with attribution (commit author identifies the agent)
Publishing: push a plugin's repo to a git host to share it
Installation: clone a repo into plugins/ to install a plugin
Store: GitHub (or any git host) IS the plugin store; a plugin's repository has plugin.toml as its manifest; discovery is git search

11.3 Auto-Commit Behavior

Every successful command dispatch that modifies files triggers an auto-commit
Commit message includes: the command name, the parameters (truncated if large), and the source (user, AI, or specific agent ID)
Batch dispatches produce one commit per command (progressive, matching the streaming dispatch model)
Commits are made to the current branch; the framework does not create branches automatically

11.4 Staging Safety

Auto-commits must use specific file staging, never blanket commands like git add -A or git add .. The framework stages only files matching known patterns:

*.toml -- plugin manifests
*.star -- Starlark handlers
*.json -- UI definitions, state snapshots
*.css -- plugin stylesheets

Before any commit, the framework must verify that nothing is already staged in the index. If there are already-staged changes (from another session, a manual git add, or any other source), the auto-commit aborts with an error rather than committing someone else's work. This is a hard blocker, not a warning.

11.5 Repository Hygiene

The plugins git repository includes a .gitignore that excludes common junk files:

__pycache__/
*.pyc
.DS_Store
*.swp
*.swo
*~
.pytest_cache/
*.egg-info/

---

12. Handler Languages

12.1 Handler Language

Plugin handlers are written in Starlark. The loader enforces this: any plugin with a language field other than "starlark" is rejected at load time. Python and Lua support are planned for the future but not yet implemented.

Language	Runtime	Adapter	Status
Starlark	starlark-pyo3	Starlark evaluation, function extraction, dict conversion	Implemented
Python	CPython (host interpreter)	Direct import, function reference	Planned
Lua	LuaJIT via lupa	Lua state creation, function extraction, dict/table conversion	Planned

12.2 Why Starlark

12.3 Starlark-in-Payload

When the AI registers commands at runtime, it can send Starlark source code in the dispatch payload. The framework:

Receives the Starlark source string
Writes it to the plugin's handler file on disk
Compiles and validates it
Registers the resulting function as a command
Git commits the new handler file

This is safe because Starlark has no I/O, no imports, no network access, and no side effects beyond what the framework explicitly exposes through the handler contract.

---

13. Consumers

The following applications will be built on ProductEngine, validating the framework under real workloads:

Consumer	Domain	Key challenges it validates
Graphist	Graph editor	Canvas/scene graph, interactive commands, real-time updates, drag interactions
VisualClaude	Claude Code session manager	Dashboard panels, SSE streams, system integration, complex state
PixelWeaver	Pixel art editor	Rich canvas with pixel-level manipulation, animation timeline, complex plugin ecosystem
TripPlanner	Travel planning and booking	Forms, data tables, external API integration, multi-step workflows
Supervisor	Multi-repo git management	DevOps tooling, background tasks, complex state machines, system-level integration

Graphist is the first consumer and serves as the MVP test bed. It is simple enough to build fast and complex enough to exercise the full architecture: plugin loading, command dispatch, state management, canvas rendering, AI integration, and hot-reload.

---

14. Open Questions

These questions will be resolved during implementation. They are recorded here so that decisions are made deliberately, not by accident.

Question	Context	Leading direction
Component vocabulary specifics	The 30+ components are categorized but specific props and behavior for each need to be designed	Design props during Graphist development; document as a component spec
Scene graph schema details	Shape types and visual properties are listed but interaction region model needs specification	Define during canvas implementation for Graphist
Expression language limits	What functions beyond `len()` and `if()` should be available?	Start minimal, add functions only when real plugins need them
Cross-plugin state access	Can plugin A read plugin B's state? Via command dispatch only, or directly?	Read access is direct (state is framework-owned); writes are via command dispatch only
Plugin dependency versioning	Name-only dependencies or semver constraints?	Start with name-only; add semver when the plugin ecosystem is large enough to need it
Concurrent agent coordination	When two AI agents modify the same plugin simultaneously, how are conflicts resolved?	Git merge conflicts surface the issue; the framework does not attempt automatic resolution
Auth system design	Token vs session vs OAuth? Single identity store or federated?	Simple token/API key for v1; design will evolve based on deployment mode requirements (see section 17)
Animation performance	CSS transitions vs JavaScript animation vs Web Animations API?	CSS transitions for simple property changes; Web Animations API for complex sequences; benchmark during implementation
Code-in-payload validation	Should Starlark sent via dispatch be compiled/validated before writing to disk?	Implemented in `metacommands.py _validate_code`; Starlark code is parsed before writing to disk
Changelog mechanism for diff-based queries	How to efficiently track state deltas for the `?since=N` query approach?	Implemented as a ring-buffer changelog in `state.py`; `get_changelog` and `get_state_diff` are wired into the `?since=N` query parameter in `ai_api.py`

---

15. Animation and Transitions

15.1 Animation Categories

All animations fall into one of four categories:

Category	Purpose	Available animations
Enter	Element appears in the UI	`none`, `fade_in`, `scale_in`, `pop`, `slide_in` (with `from` direction), `draw` (for lines/paths), `grow` (for bars)
Exit	Element leaves the UI	`none`, `fade_out`, `scale_out`, `slide_out` (with `to` direction), `shrink`, `collapse`
Transition	Smooth property changes between states	`interpolate` (default behavior for state-driven changes)
Emphasis	Attention-getters that do not change state	`pulse`, `shake`, `glow`, `flash`, `bounce`, `ring`

Enter and exit animations play when elements are added to or removed from the component tree. Transition animations play automatically when state-driven property values change (position, size, color, opacity). Emphasis animations are triggered explicitly and do not modify state -- they draw attention to an element and then stop.

15.2 Easings

Easing	Behavior
`linear`	Constant speed
`ease_in`	Slow start, fast end
`ease_out`	Fast start, slow end
`ease_in_out`	Slow start, slow end
`bounce`	Overshoots and bounces back
`elastic`	Spring-like overshoot
`step`	Discrete jumps (no interpolation)

15.3 Timing

Property	Type	Purpose
`delay_ms`	integer	Wait before starting the animation
`duration_ms`	integer	How long the animation runs
`stagger_ms`	integer	Offset between children in a group or list (each child starts `stagger_ms` after the previous)

15.4 Two-Layer Declaration

Animation behavior is declared at two layers. The second layer overrides the first.

Layer 1 -- UI defaults: the transitions property on any component in ui.json defines default enter, exit, and transition animations for that component. These are static defaults that apply unless overridden.
Layer 2 -- Dispatch rendering metadata: when a command is dispatched, the response can include rendering metadata that overrides the UI defaults for that specific operation. This allows the backend to control how a particular state change is presented, independent of the component's default animation.

Emphasis animations are not part of either layer. They are triggered explicitly via the pe.animate meta-command, which targets a specific element by ID and plays the named emphasis animation.

15.5 Batch Rendering Modes

When commands are dispatched in batches, the framework controls how results are rendered:

Mode	Behavior
Progressive	Each command's result renders immediately as it completes. The user sees changes happen in real-time. This is the default.
Atomic	All results are buffered and rendered simultaneously when the entire batch completes. The UI updates once.
Grouped	Subsets of commands within the batch are designated as atomic groups or progressive sequences. Mixed rendering within a single batch.

Rendering mode is controlled by the backend via dispatch rendering metadata, not by frontend settings. The backend decides how a batch should be presented. Individual commands within a batch can include per-command rendering overrides (e.g., "render this command's result atomically with the next two commands, but render the rest progressively").

---

16. Distribution Pipeline

Desktop mode requires packaging the Python backend, frontend assets, and runtime dependencies into a native application. The following stack handles freezing, packaging, updates, and system integration.

Layer	Tool	Purpose
Freezing	PyInstaller / Nuitka	Bundle Python + dependencies into a standalone executable. Nuitka compiles Python to C for 2-4x faster runtime but longer build times.
Native installers	Briefcase	Produce .deb, .msi, .dmg from frozen bundle
Auto-updater	tufup	Cryptographically secure delta updates built on the TUF framework
System tray	pystray	System tray icon with menu; integrates via `run_detached()`
Notifications	desktop-notifier	Native OS notifications with interactive buttons and callbacks
Localhost TLS	mkcert	Locally-trusted certificates for HTTP/2 on localhost (see section 2.2)

PyWebView already provides file dialogs, native menus, window management, clipboard access, and drag-and-drop. These do not need separate tooling.

Native packaging cannot be cross-compiled. CI/CD requires a GitHub Actions matrix build with runners for each target platform (Linux: .deb/.AppImage, macOS: .dmg with notarization, Windows: .msi with code signing).

---

17. Authentication and Identity

17.1 Why Authentication Is Needed

Authentication exists for attribution, not access control. Git commits must attribute changes to specific users and agents. Without identity, all commits appear to come from the same source, making collaboration history meaningless.

17.2 Design Principles

Both humans and AI agents authenticate via the same system. There is no separate "agent auth" -- an agent is just another authenticated identity.
Each dispatch includes the caller's identity. The framework uses this identity to set the git commit author on auto-commits.
Required for meaningful commit source tracking in collaborative scenarios (multiple users, multiple agents, or mixed human/agent workflows).

17.3 Implementation

Token store: tokens are UUID4 strings stored in .productengine-auth.json at the project root. Each token maps to an identity with name and type fields. On first run, an admin token is generated and printed to stdout for bootstrapping.

AuthMiddleware: a pure ASGI middleware (no BaseHTTPMiddleware) that extracts Bearer tokens from the Authorization header and sets an identity dict on the ASGI scope. It does not reject unauthenticated requests -- it sets the identity to anonymous. In desktop mode, unauthenticated requests from loopback addresses (127.0.0.1, ::1) are automatically attributed to local-user with type human, since PyWebView requests have no token.

Endpoints:

Endpoint	Method	Purpose
`/api/auth/token`	POST	Create a new token (requires existing valid token)
`/api/auth/whoami`	GET	Return the current request's identity
`/api/auth/token/{token}`	DELETE	Revoke a token

Token creation requires authentication -- the admin token bootstraps the first session, and new tokens can only be created by authenticated callers. This prevents unauthenticated users from minting their own identities.

---

18. DispatchContext

18.1 Purpose

Every command dispatch carries a DispatchContext -- an explicit context object that threads identity and correlation metadata through the entire dispatch chain. It replaces the thread-local set_dispatch_source / get_dispatch_source mechanism that was used initially.

Thread-locals are problematic in async Python: coroutines sharing an event loop can observe each other's thread-local state, and context propagation across await boundaries is fragile. An explicit context object passed as a function argument is unambiguous -- there is no question about which identity applies to which dispatch, even when multiple dispatches are in flight concurrently.

18.2 Structure

Field	Type	Purpose
`identity`	dict	The authenticated caller (`{"name": "claude-agent", "type": "ai"}`)
`dispatch_id`	int	Monotonic ID for correlating dispatch with SSE events and acknowledgments
`batch_id`	str or None	Shared ID across all commands in a batch dispatch (the "changeset" ID)
`timestamp`	datetime	When the dispatch was initiated

Derived properties: source returns the identity type (ai, human, system, unknown), and author returns the identity name (used as git commit author).

A singleton SYSTEM_CTX exists for non-request-driven operations (file watcher triggers, startup initialization) where there is no HTTP request to extract identity from.

18.3 Handler Contract

Handlers receive the context as their third argument:

(state: dict, params: dict, ctx: DispatchContext) -> CommandResult

This makes identity available to any handler that needs it (e.g., for audit logging or conditional behavior based on whether the caller is a human or AI) without polluting the params dict.

---

19. Git Trailers

19.1 Purpose

Every auto-commit encodes structured metadata as git trailers in the commit message body. Trailers are a standard git convention (key-value pairs at the end of the commit message) that can be parsed programmatically with git log --format='%(trailers)'.

19.2 Trailer Fields

Trailer	Example	Rationale
`Source`	`Source: ai`	Who initiated the change -- `ai`, `human`, `system`, or `unknown`. Enables filtering commit history by actor type (e.g., "show me all AI-generated changes").
`Command`	`Command: add_node`	Which command produced the change. Enables tracing what action caused a specific file mutation, even months later.
`Changeset`	`Changeset: a1b2c3d4`	Correlates commits that belong to the same batch dispatch. When an AI dispatches 5 commands in a batch, all 5 commits share the same Changeset ID, making it possible to revert or review the entire batch as a unit.

19.3 Commit Format

cmd: add_node {'x': 100, 'y': 200}

Source: ai
Command: add_node
Changeset: a1b2c3d4

The commit author is set from DispatchContext.author (e.g., claude-agent <claude-agent@productengine>), providing a second axis of attribution alongside the Source trailer.

---

20. Starlark as the Canonical Handler Language

20.1 The Decision

Starlark is the only implemented handler language (see section 12). It is the canonical language that AI agents generate when creating or modifying plugins at runtime.

20.2 Why Starlark

Python syntax. Starlark is a dialect of Python. An AI that knows Python already knows Starlark. There is no new syntax to learn, no new idioms to generate. This eliminates an entire class of AI generation errors -- the AI does not need to context-switch between languages.

Hermetic sandbox by design. Starlark has no I/O, no imports beyond what the framework explicitly provides, no network access, no filesystem access, no side effects. A malicious or buggy Starlark handler cannot read files, open sockets, or modify global state. This is not a sandbox bolted on after the fact -- Starlark was designed for this from the ground up (it was created by Google for the Bazel build system, where untrusted BUILD files must be safe to evaluate).

Trivial hot-reload. Starlark evaluation is stateless -- there is no module-level state to preserve across reloads. The framework can unload and reload a handler file by simply re-evaluating it. Python handlers require careful module unloading (importlib.reload, clearing sys.modules entries) and can hold module-level state that is lost on reload.

Starlark-in-payload. The AI can send Starlark source code in a dispatch payload, and the framework writes it to disk, compiles it, and registers it as a command -- all within a single dispatch. This enables runtime command creation without pre-existing handler files. The safety guarantee makes this viable: no amount of Starlark code can escape the sandbox.

20.3 What Starlark Cannot Do

Starlark handlers cannot perform I/O, call external services, or access the filesystem. When Python handler support is added in the future, plugins that need I/O capabilities will be able to use Python handlers. The framework's handler contract (state -> new_state) will be the same regardless of language.