python-sdk/CLAUDE.md at b2ec7f9ab31c8c7c4a794a838092b6ef51668974

Files

pptx704 9edde7bff5 feat(code_runner): rename module, fix __del__ + kernel name, expand tests

- Rename `wrenn.code_interpreter` → `wrenn.code_runner` (canonical).
  Keep old path as deprecation alias that emits a FutureWarning on
  import, mirroring the existing `Sandbox` → `Capsule` pattern.
  Submodule shims `code_interpreter/{capsule,async_capsule,models}.py`
  keep direct-submodule imports working.

- Fix sync/async ctor-failure-safe `__del__`: initialise `_kernel_id`,
  `_kernel_name`, `_proxy_client` before calling `super().__init__` so
  a failed creation no longer crashes the destructor with
  AttributeError.

- Send the kernel name to Jupyter. Previously `POST /api/kernels` had
  no body, so the server picked an arbitrary default kernelspec. Now
  sends `{"name": "wrenn"}` (override via `Capsule(kernel=...)`) and
  reuses an existing kernel only when its `name` matches.

- Preserve Jupyter `text/plain` verbatim in `Result.from_bundle`.
  The previous outer-quote strip was lossy (the string `'2'` became
  indistinguishable from the int `2`, and strings containing escaped
  quotes were mangled). `text` is now the `repr()` Jupyter sends.
  Updated the stale `test_capsule_features` quote-strip test.

- Validate `run_code(language=...)`. Anything other than `"python"`
  now raises `ValueError` instead of being silently ignored.

- Async `__del__` no longer touches the event loop; users must call
  `await close()` or use `async with`.

- New unit suite `tests/test_code_runner_unit.py` (46 tests): MIME
  unpacking, deprecation alias + warning, default template + kernel,
  custom kernel override, ctor-failure-safe __del__, kernel
  create/reuse/cache, retry on 5xx, 4xx propagation, request shape,
  run_code stream/result/error/foreign-parent/idle/unsupported-language,
  async variants.

- New e2e suite `tests/test_code_runner_e2e.py` (44 tests, integration
  marker): template == `code-runner-beta`, kernel == `wrenn`, stdout
  /stderr capture, state/import/function/class persistence, exceptions
  (Value/Name/Syntax), callbacks, multi-line, `text` repr preservation,
  filesystem round-trip, isolation between capsules, deprecated import
  path. MIME-type class covers html, markdown, json, latex, svg,
  javascript, png (matplotlib + seaborn), jpeg, multi-format bundles,
  and text-round-trip via numpy + requests.

- `make test-code-runner` runs unit + e2e together. `make test`
  extended to include the unit file.

- README: "Code Interpreter" section renamed to "Code Runner", all
  imports updated, `kernel=` documented, removed the incorrect
  "quotes stripped automatically" claim, replaced with the actual
  `text/plain` semantics.

- CLAUDE.md: appended a "Code Runner Module" section covering module
  path, defaults, kernel-reuse semantics, lifecycle invariant, and
  the new test files + make target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 04:29:31 +06:00

11 KiB

Raw Blame History

Design Context

Users

Developers across the full spectrum — solo engineers building side projects, startup teams integrating sandboxed execution into products, and platform/infra engineers at larger organizations running production workloads on Firecracker microVMs. They arrive with context: they know what a process is, what a rootfs is, what a TTY means. The interface must feel at home for all three: approachable enough not to intimidate a hacker, precise enough to earn the trust of a production ops team. Never condescend, never oversimplify. Trust the user to understand what they're looking at.

Primary job to be done: Understand what's running, act on it confidently, and get back to code.

Brand Personality

Precise. Warm. Uncompromising.

Wrenn is an engineer's favorite tool — built with visible care, not assembled from defaults. It runs real infrastructure (Firecracker microVMs), so the UI should reflect that seriousness without becoming cold or corporate. The warmth comes from the typography and color palette; the precision comes from hierarchy, density, and data fidelity.

Emotional goal: in control. Users leave a session with full confidence in what's running, what happened, and what comes next. Nothing is hidden, nothing is ambiguous.

Aesthetic Direction

Dark-only (permanently), industrial-warm, data-forward.

No light mode planned. All design decisions should optimize for dark. The near-black-green background palette (#0a0c0b through #2a302d) reads as "black with intention" — not pitch black (cold) and not charcoal (dated). The sage green accent (#5e8c58) is muted and organic, a meaningful departure from the startup-green neon that saturates the developer tool space.

Anti-references:

Supabase: avoid the friendly, approachable startup-green energy — too generic, too eager to please
AWS / GCP consoles: avoid utility-first density without craft — functional but joyless, visually dated

References that capture the right spirit:

The precision of a well-calibrated instrument
Editorial typography from technical publications
The quiet confidence of tools that don't need to explain themselves

Type System

Four fonts with strict roles — this is the design system's strongest personality trait and must be respected:

Font	CSS Class	Role	When to use
Manrope (variable, sans)	`font-sans`	UI workhorse	All body copy, nav, labels, buttons, form text
Instrument Serif	`font-serif`	Display / editorial	Page titles (h1), dialog headings, metric values, hero moments
JetBrains Mono (variable)	`font-mono`	Data / code	IDs, timestamps, key prefixes, file paths, terminal output, metrics
Alice	brand wordmark only	Brand wordmark	"Wrenn" in sidebar and login only — nowhere else

Instrument Serif at scale creates the signature editorial moments. Mono provides the precision signal for technical data. Never swap these roles.

Tracking overrides (app.css):

.font-serif — letter-spacing: 0.015em (positive tracking; Instrument Serif reads less condensed at display sizes)
.font-mono — font-variant-numeric: tabular-nums (numbers align in tables and metric displays)

Type scale (root: 87.5% = 14px base):

Token	Value	Use
`--text-display`	2.571rem (~36px)	Auth section headings
`--text-page`	2rem (~28px)	Page h1 titles
`--text-heading`	1.429rem (~20px)	Dialog headings, empty states
`--text-body`	1rem (~14px)	Primary body, buttons, inputs
`--text-ui`	0.929rem (~13px)	Nav labels, table cells
`--text-meta`	0.857rem (~12px)	Key prefixes, minor info
`--text-label`	0.786rem (~11px)	Uppercase section labels
`--text-badge`	0.714rem (~10px)	Live badges, tiny indicators

Color System

All values are CSS custom properties in frontend/src/app.css.

Backgrounds (6-step near-black-green scale):

Token	Value	Use
`--color-bg-0`	`#0a0c0b`	Page base, sidebar deepest layer
`--color-bg-1`	`#0f1211`	Sidebar surface
`--color-bg-2`	`#141817`	Card backgrounds
`--color-bg-3`	`#1a1e1c`	Table headers, elevated surfaces
`--color-bg-4`	`#212624`	Hover states, inputs
`--color-bg-5`	`#2a302d`	Highlighted items, selected rows

Text (5-level hierarchy):

Token	Value	Use
`--color-text-bright`	`#eae7e2`	H1s, dialog headings
`--color-text-primary`	`#d0cdc6`	Body copy, primary labels
`--color-text-secondary`	`#9b9790`	Secondary labels, descriptions
`--color-text-tertiary`	`#6b6862`	Hints, placeholders
`--color-text-muted`	`#454340`	Dividers as text, ultra-subtle

Accent (sage green — use sparingly, must feel earned):

Token	Value	Use
`--color-accent`	`#5e8c58`	Primary CTA, live indicators, focus rings, active nav
`--color-accent-mid`	`#89a785`	Hover accent text
`--color-accent-bright`	`#a4c89f`	Accent on dark backgrounds
`--color-accent-glow`	`rgba(94,140,88,0.07)`	Subtle tinted backgrounds
`--color-accent-glow-mid`	`rgba(94,140,88,0.14)`	Hover tint on accent items

Status semantics:

Token	Value	Use
`--color-amber`	`#d4a73c`	Warning, paused state
`--color-red`	`#cf8172`	Error, destructive actions
`--color-blue`	`#5a9fd4`	Info, neutral system states

Borders: --color-border (#1f2321) default; --color-border-mid (#2a2f2c) for inputs/hover.

Component Patterns

Buttons:

Primary: solid sage green (--color-accent), hover brightness boost + micro-lift (-translate-y-px)
Secondary: bordered (--color-border-mid), text transitions to accent on hover
Danger: red text + subtle red background on hover
All: transition-all duration-150

Inputs:

Border --color-border, background --color-bg-2; focus transitions border and icon to accent
Group focus pattern: group wrapper + group-focus-within:text-[var(--color-accent)] on icon

Tables / data lists:

Grid layout; header bg-3 + uppercase --text-label; row hover hover:bg-[var(--color-bg-3)]
Status stripe: left border color matches sandbox state

Status indicators: Running = animated ping + sage green dot; Paused = amber dot; Stopped = muted gray. Color is never the sole differentiator.

Modals & dialogs: Border + shadow only — no accent gradient bars/strips. fadeUp 0.35s entrance.

Empty states: Large icon with glow, Instrument Serif heading, secondary body text, CTA below, iconFloat 4s animation.

Animations (always respect prefers-reduced-motion): fadeUp (entrance), status-ping (live indicator), iconFloat (empty states), spin-once (refresh), staggered animation-delay on lists.

Design Principles

Precision over friendliness. Every element earns its place. Wrenn doesn't need to tell you it's developer-friendly — that should be self-evident from the quality of the information architecture.
Density with breathing room. Data-forward doesn't mean cramped. Strategic whitespace creates calm hierarchy within dense contexts. Sections breathe; rows don't waste space.
Industrial warmth. The serif + mono + warm-black combination prevents sterility. This is a forge, not a gallery. The warmth is in the details, not the primary colors.
Legible at speed. Users scan dashboards in seconds. Strong typographic contrast (serif h1, mono IDs, sans body), consistent patterns, and predictable placement let users orientate instantly without reading everything.
Craft signals trust. For infrastructure that runs production code, the quality of the UI is a proxy for the quality of the product. Pixel-level decisions matter. Polish is not decoration — it's a trust signal.

MCP Tools: code-review-graph

IMPORTANT: This project has a knowledge graph. ALWAYS use the code-review-graph MCP tools BEFORE using Grep/Glob/Read to explore the codebase. The graph is faster, cheaper (fewer tokens), and gives you structural context (callers, dependents, test coverage) that file scanning cannot.

When to use graph tools FIRST

Exploring code: semantic_search_nodes or query_graph instead of Grep
Understanding impact: get_impact_radius instead of manually tracing imports
Code review: detect_changes + get_review_context instead of reading entire files
Finding relationships: query_graph with callers_of/callees_of/imports_of/tests_for
Architecture questions: get_architecture_overview + list_communities

Fall back to Grep/Glob/Read only when the graph doesn't cover what you need.

Key Tools

Tool	Use when
`detect_changes`	Reviewing code changes — gives risk-scored analysis
`get_review_context`	Need source snippets for review — token-efficient
`get_impact_radius`	Understanding blast radius of a change
`get_affected_flows`	Finding which execution paths are impacted
`query_graph`	Tracing callers, callees, imports, tests, dependencies
`semantic_search_nodes`	Finding functions/classes by name or keyword
`get_architecture_overview`	Understanding high-level codebase structure
`refactor_tool`	Planning renames, finding dead code

Workflow

The graph auto-updates on file changes (via hooks).
Use detect_changes for code review.
Use get_affected_flows to understand impact.
Use query_graph pattern="tests_for" to check coverage.

Code Runner Module

wrenn.code_runner — stateful code execution capsule via persistent Jupyter kernel.

Module path: wrenn.code_runner (canonical). The old path wrenn.code_interpreter is a deprecation alias that emits a FutureWarning on import; do not introduce new uses.
Defaults: template code-runner-beta, kernelspec wrenn. Both overridable via Capsule(template=..., kernel=...).
Kernel reuse: _ensure_kernel lists /api/kernels, reuses the first kernel whose name matches the configured kernelspec, else POSTs {"name": <kernel>} to create one. Matching by name (not just "any kernel") is intentional — multiple kernelspecs may coexist on the same Jupyter.
Lifecycle invariant: the constructor sets _kernel_id, _kernel_name, _proxy_client to safe defaults before calling super().__init__. __del__ must never assume construction completed. Async __del__ only drops the reference — the proxy httpx.AsyncClient must be closed via await close() or async with.

Tests

tests/test_code_runner_unit.py — pure unit tests (respx + mocked WebSocket). Covers Result.from_bundle, MIME unpacking, quote-stripping, Execution.text, kernel reuse vs create, retry on 5xx, 4xx propagation, ctor-failure-safe __del__, deprecation alias.
tests/test_code_runner_e2e.py — live integration tests (marked integration, skipped without WRENN_API_KEY). Covers stateful execution, exceptions, callbacks, rich outputs (HTML, matplotlib, pandas), async variant, isolation between capsules, and the deprecated code_interpreter import path.
Run both: make test-code-runner.

11 KiB Raw Blame History