1
0
forked from wrenn/wrenn
Commit Graph

23 Commits

Author SHA1 Message Date
8f8638e6db Bump version to 0.1.2 2026-04-18 14:47:25 +06:00
92aab09104 Add daily usage metrics (CPU-minutes, RAM GB-minutes)
Introduce pre-computed daily usage rollups from sandbox_metrics_snapshots.
An hourly background worker aggregates completed days, while today's
usage is computed live from snapshots at query time for freshness.

Backend: new daily_usage table, rollup worker, UsageService, and
GET /v1/capsules/usage endpoint with date range filtering (up to 92 days).

Frontend: replace Usage page placeholder with bar charts (Chart.js),
summary total cards, and preset/custom date range controls.
2026-04-18 14:29:09 +06:00
a3f75300a9 Add email activation flow and replace is_active with status column
Email signup now creates inactive users who must activate via a 30-minute
email token before signing in. Team creation is deferred to first login
after activation, while OAuth users continue to get teams immediately.

- Replace boolean is_active with status column (inactive/active/disabled/deleted)
- Add POST /v1/auth/activate endpoint with Redis-backed token consumption
- Signup returns message instead of JWT, sends activation email
- Login differentiates error messages by user status
- Add confirm password field to signup form
- Add /activate frontend page that auto-logs in on success
- Handle inactive user cleanup on re-signup (30-min cooldown) and OAuth collision
2026-04-16 04:05:41 +06:00
f69fa8cded Add /v1/me account management endpoints
Adds self-service endpoints: GET/PATCH/DELETE /v1/me, POST /v1/me/password,
POST /v1/me/password/reset{/confirm}, GET/DELETE /v1/me/providers/{provider}.
Includes OAuth account-linking flow via cookie, hard-delete cleanup goroutine
(24h ticker, 15-day grace period), and OpenAPI spec for all new routes.
2026-04-16 03:24:55 +06:00
5f877afb9e Remove PTY inactivity timeout to keep terminal sessions alive indefinitely
Sessions now only end on process exit or explicit kill, not idle time.
The keepalive ping every 30s remains to prevent network-level disconnects.
2026-04-15 18:31:48 +06:00
516890c49a Add background process execution API
Start long-running processes (web servers, daemons) without blocking the
HTTP request. Leverages envd's existing background process support
(context.Background(), List, Connect, SendSignal RPCs) and wires it
through the host agent and control plane layers.

New API surface:
- POST /v1/capsules/{id}/exec with background:true → 202 {pid, tag}
- GET /v1/capsules/{id}/processes → list running processes
- DELETE /v1/capsules/{id}/processes/{selector} → kill by PID or tag
- WS /v1/capsules/{id}/processes/{selector}/stream → reconnect to output

The {selector} param auto-detects: numeric = PID, string = tag.
Tags are auto-generated as "proc-" + 8 hex chars if not provided.
2026-04-14 03:57:01 +06:00
565817273d Rename API routes /v1/sandboxes → /v1/capsules 2026-04-12 21:51:04 +06:00
ab3fc4a807 Add interactive PTY terminal sessions for sandboxes
Wire envd's existing PTY process capabilities through the full stack:
hostagent proto (4 new RPCs: PtyAttach, PtySendInput, PtyResize, PtyKill),
envdclient, sandbox manager, and a new WebSocket endpoint at
GET /v1/sandboxes/{id}/pty with bidirectional JSON message protocol.

Sessions use tag-based identity for disconnect/reconnect support,
base64-encoded PTY data for binary safety, and a 120s inactivity timeout.
2026-04-11 02:42:59 +06:00
c9283cac70 Add filesystem operations (list, mkdir, remove) across full stack
Plumb ListDir, MakeDir, and RemovePath through all layers:
REST API → host agent RPC → envdclient → envd. These endpoints
enable a web file browser for sandbox filesystem interaction.

New endpoints (all under requireAPIKeyOrJWT):
- POST /v1/sandboxes/{id}/files/list
- POST /v1/sandboxes/{id}/files/mkdir
- POST /v1/sandboxes/{id}/files/remove
2026-04-10 18:05:13 +06:00
84dd15d22b feat: add notification channels with provider integrations and retry
Implement a channels system for notifying teams via external providers
(Discord, Slack, Teams, Google Chat, Telegram, Matrix, webhook) when
lifecycle events occur (capsule/template/host state changes).

- Channel CRUD API under /v1/channels (JWT-only auth)
- Test endpoint to verify config before saving (POST /v1/channels/test)
- Secret rotation endpoint (PUT /v1/channels/{id}/config)
- AES-256-GCM encryption for provider secrets (WRENN_ENCRYPTION_KEY)
- Redis stream event publishing from audit logger
- Background dispatcher with consumer group and retry (10s, 30s)
- Webhook delivery with HMAC-SHA256 signing (X-WRENN-SIGNATURE)
- shoutrrr integration for chat providers
- Secrets never exposed in API responses
2026-04-09 17:06:06 +06:00
49b0b646a8 Add 5m, 1h, 6h, 12h range filters to metrics endpoint
Maps each user-facing range to the appropriate underlying ring buffer
tier and applies a time cutoff filter. No new ring buffers needed —
5m/10m read from the 10m tier, 1h/2h from the 2h tier, 6h/12h/24h
from the 24h tier.
2026-03-25 20:44:28 +06:00
9acdbb5ae9 Add per-sandbox CPU/memory/disk metrics collection
Samples /proc/{fc_pid}/stat (CPU%), /proc/{fc_pid}/status (VmRSS), and
stat() on CoW files at 500ms intervals per running sandbox. Three tiered
ring buffers downsample into 30s and 5min averages for 10min/2h/24h
retention. Metrics are flushed to DB on pause (all tiers) and destroy
(24h only). New GetSandboxMetrics and FlushSandboxMetrics RPCs on the
host agent, proxied through GET /v1/sandboxes/{id}/metrics?range= on
the control plane. Returns live data for running sandboxes, DB data for
paused, and 404 for stopped.
2026-03-25 20:10:33 +06:00
fee66bda50 Add live stats page with metrics sampling and route split
- New sandbox_metrics_snapshots table sampled every 10s (60-day retention)
- Background MetricsSampler goroutine wired into control plane startup
- GET /v1/sandboxes/stats?range=5m|1h|6h|24h|30d endpoint with adaptive
  polling intervals; reserved CPU/RAM uses ceil(paused/2) formula
- StatsPanel component: 4 stat cards + 2 Chart.js line charts (straight
  lines, integer y-axis for running count, dual-axis for CPU/RAM)
- Range filter persisted in URL query param; polls update data silently
  (no blink — loading state only shown on initial mount)
- Split /dashboard/capsules into /list and /stats sub-routes with shared
  layout; capsuleRunningCount store syncs badge across routes
- CreateCapsuleDialog extracted as reusable component
2026-03-25 14:41:05 +06:00
9bf67aa7f7 Implement host registration, JWT refresh tokens, and multi-host scheduling
Replaces the hardcoded CP_HOST_AGENT_ADDR single-agent setup with a
DB-driven registration system supporting multiple host agents (BYOC).

Key changes:
- Host agents register via one-time token, receive a 7-day JWT + 60-day
  refresh token; heartbeat loop auto-refreshes on 401/403 and pauses all
  sandboxes if refresh fails
- HostClientPool: lazy Connect RPC client cache keyed by host ID, replacing
  the single static agent client throughout the API and service layers
- RoundRobinScheduler: picks an online host for each new sandbox via
  ListActiveHosts; extensible for future scheduling strategies
- HostMonitor (replaces Reconciler): passive heartbeat staleness check marks
  hosts unreachable and sandboxes missing after 90s; active reconciliation
  per online host restores missing-but-alive sandboxes and stops orphans
- Graceful host delete: returns 409 with affected sandbox list without
  ?force=true; force-delete destroys sandboxes then evicts pool client
- Snapshot delete broadcasts to all online hosts (templates have no host_id)
- sandbox.Manager.PauseAll: pauses all running VMs on CP connectivity loss
- New migration: host_refresh_tokens table with token rotation (issue-then-
  revoke ordering to prevent lockout on mid-rotation crash)
- New sandbox status 'missing' (reversible, unlike 'stopped') and host
  status 'unreachable'; both reflected in OpenAPI spec
- Fix: refresh token auth failure now returns 401 (was 400 via generic
  'invalid' substring match in serviceErrToHTTP)
2026-03-24 18:32:05 +06:00
3932bc056e Add user names, team-scoped sandbox guard, and login robustness fixes
- Add name column to users (migration + sqlc regen); propagate through JWT
  claims, auth context, all auth/OAuth handlers, service layer, and frontend
- Sidebar and team page show name instead of email; team page splits Name/Email
  into separate columns
- Block sandbox creation in UI and API when user has no active team context
- loginTeam helper falls back to first active team when no default is set,
  fixing login for invited users with no is_default membership
- Exclude soft-deleted teams from GetDefaultTeamForUser, GetBYOCTeams queries
- Guard host creation against soft-deleted teams in service/host.go
- SwitchTeam re-fetches name from DB instead of trusting stale JWT claim
- Reset teams store on login so stale data from a previous session never persists
- Update openapi.yaml: add name to SignupRequest and AuthResponse schemas
2026-03-24 16:56:10 +06:00
8e5d426638 Add team management endpoints
- Three-role model (owner/admin/member) with owner protection invariants
- Team CRUD: create, rename (admin+), soft-delete with VM cleanup (owner only)
- Member management: add by email, remove, role updates (admin+), leave
- Switch-team endpoint re-issues JWT after DB membership verification
- User email prefix search for add-member UI autocomplete
- JWT carries role as a hint; all authorization decisions verified from DB
- Team slug: immutable 12-char hex (e.g. a1b2c3-d1e2f3), reserved on soft-delete
- Migration adds slug + deleted_at to teams; backfills existing rows
2026-03-24 13:29:54 +06:00
2c66959b92 Add host registration, heartbeat, and multi-host management
Implements the full host ↔ control plane connection flow:

- Host CRUD endpoints (POST/GET/DELETE /v1/hosts) with role-based access:
  regular hosts admin-only, BYOC hosts for admins and team owners
- One-time registration token flow: admin creates host → gets token (1hr TTL
  in Redis + Postgres audit trail) → host agent registers with specs → gets
  long-lived JWT (1yr)
- Host agent registration client with automatic spec detection (arch, CPU,
  memory, disk) and token persistence to disk
- Periodic heartbeat (30s) via POST /v1/hosts/{id}/heartbeat with X-Host-Token
  auth and host ID cross-check
- Token regeneration endpoint (POST /v1/hosts/{id}/token) for retry after
  failed registration
- Tag management (add/remove/list) with team-scoped access control
- Host JWT with typ:"host" claim, cross-use prevention in both VerifyJWT and
  VerifyHostJWT
- requireHostToken middleware for host agent authentication
- DB-level race protection: RegisterHost uses AND status='pending' with
  rows-affected check; Redis GetDel for atomic token consume
- Migration for future mTLS support (cert_fingerprint, mtls_enabled columns)
- Host agent flags: --register (one-time token), --address (required ip:port)
- serviceErrToHTTP extended with "forbidden" → 403 mapping
- OpenAPI spec, .env.example, and README updated
2026-03-17 05:51:28 +06:00
931b7d54b3 Add GitHub OAuth login with provider registry
Implement OAuth 2.0 login via GitHub as an alternative to email/password.
Uses a provider registry pattern (internal/auth/oauth/) so adding Google
or other providers later requires only a new Provider implementation.

Flow: GET /v1/auth/oauth/github redirects to GitHub, callback exchanges
the code for a user profile, upserts the user + team atomically, and
redirects to the frontend with a JWT token.

Key changes:
- Migration: make password_hash nullable, add oauth_providers table
- Provider registry with GitHubProvider (profile + email fallback)
- CSRF state cookie with HMAC-SHA256 validation
- Race-safe registration (23505 collision retries as login)
- Startup validation: CP_PUBLIC_URL required when OAuth is configured

Not fully tested — needs integration tests with a real GitHub OAuth app
and end-to-end testing with the frontend callback page.
2026-03-15 06:31:58 +06:00
477d4f8cf6 Add auto-pause TTL and ping endpoint for sandbox inactivity management
Replace the existing auto-destroy TTL behavior with auto-pause: when a
sandbox exceeds its timeout_sec of inactivity, the TTL reaper now pauses
it (snapshot + teardown) instead of destroying it, preserving the ability
to resume later.

Key changes:
- TTL reaper calls Pause instead of Destroy, with fallback to Destroy if
  pause fails (e.g. Firecracker process already gone)
- New PingSandbox RPC resets the in-memory LastActiveAt timer
- New POST /v1/sandboxes/{id}/ping REST endpoint resets both agent memory
  and DB last_active_at
- ListSandboxes RPC now includes auto_paused_sandbox_ids so the reconciler
  can distinguish auto-paused sandboxes from crashed ones in a single call
- Reconciler polls every 5s (was 30s) and marks auto-paused as "paused"
  vs orphaned as "stopped"
- Resume RPC accepts timeout_sec from DB so TTL survives pause/resume cycles
- Reaper checks every 2s (was 10s) and uses a detached context to avoid
  incomplete pauses on app shutdown
- Default timeout_sec changed from 300 to 0 (no auto-pause unless requested)
2026-03-15 05:15:18 +06:00
c92cc29b88 Add authentication, authorization, and team-scoped access control
Implement email/password auth with JWT sessions and API key auth for
sandbox lifecycle. Users get a default team on signup; sandboxes,
snapshots, and API keys are scoped to teams.

- Add user, team, users_teams, and team_api_keys tables (goose migrations)
- Add JWT middleware (Bearer token) for user management endpoints
- Add API key middleware (X-API-Key header, SHA-256 hashed) for sandbox ops
- Add signup/login handlers with transactional user+team creation
- Add API key CRUD endpoints (create/list/delete)
- Replace owner_id with team_id on sandboxes and templates
- Update all handlers to use team-scoped queries
- Add godotenv for .env file loading
- Update OpenAPI spec and test UI with auth flows
2026-03-14 03:57:06 +06:00
a1bd439c75 Add sandbox snapshot and restore with UFFD lazy memory loading
Implement full snapshot lifecycle: pause (snapshot + free resources),
resume (UFFD-based lazy restore), and named snapshot templates that
can spawn new sandboxes from frozen VM state.

Key changes:
- Snapshot header system with generational diff mapping (inspired by e2b)
- UFFD server for lazy page fault handling during snapshot restore
- Stable rootfs symlink path (/tmp/fc-vm/) for snapshot compatibility
- Templates DB table and CRUD API endpoints (POST/GET/DELETE /v1/snapshots)
- CreateSnapshot/DeleteSnapshot RPCs in hostagent proto
- Reconciler excludes paused sandboxes (expected absent from host agent)
- Snapshot templates lock vcpus/memory to baked-in values
- Proper cleanup of uffd sockets and pause snapshot files on destroy
2026-03-12 09:19:37 +06:00
b4d8edb65b Add streaming exec and file transfer endpoints
Add WebSocket-based streaming exec endpoint and streaming file
upload/download endpoints to the control plane API. Includes new
host agent RPC methods (ExecStream, StreamWriteFile, StreamReadFile),
envd client streaming support, and OpenAPI spec updates.
2026-03-11 05:42:42 +06:00
ec3360d9ad Add minimal control plane with REST API, database, and reconciler
- REST API (chi router): sandbox CRUD, exec, pause/resume, file write/read
- PostgreSQL persistence via pgx/v5 + sqlc (sandboxes table with goose migration)
- Connect RPC client to host agent for all VM operations
- Reconciler syncs host agent state with DB every 30s (detects TTL-reaped sandboxes)
- OpenAPI 3.1 spec served at /openapi.yaml, Swagger UI at /docs
- Added WriteFile/ReadFile RPCs to hostagent proto and implementations
- File upload via multipart form, download via JSON body POST
- sandbox_id propagated from control plane to host agent on create
2026-03-10 16:50:12 +06:00