1
0
forked from wrenn/wrenn

7 Commits

Author SHA1 Message Date
92aab09104 Add daily usage metrics (CPU-minutes, RAM GB-minutes)
Introduce pre-computed daily usage rollups from sandbox_metrics_snapshots.
An hourly background worker aggregates completed days, while today's
usage is computed live from snapshots at query time for freshness.

Backend: new daily_usage table, rollup worker, UsageService, and
GET /v1/capsules/usage endpoint with date range filtering (up to 92 days).

Frontend: replace Usage page placeholder with bar charts (Chart.js),
summary total cards, and preset/custom date range controls.
2026-04-18 14:29:09 +06:00
43e838c55c Fix cascading deletion gaps for user and team cleanup
- Add ON DELETE CASCADE to users_teams, oauth_providers, admin_permissions
  and ON DELETE SET NULL (with nullable columns) to team_api_keys.created_by,
  hosts.created_by, host_tokens.created_by so HardDeleteExpiredUsers no longer
  fails with FK violations
- User account deletion now cascades to sole-owned teams via DeleteTeamInternal,
  preventing orphaned teams with live sandboxes after account removal
- ListActiveSandboxesByTeam now includes hibernated sandboxes so their disk
  snapshots are cleaned up during team deletion
- Team soft-delete now hard-deletes sandbox metric points, metric snapshots,
  API keys, and channels to prevent data accumulation on deleted teams
- Extract deleteTeamCore() to deduplicate shared logic across DeleteTeam,
  AdminDeleteTeam, and DeleteTeamInternal
- Fix ListAPIKeysByTeamWithCreator to use LEFT JOIN after created_by became
  nullable, and update handler to read pgtype.Text.String for creator_email
2026-04-16 04:26:48 +06:00
27ff828e60 Push GetSandboxMetricPoints time filter into SQL
The query was fetching all rows for a (sandbox_id, tier) pair and
filtering by timestamp in Go. For repeatedly-paused sandboxes the
24h tier can accumulate up to 30 days of data, causing up to 120x
over-fetching for a 6h range request.

Add AND ts >= $3 to the query so Postgres filters on the primary key
(sandbox_id, tier, ts) directly. Drop the redundant Go-side loop.
2026-03-25 21:53:19 +06:00
9acdbb5ae9 Add per-sandbox CPU/memory/disk metrics collection
Samples /proc/{fc_pid}/stat (CPU%), /proc/{fc_pid}/status (VmRSS), and
stat() on CoW files at 500ms intervals per running sandbox. Three tiered
ring buffers downsample into 30s and 5min averages for 10min/2h/24h
retention. Metrics are flushed to DB on pause (all tiers) and destroy
(24h only). New GetSandboxMetrics and FlushSandboxMetrics RPCs on the
host agent, proxied through GET /v1/sandboxes/{id}/metrics?range= on
the control plane. Returns live data for running sandboxes, DB data for
paused, and 404 for stopped.
2026-03-25 20:10:33 +06:00
e3750f79f9 Fix metrics sampler to record zero-value snapshots when idle
SampleSandboxMetrics previously filtered WHERE status IN ('running',
'starting', 'paused'), which returned no rows when all capsules were
stopped. This caused zero snapshots to be skipped, leaving the
time-series charts with no trailing data points instead of showing
the expected zero values.

Remove the WHERE filter so the query groups by all teams that have
any sandbox row. The per-status FILTER clauses on the aggregates
already produce correct zero counts for stopped capsules.

Also includes the per-VM RAM ceiling formula change (sum(ceil(each/2))
instead of ceil(sum/2)).
2026-03-25 15:50:19 +06:00
47b0ed5b52 Fix metrics correctness, redesign stats page
- Replace stale snapshot read (GetCurrentMetrics) with live query
  (GetLiveMetrics) against sandboxes table — always returns correct
  zeros when no capsules are running
- Fix CPU reserved formula: running + starting only; paused VMs no
  longer contribute vCPUs (RAM reservation for paused unchanged)
- Merge top cards into 3 paired Now/Peak cards with colored accent
  borders (green/blue/amber matching chart colors)
- Move Live badge from Running Capsules card to page-level header
- Add colored category dots to card and chart headers
- Charts stacked vertically, flex-1 to fill remaining page height
- vCPUs chart color changed to blue (#5a9fd4), RAM stays amber
2026-03-25 15:11:46 +06:00
fee66bda50 Add live stats page with metrics sampling and route split
- New sandbox_metrics_snapshots table sampled every 10s (60-day retention)
- Background MetricsSampler goroutine wired into control plane startup
- GET /v1/sandboxes/stats?range=5m|1h|6h|24h|30d endpoint with adaptive
  polling intervals; reserved CPU/RAM uses ceil(paused/2) formula
- StatsPanel component: 4 stat cards + 2 Chart.js line charts (straight
  lines, integer y-axis for running count, dual-axis for CPU/RAM)
- Range filter persisted in URL query param; polls update data silently
  (no blink — loading state only shown on initial mount)
- Split /dashboard/capsules into /list and /stats sub-routes with shared
  layout; capsuleRunningCount store syncs badge across routes
- CreateCapsuleDialog extracted as reusable component
2026-03-25 14:41:05 +06:00