Compare commits
3 Commits
d98acc2764
...
6014fa72cf
| Author | SHA1 | Date | |
|---|---|---|---|
| 6014fa72cf | |||
| c164aadfc9 | |||
| a01796a4c3 |
@ -97,7 +97,7 @@ Startup (`cmd/control-plane/main.go`) is a thin wrapper: `cpserver.Run(cpserver.
|
||||
|
||||
**Packages:** `internal/hostagent/`, `internal/sandbox/`, `internal/vm/`, `internal/network/`, `internal/devicemapper/`, `internal/envdclient/`, `internal/snapshot/`
|
||||
|
||||
**Production deployment:** `scripts/prepare-wrenn-user.sh` creates the `wrenn` system user, sets Linux capabilities (setcap) on wrenn-agent and all child binaries (iptables, losetup, dmsetup, etc.), installs an apt hook to restore capabilities after package updates, configures udev rules for `/dev/net/tun`, loads required kernel modules, and writes systemd unit files for both services. No sudo grants — all privilege is via capabilities.
|
||||
**Production deployment:** `make setup-host` (→ `scripts/setup-host.sh`) prepares the host: creates the `wrenn` system user, sets Linux capabilities (setcap) on wrenn-agent and all child binaries (iptables, losetup, dmsetup, etc.), installs an apt hook to restore capabilities after package updates, configures udev rules for `/dev/net/tun`, and loads required kernel modules. No sudo grants — all privilege is via capabilities. `make install` then copies the binaries to `/usr/local/bin` and installs the systemd units from `deploy/systemd/`.
|
||||
|
||||
Startup (`cmd/host-agent/main.go`) wires: root/capabilities check → enable IP forwarding → clean up stale dm devices → `sandbox.Manager` (containing `vm.Manager` + `network.SlotAllocator` + `devicemapper.LoopRegistry`) → `hostagent.Server` (Connect RPC handler) → HTTP server.
|
||||
|
||||
@ -258,13 +258,14 @@ To add a new query: add it to the appropriate `.sql` file in `db/queries/` → `
|
||||
## Rootfs & Guest Init
|
||||
|
||||
- **wrenn-init** (`images/wrenn-init.sh`): the PID 1 init script baked into every rootfs. Mounts virtual filesystems, sets hostname, writes `/etc/resolv.conf`, then execs envd.
|
||||
- **Updating the rootfs** after changing envd or wrenn-init: `bash scripts/update-minimal-rootfs.sh`. This builds envd via `make build-envd` (Rust → static musl binary), mounts the rootfs image, copies in the new binaries, and unmounts. Defaults to `/var/lib/wrenn/images/minimal.ext4`.
|
||||
- Rootfs images are minimal debootstrap — no systemd, no coreutils beyond busybox. Use `/bin/sh -c` for shell builtins inside the guest.
|
||||
- **System base templates**: four built-in distro images — `minimal-ubuntu` (id 0, default), `minimal-alpine` (1), `minimal-arch` (2), `minimal-fedora` (3) — built via `images/build-{ubuntu,alpine,arch,fedora}.sh` (or `make images`). All platform-owned, protected from deletion (reserved IDs 0–1024). Same static envd + tini run on all four. Each has a `wrenn-user` with passwordless sudo.
|
||||
- **Updating the rootfs** after changing envd or wrenn-init: `bash scripts/update-minimal-rootfs.sh`. Builds envd via `make build-envd` (Rust → static musl binary), then re-injects envd + wrenn-init + tini into all four system base images.
|
||||
- Rootfs images are built from distro containers — no systemd (init is overridden to `wrenn-init`). Use `/bin/sh -c` for shell builtins inside the guest.
|
||||
|
||||
## Fixed Paths (on host machine)
|
||||
|
||||
- Kernel: `/var/lib/wrenn/kernels/vmlinux`
|
||||
- Base rootfs images: `/var/lib/wrenn/images/{template}.ext4`
|
||||
- Base rootfs images: `/var/lib/wrenn/images/teams/{base36(teamID)}/{base36(templateID)}/rootfs.ext4` (system templates use the platform team, base36 all-zeros)
|
||||
- Sandbox clones: `/var/lib/wrenn/sandboxes/`
|
||||
- Cloud Hypervisor: `/usr/local/bin/cloud-hypervisor`
|
||||
|
||||
|
||||
34
Makefile
34
Makefile
@ -131,32 +131,24 @@ check: fmt vet lint test
|
||||
# ═══════════════════════════════════════════════════
|
||||
# Rootfs Images
|
||||
# ═══════════════════════════════════════════════════
|
||||
.PHONY: images image-minimal image-python image-node
|
||||
.PHONY: images rootfs-ubuntu rootfs-alpine rootfs-arch rootfs-fedora
|
||||
|
||||
images: build-envd image-minimal image-python image-node
|
||||
# Build all four system base rootfs images (ubuntu/alpine/arch/fedora). Each
|
||||
# spawns a distro container, installs the required packages + wrenn-user, then
|
||||
# exports to images/teams/<platform>/<id>/rootfs.ext4. Requires docker + sudo.
|
||||
images: rootfs-ubuntu rootfs-alpine rootfs-arch rootfs-fedora
|
||||
|
||||
image-minimal:
|
||||
sudo bash images/templates/minimal/build.sh
|
||||
rootfs-ubuntu:
|
||||
bash images/build-ubuntu.sh
|
||||
|
||||
image-python:
|
||||
sudo bash images/templates/python312/build.sh
|
||||
rootfs-alpine:
|
||||
bash images/build-alpine.sh
|
||||
|
||||
image-node:
|
||||
sudo bash images/templates/node20/build.sh
|
||||
rootfs-arch:
|
||||
bash images/build-arch.sh
|
||||
|
||||
# ═══════════════════════════════════════════════════
|
||||
# Deployment
|
||||
# ═══════════════════════════════════════════════════
|
||||
.PHONY: setup-host install
|
||||
|
||||
setup-host:
|
||||
sudo bash scripts/setup-host.sh
|
||||
|
||||
install: build
|
||||
sudo cp $(BIN_DIR)/wrenn-cp /usr/local/bin/
|
||||
sudo cp $(BIN_DIR)/wrenn-agent /usr/local/bin/
|
||||
sudo cp deploy/systemd/*.service /etc/systemd/system/
|
||||
sudo systemctl daemon-reload
|
||||
rootfs-fedora:
|
||||
bash images/build-fedora.sh
|
||||
|
||||
# ═══════════════════════════════════════════════════
|
||||
# Clean
|
||||
|
||||
61
README.md
61
README.md
@ -22,7 +22,7 @@ Produces three binaries: `wrenn-cp` (control plane), `wrenn-agent` (host agent),
|
||||
|
||||
## Host setup
|
||||
|
||||
The host agent needs a kernel, a minimal rootfs image, and working directories on the host machine.
|
||||
The host agent needs a kernel, the system base rootfs images, and working directories on the host machine.
|
||||
|
||||
### Directory structure
|
||||
|
||||
@ -31,59 +31,74 @@ The host agent needs a kernel, a minimal rootfs image, and working directories o
|
||||
├── kernels/
|
||||
│ └── vmlinux # uncompressed Linux kernel (not bzImage)
|
||||
├── images/
|
||||
│ └── minimal/
|
||||
│ └── rootfs.ext4 # base rootfs (all other templates snapshot from this)
|
||||
│ └── teams/
|
||||
│ └── 0000000000000000000000000/ # platform team (base36 all-zeros)
|
||||
│ ├── 0000000000000000000000000/rootfs.ext4 # minimal-ubuntu (id 0)
|
||||
│ ├── 0000000000000000000000001/rootfs.ext4 # minimal-alpine (id 1)
|
||||
│ ├── 0000000000000000000000002/rootfs.ext4 # minimal-arch (id 2)
|
||||
│ └── 0000000000000000000000003/rootfs.ext4 # minimal-fedora (id 3)
|
||||
├── sandboxes/ # per-sandbox CoW files (created at runtime)
|
||||
└── snapshots/ # pause/hibernate snapshot files (created at runtime)
|
||||
```
|
||||
|
||||
Create the directories:
|
||||
Create the base directories (the per-template image dirs are created by the build scripts):
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /var/lib/wrenn/{kernels,images/minimal,sandboxes,snapshots}
|
||||
sudo mkdir -p /var/lib/wrenn/{kernels,images,sandboxes,snapshots}
|
||||
```
|
||||
|
||||
### Kernel
|
||||
|
||||
Place an uncompressed `vmlinux` kernel at `/var/lib/wrenn/kernels/vmlinux`. Versioned kernels (`vmlinux-{semver}`) are also supported — the agent picks the latest by semver.
|
||||
|
||||
### Minimal rootfs
|
||||
### System base rootfs images
|
||||
|
||||
The minimal rootfs is the base image that all other templates (Python, Node, etc.) are built on top of via device-mapper snapshots. It must contain:
|
||||
There are four built-in **system base templates** — one per distro — that all other
|
||||
templates snapshot from via device-mapper. They are platform-owned (visible to every
|
||||
team) and protected from deletion (reserved template IDs 0–1024):
|
||||
|
||||
| Template | Distro | ID |
|
||||
|----------|--------|----|
|
||||
| `minimal-ubuntu` | `ubuntu:26.04` | 0 |
|
||||
| `minimal-alpine` | `alpine:3.22` | 1 |
|
||||
| `minimal-arch` | `archlinux:base` | 2 |
|
||||
| `minimal-fedora` | `fedora:45` | 3 |
|
||||
|
||||
`minimal-ubuntu` is the default template for new sandboxes and builds. The same
|
||||
statically-linked `envd` + `tini` run on all four regardless of the distro's libc
|
||||
(glibc on Ubuntu/Arch/Fedora, musl on Alpine).
|
||||
|
||||
Each image contains these packages plus a `wrenn-user` account with passwordless `sudo`:
|
||||
|
||||
| Package | Why |
|
||||
|---------|-----|
|
||||
| `socat` | Bidirectional relay for port forwarding |
|
||||
| `chrony` | Time sync from KVM PTP clock (`/dev/ptp0`) |
|
||||
| `tini` | PID 1 zombie reaper (injected by build script, not apt) |
|
||||
| `iproute2` (`iproute` on Fedora) | `ip` for guest network setup in `wrenn-init` |
|
||||
| `tini` | PID 1 zombie reaper |
|
||||
| `sudo` | User privilege management inside the guest |
|
||||
| `wget` | HTTP fetching |
|
||||
| `curl` | HTTP client |
|
||||
| `ca-certificates` | TLS certificate verification |
|
||||
| `git` | Version control |
|
||||
|
||||
**To build a rootfs from a Docker container:**
|
||||
**To build all four images** (each spawns a distro container, installs the packages +
|
||||
`wrenn-user`, builds `envd`, injects `wrenn-init` + `tini`, and exports to the
|
||||
team-scoped path). Requires Docker + sudo:
|
||||
|
||||
1. Create and configure a container with the required packages:
|
||||
```bash
|
||||
docker run -it --name wrenn-minimal debian:bookworm bash
|
||||
# Inside the container:
|
||||
apt update && apt install -y socat chrony sudo wget curl ca-certificates
|
||||
exit
|
||||
```
|
||||
```bash
|
||||
make images
|
||||
```
|
||||
|
||||
2. Export to a rootfs image (builds envd, injects wrenn-init + tini, shrinks to minimum size):
|
||||
```bash
|
||||
sudo bash scripts/rootfs-from-container.sh wrenn-minimal minimal
|
||||
```
|
||||
Or build a single distro: `make rootfs-ubuntu` / `rootfs-alpine` / `rootfs-arch` / `rootfs-fedora`.
|
||||
|
||||
**To update an existing rootfs** after changing envd or `wrenn-init.sh`:
|
||||
**To update the images** after changing `envd` or `wrenn-init.sh` (rebuilds `envd` once,
|
||||
then re-injects `envd` + `wrenn-init` + `tini` into every system base image):
|
||||
|
||||
```bash
|
||||
bash scripts/update-minimal-rootfs.sh
|
||||
```
|
||||
|
||||
This rebuilds envd via `make build-envd` and copies the fresh binaries into the mounted rootfs image.
|
||||
|
||||
### IP forwarding
|
||||
|
||||
```bash
|
||||
|
||||
@ -228,7 +228,7 @@ func main() {
|
||||
// snapshotted state. User-initiated Pauses already running are
|
||||
// awaited by PauseAll/Destroy's lifecycleMu serialization.
|
||||
mgr.Shutdown(shutdownCtx)
|
||||
sandbox.ShrinkMinimalImage(rootDir)
|
||||
sandbox.ShrinkSystemImages(rootDir)
|
||||
if err := httpServer.Shutdown(shutdownCtx); err != nil {
|
||||
slog.Error("http server shutdown error", "error", err)
|
||||
}
|
||||
|
||||
49
db/migrations/20260522154716_seed_system_base_templates.sql
Normal file
49
db/migrations/20260522154716_seed_system_base_templates.sql
Normal file
@ -0,0 +1,49 @@
|
||||
-- +goose Up
|
||||
|
||||
-- Replace the old all-zeros "minimal" base template with the four system base
|
||||
-- templates (ubuntu/alpine/arch/fedora). All are platform-owned (team_id
|
||||
-- all-zeros) with reserved template IDs 0..3, default user wrenn-user.
|
||||
--
|
||||
-- Template IDs are well-known: the all-zeros UUID + low byte = {0,1,2,3}.
|
||||
-- On disk each lives at images/teams/{base36(0)}/{base36(id)}/rootfs.ext4.
|
||||
|
||||
-- 0 → minimal-ubuntu (was "minimal").
|
||||
UPDATE templates
|
||||
SET name = 'minimal-ubuntu',
|
||||
default_user = 'wrenn-user'
|
||||
WHERE id = '00000000-0000-0000-0000-000000000000';
|
||||
|
||||
-- Seed the row if it did not already exist (fresh DBs).
|
||||
INSERT INTO templates (id, name, type, vcpus, memory_mb, size_bytes, team_id, default_user)
|
||||
VALUES ('00000000-0000-0000-0000-000000000000', 'minimal-ubuntu', 'base', 1, 512, 0,
|
||||
'00000000-0000-0000-0000-000000000000', 'wrenn-user')
|
||||
ON CONFLICT (id) DO NOTHING;
|
||||
|
||||
-- 1 → minimal-alpine, 2 → minimal-arch, 3 → minimal-fedora.
|
||||
INSERT INTO templates (id, name, type, vcpus, memory_mb, size_bytes, team_id, default_user)
|
||||
VALUES
|
||||
('00000000-0000-0000-0000-000000000001', 'minimal-alpine', 'base', 1, 512, 0,
|
||||
'00000000-0000-0000-0000-000000000000', 'wrenn-user'),
|
||||
('00000000-0000-0000-0000-000000000002', 'minimal-arch', 'base', 1, 512, 0,
|
||||
'00000000-0000-0000-0000-000000000000', 'wrenn-user'),
|
||||
('00000000-0000-0000-0000-000000000003', 'minimal-fedora', 'base', 1, 512, 0,
|
||||
'00000000-0000-0000-0000-000000000000', 'wrenn-user')
|
||||
ON CONFLICT (id) DO NOTHING;
|
||||
|
||||
-- Point the sandboxes.template column default at the new default base template.
|
||||
ALTER TABLE sandboxes ALTER COLUMN template SET DEFAULT 'minimal-ubuntu';
|
||||
|
||||
-- +goose Down
|
||||
|
||||
ALTER TABLE sandboxes ALTER COLUMN template SET DEFAULT 'minimal';
|
||||
|
||||
DELETE FROM templates WHERE id IN (
|
||||
'00000000-0000-0000-0000-000000000001',
|
||||
'00000000-0000-0000-0000-000000000002',
|
||||
'00000000-0000-0000-0000-000000000003'
|
||||
);
|
||||
|
||||
UPDATE templates
|
||||
SET name = 'minimal',
|
||||
default_user = 'root'
|
||||
WHERE id = '00000000-0000-0000-0000-000000000000';
|
||||
@ -2,7 +2,7 @@
|
||||
name = "envd"
|
||||
version = "0.3.0"
|
||||
edition = "2024"
|
||||
rust-version = "1.88"
|
||||
rust-version = "1.95"
|
||||
|
||||
[dependencies]
|
||||
# Async runtime
|
||||
|
||||
@ -128,13 +128,15 @@ src/
|
||||
After building the static binary, copy it into the rootfs:
|
||||
|
||||
```bash
|
||||
bash scripts/update-debug-rootfs.sh [rootfs_path]
|
||||
bash scripts/update-minimal-rootfs.sh [rootfs_path]
|
||||
```
|
||||
|
||||
Or manually:
|
||||
With no argument it updates all four system base images; pass a path to target one.
|
||||
|
||||
Or manually (example path: the minimal-ubuntu image, platform team + template id 0):
|
||||
|
||||
```bash
|
||||
sudo mount -o loop /var/lib/wrenn/images/minimal.ext4 /mnt
|
||||
sudo cp target/x86_64-unknown-linux-musl/release/envd /mnt/usr/bin/envd
|
||||
sudo mount -o loop /var/lib/wrenn/images/teams/0000000000000000000000000/0000000000000000000000000/rootfs.ext4 /mnt
|
||||
sudo cp target/x86_64-unknown-linux-musl/release/envd /mnt/usr/local/bin/envd
|
||||
sudo umount /mnt
|
||||
```
|
||||
|
||||
@ -18,7 +18,9 @@ export async function destroyAdminCapsule(id: string): Promise<ApiResult<void>>
|
||||
return apiFetch('DELETE', `/api/v1/admin/capsules/${id}`);
|
||||
}
|
||||
|
||||
export async function snapshotAdminCapsule(id: string, name?: string): Promise<ApiResult<Snapshot>> {
|
||||
// Async: returns 202 with the capsule now in the "snapshotting" state. The
|
||||
// template lands later (watch template.snapshot.create or poll templates).
|
||||
export async function snapshotAdminCapsule(id: string, name?: string): Promise<ApiResult<Capsule>> {
|
||||
return apiFetch('POST', `/api/v1/admin/capsules/${id}/snapshot`, { name });
|
||||
}
|
||||
|
||||
@ -35,6 +37,7 @@ export async function listPlatformTemplates(): Promise<ApiResult<Snapshot[]>> {
|
||||
size_bytes: t.size_bytes,
|
||||
created_at: t.created_at,
|
||||
platform: true,
|
||||
protected: t.protected,
|
||||
}));
|
||||
return { ok: true, data: snapshots };
|
||||
}
|
||||
|
||||
@ -97,6 +97,8 @@ export type AdminTemplate = {
|
||||
size_bytes: number;
|
||||
team_id: string;
|
||||
created_at: string;
|
||||
/** True for built-in system base templates, which cannot be deleted. */
|
||||
protected: boolean;
|
||||
};
|
||||
|
||||
export async function listAdminTemplates(): Promise<ApiResult<AdminTemplate[]>> {
|
||||
|
||||
@ -8,6 +8,7 @@ export type CapsuleStatus =
|
||||
| 'running'
|
||||
| 'pausing'
|
||||
| 'paused'
|
||||
| 'snapshotting'
|
||||
| 'resuming'
|
||||
| 'stopping'
|
||||
| 'hibernated'
|
||||
@ -26,6 +27,7 @@ export const TRANSIENT_STATUSES: ReadonlySet<CapsuleStatus> = new Set([
|
||||
'pending',
|
||||
'starting',
|
||||
'pausing',
|
||||
'snapshotting',
|
||||
'resuming',
|
||||
'stopping'
|
||||
]);
|
||||
@ -88,9 +90,14 @@ export type Snapshot = {
|
||||
size_bytes: number;
|
||||
created_at: string;
|
||||
platform: boolean;
|
||||
/** True for built-in system base templates, which cannot be deleted. */
|
||||
protected?: boolean;
|
||||
};
|
||||
|
||||
export async function createSnapshot(capsuleId: string, name?: string): Promise<ApiResult<Snapshot>> {
|
||||
// Snapshots are async: the call returns 202 with the capsule now in the
|
||||
// "snapshotting" state. The resulting template arrives later via the
|
||||
// template.snapshot.create SSE event (or by polling listSnapshots).
|
||||
export async function createSnapshot(capsuleId: string, name?: string): Promise<ApiResult<Capsule>> {
|
||||
return apiFetch('POST', '/api/v1/snapshots', { sandbox_id: capsuleId, name });
|
||||
}
|
||||
|
||||
|
||||
@ -11,7 +11,7 @@
|
||||
};
|
||||
let { open, onclose, oncreated, templateSource = 'team' }: Props = $props();
|
||||
|
||||
let createForm = $state<CreateCapsuleParams>({ template: 'minimal', vcpus: 1, memory_mb: 512, timeout_sec: 0 });
|
||||
let createForm = $state<CreateCapsuleParams>({ template: 'minimal-ubuntu', vcpus: 1, memory_mb: 512, timeout_sec: 0 });
|
||||
let creating = $state(false);
|
||||
let createError = $state<string | null>(null);
|
||||
|
||||
@ -120,8 +120,8 @@
|
||||
const creator = templateSource === 'platform' ? createAdminCapsule : createCapsule;
|
||||
const result = await creator(createForm);
|
||||
if (result.ok) {
|
||||
createForm = { template: 'minimal', vcpus: 1, memory_mb: 512, timeout_sec: 0 };
|
||||
templateQuery = 'minimal';
|
||||
createForm = { template: 'minimal-ubuntu', vcpus: 1, memory_mb: 512, timeout_sec: 0 };
|
||||
templateQuery = 'minimal-ubuntu';
|
||||
onclose();
|
||||
oncreated?.(result.data);
|
||||
} else {
|
||||
|
||||
@ -1,13 +1,38 @@
|
||||
<script lang="ts">
|
||||
import { createSnapshot } from '$lib/api/capsules';
|
||||
import type { Snippet } from 'svelte';
|
||||
import { createSnapshot, type Capsule } from '$lib/api/capsules';
|
||||
import type { ApiResult } from '$lib/api/client';
|
||||
|
||||
type SnapshotFn = (capsuleId: string, name?: string) => Promise<ApiResult<Capsule>>;
|
||||
|
||||
type Props = {
|
||||
open: boolean;
|
||||
capsuleId: string;
|
||||
onclose: () => void;
|
||||
onsnapshot?: () => void;
|
||||
onsnapshot?: (capsule: Capsule) => void;
|
||||
title?: string;
|
||||
label?: string;
|
||||
placeholder?: string;
|
||||
hint?: string;
|
||||
confirmLabel?: string;
|
||||
pendingLabel?: string;
|
||||
snapshotFn?: SnapshotFn;
|
||||
description?: Snippet;
|
||||
};
|
||||
let { open, capsuleId, onclose, onsnapshot }: Props = $props();
|
||||
let {
|
||||
open,
|
||||
capsuleId,
|
||||
onclose,
|
||||
onsnapshot,
|
||||
title = 'Capture snapshot',
|
||||
label = 'Snapshot name',
|
||||
placeholder = 'e.g. after-apt-install, pre-migration',
|
||||
hint = 'Leave blank to use an auto-generated name.',
|
||||
confirmLabel = 'Start snapshot',
|
||||
pendingLabel = 'Starting...',
|
||||
snapshotFn = createSnapshot,
|
||||
description
|
||||
}: Props = $props();
|
||||
|
||||
let snapshotName = $state('');
|
||||
let snapshotting = $state(false);
|
||||
@ -21,14 +46,14 @@
|
||||
async function handleConfirm() {
|
||||
snapshotting = true;
|
||||
error = null;
|
||||
const result = await createSnapshot(capsuleId, snapshotName.trim() || undefined);
|
||||
const result = await snapshotFn(capsuleId, snapshotName.trim() || undefined);
|
||||
if (!result.ok) {
|
||||
error = result.error;
|
||||
snapshotting = false;
|
||||
return;
|
||||
}
|
||||
reset();
|
||||
onsnapshot?.();
|
||||
onsnapshot?.(result.data);
|
||||
onclose();
|
||||
snapshotting = false;
|
||||
}
|
||||
@ -41,6 +66,10 @@
|
||||
}
|
||||
</script>
|
||||
|
||||
{#snippet defaultDescription()}
|
||||
<p class="text-ui text-[var(--color-text-tertiary)]">The capsule moves to a <span class="font-mono text-[var(--color-blue)]">snapshotting</span> state while its memory and disk are written to a new template, then returns to running. This runs in the background; you'll be notified when it completes.</p>
|
||||
{/snippet}
|
||||
|
||||
{#if open}
|
||||
<div class="fixed inset-0 z-50 flex items-center justify-center">
|
||||
<!-- svelte-ignore a11y_no_static_element_interactions -->
|
||||
@ -59,13 +88,13 @@
|
||||
</svg>
|
||||
</div>
|
||||
<div>
|
||||
<h2 class="font-serif text-heading text-[var(--color-text-bright)]">Capture snapshot</h2>
|
||||
<h2 class="font-serif text-heading text-[var(--color-text-bright)]">{title}</h2>
|
||||
<p class="mt-0.5 text-meta text-[var(--color-text-muted)] font-mono">{capsuleId}</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="px-6 pt-5 pb-6 space-y-4">
|
||||
<p class="text-ui text-[var(--color-text-tertiary)]">Live snapshot: the capsule briefly pauses, its memory + disk are written to a new template, then the capsule resumes — your session keeps running.</p>
|
||||
{@render (description ?? defaultDescription)()}
|
||||
|
||||
{#if error}
|
||||
<div class="rounded-[var(--radius-input)] border border-[var(--color-red)]/30 bg-[var(--color-red)]/5 px-3 py-2 text-meta text-[var(--color-red)]">
|
||||
@ -75,7 +104,7 @@
|
||||
|
||||
<div>
|
||||
<div class="mb-1.5 flex items-baseline justify-between">
|
||||
<label class="text-label font-semibold uppercase tracking-[0.05em] text-[var(--color-text-tertiary)]" for="snapshot-name">Snapshot name</label>
|
||||
<label class="text-label font-semibold uppercase tracking-[0.05em] text-[var(--color-text-tertiary)]" for="snapshot-name">{label}</label>
|
||||
<span class="text-meta text-[var(--color-text-muted)]">optional</span>
|
||||
</div>
|
||||
<input
|
||||
@ -84,10 +113,10 @@
|
||||
bind:value={snapshotName}
|
||||
disabled={snapshotting}
|
||||
class="w-full rounded-[var(--radius-input)] border border-[var(--color-border)] bg-[var(--color-bg-4)] px-3 py-2 font-mono text-ui text-[var(--color-text-bright)] outline-none placeholder:text-[var(--color-text-muted)] transition-colors duration-150 focus:border-[var(--color-accent)] disabled:opacity-50"
|
||||
placeholder="e.g. after-apt-install, pre-migration"
|
||||
placeholder={placeholder}
|
||||
onkeydown={(e) => { if (e.key === 'Enter' && !snapshotting) handleConfirm(); }}
|
||||
/>
|
||||
<p class="mt-1.5 text-meta text-[var(--color-text-muted)]">Leave blank to use an auto-generated name.</p>
|
||||
<p class="mt-1.5 text-meta text-[var(--color-text-muted)]">{hint}</p>
|
||||
</div>
|
||||
|
||||
<div class="flex justify-end gap-3 pt-1">
|
||||
@ -107,9 +136,9 @@
|
||||
<svg class="animate-spin" width="13" height="13" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
||||
<path d="M21 12a9 9 0 1 1-6.219-8.56" />
|
||||
</svg>
|
||||
Capturing...
|
||||
{pendingLabel}
|
||||
{:else}
|
||||
Capture snapshot
|
||||
{confirmLabel}
|
||||
{/if}
|
||||
</button>
|
||||
</div>
|
||||
|
||||
39
frontend/src/lib/lifecycle-toasts.ts
Normal file
39
frontend/src/lib/lifecycle-toasts.ts
Normal file
@ -0,0 +1,39 @@
|
||||
import type { SSEEvent } from '$lib/api/events';
|
||||
import { toast } from '$lib/toast.svelte';
|
||||
|
||||
// Terminal copy per lifecycle verb. Success and failure are paired so the two
|
||||
// can never drift apart.
|
||||
const VERBS: Record<string, { done: string; failed: string }> = {
|
||||
'capsule.create': { done: 'Capsule created', failed: 'Capsule failed to start' },
|
||||
'capsule.pause': { done: 'Capsule paused', failed: 'Capsule failed to pause' },
|
||||
'capsule.resume': { done: 'Capsule resumed', failed: 'Capsule failed to resume' },
|
||||
'capsule.destroy': { done: 'Capsule destroyed', failed: 'Capsule failed to destroy' }
|
||||
};
|
||||
|
||||
/**
|
||||
* Surfaces lifecycle outcomes as toasts. Only system-actor events with an
|
||||
* outcome are terminal: the user-actor events published at request-accept time
|
||||
* carry a premature outcome (the operation has only been accepted, not yet
|
||||
* completed) and are skipped, so each operation toasts exactly once.
|
||||
*/
|
||||
export function lifecycleToast(event: SSEEvent): void {
|
||||
if (event.actor?.type !== 'system' || !event.outcome) return;
|
||||
|
||||
if (event.event === 'template.snapshot.create') {
|
||||
const name = event.resource?.id;
|
||||
if (event.outcome === 'success') {
|
||||
toast.success(name ? `Snapshot "${name}" captured` : 'Snapshot captured');
|
||||
} else {
|
||||
toast.error(event.error ? `Snapshot failed: ${event.error}` : 'Snapshot failed');
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
const verb = VERBS[event.event];
|
||||
if (!verb) return;
|
||||
if (event.outcome === 'success') {
|
||||
toast.success(verb.done);
|
||||
} else {
|
||||
toast.error(event.error ? `${verb.failed}: ${event.error}` : verb.failed);
|
||||
}
|
||||
}
|
||||
@ -6,6 +6,7 @@
|
||||
import FilesTab from '$lib/components/FilesTab.svelte';
|
||||
import MetricsPanel from '$lib/components/MetricsPanel.svelte';
|
||||
import DestroyDialog from '$lib/components/DestroyDialog.svelte';
|
||||
import SnapshotDialog from '$lib/components/SnapshotDialog.svelte';
|
||||
import CopyButton from '$lib/components/CopyButton.svelte';
|
||||
import { toast } from '$lib/toast.svelte';
|
||||
import {
|
||||
@ -29,9 +30,6 @@
|
||||
|
||||
// Snapshot dialog
|
||||
let showSnapshot = $state(false);
|
||||
let snapshotName = $state('');
|
||||
let snapshotting = $state(false);
|
||||
let snapshotError = $state<string | null>(null);
|
||||
|
||||
const metricsAvailable = $derived(
|
||||
capsule?.status === 'running' || capsule?.status === 'paused'
|
||||
@ -58,28 +56,12 @@
|
||||
capsuleLoading = false;
|
||||
}
|
||||
|
||||
async function handleSnapshot() {
|
||||
snapshotting = true;
|
||||
snapshotError = null;
|
||||
const result = await snapshotAdminCapsule(capsuleId, snapshotName.trim() || undefined);
|
||||
if (result.ok) {
|
||||
toast.success(`Snapshot "${result.data.name}" created`);
|
||||
showSnapshot = false;
|
||||
snapshotName = '';
|
||||
// Capsule keeps running after a live snapshot; refresh local state.
|
||||
void loadCapsule();
|
||||
} else {
|
||||
snapshotError = result.error;
|
||||
}
|
||||
snapshotting = false;
|
||||
}
|
||||
|
||||
function statusColor(status: string): string {
|
||||
switch (status) {
|
||||
case 'running': return 'var(--color-accent)';
|
||||
case 'paused': case 'hibernated': return 'var(--color-amber)';
|
||||
case 'error': return 'var(--color-red)';
|
||||
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'stopping':
|
||||
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
|
||||
return 'var(--color-blue)';
|
||||
default: return 'var(--color-text-muted)';
|
||||
}
|
||||
@ -90,7 +72,7 @@
|
||||
case 'running': return 'rgba(94,140,88,0.12)';
|
||||
case 'paused': case 'hibernated': return 'rgba(212,167,60,0.12)';
|
||||
case 'error': return 'rgba(207,129,114,0.12)';
|
||||
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'stopping':
|
||||
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
|
||||
return 'rgba(90,159,212,0.12)';
|
||||
default: return 'rgba(255,255,255,0.05)';
|
||||
}
|
||||
@ -101,7 +83,7 @@
|
||||
case 'running': return 'rgba(94,140,88,0.3)';
|
||||
case 'paused': case 'hibernated': return 'rgba(212,167,60,0.3)';
|
||||
case 'error': return 'rgba(207,129,114,0.3)';
|
||||
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'stopping':
|
||||
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
|
||||
return 'rgba(90,159,212,0.3)';
|
||||
default: return 'rgba(255,255,255,0.08)';
|
||||
}
|
||||
@ -211,8 +193,7 @@
|
||||
<div class="ml-auto flex items-center gap-2">
|
||||
{#if canSnapshot}
|
||||
<button
|
||||
onclick={() => { showSnapshot = true; snapshotName = ''; snapshotError = null; }}
|
||||
disabled={snapshotting}
|
||||
onclick={() => { showSnapshot = true; }}
|
||||
class="flex items-center gap-1.5 rounded-[var(--radius-button)] border border-[var(--color-accent)]/30 bg-[var(--color-accent)]/8 px-3 py-1.5 text-meta font-medium text-[var(--color-accent-bright)] transition-all duration-150 hover:bg-[var(--color-accent)]/15 hover:border-[var(--color-accent)]/50 disabled:opacity-50"
|
||||
>
|
||||
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.75" stroke-linecap="round" stroke-linejoin="round"><path d="M14.5 4h-5L7 7H2v13a2 2 0 002 2h16a2 2 0 002-2V7h-5l-2.5-3z" /><circle cx="12" cy="15" r="3" /></svg>
|
||||
@ -270,83 +251,24 @@
|
||||
</footer>
|
||||
</main>
|
||||
|
||||
<!-- Snapshot dialog -->
|
||||
{#if showSnapshot}
|
||||
<div class="fixed inset-0 z-50 flex items-center justify-center">
|
||||
<!-- svelte-ignore a11y_no_static_element_interactions -->
|
||||
<div
|
||||
class="absolute inset-0 bg-black/60"
|
||||
onclick={() => { if (!snapshotting) showSnapshot = false; }}
|
||||
onkeydown={(e) => { if (e.key === 'Escape' && !snapshotting) showSnapshot = false; }}
|
||||
></div>
|
||||
{#snippet adminSnapshotDescription()}
|
||||
<p class="text-ui text-[var(--color-text-tertiary)]">The capsule moves to a <span class="font-mono text-[var(--color-blue)]">snapshotting</span> state while its memory and disk are written to a new platform template available to all teams, then returns to running. This runs in the background.</p>
|
||||
{/snippet}
|
||||
|
||||
<div class="relative w-full max-w-[420px] rounded-[var(--radius-card)] border border-[var(--color-border-mid)] bg-[var(--color-bg-2)] overflow-hidden" style="animation: fadeUp 0.2s ease both; box-shadow: var(--shadow-dialog)">
|
||||
<div class="flex items-center gap-4 border-b border-[var(--color-border)] bg-[var(--color-bg-3)] px-6 py-5">
|
||||
<div class="flex h-10 w-10 shrink-0 items-center justify-center rounded-[var(--radius-input)] bg-[var(--color-accent)]/15 text-[var(--color-accent)] shadow-[0_0_12px_var(--color-accent-glow)]">
|
||||
<svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.75" stroke-linecap="round" stroke-linejoin="round">
|
||||
<path d="M14.5 4h-5L7 7H2v13a2 2 0 002 2h16a2 2 0 002-2V7h-5l-2.5-3z" />
|
||||
<circle cx="12" cy="15" r="3" />
|
||||
</svg>
|
||||
</div>
|
||||
<div>
|
||||
<h2 class="font-serif text-heading text-[var(--color-text-bright)]">Snapshot as platform template</h2>
|
||||
<p class="mt-0.5 text-meta text-[var(--color-text-muted)] font-mono">{capsuleId}</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="px-6 pt-5 pb-6 space-y-4">
|
||||
<p class="text-ui text-[var(--color-text-tertiary)]">Live snapshot: the capsule briefly pauses, its memory + disk are written to a new platform template available to all teams, then the capsule resumes — your session keeps running.</p>
|
||||
|
||||
{#if snapshotError}
|
||||
<div class="rounded-[var(--radius-input)] border border-[var(--color-red)]/30 bg-[var(--color-red)]/5 px-3 py-2 text-meta text-[var(--color-red)]">
|
||||
{snapshotError}
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<div>
|
||||
<div class="mb-1.5 flex items-baseline justify-between">
|
||||
<label class="text-label font-semibold uppercase tracking-[0.05em] text-[var(--color-text-tertiary)]" for="admin-snapshot-name">Template name</label>
|
||||
<span class="text-meta text-[var(--color-text-muted)]">optional</span>
|
||||
</div>
|
||||
<input
|
||||
id="admin-snapshot-name"
|
||||
type="text"
|
||||
bind:value={snapshotName}
|
||||
disabled={snapshotting}
|
||||
class="w-full rounded-[var(--radius-input)] border border-[var(--color-border)] bg-[var(--color-bg-4)] px-3 py-2 font-mono text-ui text-[var(--color-text-bright)] outline-none placeholder:text-[var(--color-text-muted)] transition-colors duration-150 focus:border-[var(--color-accent)] disabled:opacity-50"
|
||||
placeholder="e.g. python-3.12, node-22-dev"
|
||||
onkeydown={(e) => { if (e.key === 'Enter' && !snapshotting) handleSnapshot(); }}
|
||||
/>
|
||||
<p class="mt-1.5 text-meta text-[var(--color-text-muted)]">Leave blank for an auto-generated name. If the name already exists, it will be overwritten.</p>
|
||||
</div>
|
||||
|
||||
<div class="flex justify-end gap-3 pt-1">
|
||||
<button
|
||||
onclick={() => { showSnapshot = false; }}
|
||||
disabled={snapshotting}
|
||||
class="rounded-[var(--radius-button)] border border-[var(--color-border)] px-4 py-2 text-ui text-[var(--color-text-secondary)] transition-colors duration-150 hover:border-[var(--color-border-mid)] hover:text-[var(--color-text-primary)] disabled:opacity-50"
|
||||
>
|
||||
Cancel
|
||||
</button>
|
||||
<button
|
||||
onclick={handleSnapshot}
|
||||
disabled={snapshotting}
|
||||
class="flex items-center gap-2 rounded-[var(--radius-button)] bg-[var(--color-accent)] px-5 py-2 text-ui font-semibold text-white transition-all duration-150 hover:brightness-115 hover:-translate-y-px active:translate-y-0 disabled:opacity-50 disabled:hover:translate-y-0"
|
||||
>
|
||||
{#if snapshotting}
|
||||
<svg class="animate-spin" width="13" height="13" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
|
||||
<path d="M21 12a9 9 0 1 1-6.219-8.56" />
|
||||
</svg>
|
||||
Snapshotting...
|
||||
{:else}
|
||||
Snapshot
|
||||
{/if}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{/if}
|
||||
<SnapshotDialog
|
||||
open={showSnapshot}
|
||||
{capsuleId}
|
||||
onclose={() => { showSnapshot = false; }}
|
||||
onsnapshot={(updated) => { toast.success('Snapshot started'); capsule = updated; }}
|
||||
snapshotFn={snapshotAdminCapsule}
|
||||
title="Snapshot as platform template"
|
||||
label="Template name"
|
||||
placeholder="e.g. python-3.12, node-22-dev"
|
||||
hint="Leave blank for an auto-generated name. Each snapshot needs a unique name."
|
||||
confirmLabel="Snapshot"
|
||||
pendingLabel="Snapshotting..."
|
||||
description={adminSnapshotDescription}
|
||||
/>
|
||||
|
||||
<DestroyDialog
|
||||
open={showDestroy}
|
||||
|
||||
@ -44,7 +44,7 @@
|
||||
let showCreate = $state(false);
|
||||
let createForm = $state({
|
||||
name: '',
|
||||
base_template: 'minimal',
|
||||
base_template: 'minimal-ubuntu',
|
||||
vcpus: 1,
|
||||
memory_mb: 512,
|
||||
recipe: '',
|
||||
@ -80,7 +80,7 @@
|
||||
const PLATFORM_TEAM_ID = 'team-0000000000000000000000000';
|
||||
|
||||
function canDeleteTemplate(tmpl: AdminTemplate): boolean {
|
||||
if (tmpl.name === 'minimal') return false;
|
||||
if (tmpl.protected) return false;
|
||||
return tmpl.team_id === PLATFORM_TEAM_ID;
|
||||
}
|
||||
|
||||
@ -140,7 +140,7 @@
|
||||
|
||||
const result = await createBuild({
|
||||
name: createForm.name.trim(),
|
||||
base_template: createForm.base_template.trim() || 'minimal',
|
||||
base_template: createForm.base_template.trim() || 'minimal-ubuntu',
|
||||
recipe: lines,
|
||||
healthcheck: createForm.healthcheck.trim() || undefined,
|
||||
vcpus: createForm.vcpus,
|
||||
@ -152,7 +152,7 @@
|
||||
|
||||
if (result.ok) {
|
||||
showCreate = false;
|
||||
createForm = { name: '', base_template: 'minimal', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null };
|
||||
createForm = { name: '', base_template: 'minimal-ubuntu', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null };
|
||||
toast.success('Build queued');
|
||||
goto(`/admin/templates/builds/${result.data.id}`);
|
||||
} else {
|
||||
@ -246,7 +246,7 @@
|
||||
</p>
|
||||
</div>
|
||||
<button
|
||||
onclick={() => { showCreate = true; createError = null; createForm = { name: '', base_template: 'minimal', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null }; }}
|
||||
onclick={() => { showCreate = true; createError = null; createForm = { name: '', base_template: 'minimal-ubuntu', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null }; }}
|
||||
class="group flex items-center gap-2.5 rounded-[var(--radius-button)] bg-[var(--color-accent)] px-5 py-2.5 text-ui font-semibold text-white shadow-sm transition-all duration-200 hover:shadow-[0_0_20px_var(--color-accent-glow-mid)] hover:brightness-115 hover:-translate-y-px active:translate-y-0"
|
||||
>
|
||||
<svg width="13" height="13" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round" class="transition-transform duration-200 group-hover:rotate-90"><line x1="12" y1="5" x2="12" y2="19"/><line x1="5" y1="12" x2="19" y2="12"/></svg>
|
||||
@ -397,7 +397,7 @@
|
||||
</p>
|
||||
{#if type === 'templates'}
|
||||
<button
|
||||
onclick={() => { showCreate = true; createError = null; createForm = { name: '', base_template: 'minimal', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null }; }}
|
||||
onclick={() => { showCreate = true; createError = null; createForm = { name: '', base_template: 'minimal-ubuntu', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null }; }}
|
||||
class="mt-6 flex items-center gap-2 rounded-[var(--radius-button)] border border-[var(--color-accent)]/30 bg-[var(--color-accent)]/10 px-4 py-2 text-ui font-medium text-[var(--color-accent-bright)] transition-all duration-200 hover:bg-[var(--color-accent)]/20 hover:border-[var(--color-accent)]/50"
|
||||
>
|
||||
<svg width="13" height="13" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round"><line x1="12" y1="5" x2="12" y2="19"/><line x1="5" y1="12" x2="19" y2="12"/></svg>
|
||||
@ -476,7 +476,7 @@
|
||||
<button
|
||||
onclick={() => { deleteTarget = tmpl; deleteError = null; }}
|
||||
disabled={!canDeleteTemplate(tmpl)}
|
||||
title={tmpl.name === 'minimal' ? 'The minimal template cannot be deleted' : !canDeleteTemplate(tmpl) ? 'Cannot delete templates owned by other teams' : undefined}
|
||||
title={tmpl.protected ? 'System base templates cannot be deleted' : !canDeleteTemplate(tmpl) ? 'Cannot delete templates owned by other teams' : undefined}
|
||||
class="rounded-[var(--radius-button)] px-3 py-1.5 text-meta transition-all duration-150 {canDeleteTemplate(tmpl)
|
||||
? 'text-[var(--color-text-tertiary)] hover:bg-[var(--color-red)]/10 hover:text-[var(--color-red)]'
|
||||
: 'text-[var(--color-text-muted)] cursor-not-allowed opacity-40'}"
|
||||
|
||||
@ -2,7 +2,8 @@
|
||||
import { onMount } from 'svelte';
|
||||
import Sidebar from '$lib/components/Sidebar.svelte';
|
||||
import Toaster from '$lib/components/Toaster.svelte';
|
||||
import { startSSE, stopSSE } from '$lib/sse.svelte';
|
||||
import { startSSE, stopSSE, subscribeSSE } from '$lib/sse.svelte';
|
||||
import { lifecycleToast } from '$lib/lifecycle-toasts';
|
||||
let { children } = $props();
|
||||
|
||||
let collapsed = $state(
|
||||
@ -13,7 +14,13 @@
|
||||
|
||||
onMount(() => {
|
||||
startSSE();
|
||||
return () => stopSSE();
|
||||
// Lifecycle toasts live at the layout so they fire regardless of which
|
||||
// dashboard page is open (and survive navigation between them).
|
||||
const unsubscribe = subscribeSSE(lifecycleToast);
|
||||
return () => {
|
||||
unsubscribe();
|
||||
stopSSE();
|
||||
};
|
||||
});
|
||||
</script>
|
||||
|
||||
|
||||
@ -395,7 +395,7 @@
|
||||
</div>
|
||||
{:else}
|
||||
{#each filteredCapsules as capsule, i (capsule.id)}
|
||||
{@const isTransient = ['starting', 'resuming', 'pausing', 'stopping'].includes(capsule.status)}
|
||||
{@const isTransient = ['starting', 'resuming', 'pausing', 'snapshotting', 'stopping'].includes(capsule.status)}
|
||||
{@const stripeColor = capsule.status === 'running' ? 'bg-[var(--color-accent)]' : (capsule.status === 'paused' || capsule.status === 'hibernated') ? 'bg-[var(--color-amber)]' : isTransient ? 'bg-[var(--color-blue)]' : 'bg-[var(--color-text-muted)]'}
|
||||
<div
|
||||
class="capsule-row relative grid grid-cols-[1.6fr_0.8fr_0.5fr_0.5fr_0.6fr_1fr_0.9fr] items-center overflow-hidden border-b border-[var(--color-border)] transition-colors duration-150 hover:bg-[var(--color-bg-3)] last:border-b-0 {newCapsuleId === capsule.id ? 'capsule-born' : ''}"
|
||||
|
||||
@ -434,7 +434,7 @@
|
||||
case 'running': return 'var(--color-accent)';
|
||||
case 'paused': return 'var(--color-amber)';
|
||||
case 'error': return 'var(--color-red)';
|
||||
case 'starting': case 'resuming': case 'pausing': case 'stopping':
|
||||
case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
|
||||
return 'var(--color-blue)';
|
||||
default: return 'var(--color-text-muted)';
|
||||
}
|
||||
@ -445,7 +445,7 @@
|
||||
case 'running': return 'rgba(94,140,88,0.12)';
|
||||
case 'paused': return 'rgba(212,167,60,0.12)';
|
||||
case 'error': return 'rgba(207,129,114,0.12)';
|
||||
case 'starting': case 'resuming': case 'pausing': case 'stopping':
|
||||
case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
|
||||
return 'rgba(90,159,212,0.12)';
|
||||
default: return 'rgba(255,255,255,0.05)';
|
||||
}
|
||||
@ -456,7 +456,7 @@
|
||||
case 'running': return 'rgba(94,140,88,0.3)';
|
||||
case 'paused': return 'rgba(212,167,60,0.3)';
|
||||
case 'error': return 'rgba(207,129,114,0.3)';
|
||||
case 'starting': case 'resuming': case 'pausing': case 'stopping':
|
||||
case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
|
||||
return 'rgba(90,159,212,0.3)';
|
||||
default: return 'rgba(255,255,255,0.08)';
|
||||
}
|
||||
|
||||
17
images/build-alpine.sh
Executable file
17
images/build-alpine.sh
Executable file
@ -0,0 +1,17 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# build-alpine.sh — Build the minimal-alpine system base rootfs (template id 1).
|
||||
#
|
||||
# Usage: bash images/build-alpine.sh
|
||||
|
||||
set -euo pipefail
|
||||
source "$(cd "$(dirname "$0")" && pwd)/build-common.sh"
|
||||
|
||||
# Alpine is musl-based: the static envd + static tini run fine. bash is added so
|
||||
# wrenn-user has a familiar login shell; wrenn-init itself only needs /bin/sh.
|
||||
PREP="set -e
|
||||
apk add --no-cache socat chrony sudo wget curl ca-certificates git iproute2 tini bash
|
||||
adduser -D wrenn-user
|
||||
${WRENN_SUDOERS_SETUP}"
|
||||
|
||||
build_system_rootfs "alpine:3.22" 1 "${PREP}"
|
||||
20
images/build-arch.sh
Executable file
20
images/build-arch.sh
Executable file
@ -0,0 +1,20 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# build-arch.sh — Build the minimal-arch system base rootfs (template id 2).
|
||||
#
|
||||
# Arch is rolling-release; archlinux:base is the minimal base group.
|
||||
#
|
||||
# Usage: bash images/build-arch.sh
|
||||
|
||||
set -euo pipefail
|
||||
source "$(cd "$(dirname "$0")" && pwd)/build-common.sh"
|
||||
|
||||
# tini is AUR-only on Arch (not in core/extra), so it is not installed here —
|
||||
# rootfs-from-container.sh injects the static tini binary instead.
|
||||
PREP="set -e
|
||||
pacman -Sy --noconfirm --needed socat chrony sudo wget curl ca-certificates git iproute2 inetutils
|
||||
useradd -m -s /bin/bash wrenn-user
|
||||
${WRENN_SUDOERS_SETUP}
|
||||
pacman -Scc --noconfirm || true"
|
||||
|
||||
build_system_rootfs "archlinux:base" 2 "${PREP}"
|
||||
59
images/build-common.sh
Executable file
59
images/build-common.sh
Executable file
@ -0,0 +1,59 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# build-common.sh — shared helpers for building the system base rootfs images.
|
||||
#
|
||||
# Sourced by images/build-{ubuntu,alpine,arch,fedora}.sh. Each caller defines
|
||||
# the distro base image, reserved template ID, and the in-container prep snippet
|
||||
# (install packages + create wrenn-user), then calls build_system_rootfs.
|
||||
#
|
||||
# The same statically-linked envd + tini run on every distro; the per-OS prep
|
||||
# only differs in the package manager and the user-creation command.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# base36(all-zeros UUID) = the platform team that owns every system base
|
||||
# template. Must match id.PlatformTeamID / id.UUIDToBase36 on the Go side.
|
||||
PLATFORM_TEAM_B36="0000000000000000000000000"
|
||||
|
||||
# WRENN_SUDOERS_SETUP grants wrenn-user passwordless sudo. Identical on every
|
||||
# distro; appended to each prep snippet after the user is created.
|
||||
WRENN_SUDOERS_SETUP='echo "wrenn-user ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/wrenn-user && chmod 0440 /etc/sudoers.d/wrenn-user'
|
||||
|
||||
# build_system_rootfs <base_image> <template_id_int> <prep_snippet>
|
||||
#
|
||||
# Spawns a throwaway container from base_image, runs prep_snippet inside it,
|
||||
# then exports it to the system base template's on-disk path
|
||||
# (images/teams/<platform>/<base36(id)>/rootfs.ext4) via rootfs-from-container.sh.
|
||||
build_system_rootfs() {
|
||||
local base_image="$1" template_id="$2" prep="$3"
|
||||
local script_dir project_root container dest tmpl_b36
|
||||
|
||||
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
project_root="$(cd "${script_dir}/.." && pwd)"
|
||||
container="wrenn-build-${template_id}-$$"
|
||||
|
||||
# base36(template_id). System IDs are single-digit (0-3), so base36 equals
|
||||
# the decimal digit and the 25-char zero-padded decimal matches what
|
||||
# id.UUIDToBase36 produces for these well-known IDs.
|
||||
tmpl_b36="$(printf '%025d' "${template_id}")"
|
||||
dest="teams/${PLATFORM_TEAM_B36}/${tmpl_b36}"
|
||||
|
||||
echo "==> Pulling ${base_image}..."
|
||||
docker pull "${base_image}"
|
||||
|
||||
echo "==> Preparing container ${container}..."
|
||||
docker rm -f "${container}" >/dev/null 2>&1 || true
|
||||
|
||||
# Arm cleanup before starting the container so a failed run still removes it.
|
||||
# Expand the name into the trap now: it must survive after this function's
|
||||
# locals go out of scope (set -u would error on a stale reference otherwise).
|
||||
trap "docker rm -f '${container}' >/dev/null 2>&1 || true" EXIT
|
||||
|
||||
docker run --name "${container}" "${base_image}" /bin/sh -c "${prep}"
|
||||
|
||||
# Run the exporter as the normal user, NOT under sudo: it builds envd via
|
||||
# `make build-envd` (needs cargo on the user's PATH) and uses sudo itself
|
||||
# for the privileged mount/mkfs/copy steps.
|
||||
echo "==> Exporting to images/${dest}/rootfs.ext4..."
|
||||
bash "${project_root}/scripts/rootfs-from-container.sh" "${container}" "${dest}"
|
||||
}
|
||||
19
images/build-fedora.sh
Executable file
19
images/build-fedora.sh
Executable file
@ -0,0 +1,19 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# build-fedora.sh — Build the minimal-fedora system base rootfs (template id 3).
|
||||
#
|
||||
# Usage: bash images/build-fedora.sh
|
||||
|
||||
set -euo pipefail
|
||||
source "$(cd "$(dirname "$0")" && pwd)/build-common.sh"
|
||||
|
||||
# Fedora's iproute package provides `ip` (no "2" suffix, unlike Debian/Arch).
|
||||
PREP="set -e
|
||||
# install_weak_deps=False keeps the image lean. The guest never runs systemd:
|
||||
# PID 1 is wrenn-init -> tini -> envd.
|
||||
dnf install -y --setopt=install_weak_deps=False socat chrony sudo wget curl ca-certificates git iproute hostname tini
|
||||
useradd -m -s /bin/bash wrenn-user
|
||||
${WRENN_SUDOERS_SETUP}
|
||||
dnf clean all"
|
||||
|
||||
build_system_rootfs "fedora:45" 3 "${PREP}"
|
||||
25
images/build-ubuntu.sh
Executable file
25
images/build-ubuntu.sh
Executable file
@ -0,0 +1,25 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# build-ubuntu.sh — Build the minimal-ubuntu system base rootfs (template id 0).
|
||||
#
|
||||
# Usage: bash images/build-ubuntu.sh
|
||||
|
||||
set -euo pipefail
|
||||
source "$(cd "$(dirname "$0")" && pwd)/build-common.sh"
|
||||
|
||||
PREP="set -e
|
||||
export DEBIAN_FRONTEND=noninteractive
|
||||
apt-get update
|
||||
# --no-install-recommends keeps the image lean (avoids pulling systemd-adjacent
|
||||
# recommends). The guest never runs systemd: PID 1 is wrenn-init -> tini -> envd.
|
||||
apt-get install -y --no-install-recommends socat chrony sudo wget curl ca-certificates git iproute2 hostname tini
|
||||
# Remove the stock 'ubuntu' user (uid 1000) shipped by the base image; it is
|
||||
# replaced by wrenn-user. Also drop its cloud-init sudoers drop-in.
|
||||
userdel -r ubuntu 2>/dev/null || true
|
||||
rm -f /etc/sudoers.d/90-cloud-init-users
|
||||
useradd -m -s /bin/bash wrenn-user
|
||||
${WRENN_SUDOERS_SETUP}
|
||||
apt-get clean
|
||||
rm -rf /var/lib/apt/lists/*"
|
||||
|
||||
build_system_rootfs "ubuntu:26.04" 0 "${PREP}"
|
||||
@ -23,9 +23,11 @@ echo "+cpu +memory +io" > /sys/fs/cgroup/cgroup.subtree_control 2>/dev/null || t
|
||||
{ echo 0 > /sys/block/vda/queue/write_zeroes_max_bytes; } 2>/dev/null || true
|
||||
{ echo 0 > /sys/block/vda/queue/discard_max_bytes; } 2>/dev/null || true
|
||||
|
||||
# Set hostname and make it resolvable (sudo requires this).
|
||||
hostname capsule
|
||||
echo "127.0.0.1 capsule" >> /etc/hosts
|
||||
# Set hostname and make it resolvable (sudo requires this). Use the kernel knob
|
||||
# directly so we don't depend on the `hostname` binary, which is absent from
|
||||
# minimal Arch/Fedora images. Guard so a failure never aborts init under set -e.
|
||||
echo capsule > /proc/sys/kernel/hostname 2>/dev/null || hostname capsule 2>/dev/null || true
|
||||
echo "127.0.0.1 capsule" >> /etc/hosts 2>/dev/null || true
|
||||
|
||||
# Configure networking if the kernel ip= boot arg did not already set it up.
|
||||
if ! ip addr show eth0 2>/dev/null | grep -q "169.254.0.21"; then
|
||||
@ -35,9 +37,14 @@ if ! ip addr show eth0 2>/dev/null | grep -q "169.254.0.21"; then
|
||||
ip route add default via 169.254.0.22 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Configure DNS resolver.
|
||||
echo "nameserver 8.8.8.8" > /etc/resolv.conf
|
||||
echo "nameserver 8.8.4.4" >> /etc/resolv.conf
|
||||
# Configure DNS resolver. Drop any existing symlink first — on some distros
|
||||
# (e.g. Fedora) /etc/resolv.conf is a dangling symlink into systemd-resolved,
|
||||
# and writing through it would fail and abort init under set -e.
|
||||
rm -f /etc/resolv.conf 2>/dev/null || true
|
||||
{
|
||||
echo "nameserver 8.8.8.8"
|
||||
echo "nameserver 8.8.4.4"
|
||||
} > /etc/resolv.conf 2>/dev/null || true
|
||||
|
||||
# Set a standard PATH so envd and all child processes can find common binaries.
|
||||
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
|
||||
|
||||
@ -137,22 +137,15 @@ func (h *adminCapsuleHandler) Snapshot(w http.ResponseWriter, r *http.Request) {
|
||||
}
|
||||
}
|
||||
|
||||
tmpl, err := h.svc.CreateSnapshot(r.Context(), sandboxID, id.PlatformTeamID, req.Name)
|
||||
ac := auth.MustFromContext(r.Context())
|
||||
ac.TeamID = id.PlatformTeamID
|
||||
name := req.Name
|
||||
if err == nil {
|
||||
name = tmpl.Name
|
||||
}
|
||||
h.audit.LogSnapshotCreate(r.Context(), ac, name, err)
|
||||
sb, name, err := h.svc.CreateSnapshot(r.Context(), sandboxID, id.PlatformTeamID, req.Name)
|
||||
if err != nil {
|
||||
if name != "" {
|
||||
h.audit.LogSnapshotDeleteSystem(r.Context(), id.PlatformTeamID, name, "cleanup_after_create_error", nil)
|
||||
}
|
||||
status, code, msg := serviceErrToHTTP(err)
|
||||
writeError(w, status, code, msg)
|
||||
return
|
||||
}
|
||||
ac := auth.MustFromContext(r.Context())
|
||||
ac.TeamID = id.PlatformTeamID
|
||||
h.audit.LogSnapshotCreateRequested(r.Context(), ac, name)
|
||||
|
||||
writeJSON(w, http.StatusCreated, templateToResponse(tmpl))
|
||||
writeJSON(w, http.StatusAccepted, sandboxToResponse(sb))
|
||||
}
|
||||
|
||||
@ -246,6 +246,7 @@ func (h *buildHandler) ListTemplates(w http.ResponseWriter, r *http.Request) {
|
||||
SizeBytes int64 `json:"size_bytes"`
|
||||
TeamID string `json:"team_id"`
|
||||
CreatedAt string `json:"created_at"`
|
||||
Protected bool `json:"protected"`
|
||||
}
|
||||
|
||||
resp := make([]templateResponse, len(templates))
|
||||
@ -257,6 +258,7 @@ func (h *buildHandler) ListTemplates(w http.ResponseWriter, r *http.Request) {
|
||||
MemoryMB: t.MemoryMb,
|
||||
SizeBytes: t.SizeBytes,
|
||||
TeamID: id.FormatTeamID(t.TeamID),
|
||||
Protected: layout.IsSystemTemplate(t.TeamID, t.ID),
|
||||
}
|
||||
if t.CreatedAt.Valid {
|
||||
resp[i].CreatedAt = t.CreatedAt.Time.Format(time.RFC3339)
|
||||
@ -280,8 +282,8 @@ func (h *buildHandler) DeleteTemplate(w http.ResponseWriter, r *http.Request) {
|
||||
writeError(w, http.StatusNotFound, "not_found", "template not found")
|
||||
return
|
||||
}
|
||||
if layout.IsMinimal(tmpl.TeamID, tmpl.ID) {
|
||||
writeError(w, http.StatusForbidden, "forbidden", "the minimal template cannot be deleted")
|
||||
if layout.IsSystemTemplate(tmpl.TeamID, tmpl.ID) {
|
||||
writeError(w, http.StatusForbidden, "forbidden", "system base templates cannot be deleted")
|
||||
return
|
||||
}
|
||||
|
||||
|
||||
@ -158,6 +158,11 @@ func (h *sandboxEventHandler) verbForFailure(ctx context.Context, sandboxID pgty
|
||||
return events.CapsuleResume
|
||||
case "pausing":
|
||||
return events.CapsulePause
|
||||
case "snapshotting":
|
||||
// A snapshot pauses then resumes the VM; a host-side failure leaves the
|
||||
// sandbox errored, not destroyed. Route through CapsuleCreate so the
|
||||
// consumer's handleFailed marks it "error" rather than removing the row.
|
||||
return events.CapsuleCreate
|
||||
default:
|
||||
return events.CapsuleDestroy
|
||||
}
|
||||
|
||||
@ -79,6 +79,7 @@ type snapshotResponse struct {
|
||||
SizeBytes int64 `json:"size_bytes"`
|
||||
CreatedAt string `json:"created_at"`
|
||||
Platform bool `json:"platform"`
|
||||
Protected bool `json:"protected"`
|
||||
Metadata map[string]string `json:"metadata,omitempty"`
|
||||
}
|
||||
|
||||
@ -88,6 +89,7 @@ func templateToResponse(t db.Template) snapshotResponse {
|
||||
Type: t.Type,
|
||||
SizeBytes: t.SizeBytes,
|
||||
Platform: t.TeamID == id.PlatformTeamID,
|
||||
Protected: layout.IsSystemTemplate(t.TeamID, t.ID),
|
||||
}
|
||||
if t.Vcpus != 0 {
|
||||
resp.VCPUs = &t.Vcpus
|
||||
@ -112,8 +114,8 @@ type createSnapshotRequest struct {
|
||||
Name string `json:"name"`
|
||||
}
|
||||
|
||||
// Create handles POST /v1/snapshots. Takes a live snapshot of a running
|
||||
// sandbox and registers the result as a new template.
|
||||
// Create handles POST /v1/snapshots. Snapshots a running or paused sandbox and
|
||||
// registers the result as a new template.
|
||||
func (h *snapshotHandler) Create(w http.ResponseWriter, r *http.Request) {
|
||||
var req createSnapshotRequest
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
@ -131,22 +133,18 @@ func (h *snapshotHandler) Create(w http.ResponseWriter, r *http.Request) {
|
||||
}
|
||||
ac := auth.MustFromContext(r.Context())
|
||||
|
||||
tmpl, err := h.sandboxSvc.CreateSnapshot(r.Context(), sandboxID, ac.TeamID, req.Name)
|
||||
name := req.Name
|
||||
if err == nil {
|
||||
name = tmpl.Name
|
||||
}
|
||||
h.audit.LogSnapshotCreate(r.Context(), ac, name, err)
|
||||
// Async: the VM briefly pauses to a "snapshotting" state, then resumes. The
|
||||
// template is registered by a background goroutine; clients learn of the
|
||||
// result via the SSE template.snapshot.create event (or by polling).
|
||||
sb, name, err := h.sandboxSvc.CreateSnapshot(r.Context(), sandboxID, ac.TeamID, req.Name)
|
||||
if err != nil {
|
||||
if name != "" {
|
||||
h.audit.LogSnapshotDeleteSystem(r.Context(), ac.TeamID, name, "cleanup_after_create_error", nil)
|
||||
}
|
||||
status, code, msg := serviceErrToHTTP(err)
|
||||
writeError(w, status, code, msg)
|
||||
return
|
||||
}
|
||||
h.audit.LogSnapshotCreateRequested(r.Context(), ac, name)
|
||||
|
||||
writeJSON(w, http.StatusCreated, templateToResponse(tmpl))
|
||||
writeJSON(w, http.StatusAccepted, sandboxToResponse(sb))
|
||||
}
|
||||
|
||||
// List handles GET /v1/snapshots.
|
||||
@ -188,8 +186,8 @@ func (h *snapshotHandler) Delete(w http.ResponseWriter, r *http.Request) {
|
||||
writeError(w, http.StatusForbidden, "forbidden", "platform templates cannot be deleted here")
|
||||
return
|
||||
}
|
||||
if layout.IsMinimal(tmpl.TeamID, tmpl.ID) {
|
||||
writeError(w, http.StatusForbidden, "forbidden", "the minimal template cannot be deleted")
|
||||
if layout.IsSystemTemplate(tmpl.TeamID, tmpl.ID) {
|
||||
writeError(w, http.StatusForbidden, "forbidden", "system base templates cannot be deleted")
|
||||
return
|
||||
}
|
||||
|
||||
|
||||
@ -32,6 +32,14 @@ const unreachableThreshold = 90 * time.Second
|
||||
// that may not have registered the sandbox on the host agent yet.
|
||||
const transientGracePeriod = 2 * time.Minute
|
||||
|
||||
// snapshotGracePeriod is the grace for a sandbox stuck in "snapshotting" while
|
||||
// the VM is still alive on the host. Snapshots dump guest RAM and flatten the
|
||||
// rootfs, which can run for minutes on large sandboxes, and the agent reports
|
||||
// the VM as alive throughout — so we must not race the in-flight operation.
|
||||
// It exceeds the background goroutine's 10-minute deadline, so reaching it
|
||||
// means the control plane crashed mid-snapshot and the sandbox needs recovery.
|
||||
const snapshotGracePeriod = 15 * time.Minute
|
||||
|
||||
// HostMonitor runs on a fixed interval and performs two duties:
|
||||
//
|
||||
// 1. Passive check: marks hosts whose last_heartbeat_at is stale as
|
||||
@ -350,7 +358,7 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
|
||||
|
||||
transientSandboxes, err := m.db.ListSandboxesByHostAndStatus(ctx, db.ListSandboxesByHostAndStatusParams{
|
||||
HostID: host.ID,
|
||||
Column2: []string{"starting", "resuming", "pausing", "stopping"},
|
||||
Column2: []string{"starting", "resuming", "pausing", "stopping", "snapshotting"},
|
||||
})
|
||||
if err != nil {
|
||||
slog.Warn("host monitor: failed to list transient sandboxes", "host_id", id.FormatHostID(host.ID), "error", err)
|
||||
@ -359,7 +367,7 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
|
||||
|
||||
for _, sb := range transientSandboxes {
|
||||
sbIDStr := id.FormatSandboxID(sb.ID)
|
||||
if _, ok := aliveStatus[sbIDStr]; ok {
|
||||
if agentStatus, ok := aliveStatus[sbIDStr]; ok {
|
||||
// Sandbox is alive on host — the background goroutine should
|
||||
// finalize the transition. For starting/resuming, if the sandbox
|
||||
// is alive it means creation/resume succeeded.
|
||||
@ -370,6 +378,26 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
|
||||
slog.Info("host monitor: promoted transient sandbox to running", "sandbox_id", sbIDStr, "from", sb.Status)
|
||||
}
|
||||
}
|
||||
// A snapshot keeps the source sandbox alive throughout, so an alive
|
||||
// sandbox does NOT mean the snapshot finished. Only recover it once
|
||||
// it has been stuck past the snapshot grace period (i.e. the CP
|
||||
// crashed mid-op). Recover to the sandbox's actual host-side status:
|
||||
// a running sandbox is snapshotted live and stays running, but a
|
||||
// paused sandbox is snapshotted from disk and must return to paused.
|
||||
if sb.Status == "snapshotting" &&
|
||||
sb.LastUpdated.Valid && time.Since(sb.LastUpdated.Time) >= snapshotGracePeriod {
|
||||
recoverTo := agentStatus
|
||||
if recoverTo != "running" && recoverTo != "paused" {
|
||||
// Coerced/unknown agent label — default to running.
|
||||
recoverTo = "running"
|
||||
}
|
||||
if _, err := m.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
|
||||
ID: sb.ID, Status: "snapshotting", Status_2: recoverTo,
|
||||
}); err == nil {
|
||||
slog.Info("host monitor: recovered stuck snapshotting sandbox", "sandbox_id", sbIDStr, "to", recoverTo)
|
||||
m.audit.LogSnapshotCreateSystem(ctx, sb.TeamID, sb.ID, "snapshot_recovered", nil)
|
||||
}
|
||||
}
|
||||
continue
|
||||
}
|
||||
// Sandbox is not alive on host. If the transition is recent, give the
|
||||
@ -390,6 +418,9 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
|
||||
finalStatus = "paused"
|
||||
case "stopping":
|
||||
finalStatus = "stopped"
|
||||
case "snapshotting":
|
||||
// VM is gone but DB says snapshotting → the snapshot died with the VM.
|
||||
finalStatus = "error"
|
||||
}
|
||||
fromStatus := sb.Status
|
||||
if _, err := m.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
|
||||
@ -405,6 +436,9 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
|
||||
case "pausing":
|
||||
// Pause assumed to have succeeded host-side; emit success with inferred metadata.
|
||||
m.audit.LogSandboxAutoPause(ctx, sb.TeamID, sb.ID, "transient_timeout_inferred", nil)
|
||||
case "snapshotting":
|
||||
// VM gone mid-snapshot; the sandbox is errored.
|
||||
m.audit.LogSnapshotCreateSystem(ctx, sb.TeamID, sb.ID, "transient_timeout", inferredErr)
|
||||
case "stopping":
|
||||
m.audit.LogSandboxDestroySystem(ctx, sb.TeamID, sb.ID, "transient_timeout_inferred", nil)
|
||||
}
|
||||
|
||||
@ -1421,10 +1421,19 @@ paths:
|
||||
- apiKeyAuth: []
|
||||
- sessionAuth: []
|
||||
description: |
|
||||
Live snapshot: briefly pauses the capsule, writes its VM state +
|
||||
memory + flattened rootfs to a new template directory, then resumes
|
||||
the capsule. The source capsule keeps running after the snapshot;
|
||||
the resulting template can be used to create new capsules.
|
||||
Snapshot a capsule, processed asynchronously. The call returns
|
||||
immediately with the capsule in the `snapshotting` state, then it
|
||||
returns to its original state on completion. The capsule must be
|
||||
`running` or `paused`.
|
||||
|
||||
A `running` capsule is snapshotted live: it briefly pauses while its VM
|
||||
state + memory + flattened rootfs are written to a new template, then
|
||||
resumes to `running`. A `paused` capsule is snapshotted directly from
|
||||
its on-disk state without reviving the VM, and stays `paused`.
|
||||
|
||||
Because it is async, the response does NOT contain the template. Watch
|
||||
for the `template.snapshot.create` SSE event (its `outcome` reports
|
||||
success or failure) or poll `GET /v1/snapshots` to observe completion.
|
||||
|
||||
Snapshots are immutable: each call must use a fresh name. Re-using
|
||||
an existing name returns 409 Conflict.
|
||||
@ -1435,14 +1444,14 @@ paths:
|
||||
schema:
|
||||
$ref: "#/components/schemas/CreateSnapshotRequest"
|
||||
responses:
|
||||
"201":
|
||||
description: Snapshot created
|
||||
"202":
|
||||
description: Snapshot accepted; capsule is now snapshotting
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: "#/components/schemas/Template"
|
||||
$ref: "#/components/schemas/Capsule"
|
||||
"409":
|
||||
description: Name already exists or capsule not running
|
||||
description: Name already exists, or capsule is not running or paused
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
@ -2813,7 +2822,7 @@ paths:
|
||||
schema:
|
||||
type: array
|
||||
items:
|
||||
$ref: "#/components/schemas/Template"
|
||||
$ref: "#/components/schemas/AdminTemplate"
|
||||
|
||||
/v1/admin/templates/{name}:
|
||||
delete:
|
||||
@ -2989,6 +2998,10 @@ paths:
|
||||
summary: Create snapshot from any capsule (admin)
|
||||
operationId: adminCreateSnapshotFromCapsule
|
||||
tags: [admin]
|
||||
description: |
|
||||
Snapshots a `running` or `paused` capsule into a platform template,
|
||||
processed asynchronously (see `POST /v1/snapshots`). A running capsule
|
||||
resumes to `running`; a paused capsule stays `paused`.
|
||||
security:
|
||||
- sessionAuth: []
|
||||
parameters:
|
||||
@ -2997,21 +3010,22 @@ paths:
|
||||
required: true
|
||||
schema: {type: string}
|
||||
requestBody:
|
||||
required: true
|
||||
required: false
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required: [name]
|
||||
properties:
|
||||
name: {type: string}
|
||||
name:
|
||||
type: string
|
||||
description: Optional; an auto-generated name is used when omitted.
|
||||
responses:
|
||||
"201":
|
||||
description: Snapshot created
|
||||
"202":
|
||||
description: Snapshot accepted; capsule is now snapshotting
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: "#/components/schemas/Template"
|
||||
$ref: "#/components/schemas/Capsule"
|
||||
|
||||
/v1/admin/capsules/{id}/exec:
|
||||
parameters:
|
||||
@ -3506,7 +3520,7 @@ components:
|
||||
properties:
|
||||
template:
|
||||
type: string
|
||||
default: minimal
|
||||
default: minimal-ubuntu
|
||||
vcpus:
|
||||
type: integer
|
||||
default: 1
|
||||
@ -3610,7 +3624,7 @@ components:
|
||||
type: string
|
||||
status:
|
||||
type: string
|
||||
enum: [pending, starting, running, pausing, paused, resuming, stopping, hibernated, stopped, missing, error]
|
||||
enum: [pending, starting, running, pausing, paused, snapshotting, resuming, stopping, hibernated, stopped, missing, error]
|
||||
template:
|
||||
type: string
|
||||
vcpus:
|
||||
@ -3684,13 +3698,51 @@ components:
|
||||
type: boolean
|
||||
description: |
|
||||
True when the template is platform-managed (visible to all teams,
|
||||
e.g. the built-in `minimal` rootfs). False for team-owned
|
||||
e.g. the built-in `minimal-ubuntu` rootfs). False for team-owned
|
||||
snapshot templates.
|
||||
protected:
|
||||
type: boolean
|
||||
description: |
|
||||
True for built-in system base templates (minimal-ubuntu,
|
||||
minimal-alpine, minimal-arch, minimal-fedora). Protected templates
|
||||
cannot be deleted.
|
||||
metadata:
|
||||
type: object
|
||||
additionalProperties: {type: string}
|
||||
nullable: true
|
||||
|
||||
AdminTemplate:
|
||||
type: object
|
||||
description: |
|
||||
Template as returned by the admin templates list. Unlike `Template`
|
||||
(the team-facing snapshot shape), this includes the owning `team_id`
|
||||
and omits `platform`/`metadata`.
|
||||
properties:
|
||||
name:
|
||||
type: string
|
||||
type:
|
||||
type: string
|
||||
enum: [base, snapshot]
|
||||
vcpus:
|
||||
type: integer
|
||||
memory_mb:
|
||||
type: integer
|
||||
size_bytes:
|
||||
type: integer
|
||||
format: int64
|
||||
team_id:
|
||||
type: string
|
||||
description: Owning team ID (formatted, e.g. `team-…`). Platform team for global templates.
|
||||
created_at:
|
||||
type: string
|
||||
format: date-time
|
||||
protected:
|
||||
type: boolean
|
||||
description: |
|
||||
True for built-in system base templates (minimal-ubuntu,
|
||||
minimal-alpine, minimal-arch, minimal-fedora). Protected templates
|
||||
cannot be deleted.
|
||||
|
||||
ExecRequest:
|
||||
type: object
|
||||
required: [cmd]
|
||||
|
||||
@ -266,7 +266,7 @@ func (c *SandboxEventConsumer) handleStopped(ctx context.Context, sandboxID pgty
|
||||
// audit.Log writes the row only — it does NOT republish an event, which would
|
||||
// loop back into this consumer. Do not switch to LogSandboxCreateSystem here.
|
||||
func (c *SandboxEventConsumer) handleFailed(ctx context.Context, sandboxID pgtype.UUID, event events.Event) {
|
||||
for _, fromStatus := range []string{"running", "starting", "pausing", "resuming"} {
|
||||
for _, fromStatus := range []string{"running", "starting", "pausing", "resuming", "snapshotting"} {
|
||||
if _, err := c.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
|
||||
ID: sandboxID, Status: fromStatus, Status_2: "error",
|
||||
}); err == nil {
|
||||
|
||||
@ -83,7 +83,13 @@ func New(
|
||||
sandboxSvc := &service.SandboxService{DB: queries, Pool: pool, Scheduler: sched}
|
||||
sandboxSvc.PublishEvent = func(ctx context.Context, event service.SandboxStateEvent) {
|
||||
if evt, ok := serviceEventToCanonical(event); ok {
|
||||
eventPub.Publish(ctx, evt)
|
||||
// State-change events are ephemeral UI signals — mirror them to the
|
||||
// dashboard via Pub/Sub only, never to durable channel subscribers.
|
||||
if evt.Event == events.CapsuleStateChanged {
|
||||
eventPub.PublishTransient(ctx, evt)
|
||||
} else {
|
||||
eventPub.Publish(ctx, evt)
|
||||
}
|
||||
}
|
||||
}
|
||||
apiKeySvc := &service.APIKeyService{DB: queries}
|
||||
@ -482,6 +488,39 @@ func serviceEventToCanonical(e service.SandboxStateEvent) (events.Event, bool) {
|
||||
eventType = events.CapsuleCreate
|
||||
outcome = events.OutcomeError
|
||||
metadata = map[string]string{"reason": "create_failed"}
|
||||
case "sandbox.snapshotted":
|
||||
// Completion of an async snapshot. The resource is the template name,
|
||||
// not the sandbox, so the dashboard's snapshot list refreshes.
|
||||
return events.Event{
|
||||
Event: events.SnapshotCreate,
|
||||
Outcome: events.OutcomeSuccess,
|
||||
Timestamp: events.Now(),
|
||||
TeamID: e.TeamID,
|
||||
Actor: events.SystemActor(),
|
||||
Resource: events.Resource{ID: e.Metadata["name"], Type: "snapshot"},
|
||||
}, true
|
||||
case "sandbox.snapshot_failed":
|
||||
return events.Event{
|
||||
Event: events.SnapshotCreate,
|
||||
Outcome: events.OutcomeError,
|
||||
Timestamp: events.Now(),
|
||||
TeamID: e.TeamID,
|
||||
Actor: events.SystemActor(),
|
||||
Resource: events.Resource{ID: e.Metadata["name"], Type: "snapshot"},
|
||||
Metadata: map[string]string{"reason": "snapshot_failed"},
|
||||
Error: e.Error,
|
||||
}, true
|
||||
case "sandbox.state_changed":
|
||||
// Transient badge transition with no terminal verb of its own. Carries
|
||||
// from/to in metadata; routed via Pub/Sub only by the caller.
|
||||
return events.Event{
|
||||
Event: events.CapsuleStateChanged,
|
||||
Timestamp: events.Now(),
|
||||
TeamID: e.TeamID,
|
||||
Actor: events.SystemActor(),
|
||||
Resource: events.Resource{ID: e.SandboxID, Type: "sandbox"},
|
||||
Metadata: e.Metadata,
|
||||
}, true
|
||||
default:
|
||||
return events.Event{}, false
|
||||
}
|
||||
|
||||
@ -15,20 +15,19 @@ import (
|
||||
|
||||
func timeNowNano() int64 { return time.Now().UnixNano() }
|
||||
|
||||
// IsMinimal reports whether the given team and template IDs represent the
|
||||
// built-in "minimal" template (both all-zeros).
|
||||
func IsMinimal(teamID, templateID pgtype.UUID) bool {
|
||||
return teamID.Bytes == id.PlatformTeamID.Bytes && templateID.Bytes == id.MinimalTemplateID.Bytes
|
||||
// IsSystemTemplate reports whether the given team and template IDs represent a
|
||||
// built-in system base template (minimal-ubuntu / -alpine / -arch / -fedora):
|
||||
// platform-owned with a template ID in the reserved range. System templates are
|
||||
// protected from deletion.
|
||||
func IsSystemTemplate(teamID, templateID pgtype.UUID) bool {
|
||||
return teamID.Bytes == id.PlatformTeamID.Bytes && id.IsReservedTemplateID(templateID)
|
||||
}
|
||||
|
||||
// TemplateDir returns the on-disk directory for a template.
|
||||
// TemplateDir returns the on-disk directory for a template. Every template —
|
||||
// including the built-in system base templates — lives under the teams tree:
|
||||
//
|
||||
// minimal (zeros, zeros): {wrennDir}/images/minimal
|
||||
// all others: {wrennDir}/images/teams/{base36(teamID)}/{base36(templateID)}
|
||||
// {wrennDir}/images/teams/{base36(teamID)}/{base36(templateID)}
|
||||
func TemplateDir(wrennDir string, teamID, templateID pgtype.UUID) string {
|
||||
if IsMinimal(teamID, templateID) {
|
||||
return filepath.Join(wrennDir, "images", "minimal")
|
||||
}
|
||||
return filepath.Join(wrennDir, "images", "teams",
|
||||
id.UUIDToBase36(teamID.Bytes),
|
||||
id.UUIDToBase36(templateID.Bytes))
|
||||
|
||||
@ -9,7 +9,7 @@ import (
|
||||
"git.omukk.dev/wrenn/wrenn/pkg/id"
|
||||
)
|
||||
|
||||
func TestIsMinimal(t *testing.T) {
|
||||
func TestIsSystemTemplate(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
teamID pgtype.UUID
|
||||
@ -17,35 +17,41 @@ func TestIsMinimal(t *testing.T) {
|
||||
want bool
|
||||
}{
|
||||
{
|
||||
name: "both zeros",
|
||||
name: "ubuntu (zeros, zeros)",
|
||||
teamID: id.PlatformTeamID,
|
||||
templateID: id.MinimalTemplateID,
|
||||
templateID: id.UbuntuTemplateID,
|
||||
want: true,
|
||||
},
|
||||
{
|
||||
name: "non-zero team",
|
||||
teamID: pgtype.UUID{Bytes: [16]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}, Valid: true},
|
||||
templateID: id.MinimalTemplateID,
|
||||
want: false,
|
||||
},
|
||||
{
|
||||
name: "non-zero template",
|
||||
name: "fedora (platform, id 3)",
|
||||
teamID: id.PlatformTeamID,
|
||||
templateID: pgtype.UUID{Bytes: [16]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}, Valid: true},
|
||||
templateID: id.FedoraTemplateID,
|
||||
want: true,
|
||||
},
|
||||
{
|
||||
name: "platform, max reserved id",
|
||||
teamID: id.PlatformTeamID,
|
||||
templateID: pgtype.UUID{Bytes: [16]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x04, 0x00}, Valid: true}, // 1024
|
||||
want: true,
|
||||
},
|
||||
{
|
||||
name: "platform, above reserved range",
|
||||
teamID: id.PlatformTeamID,
|
||||
templateID: pgtype.UUID{Bytes: [16]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x04, 0x01}, Valid: true}, // 1025
|
||||
want: false,
|
||||
},
|
||||
{
|
||||
name: "both non-zero",
|
||||
teamID: pgtype.UUID{Bytes: [16]byte{1}, Valid: true},
|
||||
templateID: pgtype.UUID{Bytes: [16]byte{2}, Valid: true},
|
||||
name: "non-platform team, reserved id",
|
||||
teamID: pgtype.UUID{Bytes: [16]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}, Valid: true},
|
||||
templateID: id.UbuntuTemplateID,
|
||||
want: false,
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := IsMinimal(tt.teamID, tt.templateID); got != tt.want {
|
||||
t.Errorf("IsMinimal() = %v, want %v", got, tt.want)
|
||||
if got := IsSystemTemplate(tt.teamID, tt.templateID); got != tt.want {
|
||||
t.Errorf("IsSystemTemplate() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
@ -54,9 +60,11 @@ func TestIsMinimal(t *testing.T) {
|
||||
func TestTemplateDir(t *testing.T) {
|
||||
wrennDir := "/var/lib/wrenn"
|
||||
|
||||
t.Run("minimal", func(t *testing.T) {
|
||||
got := TemplateDir(wrennDir, id.PlatformTeamID, id.MinimalTemplateID)
|
||||
want := filepath.Join(wrennDir, "images", "minimal")
|
||||
t.Run("system base template (ubuntu) lives under teams", func(t *testing.T) {
|
||||
got := TemplateDir(wrennDir, id.PlatformTeamID, id.UbuntuTemplateID)
|
||||
want := filepath.Join(wrennDir, "images", "teams",
|
||||
id.UUIDToBase36(id.PlatformTeamID.Bytes),
|
||||
id.UUIDToBase36(id.UbuntuTemplateID.Bytes))
|
||||
if got != want {
|
||||
t.Errorf("TemplateDir() = %q, want %q", got, want)
|
||||
}
|
||||
@ -88,8 +96,11 @@ func TestTemplateDir(t *testing.T) {
|
||||
|
||||
func TestTemplateRootfs(t *testing.T) {
|
||||
wrennDir := "/var/lib/wrenn"
|
||||
got := TemplateRootfs(wrennDir, id.PlatformTeamID, id.MinimalTemplateID)
|
||||
want := filepath.Join(wrennDir, "images", "minimal", "rootfs.ext4")
|
||||
got := TemplateRootfs(wrennDir, id.PlatformTeamID, id.UbuntuTemplateID)
|
||||
want := filepath.Join(wrennDir, "images", "teams",
|
||||
id.UUIDToBase36(id.PlatformTeamID.Bytes),
|
||||
id.UUIDToBase36(id.UbuntuTemplateID.Bytes),
|
||||
"rootfs.ext4")
|
||||
if got != want {
|
||||
t.Errorf("TemplateRootfs() = %q, want %q", got, want)
|
||||
}
|
||||
|
||||
@ -9,12 +9,13 @@ import (
|
||||
type SandboxStatus string
|
||||
|
||||
const (
|
||||
StatusPending SandboxStatus = "pending"
|
||||
StatusRunning SandboxStatus = "running"
|
||||
StatusPausing SandboxStatus = "pausing"
|
||||
StatusPaused SandboxStatus = "paused"
|
||||
StatusStopped SandboxStatus = "stopped"
|
||||
StatusError SandboxStatus = "error"
|
||||
StatusPending SandboxStatus = "pending"
|
||||
StatusRunning SandboxStatus = "running"
|
||||
StatusPausing SandboxStatus = "pausing"
|
||||
StatusPaused SandboxStatus = "paused"
|
||||
StatusSnapshotting SandboxStatus = "snapshotting"
|
||||
StatusStopped SandboxStatus = "stopped"
|
||||
StatusError SandboxStatus = "error"
|
||||
)
|
||||
|
||||
// Sandbox holds all state for a running sandbox on this host.
|
||||
|
||||
@ -9,6 +9,8 @@ import (
|
||||
"strconv"
|
||||
"strings"
|
||||
|
||||
"github.com/jackc/pgx/v5/pgtype"
|
||||
|
||||
"git.omukk.dev/wrenn/wrenn/internal/layout"
|
||||
"git.omukk.dev/wrenn/wrenn/pkg/id"
|
||||
)
|
||||
@ -29,13 +31,9 @@ func EnsureImageSizes(wrennDir string, targetMB int) error {
|
||||
}
|
||||
targetBytes := int64(targetMB) * 1024 * 1024
|
||||
|
||||
// Expand the built-in minimal image.
|
||||
minimalRootfs := layout.TemplateRootfs(wrennDir, id.PlatformTeamID, id.MinimalTemplateID)
|
||||
if err := expandImage(minimalRootfs, targetBytes, targetMB); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Walk teams/{teamDir}/{templateDir}/rootfs.ext4 two levels deep.
|
||||
// Walk teams/{teamDir}/{templateDir}/rootfs.ext4 two levels deep. The
|
||||
// built-in system base templates live under teams/{base36(0)}/... so this
|
||||
// covers them too.
|
||||
teamsDir := layout.TeamsDir(wrennDir)
|
||||
teamEntries, err := os.ReadDir(teamsDir)
|
||||
if err != nil {
|
||||
@ -104,12 +102,19 @@ func ParseSizeToMB(s string) (int, error) {
|
||||
}
|
||||
}
|
||||
|
||||
// ShrinkMinimalImage shrinks the built-in minimal rootfs back to its minimum
|
||||
// size using resize2fs -M. This is the inverse of EnsureImageSizes and should
|
||||
// be called during graceful shutdown so the image is stored compactly on disk.
|
||||
func ShrinkMinimalImage(wrennDir string) {
|
||||
minimalRootfs := layout.TemplateRootfs(wrennDir, id.PlatformTeamID, id.MinimalTemplateID)
|
||||
shrinkImage(minimalRootfs)
|
||||
// ShrinkSystemImages shrinks the built-in system base rootfs images back to
|
||||
// their minimum size using resize2fs -M. This is the inverse of
|
||||
// EnsureImageSizes and should be called during graceful shutdown so the images
|
||||
// are stored compactly on disk.
|
||||
func ShrinkSystemImages(wrennDir string) {
|
||||
for _, tmplID := range []pgtype.UUID{
|
||||
id.UbuntuTemplateID,
|
||||
id.AlpineTemplateID,
|
||||
id.ArchTemplateID,
|
||||
id.FedoraTemplateID,
|
||||
} {
|
||||
shrinkImage(layout.TemplateRootfs(wrennDir, id.PlatformTeamID, tmplID))
|
||||
}
|
||||
}
|
||||
|
||||
// shrinkImage shrinks a single rootfs image to its minimum size.
|
||||
|
||||
@ -294,12 +294,12 @@ func (m *Manager) Create(
|
||||
// Snapshot template? Route to the CH-restore path; the launcher manages
|
||||
// its own resource lifecycle and registers the sandbox itself.
|
||||
//
|
||||
// The minimal base template never carries a memory snapshot; guarding
|
||||
// here prevents a stray state.json (e.g. from a failed CreateSnapshot
|
||||
// that mis-targeted minimal) from silently rerouting fresh boots into
|
||||
// System base templates never carry a memory snapshot; guarding here
|
||||
// prevents a stray state.json (e.g. from a failed CreateSnapshot that
|
||||
// mis-targeted a base template) from silently rerouting fresh boots into
|
||||
// the restore path with a confusing error downstream.
|
||||
templateDir := layout.TemplateDir(m.cfg.WrennDir, teamID, templateID)
|
||||
if !layout.IsMinimal(teamID, templateID) && layout.IsSnapshotTemplate(templateDir) {
|
||||
if !layout.IsSystemTemplate(teamID, templateID) && layout.IsSnapshotTemplate(templateDir) {
|
||||
return m.createFromSnapshotTemplate(ctx, sandboxID, teamID, templateID,
|
||||
vcpus, memoryMB, timeoutSec, diskSizeMB, defaultUser, defaultEnv)
|
||||
}
|
||||
|
||||
@ -32,6 +32,7 @@ import (
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"strings"
|
||||
@ -695,11 +696,12 @@ func (m *Manager) waitForMemoryLoader(ctx context.Context, sb *sandboxState) err
|
||||
}
|
||||
|
||||
// CreateSnapshot writes a self-contained template snapshot to
|
||||
// WRENN_DIR/images/teams/{teamID}/{templateID}/. The sandbox is briefly
|
||||
// paused, the dm-snapshot is flattened into rootfs.ext4, CH writes the
|
||||
// memory snapshot, then the sandbox is resumed.
|
||||
// WRENN_DIR/images/teams/{teamID}/{templateID}/, then returns the total size
|
||||
// (in bytes) of the artefacts written.
|
||||
//
|
||||
// Returns the total size (in bytes) of the artefacts written.
|
||||
// A running sandbox is snapshotted live (briefly paused, memory dumped, rootfs
|
||||
// flattened, then resumed). A paused sandbox is snapshotted straight from its
|
||||
// on-disk pause artefacts without reviving the VM — it stays paused.
|
||||
func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID string, teamID, templateID pgtype.UUID, name string) (int64, error) {
|
||||
sb, err := m.get(sandboxID)
|
||||
if err != nil {
|
||||
@ -709,10 +711,6 @@ func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID string, teamID,
|
||||
sb.lifecycleMu.Lock()
|
||||
defer sb.lifecycleMu.Unlock()
|
||||
|
||||
if sb.Status != models.StatusRunning {
|
||||
return 0, fmt.Errorf("%w: %s (status: %s)", ErrNotRunning, sandboxID, sb.Status)
|
||||
}
|
||||
|
||||
// Refuse silent overwrites: every snapshot must land in a fresh
|
||||
// templateID. Defends against caller bugs and concurrent CreateSnapshot
|
||||
// races for the same destination. User-facing snapshot-name uniqueness
|
||||
@ -722,6 +720,22 @@ func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID string, teamID,
|
||||
id.UUIDString(teamID), id.UUIDString(templateID))
|
||||
}
|
||||
|
||||
switch sb.Status {
|
||||
case models.StatusRunning:
|
||||
return m.snapshotRunningToTemplate(ctx, sb, teamID, templateID, name)
|
||||
case models.StatusPaused:
|
||||
return m.snapshotPausedToTemplate(ctx, sb, teamID, templateID, name)
|
||||
default:
|
||||
return 0, fmt.Errorf("%w: %s (status: %s)", ErrNotRunning, sandboxID, sb.Status)
|
||||
}
|
||||
}
|
||||
|
||||
// snapshotRunningToTemplate takes a live snapshot of a running sandbox: pause
|
||||
// CH, dump memory + flatten the rootfs into a staging dir, resume CH, then
|
||||
// promote the staged template into place. The sandbox returns to running.
|
||||
func (m *Manager) snapshotRunningToTemplate(ctx context.Context, sb *sandboxState, teamID, templateID pgtype.UUID, name string) (int64, error) {
|
||||
sandboxID := sb.ID
|
||||
|
||||
// Same rationale as Pause: wait for the background memory loader so the
|
||||
// resulting memory-ranges is self-contained when this sandbox itself was
|
||||
// previously restored from an ondemand snapshot.
|
||||
@ -821,6 +835,152 @@ func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID string, teamID,
|
||||
return size, nil
|
||||
}
|
||||
|
||||
// snapshotPausedToTemplate builds a self-contained template from a paused
|
||||
// sandbox's on-disk artefacts without reviving the VM. The pause snapshot
|
||||
// already holds a self-contained CH memory image (Pause blocks on the memory
|
||||
// loader before snapshotting), so we copy those memory files verbatim and
|
||||
// flatten the persistent CoW into rootfs.ext4. The sandbox stays Paused.
|
||||
func (m *Manager) snapshotPausedToTemplate(ctx context.Context, sb *sandboxState, teamID, templateID pgtype.UUID, name string) (int64, error) {
|
||||
snapDir := layout.PauseSnapshotDir(m.cfg.WrennDir, sb.ID)
|
||||
meta, err := readSnapshotMeta(snapDir)
|
||||
if err != nil {
|
||||
return 0, fmt.Errorf("load pause snapshot meta: %w", err)
|
||||
}
|
||||
|
||||
dstDir := layout.TemplateDir(m.cfg.WrennDir, teamID, templateID)
|
||||
stageDir := filepath.Join(layout.SandboxesDir(m.cfg.WrennDir),
|
||||
fmt.Sprintf(".stage-%s-%d", sb.ID, time.Now().UnixNano()))
|
||||
if err := os.MkdirAll(stageDir, 0o755); err != nil {
|
||||
return 0, fmt.Errorf("mkdir stage dir: %w", err)
|
||||
}
|
||||
defer os.RemoveAll(stageDir)
|
||||
|
||||
// Flatten the persistent CoW into a standalone rootfs.ext4. The VM is down,
|
||||
// so re-attach a throwaway dm-snapshot over the base image + CoW just long
|
||||
// enough to read through it; the CoW file is left intact for a later Resume.
|
||||
if err := m.flattenPausedCow(ctx, sb.ID, meta, filepath.Join(stageDir, "rootfs.ext4")); err != nil {
|
||||
return 0, err
|
||||
}
|
||||
|
||||
// Copy CH's memory snapshot files verbatim (state.json, config.json,
|
||||
// memory-ranges, …) — everything except the CoW and the pause meta, which
|
||||
// the template replaces with its own rootfs.ext4 and meta below.
|
||||
if err := copyMemorySnapshotFiles(snapDir, stageDir); err != nil {
|
||||
return 0, err
|
||||
}
|
||||
|
||||
// Template meta: no SlotIndex (a template allocates a fresh slot per launch);
|
||||
// SandboxDir + BaseTemplate carried forward so the restore path resolves the
|
||||
// tmpfs disk path baked into CH's config.json.
|
||||
tmplMeta := &snapshotMeta{
|
||||
TemplateName: name,
|
||||
TeamID: id.UUIDString(teamID),
|
||||
TemplateID: id.UUIDString(templateID),
|
||||
VCPUs: meta.VCPUs,
|
||||
MemoryMB: meta.MemoryMB,
|
||||
TimeoutSec: meta.TimeoutSec,
|
||||
BaseTemplate: meta.BaseTemplate,
|
||||
SandboxDir: meta.SandboxDir,
|
||||
CreatedAt: time.Now(),
|
||||
}
|
||||
if err := writeSnapshotMeta(stageDir, tmplMeta); err != nil {
|
||||
slog.Warn("template meta write failed", "id", sb.ID, "error", err)
|
||||
}
|
||||
|
||||
if err := promoteSnapshotDir(stageDir, dstDir); err != nil {
|
||||
return 0, fmt.Errorf("promote snapshot: %w", err)
|
||||
}
|
||||
|
||||
size, err := snapshot.DirSize(dstDir, "")
|
||||
if err != nil {
|
||||
slog.Warn("snapshot size calc failed", "id", sb.ID, "error", err)
|
||||
}
|
||||
slog.Info("paused snapshot created",
|
||||
"id", sb.ID,
|
||||
"team_id", teamID,
|
||||
"template_id", templateID,
|
||||
"dir", dstDir,
|
||||
"bytes", size,
|
||||
)
|
||||
return size, nil
|
||||
}
|
||||
|
||||
// flattenPausedCow re-attaches a temporary dm-snapshot over a paused sandbox's
|
||||
// base image + persistent CoW, flattens it into outPath, then tears the dm
|
||||
// device down. The CoW file is preserved (RemoveSnapshot never deletes it) so a
|
||||
// later Resume still works. A distinct dm name avoids colliding with the
|
||||
// "wrenn-{id}" device a concurrent Resume would create — though lifecycleMu
|
||||
// already serialises the two.
|
||||
func (m *Manager) flattenPausedCow(ctx context.Context, sandboxID string, meta *snapshotMeta, outPath string) error {
|
||||
originLoop, err := m.loops.Acquire(meta.BaseTemplate)
|
||||
if err != nil {
|
||||
return fmt.Errorf("acquire loop: %w", err)
|
||||
}
|
||||
defer m.loops.Release(meta.BaseTemplate)
|
||||
|
||||
originSize, err := devicemapper.OriginSizeBytes(originLoop)
|
||||
if err != nil {
|
||||
return fmt.Errorf("origin size: %w", err)
|
||||
}
|
||||
|
||||
dmDev, err := devicemapper.RestoreSnapshot(ctx, "wrenn-flat-"+sandboxID, originLoop, meta.CowPath, originSize)
|
||||
if err != nil {
|
||||
return fmt.Errorf("restore dm-snapshot: %w", err)
|
||||
}
|
||||
defer func() {
|
||||
if rerr := devicemapper.RemoveSnapshot(context.Background(), dmDev); rerr != nil {
|
||||
slog.Warn("dm remove after paused flatten", "id", sandboxID, "error", rerr)
|
||||
}
|
||||
}()
|
||||
|
||||
if err := devicemapper.FlattenSnapshot(dmDev.DevicePath, outPath); err != nil {
|
||||
return fmt.Errorf("flatten rootfs: %w", err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// copyMemorySnapshotFiles copies every regular file from a pause snapshot dir
|
||||
// into dstDir except the CoW and the wrenn meta — i.e. CH's own memory snapshot
|
||||
// artefacts (state.json, config.json, memory-ranges, …). It hardlinks when the
|
||||
// dirs share a filesystem (instant, preserves sparseness) and falls back to a
|
||||
// sparse-preserving copy across filesystems. Pause never mutates these files in
|
||||
// place — the next Pause writes a fresh dir and swaps — so a hardlink stays a
|
||||
// valid, immutable view for the template.
|
||||
func copyMemorySnapshotFiles(srcDir, dstDir string) error {
|
||||
entries, err := os.ReadDir(srcDir)
|
||||
if err != nil {
|
||||
return fmt.Errorf("read pause dir: %w", err)
|
||||
}
|
||||
for _, e := range entries {
|
||||
if e.IsDir() {
|
||||
continue
|
||||
}
|
||||
name := e.Name()
|
||||
if name == layout.SandboxCowName || name == snapshotMetaFile {
|
||||
continue
|
||||
}
|
||||
if err := linkOrCopyFile(filepath.Join(srcDir, name), filepath.Join(dstDir, name)); err != nil {
|
||||
return fmt.Errorf("copy %s: %w", name, err)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// linkOrCopyFile hardlinks from→to, falling back to a sparse-preserving copy
|
||||
// when the two paths live on different filesystems (os.Link returns EXDEV). A
|
||||
// plain byte copy would materialise the zero pages punched out of memory-ranges
|
||||
// — inflating a multi-GB snapshot to its full apparent size — so the fallback
|
||||
// uses `cp --sparse=always`, which re-detects and re-punches the holes.
|
||||
func linkOrCopyFile(from, to string) error {
|
||||
if err := os.Link(from, to); err == nil {
|
||||
return nil
|
||||
}
|
||||
if out, err := exec.Command("cp", "--sparse=always", from, to).CombinedOutput(); err != nil {
|
||||
return fmt.Errorf("sparse copy: %s: %w", string(out), err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// DeleteSnapshot removes a template snapshot directory. Refuses deletion
|
||||
// while any in-memory sandbox is still derived from this template — even
|
||||
// though Linux unlink lets the open loop device keep working, the agent
|
||||
@ -983,9 +1143,10 @@ func (m *Manager) PauseAll(ctx context.Context) {
|
||||
wg.Wait()
|
||||
}
|
||||
|
||||
// CleanupOrphanPauseDirs removes leftover *.staging-* and *.trash-* dirs
|
||||
// under sandboxes/ from any Pause that crashed before completing the swap.
|
||||
// Safe to call at agent startup before any sandbox is created or restored.
|
||||
// CleanupOrphanPauseDirs removes leftover *.staging-*, *.stage-*, and *.trash-*
|
||||
// dirs under sandboxes/ from any Pause/snapshot/flatten that crashed before
|
||||
// completing its swap or promote. Safe to call at agent startup before any
|
||||
// sandbox is created or restored.
|
||||
//
|
||||
// Per-sandbox cleanup happens implicitly during Destroy (which removes the
|
||||
// whole PauseSnapshotDir) — this function only handles agent-crash orphans.
|
||||
@ -1001,7 +1162,12 @@ func CleanupOrphanPauseDirs(wrennDir string) {
|
||||
continue
|
||||
}
|
||||
name := e.Name()
|
||||
if !strings.Contains(name, ".staging-") && !strings.Contains(name, ".trash-") {
|
||||
// ".stage-" is the prefix used by snapshot/flatten staging dirs;
|
||||
// ".staging-" + ".trash-" are used by Pause's swap. (".stage-" is not a
|
||||
// substring of ".staging-", so all three need an explicit check.)
|
||||
if !strings.Contains(name, ".stage-") &&
|
||||
!strings.Contains(name, ".staging-") &&
|
||||
!strings.Contains(name, ".trash-") {
|
||||
continue
|
||||
}
|
||||
path := filepath.Join(sandboxesDir, name)
|
||||
|
||||
@ -390,16 +390,25 @@ func (l *AuditLogger) LogSandboxStateChanged(ctx context.Context, teamID, sandbo
|
||||
|
||||
// --- Snapshot events (scope: team) ---
|
||||
|
||||
func (l *AuditLogger) LogSnapshotCreate(ctx context.Context, ac auth.AuthContext, name string, err error) {
|
||||
l.Log(ctx, newEntry(ac, ac.TeamID, "team", "snapshot", name, "create", auditStatusFor(err, "success"), mergeMeta(nil, err)))
|
||||
l.publish(ctx, events.Event{
|
||||
Event: events.SnapshotCreate,
|
||||
Outcome: outcomeFromErr(err),
|
||||
Timestamp: events.Now(),
|
||||
TeamID: id.FormatTeamID(ac.TeamID),
|
||||
Actor: actorToEvent(ac),
|
||||
Resource: events.Resource{ID: name, Type: "snapshot"},
|
||||
Error: errString(err),
|
||||
// LogSnapshotCreateRequested records that a user requested an async snapshot.
|
||||
// It writes the user-attributed audit row only — the terminal success/failure
|
||||
// event is published later by the background goroutine (system actor). Mirrors
|
||||
// the accept-time audit pattern used by LogSandboxPause.
|
||||
func (l *AuditLogger) LogSnapshotCreateRequested(ctx context.Context, ac auth.AuthContext, name string) {
|
||||
l.Log(ctx, newEntry(ac, ac.TeamID, "team", "snapshot", name, "create", "success", nil))
|
||||
}
|
||||
|
||||
// LogSnapshotCreateSystem records a system-actor snapshot transition inferred
|
||||
// by a reconciler (e.g. the HostMonitor recovering or failing a sandbox stuck
|
||||
// in "snapshotting"). It writes an audit row only and does NOT publish a
|
||||
// SnapshotCreate event: the reconciler has no template name, and emitting one
|
||||
// would surface a spurious "snapshot captured/failed" toast.
|
||||
func (l *AuditLogger) LogSnapshotCreateSystem(ctx context.Context, teamID, sandboxID pgtype.UUID, reason string, err error) {
|
||||
l.Log(ctx, Entry{
|
||||
TeamID: teamID, ActorType: "system",
|
||||
ResourceType: "sandbox", ResourceID: id.FormatSandboxID(sandboxID),
|
||||
Action: "snapshot", Scope: "team", Status: auditStatusFor(err, "info"),
|
||||
Metadata: mergeMeta(map[string]any{"reason": reason}, err),
|
||||
})
|
||||
}
|
||||
|
||||
|
||||
36
pkg/id/id.go
36
pkg/id/id.go
@ -2,6 +2,7 @@ package id
|
||||
|
||||
import (
|
||||
"crypto/rand"
|
||||
"encoding/binary"
|
||||
"encoding/hex"
|
||||
"fmt"
|
||||
"math/big"
|
||||
@ -156,10 +157,37 @@ func ParseChannelID(s string) (pgtype.UUID, error) { return parseUUID(PrefixCh
|
||||
// (e.g. base templates, shared infrastructure).
|
||||
var PlatformTeamID = pgtype.UUID{Bytes: [16]byte{}, Valid: true}
|
||||
|
||||
// MinimalTemplateID is the all-zeros UUID sentinel for the built-in "minimal"
|
||||
// template. When both team_id and template_id are zero, the host agent uses
|
||||
// the minimal rootfs at WRENN_DIR/images/minimal/.
|
||||
var MinimalTemplateID = pgtype.UUID{Bytes: [16]byte{}, Valid: true}
|
||||
// SystemTemplateMaxID is the highest template ID reserved for built-in system
|
||||
// base templates. Template IDs in [0, SystemTemplateMaxID] under the platform
|
||||
// team are protected: they cannot be deleted and live at the well-known
|
||||
// teams/{base36(0)}/{base36(id)} on-disk paths.
|
||||
const SystemTemplateMaxID = 1024
|
||||
|
||||
// templateID returns the all-zeros UUID with its low 64 bits set to n. Used to
|
||||
// mint the well-known IDs for the built-in system base templates.
|
||||
func templateID(n uint64) pgtype.UUID {
|
||||
var b [16]byte
|
||||
binary.BigEndian.PutUint64(b[8:], n)
|
||||
return pgtype.UUID{Bytes: b, Valid: true}
|
||||
}
|
||||
|
||||
// Well-known system base template IDs (platform team). The on-disk rootfs for
|
||||
// each lives at WRENN_DIR/images/teams/{base36(PlatformTeamID)}/{base36(id)}/.
|
||||
var (
|
||||
UbuntuTemplateID = templateID(0) // minimal-ubuntu (replaces the old "minimal")
|
||||
AlpineTemplateID = templateID(1) // minimal-alpine
|
||||
ArchTemplateID = templateID(2) // minimal-arch
|
||||
FedoraTemplateID = templateID(3) // minimal-fedora
|
||||
)
|
||||
|
||||
// IsReservedTemplateID reports whether t falls in the reserved system template
|
||||
// ID range [0, SystemTemplateMaxID] (i.e. the top 64 bits are zero and the
|
||||
// bottom 64 bits are <= SystemTemplateMaxID).
|
||||
func IsReservedTemplateID(t pgtype.UUID) bool {
|
||||
hi := binary.BigEndian.Uint64(t.Bytes[:8])
|
||||
lo := binary.BigEndian.Uint64(t.Bytes[8:])
|
||||
return hi == 0 && lo <= SystemTemplateMaxID
|
||||
}
|
||||
|
||||
// UUIDString converts a pgtype.UUID to a standard hyphenated UUID string
|
||||
// (e.g., "6ba7b810-9dad-11d1-80b4-00c04fd430c8"). Used for RPC wire format.
|
||||
|
||||
@ -106,7 +106,7 @@ func (s *BuildService) takeArchive(buildID string) []byte {
|
||||
// Create inserts a new build record and enqueues it to Redis.
|
||||
func (s *BuildService) Create(ctx context.Context, p BuildCreateParams) (db.TemplateBuild, error) {
|
||||
if p.BaseTemplate == "" {
|
||||
p.BaseTemplate = "minimal"
|
||||
p.BaseTemplate = "minimal-ubuntu"
|
||||
}
|
||||
if p.VCPUs <= 0 {
|
||||
p.VCPUs = 1
|
||||
@ -447,17 +447,15 @@ func (s *BuildService) provisionBuildSandbox(
|
||||
sandboxIDStr := id.FormatSandboxID(sandboxID)
|
||||
log.Info("provisioning build sandbox", "sandbox_id", sandboxIDStr, "host_id", id.FormatHostID(host.ID))
|
||||
|
||||
baseTeamID := id.PlatformTeamID
|
||||
baseTemplateID := id.MinimalTemplateID
|
||||
if build.BaseTemplate != "minimal" {
|
||||
baseTmpl, err := s.DB.GetPlatformTemplateByName(ctx, build.BaseTemplate)
|
||||
if err != nil {
|
||||
s.failBuild(ctx, buildID, fmt.Sprintf("base template %q not found: %v", build.BaseTemplate, err))
|
||||
return nil, "", nil, err
|
||||
}
|
||||
baseTeamID = baseTmpl.TeamID
|
||||
baseTemplateID = baseTmpl.ID
|
||||
// All base templates — including the built-in system ones — are
|
||||
// platform-owned rows, so resolve the path from the DB record.
|
||||
baseTmpl, err := s.DB.GetPlatformTemplateByName(ctx, build.BaseTemplate)
|
||||
if err != nil {
|
||||
s.failBuild(ctx, buildID, fmt.Sprintf("base template %q not found: %v", build.BaseTemplate, err))
|
||||
return nil, "", nil, err
|
||||
}
|
||||
baseTeamID := baseTmpl.TeamID
|
||||
baseTemplateID := baseTmpl.ID
|
||||
|
||||
resp, err := agent.CreateSandbox(ctx, connect.NewRequest(&pb.CreateSandboxRequest{
|
||||
SandboxId: sandboxIDStr,
|
||||
@ -481,6 +479,23 @@ func (s *BuildService) provisionBuildSandbox(
|
||||
HostID: host.ID,
|
||||
})
|
||||
|
||||
if _, err := s.DB.InsertSandbox(ctx, db.InsertSandboxParams{
|
||||
ID: sandboxID,
|
||||
TeamID: id.PlatformTeamID,
|
||||
HostID: host.ID,
|
||||
Template: build.BaseTemplate,
|
||||
Status: "running",
|
||||
Vcpus: build.Vcpus,
|
||||
MemoryMb: build.MemoryMb,
|
||||
TimeoutSec: 0,
|
||||
DiskSizeMb: 5120,
|
||||
TemplateID: baseTemplateID,
|
||||
TemplateTeamID: baseTeamID,
|
||||
Metadata: []byte("{}"),
|
||||
}); err != nil {
|
||||
log.Warn("failed to insert builder sandbox record", "error", err)
|
||||
}
|
||||
|
||||
archive := s.takeArchive(buildIDStr)
|
||||
if len(archive) > 0 {
|
||||
if err := s.uploadAndExtractArchive(ctx, agent, sandboxIDStr, archive, buildIDStr); err != nil {
|
||||
@ -602,6 +617,7 @@ func (s *BuildService) finalizeBuild(
|
||||
}
|
||||
s.publishStatus(ctx, buildID, "success", build.TotalSteps, build.TotalSteps, "")
|
||||
|
||||
s.destroySandbox(ctx, agent, sandboxIDStr)
|
||||
log.Info("template build completed successfully", "name", build.Name)
|
||||
}
|
||||
|
||||
@ -796,6 +812,13 @@ func (s *BuildService) destroySandbox(_ context.Context, agent buildAgentClient,
|
||||
})); err != nil {
|
||||
slog.Warn("failed to destroy build sandbox", "sandbox_id", sandboxIDStr, "error", err)
|
||||
}
|
||||
if sbID, err := id.ParseSandboxID(sandboxIDStr); err == nil {
|
||||
if _, err := s.DB.UpdateSandboxStatus(ctx, db.UpdateSandboxStatusParams{
|
||||
ID: sbID, Status: "stopped",
|
||||
}); err != nil {
|
||||
slog.Warn("failed to mark builder sandbox stopped", "sandbox_id", sandboxIDStr, "error", err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// fetchSandboxEnv executes the 'env' command inside the specified sandbox via
|
||||
|
||||
@ -121,7 +121,7 @@ type hostagentClient = interface {
|
||||
// sandbox event to the Redis stream when the operation completes.
|
||||
func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.Sandbox, error) {
|
||||
if p.Template == "" {
|
||||
p.Template = "minimal"
|
||||
p.Template = "minimal-ubuntu"
|
||||
}
|
||||
if err := validate.SafeName(p.Template); err != nil {
|
||||
return db.Sandbox{}, fmt.Errorf("invalid template name: %w", err)
|
||||
@ -137,26 +137,23 @@ func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.
|
||||
}
|
||||
p.TimeoutSec = clampTimeout(p.TimeoutSec)
|
||||
|
||||
// Resolve template name → (teamID, templateID).
|
||||
templateTeamID := id.PlatformTeamID
|
||||
templateID := id.MinimalTemplateID
|
||||
var templateDefaultUser string
|
||||
// Resolve template name → (teamID, templateID). System base templates are
|
||||
// platform-owned rows like any other, so the lookup handles them too (the
|
||||
// query also matches platform templates for any team).
|
||||
tmpl, err := s.DB.GetTemplateByTeam(ctx, db.GetTemplateByTeamParams{Name: p.Template, TeamID: p.TeamID})
|
||||
if err != nil {
|
||||
return db.Sandbox{}, fmt.Errorf("template %q not found: %w", p.Template, err)
|
||||
}
|
||||
templateTeamID := tmpl.TeamID
|
||||
templateID := tmpl.ID
|
||||
templateDefaultUser := tmpl.DefaultUser
|
||||
var templateDefaultEnv map[string]string
|
||||
if p.Template != "minimal" {
|
||||
tmpl, err := s.DB.GetTemplateByTeam(ctx, db.GetTemplateByTeamParams{Name: p.Template, TeamID: p.TeamID})
|
||||
if err != nil {
|
||||
return db.Sandbox{}, fmt.Errorf("template %q not found: %w", p.Template, err)
|
||||
}
|
||||
templateTeamID = tmpl.TeamID
|
||||
templateID = tmpl.ID
|
||||
templateDefaultUser = tmpl.DefaultUser
|
||||
if len(tmpl.DefaultEnv) > 0 {
|
||||
_ = json.Unmarshal(tmpl.DefaultEnv, &templateDefaultEnv)
|
||||
}
|
||||
if tmpl.Type == "snapshot" {
|
||||
p.VCPUs = tmpl.Vcpus
|
||||
p.MemoryMB = tmpl.MemoryMb
|
||||
}
|
||||
if len(tmpl.DefaultEnv) > 0 {
|
||||
_ = json.Unmarshal(tmpl.DefaultEnv, &templateDefaultEnv)
|
||||
}
|
||||
if tmpl.Type == "snapshot" {
|
||||
p.VCPUs = tmpl.Vcpus
|
||||
p.MemoryMB = tmpl.MemoryMb
|
||||
}
|
||||
|
||||
if !p.TeamID.Valid {
|
||||
@ -461,59 +458,140 @@ func (s *SandboxService) resumeInBackground(
|
||||
})
|
||||
}
|
||||
|
||||
// CreateSnapshot takes a live snapshot of a running sandbox, publishing
|
||||
// the result as a new template owned by the sandbox's team. Returns the
|
||||
// inserted template record.
|
||||
func (s *SandboxService) CreateSnapshot(ctx context.Context, sandboxID, teamID pgtype.UUID, name string) (db.Template, error) {
|
||||
// CreateSnapshot asynchronously snapshots a running or paused sandbox,
|
||||
// publishing the result as a new template owned by the sandbox's team. The DB
|
||||
// CAS from the sandbox's current status to "snapshotting" is the authoritative
|
||||
// gate against concurrent Pause/Snapshot/Destroy calls; if it loses, no agent
|
||||
// RPC fires. A running sandbox is snapshotted live (CH briefly paused, then
|
||||
// resumed); a paused sandbox is snapshotted from its on-disk artefacts without
|
||||
// reviving the VM. Either way the sandbox returns to its original status on
|
||||
// completion. Returns the sandbox (now "snapshotting") and the resolved name.
|
||||
func (s *SandboxService) CreateSnapshot(ctx context.Context, sandboxID, teamID pgtype.UUID, name string) (db.Sandbox, string, error) {
|
||||
sb, err := s.DB.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID})
|
||||
if err != nil {
|
||||
return db.Template{}, fmt.Errorf("sandbox not found: %w", err)
|
||||
return db.Sandbox{}, "", fmt.Errorf("sandbox not found: %w", err)
|
||||
}
|
||||
if sb.Status != "running" {
|
||||
return db.Template{}, fmt.Errorf("sandbox is not running (status: %s)", sb.Status)
|
||||
if sb.Status != "running" && sb.Status != "paused" {
|
||||
return db.Sandbox{}, "", fmt.Errorf("sandbox is not running or paused (status: %s)", sb.Status)
|
||||
}
|
||||
origStatus := sb.Status
|
||||
|
||||
if name == "" {
|
||||
name = id.NewSnapshotName()
|
||||
}
|
||||
if err := validate.SafeName(name); err != nil {
|
||||
return db.Template{}, fmt.Errorf("invalid name: %w", err)
|
||||
return db.Sandbox{}, "", fmt.Errorf("invalid name: %w", err)
|
||||
}
|
||||
// Reject duplicate names up front so we don't pause the VM and dump memory
|
||||
// only to fail on the template insert at the very end.
|
||||
if _, err := s.DB.GetTemplateByTeam(ctx, db.GetTemplateByTeamParams{Name: name, TeamID: teamID}); err == nil {
|
||||
return db.Sandbox{}, "", fmt.Errorf("conflict: a snapshot named %q already exists", name)
|
||||
}
|
||||
|
||||
if _, err := s.DB.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
|
||||
ID: sandboxID, Status: origStatus, Status_2: "snapshotting",
|
||||
}); err != nil {
|
||||
return db.Sandbox{}, "", fmt.Errorf("sandbox not in %s state (current: %s)", origStatus, sb.Status)
|
||||
}
|
||||
|
||||
agent, err := s.agentForHost(ctx, sb.HostID)
|
||||
if err != nil {
|
||||
return db.Template{}, err
|
||||
// Roll back the CAS so the sandbox isn't stuck in "snapshotting".
|
||||
if _, rerr := s.DB.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
|
||||
ID: sandboxID, Status: "snapshotting", Status_2: origStatus,
|
||||
}); rerr != nil {
|
||||
slog.Warn("failed to roll back snapshotting→"+origStatus, "id", id.FormatSandboxID(sandboxID), "error", rerr)
|
||||
}
|
||||
return db.Sandbox{}, "", err
|
||||
}
|
||||
|
||||
sandboxIDStr := id.FormatSandboxID(sandboxID)
|
||||
hostIDStr := id.FormatHostID(sb.HostID)
|
||||
teamIDStr := id.FormatTeamID(sb.TeamID)
|
||||
|
||||
// Notify other clients that the badge moved to "snapshotting".
|
||||
s.publishStateChanged(ctx, sandboxIDStr, teamIDStr, hostIDStr, origStatus, "snapshotting")
|
||||
|
||||
go s.snapshotInBackground(sandboxID, sandboxIDStr, hostIDStr, teamIDStr, teamID, agent, name, origStatus, sb.Vcpus, sb.MemoryMb)
|
||||
|
||||
sb.Status = "snapshotting"
|
||||
return sb, name, nil
|
||||
}
|
||||
|
||||
func (s *SandboxService) snapshotInBackground(
|
||||
sandboxID pgtype.UUID, sandboxIDStr, hostIDStr, teamIDStr string, teamID pgtype.UUID,
|
||||
agent hostagentClient, name, origStatus string, vcpus, memoryMB int32,
|
||||
) {
|
||||
bgCtx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
|
||||
defer cancel()
|
||||
|
||||
newTemplateID := id.NewSandboxID() // any random UUID
|
||||
templateUUID := pgtype.UUID{Bytes: newTemplateID.Bytes, Valid: true}
|
||||
|
||||
resp, err := agent.CreateSnapshot(ctx, connect.NewRequest(&pb.CreateSnapshotRequest{
|
||||
SandboxId: id.FormatSandboxID(sandboxID),
|
||||
resp, err := agent.CreateSnapshot(bgCtx, connect.NewRequest(&pb.CreateSnapshotRequest{
|
||||
SandboxId: sandboxIDStr,
|
||||
Name: name,
|
||||
TeamId: id.UUIDString(teamID),
|
||||
TemplateId: id.UUIDString(templateUUID),
|
||||
}))
|
||||
if err != nil {
|
||||
return db.Template{}, fmt.Errorf("agent snapshot: %w", err)
|
||||
|
||||
// Either way, the host-side op is done; return the badge to its original
|
||||
// status (running for a live snapshot, paused for an on-disk one). Use a CAS
|
||||
// so a concurrent Destroy (which sets "stopping") wins: if the CAS misses,
|
||||
// the sandbox is no longer ours and we must NOT announce its old status. The
|
||||
// snapshot itself is still valid and is registered below — a snapshot
|
||||
// template outlives its source sandbox.
|
||||
if _, derr := s.DB.UpdateSandboxStatusIf(bgCtx, db.UpdateSandboxStatusIfParams{
|
||||
ID: sandboxID, Status: "snapshotting", Status_2: origStatus,
|
||||
}); derr != nil {
|
||||
slog.Warn("snapshotting→"+origStatus+" CAS missed (sandbox moved on); skipping state signal", "sandbox_id", sandboxIDStr, "error", derr)
|
||||
} else {
|
||||
s.publishStateChanged(bgCtx, sandboxIDStr, teamIDStr, hostIDStr, "snapshotting", origStatus)
|
||||
}
|
||||
|
||||
tmpl, err := s.DB.InsertTemplate(ctx, db.InsertTemplateParams{
|
||||
if err != nil {
|
||||
slog.Warn("background snapshot failed", "sandbox_id", sandboxIDStr, "error", err)
|
||||
s.publishEvent(bgCtx, SandboxStateEvent{
|
||||
Event: "sandbox.snapshot_failed", SandboxID: sandboxIDStr, TeamID: teamIDStr, HostID: hostIDStr,
|
||||
Metadata: map[string]string{"name": name}, Error: err.Error(), Timestamp: time.Now().Unix(),
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
if _, err := s.DB.InsertTemplate(bgCtx, db.InsertTemplateParams{
|
||||
ID: templateUUID,
|
||||
Name: name,
|
||||
Type: "snapshot",
|
||||
Vcpus: sb.Vcpus,
|
||||
MemoryMb: sb.MemoryMb,
|
||||
Vcpus: vcpus,
|
||||
MemoryMb: memoryMB,
|
||||
SizeBytes: resp.Msg.SizeBytes,
|
||||
TeamID: teamID,
|
||||
DefaultUser: "",
|
||||
DefaultEnv: []byte("{}"),
|
||||
Metadata: []byte("{}"),
|
||||
})
|
||||
if err != nil {
|
||||
return db.Template{}, fmt.Errorf("insert template: %w", err)
|
||||
}); err != nil {
|
||||
slog.Warn("failed to insert snapshot template", "sandbox_id", sandboxIDStr, "name", name, "error", err)
|
||||
s.publishEvent(bgCtx, SandboxStateEvent{
|
||||
Event: "sandbox.snapshot_failed", SandboxID: sandboxIDStr, TeamID: teamIDStr, HostID: hostIDStr,
|
||||
Metadata: map[string]string{"name": name}, Error: "failed to register snapshot", Timestamp: time.Now().Unix(),
|
||||
})
|
||||
return
|
||||
}
|
||||
return tmpl, nil
|
||||
|
||||
s.publishEvent(bgCtx, SandboxStateEvent{
|
||||
Event: "sandbox.snapshotted", SandboxID: sandboxIDStr, TeamID: teamIDStr, HostID: hostIDStr,
|
||||
Metadata: map[string]string{"name": name}, Timestamp: time.Now().Unix(),
|
||||
})
|
||||
}
|
||||
|
||||
// publishStateChanged emits a transient capsule.state.changed event so the
|
||||
// dashboard flips the status badge during a transition that has no terminal
|
||||
// lifecycle verb of its own (e.g. the snapshotting round-trip).
|
||||
func (s *SandboxService) publishStateChanged(ctx context.Context, sandboxIDStr, teamIDStr, hostIDStr, from, to string) {
|
||||
s.publishEvent(ctx, SandboxStateEvent{
|
||||
Event: "sandbox.state_changed", SandboxID: sandboxIDStr, TeamID: teamIDStr, HostID: hostIDStr,
|
||||
Metadata: map[string]string{"from": from, "to": to}, Timestamp: time.Now().Unix(),
|
||||
})
|
||||
}
|
||||
|
||||
// Destroy stops a sandbox asynchronously. Pre-marks the DB status as
|
||||
|
||||
@ -1,393 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# prepare-wrenn-user.sh — Create the wrenn system user and configure minimal privileges.
|
||||
#
|
||||
# Creates a locked-down 'wrenn' system user that can run wrenn-agent and wrenn-cp
|
||||
# with only the privileges they need. The agent binary gets Linux capabilities
|
||||
# via setcap — no sudo is configured for the wrenn user at all. If an attacker
|
||||
# compromises the wrenn user, they cannot escalate via sudo.
|
||||
#
|
||||
# What this script does:
|
||||
# 1. Creates the 'wrenn' system user (bash shell for debugging, no home dir)
|
||||
# 2. Creates required directories with correct ownership
|
||||
# 3. Sets Linux capabilities on wrenn-agent and all child binaries
|
||||
# 4. Installs an apt hook to restore capabilities after package updates
|
||||
# 5. Installs a sudoers drop-in (comment-only, no grants — absence is the cage)
|
||||
# 6. Ensures required kernel modules are loaded
|
||||
# 7. Writes systemd unit files for both wrenn-agent and wrenn-cp
|
||||
#
|
||||
# Usage:
|
||||
# sudo bash scripts/prepare-wrenn-user.sh
|
||||
#
|
||||
# Prerequisites:
|
||||
# - wrenn-agent binary at /usr/local/bin/wrenn-agent
|
||||
# - wrenn-cp binary at /usr/local/bin/wrenn-cp
|
||||
# - cloud-hypervisor binary at /usr/local/bin/cloud-hypervisor
|
||||
# - libcap2-bin installed (for setcap)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ── Guard ────────────────────────────────────────────────────────────────────
|
||||
|
||||
if [[ $EUID -ne 0 ]]; then
|
||||
echo "ERROR: This script must be run as root."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ── Configuration ────────────────────────────────────────────────────────────
|
||||
|
||||
WRENN_USER="wrenn"
|
||||
WRENN_GROUP="wrenn"
|
||||
WRENN_DIR="/var/lib/wrenn"
|
||||
AGENT_BIN="/usr/local/bin/wrenn-agent"
|
||||
CP_BIN="/usr/local/bin/wrenn-cp"
|
||||
CH_BIN="/usr/local/bin/cloud-hypervisor"
|
||||
RESTORE_CAPS_SCRIPT="/etc/wrenn/restore-caps.sh"
|
||||
|
||||
# ── 1. Create system user ───────────────────────────────────────────────────
|
||||
|
||||
if id "${WRENN_USER}" &>/dev/null; then
|
||||
echo "==> User '${WRENN_USER}' already exists, skipping creation."
|
||||
else
|
||||
echo "==> Creating system user '${WRENN_USER}'..."
|
||||
useradd \
|
||||
--system \
|
||||
--no-create-home \
|
||||
--home-dir "${WRENN_DIR}" \
|
||||
--shell /bin/bash \
|
||||
"${WRENN_USER}"
|
||||
fi
|
||||
|
||||
# Add wrenn to kvm group for /dev/kvm access.
|
||||
if getent group kvm &>/dev/null; then
|
||||
usermod -aG kvm "${WRENN_USER}"
|
||||
echo "==> Added '${WRENN_USER}' to 'kvm' group."
|
||||
fi
|
||||
|
||||
# ── 2. Create directories with correct ownership ────────────────────────────
|
||||
|
||||
echo "==> Setting up directories..."
|
||||
|
||||
directories=(
|
||||
"${WRENN_DIR}"
|
||||
"${WRENN_DIR}/images"
|
||||
"${WRENN_DIR}/kernels"
|
||||
"${WRENN_DIR}/sandboxes"
|
||||
"${WRENN_DIR}/snapshots"
|
||||
"${WRENN_DIR}/logs"
|
||||
"/run/netns"
|
||||
)
|
||||
|
||||
for dir in "${directories[@]}"; do
|
||||
mkdir -p "${dir}"
|
||||
done
|
||||
|
||||
# Only chown wrenn-owned dirs (not /run/netns which is system-managed).
|
||||
for dir in "${WRENN_DIR}" "${WRENN_DIR}/images" "${WRENN_DIR}/kernels" \
|
||||
"${WRENN_DIR}/sandboxes" "${WRENN_DIR}/snapshots" "${WRENN_DIR}/logs"; do
|
||||
chown "${WRENN_USER}:${WRENN_GROUP}" "${dir}"
|
||||
chmod 750 "${dir}"
|
||||
done
|
||||
|
||||
# ── 3. Set capabilities on binaries ─────────────────────────────────────────
|
||||
#
|
||||
# These capabilities replace full root access. The wrenn-agent binary gets
|
||||
# exactly the capabilities it needs for:
|
||||
#
|
||||
# CAP_SYS_ADMIN — network namespaces (netns create/enter), mount namespaces
|
||||
# (unshare -m), losetup, dmsetup, mount/umount
|
||||
# CAP_NET_ADMIN — veth/TAP creation (netlink), iptables rules, IP forwarding,
|
||||
# routing table manipulation
|
||||
# CAP_NET_RAW — raw socket access (needed by iptables internally)
|
||||
# CAP_SYS_PTRACE — reading /proc/self/ns/net (netns.Get)
|
||||
# CAP_KILL — sending SIGTERM/SIGKILL to Cloud Hypervisor processes
|
||||
# CAP_DAC_OVERRIDE — accessing /dev/loop*, /dev/mapper/*, /dev/net/tun,
|
||||
# /proc/sys/net/ipv4/ip_forward
|
||||
# CAP_MKNOD — creating device nodes (dm-snapshot)
|
||||
#
|
||||
# The 'ep' suffix means Effective + Permitted (granted at exec time).
|
||||
|
||||
echo "==> Setting capabilities on wrenn-agent..."
|
||||
|
||||
if [[ ! -f "${AGENT_BIN}" ]]; then
|
||||
echo "WARNING: ${AGENT_BIN} not found, skipping setcap. Install the binary first."
|
||||
else
|
||||
setcap \
|
||||
cap_sys_admin,cap_net_admin,cap_net_raw,cap_sys_ptrace,cap_kill,cap_dac_override,cap_mknod+ep \
|
||||
"${AGENT_BIN}"
|
||||
|
||||
echo " Capabilities set on ${AGENT_BIN}:"
|
||||
getcap "${AGENT_BIN}"
|
||||
fi
|
||||
|
||||
# Cloud Hypervisor also needs capabilities when spawned by a non-root parent.
|
||||
# CAP_NET_ADMIN is required for network device access inside the netns.
|
||||
if [[ -f "${CH_BIN}" ]]; then
|
||||
setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep "${CH_BIN}"
|
||||
echo " Capabilities set on ${CH_BIN}:"
|
||||
getcap "${CH_BIN}"
|
||||
fi
|
||||
|
||||
# ── Helper: resolve binary path and apply setcap ────────────────────────────
|
||||
#
|
||||
# Uses `command -v` to find the binary in PATH (handles /usr/bin vs /usr/sbin
|
||||
# differences across distros), then `readlink -f` to resolve symlinks so that
|
||||
# setcap hits the real inode (important for iptables-nft/alternatives).
|
||||
|
||||
setcap_binary() {
|
||||
local name="$1" caps="$2"
|
||||
local bin
|
||||
bin=$(command -v "$name" 2>/dev/null) || {
|
||||
echo " WARNING: ${name} not found in PATH, skipping."
|
||||
return 0
|
||||
}
|
||||
bin=$(readlink -f "$bin")
|
||||
setcap "$caps" "$bin"
|
||||
echo " $(getcap "$bin")"
|
||||
}
|
||||
|
||||
# The child binaries invoked by wrenn-agent (iptables, losetup, dmsetup, etc.)
|
||||
# also need capabilities since they'll be exec'd by a non-root user.
|
||||
echo "==> Setting capabilities on child binaries..."
|
||||
|
||||
setcap_binary iptables "cap_net_admin,cap_net_raw+ep"
|
||||
setcap_binary iptables-save "cap_net_admin,cap_net_raw+ep"
|
||||
setcap_binary ip "cap_sys_admin,cap_net_admin+ep"
|
||||
setcap_binary sysctl "cap_net_admin+ep"
|
||||
setcap_binary losetup "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary blockdev "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary dmsetup "cap_sys_admin,cap_dac_override,cap_mknod+ep"
|
||||
setcap_binary e2fsck "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary resize2fs "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary dd "cap_dac_override+ep"
|
||||
setcap_binary unshare "cap_sys_admin+ep"
|
||||
setcap_binary mount "cap_sys_admin,cap_dac_override+ep"
|
||||
|
||||
# ── 4. Persist capabilities across package updates ──────────────────────────
|
||||
#
|
||||
# apt/dpkg overwrites binaries on package updates, which strips the xattr-based
|
||||
# capabilities set by setcap. This installs:
|
||||
# - /etc/wrenn/restore-caps.sh: re-applies setcap to all child binaries
|
||||
# - /etc/apt/apt.conf.d/99-wrenn-setcap: apt post-invoke hook that calls it
|
||||
|
||||
echo "==> Installing capability restore hook..."
|
||||
|
||||
mkdir -p /etc/wrenn
|
||||
|
||||
cat > "${RESTORE_CAPS_SCRIPT}" << 'RESTORE'
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# restore-caps.sh — Re-apply Linux capabilities to wrenn child binaries.
|
||||
# Called automatically by apt after package updates (see /etc/apt/apt.conf.d/99-wrenn-setcap).
|
||||
# Can also be run manually: sudo /etc/wrenn/restore-caps.sh
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
setcap_binary() {
|
||||
local name="$1" caps="$2"
|
||||
local bin
|
||||
bin=$(command -v "$name" 2>/dev/null) || return 0
|
||||
bin=$(readlink -f "$bin")
|
||||
setcap "$caps" "$bin" 2>/dev/null || true
|
||||
}
|
||||
|
||||
# wrenn-agent and cloud-hypervisor (only if present — they aren't package-managed).
|
||||
[[ -f /usr/local/bin/wrenn-agent ]] && \
|
||||
setcap cap_sys_admin,cap_net_admin,cap_net_raw,cap_sys_ptrace,cap_kill,cap_dac_override,cap_mknod+ep \
|
||||
/usr/local/bin/wrenn-agent 2>/dev/null || true
|
||||
[[ -f /usr/local/bin/cloud-hypervisor ]] && \
|
||||
setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep \
|
||||
/usr/local/bin/cloud-hypervisor 2>/dev/null || true
|
||||
|
||||
# Child binaries (these are the ones wiped by apt).
|
||||
setcap_binary iptables "cap_net_admin,cap_net_raw+ep"
|
||||
setcap_binary iptables-save "cap_net_admin,cap_net_raw+ep"
|
||||
setcap_binary ip "cap_sys_admin,cap_net_admin+ep"
|
||||
setcap_binary sysctl "cap_net_admin+ep"
|
||||
setcap_binary losetup "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary blockdev "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary dmsetup "cap_sys_admin,cap_dac_override,cap_mknod+ep"
|
||||
setcap_binary e2fsck "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary resize2fs "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary dd "cap_dac_override+ep"
|
||||
setcap_binary unshare "cap_sys_admin+ep"
|
||||
setcap_binary mount "cap_sys_admin,cap_dac_override+ep"
|
||||
RESTORE
|
||||
|
||||
chmod 755 "${RESTORE_CAPS_SCRIPT}"
|
||||
|
||||
cat > /etc/apt/apt.conf.d/99-wrenn-setcap << 'APT'
|
||||
// Re-apply Linux capabilities to wrenn child binaries after any package update.
|
||||
// Capabilities (xattr) are stripped when dpkg overwrites a binary.
|
||||
DPkg::Post-Invoke { "/etc/wrenn/restore-caps.sh"; };
|
||||
APT
|
||||
|
||||
echo " Installed ${RESTORE_CAPS_SCRIPT} and apt post-invoke hook."
|
||||
|
||||
# ── 5. Device access ────────────────────────────────────────────────────────
|
||||
#
|
||||
# /dev/kvm — handled by kvm group membership above
|
||||
# /dev/net/tun — needs to be accessible by wrenn user
|
||||
|
||||
echo "==> Configuring device access..."
|
||||
|
||||
# Ensure /dev/net/tun is accessible (udev rule for persistence across reboots).
|
||||
cat > /etc/udev/rules.d/99-wrenn.rules << 'UDEV'
|
||||
# Allow wrenn user access to TUN device for TAP networking.
|
||||
SUBSYSTEM=="misc", KERNEL=="tun", GROUP="wrenn", MODE="0660"
|
||||
UDEV
|
||||
|
||||
udevadm control --reload-rules 2>/dev/null || true
|
||||
echo " Installed udev rule for /dev/net/tun."
|
||||
|
||||
# ── 6. Kernel modules ───────────────────────────────────────────────────────
|
||||
|
||||
echo "==> Ensuring kernel modules are loaded..."
|
||||
|
||||
modules=(dm_snapshot dm_mod loop tun)
|
||||
for mod in "${modules[@]}"; do
|
||||
if ! lsmod | grep -q "^${mod}"; then
|
||||
modprobe "${mod}" 2>/dev/null && echo " Loaded ${mod}" || echo " WARNING: Could not load ${mod}"
|
||||
else
|
||||
echo " ${mod} already loaded."
|
||||
fi
|
||||
done
|
||||
|
||||
# Persist across reboots.
|
||||
for mod in "${modules[@]}"; do
|
||||
grep -qxF "${mod}" /etc/modules-load.d/wrenn.conf 2>/dev/null || echo "${mod}" >> /etc/modules-load.d/wrenn.conf
|
||||
done
|
||||
echo " Module persistence written to /etc/modules-load.d/wrenn.conf."
|
||||
|
||||
# ── 7. Sudoers ──────────────────────────────────────────────────────────────
|
||||
#
|
||||
# The wrenn user has no sudo grants. The absence of a grant is the cage — an
|
||||
# explicit "!ALL" deny is weaker due to known bypasses (CVE-2019-14287).
|
||||
# This file exists purely as documentation for operators running `sudo -l`.
|
||||
|
||||
echo "==> Writing sudoers drop-in..."
|
||||
|
||||
cat > /etc/sudoers.d/wrenn << 'SUDOERS'
|
||||
# Wrenn system user — no sudo access permitted.
|
||||
# All privilege is granted via Linux capabilities on specific binaries (setcap).
|
||||
# This file contains no active rules. The absence of any grant is intentional
|
||||
# and is the strongest way to deny escalation.
|
||||
#
|
||||
# Do not add rules here. If the wrenn user needs new privileges, use setcap
|
||||
# on the specific binary instead.
|
||||
SUDOERS
|
||||
|
||||
chmod 440 /etc/sudoers.d/wrenn
|
||||
visudo -c -f /etc/sudoers.d/wrenn
|
||||
echo " /etc/sudoers.d/wrenn installed and validated."
|
||||
|
||||
# ── 8. Systemd units ────────────────────────────────────────────────────────
|
||||
|
||||
echo "==> Writing systemd service files..."
|
||||
|
||||
cat > /etc/systemd/system/wrenn-agent.service << 'UNIT'
|
||||
[Unit]
|
||||
Description=Wrenn Host Agent
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=wrenn
|
||||
Group=wrenn
|
||||
EnvironmentFile=-/etc/wrenn/agent.env
|
||||
|
||||
# The binary has capabilities set via setcap. These systemd directives ensure
|
||||
# the capabilities are inherited into the process at exec time.
|
||||
AmbientCapabilities=CAP_SYS_ADMIN CAP_NET_ADMIN CAP_NET_RAW CAP_SYS_PTRACE CAP_KILL CAP_DAC_OVERRIDE CAP_MKNOD
|
||||
CapabilityBoundingSet=CAP_SYS_ADMIN CAP_NET_ADMIN CAP_NET_RAW CAP_SYS_PTRACE CAP_KILL CAP_DAC_OVERRIDE CAP_MKNOD
|
||||
|
||||
# IMPORTANT: must be false — child binaries (iptables, losetup, dmsetup, etc.)
|
||||
# have their own file capabilities via setcap which must be honored at exec time.
|
||||
NoNewPrivileges=false
|
||||
|
||||
# Enable IP forwarding before the agent starts. The "+" prefix runs this
|
||||
# directive as root (bypassing User=wrenn) so it can write to procfs.
|
||||
ExecStartPre=+/bin/sh -c 'sysctl -w net.ipv4.ip_forward=1'
|
||||
|
||||
ExecStart=/usr/local/bin/wrenn-agent --address ${WRENN_ADVERTISE_ADDR}
|
||||
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
# File descriptor limits (Cloud Hypervisor + loop devices + sockets).
|
||||
LimitNOFILE=65536
|
||||
LimitNPROC=4096
|
||||
|
||||
# IO priority + cgroup weight. Large-VM snapshot writes (CH memfile dump,
|
||||
# zero-page hole punching, dm-snapshot flatten) can saturate a single-disk
|
||||
# host and starve sshd/journal reads. Best-effort scheduling class +
|
||||
# below-default cgroup weight lets latency-sensitive workloads keep up.
|
||||
IOSchedulingClass=best-effort
|
||||
IOSchedulingPriority=5
|
||||
IOWeight=50
|
||||
|
||||
# Protect host filesystem — only allow access to what's needed.
|
||||
ProtectHome=true
|
||||
ReadWritePaths=/var/lib/wrenn /tmp /run/netns /dev/mapper
|
||||
ReadOnlyPaths=/usr/local/bin/cloud-hypervisor
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
UNIT
|
||||
|
||||
cat > /etc/systemd/system/wrenn-cp.service << 'UNIT'
|
||||
[Unit]
|
||||
Description=Wrenn Control Plane
|
||||
After=network-online.target postgresql.service
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=wrenn
|
||||
Group=wrenn
|
||||
EnvironmentFile=-/etc/wrenn/cp.env
|
||||
|
||||
# Control plane is fully unprivileged — no capabilities needed.
|
||||
NoNewPrivileges=true
|
||||
CapabilityBoundingSet=
|
||||
|
||||
ExecStart=/usr/local/bin/wrenn-cp
|
||||
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
ProtectHome=true
|
||||
ProtectSystem=strict
|
||||
ReadWritePaths=/tmp
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
UNIT
|
||||
|
||||
mkdir -p /etc/wrenn
|
||||
touch /etc/wrenn/agent.env /etc/wrenn/cp.env
|
||||
chmod 640 /etc/wrenn/agent.env /etc/wrenn/cp.env
|
||||
chown root:${WRENN_GROUP} /etc/wrenn/agent.env /etc/wrenn/cp.env
|
||||
|
||||
systemctl daemon-reload
|
||||
echo " wrenn-agent.service and wrenn-cp.service installed."
|
||||
|
||||
# ── Done ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
echo ""
|
||||
echo "=== Setup complete ==="
|
||||
echo ""
|
||||
echo "Next steps:"
|
||||
echo " 1. Copy wrenn-agent and wrenn-cp binaries to /usr/local/bin/"
|
||||
echo " 2. Edit /etc/wrenn/agent.env with WRENN_CP_URL and WRENN_ADVERTISE_ADDR"
|
||||
echo " 3. Edit /etc/wrenn/cp.env with DATABASE_URL and other control plane config"
|
||||
echo " 4. systemctl enable --now wrenn-agent"
|
||||
echo " 5. systemctl enable --now wrenn-cp"
|
||||
echo ""
|
||||
echo "Security summary:"
|
||||
echo " - wrenn user: bash shell (for debugging), no home, no sudo (no grants in sudoers)"
|
||||
echo " - wrenn-agent: runs as wrenn with 7 capabilities via setcap (not root)"
|
||||
echo " - wrenn-cp: runs as wrenn with zero capabilities"
|
||||
echo " - Capabilities auto-restored after apt upgrades via /etc/wrenn/restore-caps.sh"
|
||||
echo ""
|
||||
@ -38,7 +38,9 @@ IMAGE_NAME="$2"
|
||||
OUTPUT_DIR="${WRENN_IMAGES_PATH}/${IMAGE_NAME}"
|
||||
OUTPUT_FILE="${OUTPUT_DIR}/rootfs.ext4"
|
||||
MOUNT_DIR="/tmp/wrenn-rootfs-build"
|
||||
TAR_FILE="/tmp/wrenn-rootfs-export-${IMAGE_NAME}.tar"
|
||||
# IMAGE_NAME may contain slashes (e.g. teams/<team>/<id>); flatten them so the
|
||||
# temp tar is a single file in /tmp rather than a path into a missing dir.
|
||||
TAR_FILE="/tmp/wrenn-rootfs-export-${IMAGE_NAME//\//_}.tar"
|
||||
|
||||
# Verify the container exists.
|
||||
if ! docker inspect "${CONTAINER}" > /dev/null 2>&1; then
|
||||
@ -121,16 +123,24 @@ if [ -z "${TINI_BIN}" ]; then
|
||||
aarch64) TINI_ARCH="arm64" ;;
|
||||
*) echo "ERROR: Unsupported architecture: ${ARCH}"; exit 1 ;;
|
||||
esac
|
||||
# Use the statically linked tini so the binary runs regardless of the
|
||||
# guest's libc (glibc on Ubuntu/Arch/Fedora, musl on Alpine).
|
||||
TINI_VERSION="v0.19.0"
|
||||
TINI_URL="https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"
|
||||
TINI_TMP="/tmp/tini-${TINI_ARCH}"
|
||||
echo " Downloading tini ${TINI_VERSION} (${TINI_ARCH})..."
|
||||
TINI_URL="https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-static-${TINI_ARCH}"
|
||||
TINI_TMP="/tmp/tini-static-${TINI_ARCH}"
|
||||
echo " Downloading tini ${TINI_VERSION} static (${TINI_ARCH})..."
|
||||
curl -fsSL "${TINI_URL}" -o "${TINI_TMP}"
|
||||
chmod +x "${TINI_TMP}"
|
||||
TINI_BIN="${TINI_TMP}"
|
||||
fi
|
||||
sudo mkdir -p "${MOUNT_DIR}/sbin"
|
||||
sudo cp "${TINI_BIN}" "${MOUNT_DIR}/sbin/tini"
|
||||
# On usr-merged distros (e.g. Fedora) /sbin is a symlink to /usr/bin, so a tini
|
||||
# already at /usr/bin/tini IS /sbin/tini — copying onto itself errors. Skip then.
|
||||
if [ "${TINI_BIN}" -ef "${MOUNT_DIR}/sbin/tini" ]; then
|
||||
echo " tini already at /sbin/tini (usr-merged); skipping copy"
|
||||
else
|
||||
sudo cp "${TINI_BIN}" "${MOUNT_DIR}/sbin/tini"
|
||||
fi
|
||||
sudo chmod 755 "${MOUNT_DIR}/sbin/tini"
|
||||
|
||||
# Step 6: Verify injected binaries and required container packages.
|
||||
|
||||
@ -1,32 +1,46 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# update-debug-rootfs.sh — Build envd and inject it (plus wrenn-init + tini) into the debug rootfs.
|
||||
# update-minimal-rootfs.sh — Rebuild envd and inject it (plus wrenn-init + tini)
|
||||
# into the system base rootfs images.
|
||||
#
|
||||
# This script:
|
||||
# 1. Builds a fresh envd static binary via make
|
||||
# 2. Mounts the rootfs image
|
||||
# 3. Copies envd, wrenn-init, and tini into the image
|
||||
# 4. Unmounts cleanly
|
||||
# 1. Builds a fresh envd static binary via make (once)
|
||||
# 2. For each system base rootfs (ubuntu/alpine/arch/fedora): mounts it,
|
||||
# copies envd + wrenn-init + tini in, and unmounts cleanly
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/update-debug-rootfs.sh [rootfs_path]
|
||||
# bash scripts/update-minimal-rootfs.sh [rootfs_path]
|
||||
#
|
||||
# Defaults to /var/lib/wrenn/images/minimal/rootfs.ext4
|
||||
# With no argument it updates all four system base rootfs images under
|
||||
# ${WRENN_DIR}/images/teams/<platform>/<id>/rootfs.ext4
|
||||
# With a path argument it updates only that single rootfs.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
WRENN_DIR="${WRENN_DIR:-/var/lib/wrenn}"
|
||||
ROOTFS="${1:-${WRENN_DIR}/images/minimal/rootfs.ext4}"
|
||||
MOUNT_DIR="/tmp/wrenn-rootfs-update"
|
||||
|
||||
if [ ! -f "${ROOTFS}" ]; then
|
||||
echo "ERROR: Rootfs not found at ${ROOTFS}"
|
||||
exit 1
|
||||
# base36(all-zeros UUID) = platform team that owns every system base template.
|
||||
PLATFORM_TEAM_B36="0000000000000000000000000"
|
||||
|
||||
# System base template IDs (well-known reserved IDs 0..3). Single-digit IDs, so
|
||||
# the 25-char base36 string is just the zero-padded decimal.
|
||||
SYSTEM_TEMPLATE_IDS=(0 1 2 3)
|
||||
|
||||
# Resolve which rootfs images to update.
|
||||
ROOTFS_LIST=()
|
||||
if [ $# -ge 1 ]; then
|
||||
ROOTFS_LIST=("$1")
|
||||
else
|
||||
for tid in "${SYSTEM_TEMPLATE_IDS[@]}"; do
|
||||
tmpl_b36="$(printf '%025d' "${tid}")"
|
||||
ROOTFS_LIST+=("${WRENN_DIR}/images/teams/${PLATFORM_TEAM_B36}/${tmpl_b36}/rootfs.ext4")
|
||||
done
|
||||
fi
|
||||
|
||||
# Step 1: Build envd.
|
||||
# Step 1: Build envd (once).
|
||||
echo "==> Building envd..."
|
||||
cd "${PROJECT_ROOT}"
|
||||
make build-envd
|
||||
@ -42,64 +56,84 @@ if ! ldd "${ENVD_BIN}" | grep -q "statically linked"; then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Step 2: Mount the rootfs.
|
||||
echo "==> Mounting rootfs at ${MOUNT_DIR}..."
|
||||
mkdir -p "${MOUNT_DIR}"
|
||||
sudo mount -o loop,rw "${ROOTFS}" "${MOUNT_DIR}"
|
||||
|
||||
cleanup() {
|
||||
echo "==> Unmounting rootfs..."
|
||||
sudo umount "${MOUNT_DIR}" 2>/dev/null || true
|
||||
rmdir "${MOUNT_DIR}" 2>/dev/null || true
|
||||
}
|
||||
trap cleanup EXIT
|
||||
|
||||
# Step 3: Copy files into rootfs.
|
||||
echo "==> Installing envd..."
|
||||
sudo mkdir -p "${MOUNT_DIR}/usr/local/bin"
|
||||
sudo cp "${ENVD_BIN}" "${MOUNT_DIR}/usr/local/bin/envd"
|
||||
sudo chmod 755 "${MOUNT_DIR}/usr/local/bin/envd"
|
||||
|
||||
echo "==> Installing wrenn-init..."
|
||||
sudo cp "${PROJECT_ROOT}/images/wrenn-init.sh" "${MOUNT_DIR}/usr/local/bin/wrenn-init"
|
||||
sudo chmod 755 "${MOUNT_DIR}/usr/local/bin/wrenn-init"
|
||||
|
||||
echo "==> Installing tini..."
|
||||
TINI_BIN=""
|
||||
# 1. Already in the rootfs?
|
||||
for p in "${MOUNT_DIR}/usr/bin/tini" "${MOUNT_DIR}/sbin/tini" "${MOUNT_DIR}/usr/local/bin/tini"; do
|
||||
if [ -f "$p" ]; then TINI_BIN="$p"; break; fi
|
||||
done
|
||||
# 2. Available on the host?
|
||||
if [ -z "${TINI_BIN}" ]; then
|
||||
for p in /usr/bin/tini /usr/local/bin/tini /sbin/tini; do
|
||||
if [ -f "$p" ]; then TINI_BIN="$p"; break; fi
|
||||
# resolve_tini ROOTFS_MOUNT — echo a path to a tini binary suitable for the
|
||||
# mounted rootfs. Prefers one already in the image, then a static download.
|
||||
resolve_tini() {
|
||||
local mount_dir="$1" p tini_arch arch
|
||||
for p in "${mount_dir}/usr/bin/tini" "${mount_dir}/sbin/tini" "${mount_dir}/usr/local/bin/tini"; do
|
||||
if [ -f "$p" ]; then echo "$p"; return; fi
|
||||
done
|
||||
fi
|
||||
# 3. Download from GitHub releases.
|
||||
if [ -z "${TINI_BIN}" ]; then
|
||||
ARCH="$(uname -m)"
|
||||
case "${ARCH}" in
|
||||
x86_64) TINI_ARCH="amd64" ;;
|
||||
aarch64) TINI_ARCH="arm64" ;;
|
||||
*) echo "ERROR: Unsupported architecture: ${ARCH}"; exit 1 ;;
|
||||
arch="$(uname -m)"
|
||||
case "${arch}" in
|
||||
x86_64) tini_arch="amd64" ;;
|
||||
aarch64) tini_arch="arm64" ;;
|
||||
*) echo "ERROR: Unsupported architecture: ${arch}" >&2; exit 1 ;;
|
||||
esac
|
||||
TINI_VERSION="v0.19.0"
|
||||
TINI_URL="https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"
|
||||
TINI_TMP="/tmp/tini-${TINI_ARCH}"
|
||||
echo " Downloading tini ${TINI_VERSION} (${TINI_ARCH})..."
|
||||
curl -fsSL "${TINI_URL}" -o "${TINI_TMP}"
|
||||
chmod +x "${TINI_TMP}"
|
||||
TINI_BIN="${TINI_TMP}"
|
||||
# Static tini runs under any libc (glibc or musl).
|
||||
local tmp="/tmp/tini-static-${tini_arch}"
|
||||
if [ ! -f "${tmp}" ]; then
|
||||
echo " Downloading tini v0.19.0 static (${tini_arch})..." >&2
|
||||
curl -fsSL "https://github.com/krallin/tini/releases/download/v0.19.0/tini-static-${tini_arch}" -o "${tmp}"
|
||||
chmod +x "${tmp}"
|
||||
fi
|
||||
echo "${tmp}"
|
||||
}
|
||||
|
||||
# inject_rootfs ROOTFS — mount, copy guest binaries in, unmount.
|
||||
inject_rootfs() {
|
||||
local rootfs="$1" tini_bin
|
||||
echo ""
|
||||
echo "==> Updating ${rootfs}"
|
||||
|
||||
mkdir -p "${MOUNT_DIR}"
|
||||
sudo mount -o loop,rw "${rootfs}" "${MOUNT_DIR}"
|
||||
|
||||
local mounted=1
|
||||
cleanup_mount() {
|
||||
if [ "${mounted}" = "1" ]; then
|
||||
sudo umount "${MOUNT_DIR}" 2>/dev/null || true
|
||||
rmdir "${MOUNT_DIR}" 2>/dev/null || true
|
||||
mounted=0
|
||||
fi
|
||||
}
|
||||
trap cleanup_mount RETURN
|
||||
|
||||
sudo mkdir -p "${MOUNT_DIR}/usr/local/bin"
|
||||
sudo cp "${ENVD_BIN}" "${MOUNT_DIR}/usr/local/bin/envd"
|
||||
sudo chmod 755 "${MOUNT_DIR}/usr/local/bin/envd"
|
||||
|
||||
sudo cp "${PROJECT_ROOT}/images/wrenn-init.sh" "${MOUNT_DIR}/usr/local/bin/wrenn-init"
|
||||
sudo chmod 755 "${MOUNT_DIR}/usr/local/bin/wrenn-init"
|
||||
|
||||
tini_bin="$(resolve_tini "${MOUNT_DIR}")"
|
||||
sudo mkdir -p "${MOUNT_DIR}/sbin"
|
||||
# On usr-merged distros (e.g. Fedora) /sbin -> /usr/bin, so a tini already at
|
||||
# /usr/bin/tini IS /sbin/tini — copying onto itself errors. Skip then.
|
||||
if [ "${tini_bin}" -ef "${MOUNT_DIR}/sbin/tini" ]; then
|
||||
echo " tini already at /sbin/tini (usr-merged); skipping copy"
|
||||
else
|
||||
sudo cp "${tini_bin}" "${MOUNT_DIR}/sbin/tini"
|
||||
fi
|
||||
sudo chmod 755 "${MOUNT_DIR}/sbin/tini"
|
||||
|
||||
ls -la "${MOUNT_DIR}/usr/local/bin/envd" "${MOUNT_DIR}/usr/local/bin/wrenn-init" "${MOUNT_DIR}/sbin/tini"
|
||||
cleanup_mount
|
||||
}
|
||||
|
||||
# Step 2: Update each rootfs that exists.
|
||||
UPDATED=0
|
||||
for rootfs in "${ROOTFS_LIST[@]}"; do
|
||||
if [ ! -f "${rootfs}" ]; then
|
||||
echo "==> Skipping (not found): ${rootfs}"
|
||||
continue
|
||||
fi
|
||||
inject_rootfs "${rootfs}"
|
||||
UPDATED=$((UPDATED + 1))
|
||||
done
|
||||
|
||||
echo ""
|
||||
if [ "${UPDATED}" -eq 0 ]; then
|
||||
echo "==> No rootfs images updated. Build them first with: make images"
|
||||
exit 1
|
||||
fi
|
||||
sudo mkdir -p "${MOUNT_DIR}/sbin"
|
||||
sudo cp "${TINI_BIN}" "${MOUNT_DIR}/sbin/tini"
|
||||
sudo chmod 755 "${MOUNT_DIR}/sbin/tini"
|
||||
|
||||
# Step 4: Verify.
|
||||
echo ""
|
||||
echo "==> Installed files:"
|
||||
ls -la "${MOUNT_DIR}/usr/local/bin/envd" "${MOUNT_DIR}/usr/local/bin/wrenn-init" "${MOUNT_DIR}/sbin/tini"
|
||||
|
||||
echo ""
|
||||
echo "==> Done. Rootfs updated: ${ROOTFS}"
|
||||
echo "==> Done. Updated ${UPDATED} rootfs image(s)."
|
||||
|
||||
Reference in New Issue
Block a user