Add device-mapper snapshots, test UI, fix pause ordering and lint errors
- Replace reflink rootfs copy with device-mapper snapshots (shared read-only loop device per base template, per-sandbox sparse CoW file) - Add devicemapper package with create/restore/remove/flatten operations and refcounted LoopRegistry for base image loop devices - Fix pause ordering: destroy VM before removing dm-snapshot to avoid "device busy" error (FC must release the dm device first) - Add test UI at GET /test for sandbox lifecycle management (create, pause, resume, destroy, exec, snapshot create/list/delete) - Fix DirSize to report actual disk usage (stat.Blocks * 512) instead of apparent size, so sparse CoW files report correctly - Add timing logs to pause flow for performance diagnostics - Fix all lint errors across api, network, vm, uffd, and sandbox packages - Remove obsolete internal/filesystem package (replaced by devicemapper) - Update CLAUDE.md with device-mapper architecture documentation
This commit is contained in:
@ -74,15 +74,15 @@ Startup (`cmd/control-plane/main.go`) wires: config (env vars) → pgxpool → `
|
|||||||
|
|
||||||
### Host Agent
|
### Host Agent
|
||||||
|
|
||||||
**Packages:** `internal/hostagent/`, `internal/sandbox/`, `internal/vm/`, `internal/network/`, `internal/filesystem/`, `internal/envdclient/`, `internal/snapshot/`
|
**Packages:** `internal/hostagent/`, `internal/sandbox/`, `internal/vm/`, `internal/network/`, `internal/devicemapper/`, `internal/envdclient/`, `internal/snapshot/`
|
||||||
|
|
||||||
Startup (`cmd/host-agent/main.go`) wires: root check → enable IP forwarding → `sandbox.Manager` (containing `vm.Manager` + `network.SlotAllocator`) → `hostagent.Server` (Connect RPC handler) → HTTP server.
|
Startup (`cmd/host-agent/main.go`) wires: root check → enable IP forwarding → clean up stale dm devices → `sandbox.Manager` (containing `vm.Manager` + `network.SlotAllocator` + `devicemapper.LoopRegistry`) → `hostagent.Server` (Connect RPC handler) → HTTP server.
|
||||||
|
|
||||||
- **RPC Server** (`internal/hostagent/server.go`): implements `hostagentv1connect.HostAgentServiceHandler`. Thin wrapper — every method delegates to `sandbox.Manager`. Maps Connect error codes on return.
|
- **RPC Server** (`internal/hostagent/server.go`): implements `hostagentv1connect.HostAgentServiceHandler`. Thin wrapper — every method delegates to `sandbox.Manager`. Maps Connect error codes on return.
|
||||||
- **Sandbox Manager** (`internal/sandbox/manager.go`): the core orchestration layer. Maintains in-memory state in `boxes map[string]*sandboxState` (protected by `sync.RWMutex`). Each `sandboxState` holds a `models.Sandbox`, a `*network.Slot`, and an `*envdclient.Client`. Runs a TTL reaper (every 10s) that auto-destroys timed-out sandboxes.
|
- **Sandbox Manager** (`internal/sandbox/manager.go`): the core orchestration layer. Maintains in-memory state in `boxes map[string]*sandboxState` (protected by `sync.RWMutex`). Each `sandboxState` holds a `models.Sandbox`, a `*network.Slot`, and an `*envdclient.Client`. Runs a TTL reaper (every 10s) that auto-destroys timed-out sandboxes.
|
||||||
- **VM Manager** (`internal/vm/manager.go`, `fc.go`, `config.go`): manages Firecracker processes. Uses raw HTTP API over Unix socket (`/tmp/fc-{sandboxID}.sock`), not the firecracker-go-sdk Machine type. Launches Firecracker via `unshare -m` + `ip netns exec`. Configures VM via PUT to `/boot-source`, `/drives/rootfs`, `/network-interfaces/eth0`, `/machine-config`, then starts with PUT `/actions`.
|
- **VM Manager** (`internal/vm/manager.go`, `fc.go`, `config.go`): manages Firecracker processes. Uses raw HTTP API over Unix socket (`/tmp/fc-{sandboxID}.sock`), not the firecracker-go-sdk Machine type. Launches Firecracker via `unshare -m` + `ip netns exec`. Configures VM via PUT to `/boot-source`, `/drives/rootfs`, `/network-interfaces/eth0`, `/machine-config`, then starts with PUT `/actions`.
|
||||||
- **Network** (`internal/network/setup.go`, `allocator.go`): per-sandbox network namespace with veth pair + TAP device. See Networking section below.
|
- **Network** (`internal/network/setup.go`, `allocator.go`): per-sandbox network namespace with veth pair + TAP device. See Networking section below.
|
||||||
- **Filesystem** (`internal/filesystem/clone.go`): CoW rootfs clones via `cp --reflink=auto`.
|
- **Device Mapper** (`internal/devicemapper/devicemapper.go`): CoW rootfs via device-mapper snapshots. Shared read-only loop devices per base template (refcounted `LoopRegistry`), per-sandbox sparse CoW files, dm-snapshot create/restore/remove/flatten operations.
|
||||||
- **envd Client** (`internal/envdclient/client.go`, `health.go`): dual interface to the guest agent. Connect RPC for streaming process exec (`process.Start()` bidirectional stream). Plain HTTP for file operations (POST/GET `/files?path=...&username=root`). Health check polls `GET /health` every 100ms until ready (30s timeout).
|
- **envd Client** (`internal/envdclient/client.go`, `health.go`): dual interface to the guest agent. Connect RPC for streaming process exec (`process.Start()` bidirectional stream). Plain HTTP for file operations (POST/GET `/files?path=...&username=root`). Health check polls `GET /health` every 100ms until ready (30s timeout).
|
||||||
|
|
||||||
### envd (Guest Agent)
|
### envd (Guest Agent)
|
||||||
@ -132,7 +132,7 @@ HIBERNATED → RUNNING (cold snapshot resume, slower)
|
|||||||
**Sandbox creation** (`POST /v1/sandboxes`):
|
**Sandbox creation** (`POST /v1/sandboxes`):
|
||||||
1. API handler generates sandbox ID, inserts into DB as "pending"
|
1. API handler generates sandbox ID, inserts into DB as "pending"
|
||||||
2. RPC `CreateSandbox` → host agent → `sandbox.Manager.Create()`
|
2. RPC `CreateSandbox` → host agent → `sandbox.Manager.Create()`
|
||||||
3. Manager: resolve base rootfs → `cp --reflink` clone → allocate network slot → `CreateNetwork()` (netns + veth + tap + NAT) → `vm.Create()` (start Firecracker, configure via HTTP API, boot) → `envdclient.WaitUntilReady()` (poll /health) → store in-memory state
|
3. Manager: resolve base rootfs → acquire shared loop device → create dm-snapshot (sparse CoW file) → allocate network slot → `CreateNetwork()` (netns + veth + tap + NAT) → `vm.Create()` (start Firecracker with `/dev/mapper/wrenn-{id}`, configure via HTTP API, boot) → `envdclient.WaitUntilReady()` (poll /health) → store in-memory state
|
||||||
4. API handler updates DB to "running" with host_ip
|
4. API handler updates DB to "running" with host_ip
|
||||||
|
|
||||||
**Command execution** (`POST /v1/sandboxes/{id}/exec`):
|
**Command execution** (`POST /v1/sandboxes/{id}/exec`):
|
||||||
@ -181,6 +181,7 @@ To add a new query: add it to the appropriate `.sql` file in `db/queries/` → `
|
|||||||
- **Buf + protoc-gen-connect-go** for code generation (not protoc-gen-go-grpc)
|
- **Buf + protoc-gen-connect-go** for code generation (not protoc-gen-go-grpc)
|
||||||
- **Raw Firecracker HTTP API** via Unix socket (not firecracker-go-sdk Machine type)
|
- **Raw Firecracker HTTP API** via Unix socket (not firecracker-go-sdk Machine type)
|
||||||
- **TAP networking** (not vsock) for host-to-envd communication
|
- **TAP networking** (not vsock) for host-to-envd communication
|
||||||
|
- **Device-mapper snapshots** for rootfs CoW — shared read-only loop device per base template, per-sandbox sparse CoW file, Firecracker gets `/dev/mapper/wrenn-{id}`
|
||||||
- **PostgreSQL** via pgx/v5 + sqlc (type-safe query generation). Goose for migrations (plain SQL, up/down)
|
- **PostgreSQL** via pgx/v5 + sqlc (type-safe query generation). Goose for migrations (plain SQL, up/down)
|
||||||
- **Admin UI**: htmx + Go html/template + chi router. No SPA, no React, no build step
|
- **Admin UI**: htmx + Go html/template + chi router. No SPA, no React, no build step
|
||||||
- **Lago** for billing (external service, not in this codebase)
|
- **Lago** for billing (external service, not in this codebase)
|
||||||
|
|||||||
@ -9,6 +9,7 @@ import (
|
|||||||
"syscall"
|
"syscall"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"git.omukk.dev/wrenn/sandbox/internal/devicemapper"
|
||||||
"git.omukk.dev/wrenn/sandbox/internal/hostagent"
|
"git.omukk.dev/wrenn/sandbox/internal/hostagent"
|
||||||
"git.omukk.dev/wrenn/sandbox/internal/sandbox"
|
"git.omukk.dev/wrenn/sandbox/internal/sandbox"
|
||||||
"git.omukk.dev/wrenn/sandbox/proto/hostagent/gen/hostagentv1connect"
|
"git.omukk.dev/wrenn/sandbox/proto/hostagent/gen/hostagentv1connect"
|
||||||
@ -29,6 +30,9 @@ func main() {
|
|||||||
slog.Warn("failed to enable ip_forward", "error", err)
|
slog.Warn("failed to enable ip_forward", "error", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Clean up any stale dm-snapshot devices from a previous crash.
|
||||||
|
devicemapper.CleanupStaleDevices()
|
||||||
|
|
||||||
listenAddr := envOrDefault("AGENT_LISTEN_ADDR", ":50051")
|
listenAddr := envOrDefault("AGENT_LISTEN_ADDR", ":50051")
|
||||||
kernelPath := envOrDefault("AGENT_KERNEL_PATH", "/var/lib/wrenn/kernels/vmlinux")
|
kernelPath := envOrDefault("AGENT_KERNEL_PATH", "/var/lib/wrenn/kernels/vmlinux")
|
||||||
imagesPath := envOrDefault("AGENT_IMAGES_PATH", "/var/lib/wrenn/images")
|
imagesPath := envOrDefault("AGENT_IMAGES_PATH", "/var/lib/wrenn/images")
|
||||||
|
|||||||
5
go.mod
5
go.mod
@ -5,20 +5,19 @@ go 1.25.0
|
|||||||
require (
|
require (
|
||||||
connectrpc.com/connect v1.19.1
|
connectrpc.com/connect v1.19.1
|
||||||
github.com/go-chi/chi/v5 v5.2.5
|
github.com/go-chi/chi/v5 v5.2.5
|
||||||
|
github.com/google/uuid v1.6.0
|
||||||
github.com/gorilla/websocket v1.5.3
|
github.com/gorilla/websocket v1.5.3
|
||||||
github.com/jackc/pgx/v5 v5.8.0
|
github.com/jackc/pgx/v5 v5.8.0
|
||||||
github.com/vishvananda/netlink v1.1.1-0.20210330154013-f5de75959ad5
|
github.com/vishvananda/netlink v1.1.1-0.20210330154013-f5de75959ad5
|
||||||
github.com/vishvananda/netns v0.0.0-20210104183010-2eb08e3e575f
|
github.com/vishvananda/netns v0.0.0-20210104183010-2eb08e3e575f
|
||||||
|
golang.org/x/sys v0.42.0
|
||||||
google.golang.org/protobuf v1.36.11
|
google.golang.org/protobuf v1.36.11
|
||||||
)
|
)
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/bits-and-blooms/bitset v1.24.4 // indirect
|
|
||||||
github.com/google/uuid v1.6.0 // indirect
|
|
||||||
github.com/jackc/pgpassfile v1.0.0 // indirect
|
github.com/jackc/pgpassfile v1.0.0 // indirect
|
||||||
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
|
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
|
||||||
github.com/jackc/puddle/v2 v2.2.2 // indirect
|
github.com/jackc/puddle/v2 v2.2.2 // indirect
|
||||||
golang.org/x/sync v0.17.0 // indirect
|
golang.org/x/sync v0.17.0 // indirect
|
||||||
golang.org/x/sys v0.42.0 // indirect
|
|
||||||
golang.org/x/text v0.29.0 // indirect
|
golang.org/x/text v0.29.0 // indirect
|
||||||
)
|
)
|
||||||
|
|||||||
4
go.sum
4
go.sum
@ -1,7 +1,5 @@
|
|||||||
connectrpc.com/connect v1.19.1 h1:R5M57z05+90EfEvCY1b7hBxDVOUl45PrtXtAV2fOC14=
|
connectrpc.com/connect v1.19.1 h1:R5M57z05+90EfEvCY1b7hBxDVOUl45PrtXtAV2fOC14=
|
||||||
connectrpc.com/connect v1.19.1/go.mod h1:tN20fjdGlewnSFeZxLKb0xwIZ6ozc3OQs2hTXy4du9w=
|
connectrpc.com/connect v1.19.1/go.mod h1:tN20fjdGlewnSFeZxLKb0xwIZ6ozc3OQs2hTXy4du9w=
|
||||||
github.com/bits-and-blooms/bitset v1.24.4 h1:95H15Og1clikBrKr/DuzMXkQzECs1M6hhoGXLwLQOZE=
|
|
||||||
github.com/bits-and-blooms/bitset v1.24.4/go.mod h1:7hO7Gc7Pp1vODcmWvKMRA9BNmbv6a/7QIWpPxHddWR8=
|
|
||||||
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||||
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
||||||
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||||
@ -37,8 +35,6 @@ golang.org/x/sync v0.17.0 h1:l60nONMj9l5drqw6jlhIELNv9I0A4OFgRsG9k2oT9Ug=
|
|||||||
golang.org/x/sync v0.17.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
|
golang.org/x/sync v0.17.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
|
||||||
golang.org/x/sys v0.0.0-20200217220822-9197077df867/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
golang.org/x/sys v0.0.0-20200217220822-9197077df867/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
golang.org/x/sys v0.0.0-20200728102440-3e129f6d46b1/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
golang.org/x/sys v0.0.0-20200728102440-3e129f6d46b1/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
golang.org/x/sys v0.41.0 h1:Ivj+2Cp/ylzLiEU89QhWblYnOE9zerudt9Ftecq2C6k=
|
|
||||||
golang.org/x/sys v0.41.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
|
|
||||||
golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
|
golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
|
||||||
golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
|
golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
|
||||||
golang.org/x/text v0.29.0 h1:1neNs90w9YzJ9BocxfsQNHKuAT4pkghyXc4nhZ6sJvk=
|
golang.org/x/text v0.29.0 h1:1neNs90w9YzJ9BocxfsQNHKuAT4pkghyXc4nhZ6sJvk=
|
||||||
|
|||||||
@ -3,6 +3,7 @@ package api
|
|||||||
import (
|
import (
|
||||||
"encoding/base64"
|
"encoding/base64"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
|
"log/slog"
|
||||||
"net/http"
|
"net/http"
|
||||||
"time"
|
"time"
|
||||||
"unicode/utf8"
|
"unicode/utf8"
|
||||||
@ -85,13 +86,15 @@ func (h *execHandler) Exec(w http.ResponseWriter, r *http.Request) {
|
|||||||
duration := time.Since(start)
|
duration := time.Since(start)
|
||||||
|
|
||||||
// Update last active.
|
// Update last active.
|
||||||
h.db.UpdateLastActive(ctx, db.UpdateLastActiveParams{
|
if err := h.db.UpdateLastActive(ctx, db.UpdateLastActiveParams{
|
||||||
ID: sandboxID,
|
ID: sandboxID,
|
||||||
LastActiveAt: pgtype.Timestamptz{
|
LastActiveAt: pgtype.Timestamptz{
|
||||||
Time: time.Now(),
|
Time: time.Now(),
|
||||||
Valid: true,
|
Valid: true,
|
||||||
},
|
},
|
||||||
})
|
}); err != nil {
|
||||||
|
slog.Warn("failed to update last_active_at", "id", sandboxID, "error", err)
|
||||||
|
}
|
||||||
|
|
||||||
// Use base64 encoding if output contains non-UTF-8 bytes.
|
// Use base64 encoding if output contains non-UTF-8 bytes.
|
||||||
stdout := resp.Msg.Stdout
|
stdout := resp.Msg.Stdout
|
||||||
|
|||||||
@ -37,11 +37,6 @@ type wsStartMsg struct {
|
|||||||
Args []string `json:"args"`
|
Args []string `json:"args"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// wsStopMsg is sent by the client to stop the process.
|
|
||||||
type wsStopMsg struct {
|
|
||||||
Type string `json:"type"` // "stop"
|
|
||||||
}
|
|
||||||
|
|
||||||
// wsOutMsg is sent by the server for process events.
|
// wsOutMsg is sent by the server for process events.
|
||||||
type wsOutMsg struct {
|
type wsOutMsg struct {
|
||||||
Type string `json:"type"` // "start", "stdout", "stderr", "exit", "error"
|
Type string `json:"type"` // "start", "stdout", "stderr", "exit", "error"
|
||||||
|
|||||||
@ -128,5 +128,5 @@ func (h *filesHandler) Download(w http.ResponseWriter, r *http.Request) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
w.Header().Set("Content-Type", "application/octet-stream")
|
w.Header().Set("Content-Type", "application/octet-stream")
|
||||||
w.Write(resp.Msg.Content)
|
_, _ = w.Write(resp.Msg.Content)
|
||||||
}
|
}
|
||||||
|
|||||||
@ -2,6 +2,7 @@ package api
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"io"
|
"io"
|
||||||
|
"log/slog"
|
||||||
"mime"
|
"mime"
|
||||||
"mime/multipart"
|
"mime/multipart"
|
||||||
"net/http"
|
"net/http"
|
||||||
@ -189,6 +190,6 @@ func (h *filesStreamHandler) StreamDownload(w http.ResponseWriter, r *http.Reque
|
|||||||
|
|
||||||
if err := stream.Err(); err != nil {
|
if err := stream.Err(); err != nil {
|
||||||
// Headers already sent, nothing we can do but log.
|
// Headers already sent, nothing we can do but log.
|
||||||
// The client will see a truncated response.
|
slog.Warn("file stream error after headers sent", "error", err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -111,7 +111,7 @@ func (h *sandboxHandler) Create(w http.ResponseWriter, r *http.Request) {
|
|||||||
sandboxID := id.NewSandboxID()
|
sandboxID := id.NewSandboxID()
|
||||||
|
|
||||||
// Insert pending record.
|
// Insert pending record.
|
||||||
sb, err := h.db.InsertSandbox(ctx, db.InsertSandboxParams{
|
_, err := h.db.InsertSandbox(ctx, db.InsertSandboxParams{
|
||||||
ID: sandboxID,
|
ID: sandboxID,
|
||||||
OwnerID: "",
|
OwnerID: "",
|
||||||
HostID: "default",
|
HostID: "default",
|
||||||
@ -136,9 +136,11 @@ func (h *sandboxHandler) Create(w http.ResponseWriter, r *http.Request) {
|
|||||||
TimeoutSec: req.TimeoutSec,
|
TimeoutSec: req.TimeoutSec,
|
||||||
}))
|
}))
|
||||||
if err != nil {
|
if err != nil {
|
||||||
h.db.UpdateSandboxStatus(ctx, db.UpdateSandboxStatusParams{
|
if _, dbErr := h.db.UpdateSandboxStatus(ctx, db.UpdateSandboxStatusParams{
|
||||||
ID: sandboxID, Status: "error",
|
ID: sandboxID, Status: "error",
|
||||||
})
|
}); dbErr != nil {
|
||||||
|
slog.Warn("failed to update sandbox status to error", "id", sandboxID, "error", dbErr)
|
||||||
|
}
|
||||||
status, code, msg := agentErrToHTTP(err)
|
status, code, msg := agentErrToHTTP(err)
|
||||||
writeError(w, status, code, msg)
|
writeError(w, status, code, msg)
|
||||||
return
|
return
|
||||||
@ -146,7 +148,7 @@ func (h *sandboxHandler) Create(w http.ResponseWriter, r *http.Request) {
|
|||||||
|
|
||||||
// Update to running.
|
// Update to running.
|
||||||
now := time.Now()
|
now := time.Now()
|
||||||
sb, err = h.db.UpdateSandboxRunning(ctx, db.UpdateSandboxRunningParams{
|
sb, err := h.db.UpdateSandboxRunning(ctx, db.UpdateSandboxRunningParams{
|
||||||
ID: sandboxID,
|
ID: sandboxID,
|
||||||
HostIp: resp.Msg.HostIp,
|
HostIp: resp.Msg.HostIp,
|
||||||
GuestIp: "",
|
GuestIp: "",
|
||||||
|
|||||||
@ -84,7 +84,9 @@ func (h *snapshotHandler) Create(w http.ResponseWriter, r *http.Request) {
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
// Delete existing template record and files.
|
// Delete existing template record and files.
|
||||||
h.db.DeleteTemplate(ctx, req.Name)
|
if err := h.db.DeleteTemplate(ctx, req.Name); err != nil {
|
||||||
|
slog.Warn("failed to delete existing template", "name", req.Name, "error", err)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Verify sandbox exists and is running or paused.
|
// Verify sandbox exists and is running or paused.
|
||||||
|
|||||||
435
internal/api/handlers_test_ui.go
Normal file
435
internal/api/handlers_test_ui.go
Normal file
@ -0,0 +1,435 @@
|
|||||||
|
package api
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"net/http"
|
||||||
|
)
|
||||||
|
|
||||||
|
func serveTestUI(w http.ResponseWriter, r *http.Request) {
|
||||||
|
w.Header().Set("Content-Type", "text/html; charset=utf-8")
|
||||||
|
fmt.Fprint(w, testUIHTML)
|
||||||
|
}
|
||||||
|
|
||||||
|
const testUIHTML = `<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
<title>Wrenn Sandbox — Test Console</title>
|
||||||
|
<style>
|
||||||
|
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||||
|
body {
|
||||||
|
font-family: 'Menlo', 'Consolas', 'JetBrains Mono', monospace;
|
||||||
|
font-size: 13px;
|
||||||
|
background: #0f1211;
|
||||||
|
color: #c8c4bc;
|
||||||
|
padding: 16px;
|
||||||
|
}
|
||||||
|
h1 { font-size: 18px; color: #e8e5df; margin-bottom: 12px; }
|
||||||
|
h2 { font-size: 14px; color: #89a785; margin: 16px 0 8px; border-bottom: 1px solid #262c2a; padding-bottom: 4px; }
|
||||||
|
.grid { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; }
|
||||||
|
.panel {
|
||||||
|
background: #151918;
|
||||||
|
border: 1px solid #262c2a;
|
||||||
|
border-radius: 8px;
|
||||||
|
padding: 12px;
|
||||||
|
}
|
||||||
|
.full { grid-column: 1 / -1; }
|
||||||
|
label { display: block; color: #8a867f; margin: 6px 0 2px; font-size: 11px; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||||
|
input, select {
|
||||||
|
width: 100%;
|
||||||
|
background: #1b201e;
|
||||||
|
border: 1px solid #262c2a;
|
||||||
|
color: #e8e5df;
|
||||||
|
padding: 6px 8px;
|
||||||
|
border-radius: 4px;
|
||||||
|
font-family: inherit;
|
||||||
|
font-size: 13px;
|
||||||
|
}
|
||||||
|
input:focus, select:focus { outline: none; border-color: #5e8c58; }
|
||||||
|
.btn-row { display: flex; gap: 6px; margin-top: 8px; flex-wrap: wrap; }
|
||||||
|
button {
|
||||||
|
padding: 6px 14px;
|
||||||
|
border: 1px solid #262c2a;
|
||||||
|
border-radius: 4px;
|
||||||
|
font-family: inherit;
|
||||||
|
font-size: 12px;
|
||||||
|
cursor: pointer;
|
||||||
|
background: #1b201e;
|
||||||
|
color: #c8c4bc;
|
||||||
|
transition: all 0.15s;
|
||||||
|
}
|
||||||
|
button:hover { border-color: #5e8c58; color: #e8e5df; }
|
||||||
|
.btn-green { background: #2a3d28; border-color: #5e8c58; color: #89a785; }
|
||||||
|
.btn-green:hover { background: #3a5035; }
|
||||||
|
.btn-red { background: #3d2828; border-color: #b35544; color: #c27b6d; }
|
||||||
|
.btn-red:hover { background: #4d3030; }
|
||||||
|
.btn-amber { background: #3d3428; border-color: #9e7c2e; color: #c8a84e; }
|
||||||
|
.btn-amber:hover { background: #4d4030; }
|
||||||
|
.btn-blue { background: #28343d; border-color: #3d7aac; color: #6da0cc; }
|
||||||
|
.btn-blue:hover { background: #304050; }
|
||||||
|
table { width: 100%; border-collapse: collapse; margin-top: 8px; }
|
||||||
|
th { text-align: left; font-size: 11px; color: #8a867f; text-transform: uppercase; letter-spacing: 0.05em; padding: 4px 8px; border-bottom: 1px solid #262c2a; }
|
||||||
|
td { padding: 6px 8px; border-bottom: 1px solid #1b201e; }
|
||||||
|
tr:hover td { background: #1b201e; }
|
||||||
|
.status { display: inline-block; padding: 2px 8px; border-radius: 10px; font-size: 11px; font-weight: 600; }
|
||||||
|
.status-running { background: rgba(94,140,88,0.15); color: #89a785; }
|
||||||
|
.status-paused { background: rgba(158,124,46,0.15); color: #c8a84e; }
|
||||||
|
.status-pending { background: rgba(61,122,172,0.15); color: #6da0cc; }
|
||||||
|
.status-stopped { background: rgba(138,134,127,0.15); color: #8a867f; }
|
||||||
|
.status-error { background: rgba(179,85,68,0.15); color: #c27b6d; }
|
||||||
|
.status-hibernated { background: rgba(61,122,172,0.15); color: #6da0cc; }
|
||||||
|
.log {
|
||||||
|
background: #0f1211;
|
||||||
|
border: 1px solid #262c2a;
|
||||||
|
border-radius: 4px;
|
||||||
|
padding: 8px;
|
||||||
|
max-height: 300px;
|
||||||
|
overflow-y: auto;
|
||||||
|
margin-top: 8px;
|
||||||
|
font-size: 12px;
|
||||||
|
white-space: pre-wrap;
|
||||||
|
word-break: break-all;
|
||||||
|
}
|
||||||
|
.log-entry { margin-bottom: 4px; }
|
||||||
|
.log-time { color: #5f5c57; }
|
||||||
|
.log-ok { color: #89a785; }
|
||||||
|
.log-err { color: #c27b6d; }
|
||||||
|
.log-info { color: #6da0cc; }
|
||||||
|
.exec-output {
|
||||||
|
background: #0f1211;
|
||||||
|
border: 1px solid #262c2a;
|
||||||
|
border-radius: 4px;
|
||||||
|
padding: 8px;
|
||||||
|
max-height: 200px;
|
||||||
|
overflow-y: auto;
|
||||||
|
margin-top: 8px;
|
||||||
|
font-size: 12px;
|
||||||
|
white-space: pre-wrap;
|
||||||
|
}
|
||||||
|
.clickable { cursor: pointer; color: #89a785; text-decoration: underline; }
|
||||||
|
.clickable:hover { color: #aacdaa; }
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<h1>Wrenn Sandbox Test Console</h1>
|
||||||
|
|
||||||
|
<div class="grid">
|
||||||
|
<!-- Create Sandbox -->
|
||||||
|
<div class="panel">
|
||||||
|
<h2>Create Sandbox</h2>
|
||||||
|
<label>Template</label>
|
||||||
|
<input type="text" id="create-template" value="minimal" placeholder="minimal or snapshot name">
|
||||||
|
<label>vCPUs</label>
|
||||||
|
<input type="number" id="create-vcpus" value="1" min="1" max="8">
|
||||||
|
<label>Memory (MB)</label>
|
||||||
|
<input type="number" id="create-memory" value="512" min="128" max="8192">
|
||||||
|
<label>Timeout (sec)</label>
|
||||||
|
<input type="number" id="create-timeout" value="300" min="30">
|
||||||
|
<div class="btn-row">
|
||||||
|
<button class="btn-green" onclick="createSandbox()">Create</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- Snapshot Management -->
|
||||||
|
<div class="panel">
|
||||||
|
<h2>Create Snapshot</h2>
|
||||||
|
<label>Sandbox ID</label>
|
||||||
|
<input type="text" id="snap-sandbox-id" placeholder="sb-xxxxxxxx">
|
||||||
|
<label>Snapshot Name (optional)</label>
|
||||||
|
<input type="text" id="snap-name" placeholder="auto-generated if empty">
|
||||||
|
<div class="btn-row">
|
||||||
|
<button class="btn-amber" onclick="createSnapshot()">Create Snapshot</button>
|
||||||
|
<label style="display:inline-flex;align-items:center;margin:0;font-size:12px;text-transform:none;letter-spacing:0">
|
||||||
|
<input type="checkbox" id="snap-overwrite" style="width:auto;margin-right:4px"> Overwrite
|
||||||
|
</label>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<h2>Snapshots / Templates</h2>
|
||||||
|
<div class="btn-row">
|
||||||
|
<button onclick="listSnapshots()">Refresh</button>
|
||||||
|
</div>
|
||||||
|
<div id="snapshots-table"></div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- Execute Command -->
|
||||||
|
<div class="panel">
|
||||||
|
<h2>Execute Command</h2>
|
||||||
|
<label>Sandbox ID</label>
|
||||||
|
<input type="text" id="exec-sandbox-id" placeholder="sb-xxxxxxxx">
|
||||||
|
<label>Command</label>
|
||||||
|
<input type="text" id="exec-cmd" value="/bin/sh" placeholder="/bin/sh">
|
||||||
|
<label>Args (comma separated)</label>
|
||||||
|
<input type="text" id="exec-args" value="-c,uname -a" placeholder="-c,echo hello">
|
||||||
|
<div class="btn-row">
|
||||||
|
<button class="btn-green" onclick="execCmd()">Run</button>
|
||||||
|
</div>
|
||||||
|
<div id="exec-output" class="exec-output" style="display:none"></div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- Activity Log -->
|
||||||
|
<div class="panel">
|
||||||
|
<h2>Activity Log</h2>
|
||||||
|
<div id="log" class="log"></div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- Sandboxes List -->
|
||||||
|
<div class="panel full">
|
||||||
|
<h2>Sandboxes</h2>
|
||||||
|
<div class="btn-row">
|
||||||
|
<button onclick="listSandboxes()">Refresh</button>
|
||||||
|
<label style="display:inline-flex;align-items:center;margin:0;font-size:12px;text-transform:none;letter-spacing:0">
|
||||||
|
<input type="checkbox" id="auto-refresh" style="width:auto;margin-right:4px"> Auto-refresh (5s)
|
||||||
|
</label>
|
||||||
|
</div>
|
||||||
|
<div id="sandboxes-table"></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
const API = '';
|
||||||
|
|
||||||
|
function log(msg, level) {
|
||||||
|
const el = document.getElementById('log');
|
||||||
|
const t = new Date().toLocaleTimeString();
|
||||||
|
const cls = level === 'ok' ? 'log-ok' : level === 'err' ? 'log-err' : 'log-info';
|
||||||
|
el.innerHTML = '<div class="log-entry"><span class="log-time">' + t + '</span> <span class="' + cls + '">' + esc(msg) + '</span></div>' + el.innerHTML;
|
||||||
|
}
|
||||||
|
|
||||||
|
function esc(s) {
|
||||||
|
const d = document.createElement('div');
|
||||||
|
d.textContent = s;
|
||||||
|
return d.innerHTML;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function api(method, path, body) {
|
||||||
|
const opts = { method, headers: {} };
|
||||||
|
if (body) {
|
||||||
|
opts.headers['Content-Type'] = 'application/json';
|
||||||
|
opts.body = JSON.stringify(body);
|
||||||
|
}
|
||||||
|
const resp = await fetch(API + path, opts);
|
||||||
|
if (resp.status === 204) return null;
|
||||||
|
const data = await resp.json();
|
||||||
|
if (resp.status >= 300) {
|
||||||
|
throw new Error(data.error ? data.error.message : resp.statusText);
|
||||||
|
}
|
||||||
|
return data;
|
||||||
|
}
|
||||||
|
|
||||||
|
function statusBadge(s) {
|
||||||
|
return '<span class="status status-' + s + '">' + s + '</span>';
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- Sandboxes ---
|
||||||
|
|
||||||
|
async function listSandboxes() {
|
||||||
|
try {
|
||||||
|
const data = await api('GET', '/v1/sandboxes');
|
||||||
|
renderSandboxes(data);
|
||||||
|
} catch (e) {
|
||||||
|
log('List sandboxes failed: ' + e.message, 'err');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderSandboxes(sandboxes) {
|
||||||
|
if (!sandboxes || sandboxes.length === 0) {
|
||||||
|
document.getElementById('sandboxes-table').innerHTML = '<p style="color:#5f5c57;margin-top:8px">No sandboxes</p>';
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
let html = '<table><thead><tr><th>ID</th><th>Status</th><th>Template</th><th>vCPUs</th><th>Mem</th><th>Host IP</th><th>Created</th><th>Actions</th></tr></thead><tbody>';
|
||||||
|
for (const sb of sandboxes) {
|
||||||
|
html += '<tr>';
|
||||||
|
html += '<td class="clickable" onclick="useSandbox(\'' + sb.id + '\')">' + sb.id + '</td>';
|
||||||
|
html += '<td>' + statusBadge(sb.status) + '</td>';
|
||||||
|
html += '<td>' + esc(sb.template) + '</td>';
|
||||||
|
html += '<td>' + sb.vcpus + '</td>';
|
||||||
|
html += '<td>' + sb.memory_mb + 'MB</td>';
|
||||||
|
html += '<td>' + (sb.host_ip || '-') + '</td>';
|
||||||
|
html += '<td>' + new Date(sb.created_at).toLocaleTimeString() + '</td>';
|
||||||
|
html += '<td><div class="btn-row">';
|
||||||
|
if (sb.status === 'running') {
|
||||||
|
html += '<button class="btn-amber" onclick="pauseSandbox(\'' + sb.id + '\')">Pause</button>';
|
||||||
|
html += '<button class="btn-red" onclick="destroySandbox(\'' + sb.id + '\')">Destroy</button>';
|
||||||
|
} else if (sb.status === 'paused') {
|
||||||
|
html += '<button class="btn-green" onclick="resumeSandbox(\'' + sb.id + '\')">Resume</button>';
|
||||||
|
html += '<button class="btn-red" onclick="destroySandbox(\'' + sb.id + '\')">Destroy</button>';
|
||||||
|
} else {
|
||||||
|
html += '<button class="btn-red" onclick="destroySandbox(\'' + sb.id + '\')">Destroy</button>';
|
||||||
|
}
|
||||||
|
html += '</div></td>';
|
||||||
|
html += '</tr>';
|
||||||
|
}
|
||||||
|
html += '</tbody></table>';
|
||||||
|
document.getElementById('sandboxes-table').innerHTML = html;
|
||||||
|
}
|
||||||
|
|
||||||
|
function useSandbox(id) {
|
||||||
|
document.getElementById('exec-sandbox-id').value = id;
|
||||||
|
document.getElementById('snap-sandbox-id').value = id;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function createSandbox() {
|
||||||
|
const template = document.getElementById('create-template').value;
|
||||||
|
const vcpus = parseInt(document.getElementById('create-vcpus').value);
|
||||||
|
const memory_mb = parseInt(document.getElementById('create-memory').value);
|
||||||
|
const timeout_sec = parseInt(document.getElementById('create-timeout').value);
|
||||||
|
log('Creating sandbox (template=' + template + ', vcpus=' + vcpus + ', mem=' + memory_mb + 'MB)...', 'info');
|
||||||
|
try {
|
||||||
|
const data = await api('POST', '/v1/sandboxes', { template, vcpus, memory_mb, timeout_sec });
|
||||||
|
log('Created sandbox ' + data.id + ' [' + data.status + ']', 'ok');
|
||||||
|
listSandboxes();
|
||||||
|
} catch (e) {
|
||||||
|
log('Create failed: ' + e.message, 'err');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function pauseSandbox(id) {
|
||||||
|
log('Pausing ' + id + '...', 'info');
|
||||||
|
try {
|
||||||
|
await api('POST', '/v1/sandboxes/' + id + '/pause');
|
||||||
|
log('Paused ' + id, 'ok');
|
||||||
|
listSandboxes();
|
||||||
|
} catch (e) {
|
||||||
|
log('Pause failed: ' + e.message, 'err');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function resumeSandbox(id) {
|
||||||
|
log('Resuming ' + id + '...', 'info');
|
||||||
|
try {
|
||||||
|
await api('POST', '/v1/sandboxes/' + id + '/resume');
|
||||||
|
log('Resumed ' + id, 'ok');
|
||||||
|
listSandboxes();
|
||||||
|
} catch (e) {
|
||||||
|
log('Resume failed: ' + e.message, 'err');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function destroySandbox(id) {
|
||||||
|
log('Destroying ' + id + '...', 'info');
|
||||||
|
try {
|
||||||
|
await api('DELETE', '/v1/sandboxes/' + id);
|
||||||
|
log('Destroyed ' + id, 'ok');
|
||||||
|
listSandboxes();
|
||||||
|
} catch (e) {
|
||||||
|
log('Destroy failed: ' + e.message, 'err');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- Exec ---
|
||||||
|
|
||||||
|
async function execCmd() {
|
||||||
|
const sandboxId = document.getElementById('exec-sandbox-id').value;
|
||||||
|
const cmd = document.getElementById('exec-cmd').value;
|
||||||
|
const argsStr = document.getElementById('exec-args').value;
|
||||||
|
const args = argsStr ? argsStr.split(',').map(s => s.trim()) : [];
|
||||||
|
|
||||||
|
if (!sandboxId) { log('No sandbox ID for exec', 'err'); return; }
|
||||||
|
|
||||||
|
const out = document.getElementById('exec-output');
|
||||||
|
out.style.display = 'block';
|
||||||
|
out.textContent = 'Running...';
|
||||||
|
|
||||||
|
log('Exec on ' + sandboxId + ': ' + cmd + ' ' + args.join(' '), 'info');
|
||||||
|
try {
|
||||||
|
const data = await api('POST', '/v1/sandboxes/' + sandboxId + '/exec', { cmd, args });
|
||||||
|
let text = '';
|
||||||
|
if (data.stdout) text += data.stdout;
|
||||||
|
if (data.stderr) text += '\n[stderr]\n' + data.stderr;
|
||||||
|
text += '\n[exit_code=' + data.exit_code + ', duration=' + data.duration_ms + 'ms]';
|
||||||
|
out.textContent = text;
|
||||||
|
log('Exec completed (exit=' + data.exit_code + ')', data.exit_code === 0 ? 'ok' : 'err');
|
||||||
|
} catch (e) {
|
||||||
|
out.textContent = 'Error: ' + e.message;
|
||||||
|
log('Exec failed: ' + e.message, 'err');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- Snapshots ---
|
||||||
|
|
||||||
|
async function createSnapshot() {
|
||||||
|
const sandbox_id = document.getElementById('snap-sandbox-id').value;
|
||||||
|
const name = document.getElementById('snap-name').value;
|
||||||
|
const overwrite = document.getElementById('snap-overwrite').checked;
|
||||||
|
|
||||||
|
if (!sandbox_id) { log('No sandbox ID for snapshot', 'err'); return; }
|
||||||
|
|
||||||
|
const body = { sandbox_id };
|
||||||
|
if (name) body.name = name;
|
||||||
|
|
||||||
|
const qs = overwrite ? '?overwrite=true' : '';
|
||||||
|
log('Creating snapshot from ' + sandbox_id + (name ? ' as "' + name + '"' : '') + '...', 'info');
|
||||||
|
try {
|
||||||
|
const data = await api('POST', '/v1/snapshots' + qs, body);
|
||||||
|
log('Snapshot created: ' + data.name + ' (' + (data.size_bytes / 1024 / 1024).toFixed(1) + 'MB)', 'ok');
|
||||||
|
listSnapshots();
|
||||||
|
listSandboxes();
|
||||||
|
} catch (e) {
|
||||||
|
log('Snapshot failed: ' + e.message, 'err');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function listSnapshots() {
|
||||||
|
try {
|
||||||
|
const data = await api('GET', '/v1/snapshots');
|
||||||
|
renderSnapshots(data);
|
||||||
|
} catch (e) {
|
||||||
|
log('List snapshots failed: ' + e.message, 'err');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderSnapshots(snapshots) {
|
||||||
|
if (!snapshots || snapshots.length === 0) {
|
||||||
|
document.getElementById('snapshots-table').innerHTML = '<p style="color:#5f5c57;margin-top:8px">No snapshots</p>';
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
let html = '<table><thead><tr><th>Name</th><th>Type</th><th>vCPUs</th><th>Mem</th><th>Size</th><th>Actions</th></tr></thead><tbody>';
|
||||||
|
for (const s of snapshots) {
|
||||||
|
html += '<tr>';
|
||||||
|
html += '<td class="clickable" onclick="useTemplate(\'' + esc(s.name) + '\')">' + esc(s.name) + '</td>';
|
||||||
|
html += '<td>' + s.type + '</td>';
|
||||||
|
html += '<td>' + (s.vcpus || '-') + '</td>';
|
||||||
|
html += '<td>' + (s.memory_mb ? s.memory_mb + 'MB' : '-') + '</td>';
|
||||||
|
html += '<td>' + (s.size_bytes / 1024 / 1024).toFixed(1) + 'MB</td>';
|
||||||
|
html += '<td><button class="btn-red" onclick="deleteSnapshot(\'' + esc(s.name) + '\')">Delete</button></td>';
|
||||||
|
html += '</tr>';
|
||||||
|
}
|
||||||
|
html += '</tbody></table>';
|
||||||
|
document.getElementById('snapshots-table').innerHTML = html;
|
||||||
|
}
|
||||||
|
|
||||||
|
function useTemplate(name) {
|
||||||
|
document.getElementById('create-template').value = name;
|
||||||
|
log('Template set to "' + name + '" — click Create to launch from this snapshot', 'info');
|
||||||
|
}
|
||||||
|
|
||||||
|
async function deleteSnapshot(name) {
|
||||||
|
log('Deleting snapshot "' + name + '"...', 'info');
|
||||||
|
try {
|
||||||
|
await api('DELETE', '/v1/snapshots/' + encodeURIComponent(name));
|
||||||
|
log('Deleted snapshot "' + name + '"', 'ok');
|
||||||
|
listSnapshots();
|
||||||
|
} catch (e) {
|
||||||
|
log('Delete snapshot failed: ' + e.message, 'err');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- Auto-refresh ---
|
||||||
|
let refreshInterval = null;
|
||||||
|
document.getElementById('auto-refresh').addEventListener('change', function() {
|
||||||
|
if (this.checked) {
|
||||||
|
refreshInterval = setInterval(listSandboxes, 5000);
|
||||||
|
} else {
|
||||||
|
clearInterval(refreshInterval);
|
||||||
|
refreshInterval = null;
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// --- Init ---
|
||||||
|
listSandboxes();
|
||||||
|
listSnapshots();
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
</html>`
|
||||||
@ -24,7 +24,7 @@ type errorDetail struct {
|
|||||||
func writeJSON(w http.ResponseWriter, status int, v any) {
|
func writeJSON(w http.ResponseWriter, status int, v any) {
|
||||||
w.Header().Set("Content-Type", "application/json")
|
w.Header().Set("Content-Type", "application/json")
|
||||||
w.WriteHeader(status)
|
w.WriteHeader(status)
|
||||||
json.NewEncoder(w).Encode(v)
|
_ = json.NewEncoder(w).Encode(v)
|
||||||
}
|
}
|
||||||
|
|
||||||
func writeError(w http.ResponseWriter, status int, code, message string) {
|
func writeError(w http.ResponseWriter, status int, code, message string) {
|
||||||
|
|||||||
@ -35,6 +35,9 @@ func New(queries *db.Queries, agent hostagentv1connect.HostAgentServiceClient) *
|
|||||||
r.Get("/openapi.yaml", serveOpenAPI)
|
r.Get("/openapi.yaml", serveOpenAPI)
|
||||||
r.Get("/docs", serveDocs)
|
r.Get("/docs", serveDocs)
|
||||||
|
|
||||||
|
// Test UI for sandbox lifecycle management.
|
||||||
|
r.Get("/test", serveTestUI)
|
||||||
|
|
||||||
// Sandbox CRUD.
|
// Sandbox CRUD.
|
||||||
r.Route("/v1/sandboxes", func(r chi.Router) {
|
r.Route("/v1/sandboxes", func(r chi.Router) {
|
||||||
r.Post("/", sandbox.Create)
|
r.Post("/", sandbox.Create)
|
||||||
@ -71,7 +74,7 @@ func (s *Server) Handler() http.Handler {
|
|||||||
|
|
||||||
func serveOpenAPI(w http.ResponseWriter, r *http.Request) {
|
func serveOpenAPI(w http.ResponseWriter, r *http.Request) {
|
||||||
w.Header().Set("Content-Type", "application/yaml")
|
w.Header().Set("Content-Type", "application/yaml")
|
||||||
w.Write(openapiYAML)
|
_, _ = w.Write(openapiYAML)
|
||||||
}
|
}
|
||||||
|
|
||||||
func serveDocs(w http.ResponseWriter, r *http.Request) {
|
func serveDocs(w http.ResponseWriter, r *http.Request) {
|
||||||
|
|||||||
317
internal/devicemapper/devicemapper.go
Normal file
317
internal/devicemapper/devicemapper.go
Normal file
@ -0,0 +1,317 @@
|
|||||||
|
// Package devicemapper provides device-mapper snapshot operations for
|
||||||
|
// copy-on-write rootfs management. Each sandbox gets a dm-snapshot backed
|
||||||
|
// by a shared read-only loop device (the base template image) and a
|
||||||
|
// per-sandbox sparse CoW file that stores only modified blocks.
|
||||||
|
package devicemapper
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"log/slog"
|
||||||
|
"os"
|
||||||
|
"os/exec"
|
||||||
|
"strconv"
|
||||||
|
"strings"
|
||||||
|
"sync"
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
// ChunkSize is the dm-snapshot chunk size in 512-byte sectors.
|
||||||
|
// 8 sectors = 4KB, matching the standard page/block size.
|
||||||
|
ChunkSize = 8
|
||||||
|
)
|
||||||
|
|
||||||
|
// loopEntry tracks a loop device and its reference count.
|
||||||
|
type loopEntry struct {
|
||||||
|
device string // e.g., /dev/loop0
|
||||||
|
refcount int
|
||||||
|
}
|
||||||
|
|
||||||
|
// LoopRegistry manages loop devices for base template images.
|
||||||
|
// Each unique image path gets one read-only loop device, shared
|
||||||
|
// across all sandboxes using that template. Reference counting
|
||||||
|
// ensures the loop device is released when no sandboxes use it.
|
||||||
|
type LoopRegistry struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
entries map[string]*loopEntry // imagePath → loopEntry
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewLoopRegistry creates a new loop device registry.
|
||||||
|
func NewLoopRegistry() *LoopRegistry {
|
||||||
|
return &LoopRegistry{
|
||||||
|
entries: make(map[string]*loopEntry),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Acquire returns a read-only loop device for the given image path.
|
||||||
|
// If one already exists, its refcount is incremented. Otherwise a new
|
||||||
|
// loop device is created via losetup.
|
||||||
|
func (r *LoopRegistry) Acquire(imagePath string) (string, error) {
|
||||||
|
r.mu.Lock()
|
||||||
|
defer r.mu.Unlock()
|
||||||
|
|
||||||
|
if e, ok := r.entries[imagePath]; ok {
|
||||||
|
e.refcount++
|
||||||
|
slog.Debug("loop device reused", "image", imagePath, "device", e.device, "refcount", e.refcount)
|
||||||
|
return e.device, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
dev, err := losetupCreate(imagePath)
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("losetup %s: %w", imagePath, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
r.entries[imagePath] = &loopEntry{device: dev, refcount: 1}
|
||||||
|
slog.Info("loop device created", "image", imagePath, "device", dev)
|
||||||
|
return dev, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Release decrements the refcount for the given image path.
|
||||||
|
// When the refcount reaches zero, the loop device is detached.
|
||||||
|
func (r *LoopRegistry) Release(imagePath string) {
|
||||||
|
r.mu.Lock()
|
||||||
|
defer r.mu.Unlock()
|
||||||
|
|
||||||
|
e, ok := r.entries[imagePath]
|
||||||
|
if !ok {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
e.refcount--
|
||||||
|
if e.refcount <= 0 {
|
||||||
|
if err := losetupDetach(e.device); err != nil {
|
||||||
|
slog.Warn("losetup detach failed", "device", e.device, "error", err)
|
||||||
|
}
|
||||||
|
delete(r.entries, imagePath)
|
||||||
|
slog.Info("loop device released", "image", imagePath, "device", e.device)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ReleaseAll detaches all loop devices. Used during shutdown.
|
||||||
|
func (r *LoopRegistry) ReleaseAll() {
|
||||||
|
r.mu.Lock()
|
||||||
|
defer r.mu.Unlock()
|
||||||
|
|
||||||
|
for path, e := range r.entries {
|
||||||
|
if err := losetupDetach(e.device); err != nil {
|
||||||
|
slog.Warn("losetup detach failed", "device", e.device, "error", err)
|
||||||
|
}
|
||||||
|
delete(r.entries, path)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// SnapshotDevice holds the state for a single dm-snapshot device.
|
||||||
|
type SnapshotDevice struct {
|
||||||
|
Name string // dm device name, e.g., "wrenn-sb-a1b2c3d4"
|
||||||
|
DevicePath string // /dev/mapper/<Name>
|
||||||
|
CowPath string // path to the sparse CoW file
|
||||||
|
CowLoopDev string // loop device for the CoW file
|
||||||
|
}
|
||||||
|
|
||||||
|
// CreateSnapshot sets up a new dm-snapshot device.
|
||||||
|
//
|
||||||
|
// It creates a sparse CoW file, attaches it as a loop device, and creates
|
||||||
|
// a device-mapper snapshot target combining the read-only origin with the
|
||||||
|
// writable CoW layer.
|
||||||
|
//
|
||||||
|
// The origin loop device must already exist (from LoopRegistry.Acquire).
|
||||||
|
func CreateSnapshot(name, originLoopDev, cowPath string, originSizeBytes int64) (*SnapshotDevice, error) {
|
||||||
|
// Create sparse CoW file sized to match the origin.
|
||||||
|
if err := createSparseFile(cowPath, originSizeBytes); err != nil {
|
||||||
|
return nil, fmt.Errorf("create cow file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
cowLoopDev, err := losetupCreateRW(cowPath)
|
||||||
|
if err != nil {
|
||||||
|
os.Remove(cowPath)
|
||||||
|
return nil, fmt.Errorf("losetup cow: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
sectors := originSizeBytes / 512
|
||||||
|
if err := dmsetupCreate(name, originLoopDev, cowLoopDev, sectors); err != nil {
|
||||||
|
if detachErr := losetupDetach(cowLoopDev); detachErr != nil {
|
||||||
|
slog.Warn("cow losetup detach failed during cleanup", "device", cowLoopDev, "error", detachErr)
|
||||||
|
}
|
||||||
|
os.Remove(cowPath)
|
||||||
|
return nil, fmt.Errorf("dmsetup create: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
devPath := "/dev/mapper/" + name
|
||||||
|
|
||||||
|
slog.Info("dm-snapshot created",
|
||||||
|
"name", name,
|
||||||
|
"device", devPath,
|
||||||
|
"origin", originLoopDev,
|
||||||
|
"cow", cowPath,
|
||||||
|
)
|
||||||
|
|
||||||
|
return &SnapshotDevice{
|
||||||
|
Name: name,
|
||||||
|
DevicePath: devPath,
|
||||||
|
CowPath: cowPath,
|
||||||
|
CowLoopDev: cowLoopDev,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// RestoreSnapshot re-attaches a dm-snapshot from an existing persistent CoW file.
|
||||||
|
// The CoW file must have been created with the persistent (P) flag and still
|
||||||
|
// contain valid dm-snapshot metadata.
|
||||||
|
func RestoreSnapshot(name, originLoopDev, cowPath string, originSizeBytes int64) (*SnapshotDevice, error) {
|
||||||
|
cowLoopDev, err := losetupCreateRW(cowPath)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("losetup cow: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
sectors := originSizeBytes / 512
|
||||||
|
if err := dmsetupCreate(name, originLoopDev, cowLoopDev, sectors); err != nil {
|
||||||
|
if detachErr := losetupDetach(cowLoopDev); detachErr != nil {
|
||||||
|
slog.Warn("cow losetup detach failed during cleanup", "device", cowLoopDev, "error", detachErr)
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("dmsetup create: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
devPath := "/dev/mapper/" + name
|
||||||
|
|
||||||
|
slog.Info("dm-snapshot restored",
|
||||||
|
"name", name,
|
||||||
|
"device", devPath,
|
||||||
|
"origin", originLoopDev,
|
||||||
|
"cow", cowPath,
|
||||||
|
)
|
||||||
|
|
||||||
|
return &SnapshotDevice{
|
||||||
|
Name: name,
|
||||||
|
DevicePath: devPath,
|
||||||
|
CowPath: cowPath,
|
||||||
|
CowLoopDev: cowLoopDev,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// RemoveSnapshot tears down a dm-snapshot device and its CoW loop device.
|
||||||
|
// The CoW file is NOT deleted — the caller decides whether to keep or remove it.
|
||||||
|
func RemoveSnapshot(dev *SnapshotDevice) error {
|
||||||
|
if err := dmsetupRemove(dev.Name); err != nil {
|
||||||
|
return fmt.Errorf("dmsetup remove %s: %w", dev.Name, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := losetupDetach(dev.CowLoopDev); err != nil {
|
||||||
|
slog.Warn("cow losetup detach failed", "device", dev.CowLoopDev, "error", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
slog.Info("dm-snapshot removed", "name", dev.Name)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// FlattenSnapshot reads the full contents of a dm-snapshot device and writes
|
||||||
|
// it to a new file. This merges the base image + CoW changes into a standalone
|
||||||
|
// rootfs image suitable for use as a new template.
|
||||||
|
func FlattenSnapshot(dmDevPath, outputPath string) error {
|
||||||
|
cmd := exec.Command("dd",
|
||||||
|
"if="+dmDevPath,
|
||||||
|
"of="+outputPath,
|
||||||
|
"bs=4M",
|
||||||
|
"status=none",
|
||||||
|
)
|
||||||
|
if out, err := cmd.CombinedOutput(); err != nil {
|
||||||
|
os.Remove(outputPath)
|
||||||
|
return fmt.Errorf("dd flatten: %s: %w", string(out), err)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// OriginSizeBytes returns the size in bytes of a loop device's backing file.
|
||||||
|
func OriginSizeBytes(loopDev string) (int64, error) {
|
||||||
|
// blockdev --getsize64 returns size in bytes.
|
||||||
|
out, err := exec.Command("blockdev", "--getsize64", loopDev).CombinedOutput()
|
||||||
|
if err != nil {
|
||||||
|
return 0, fmt.Errorf("blockdev --getsize64 %s: %s: %w", loopDev, strings.TrimSpace(string(out)), err)
|
||||||
|
}
|
||||||
|
s := strings.TrimSpace(string(out))
|
||||||
|
return strconv.ParseInt(s, 10, 64)
|
||||||
|
}
|
||||||
|
|
||||||
|
// CleanupStaleDevices removes any device-mapper devices matching the
|
||||||
|
// "wrenn-" prefix that may have been left behind by a previous agent
|
||||||
|
// instance that crashed or was killed. Should be called at agent startup.
|
||||||
|
func CleanupStaleDevices() {
|
||||||
|
out, err := exec.Command("dmsetup", "ls", "--target", "snapshot").CombinedOutput()
|
||||||
|
if err != nil {
|
||||||
|
slog.Debug("dmsetup ls failed (may be normal if no devices exist)", "error", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, line := range strings.Split(strings.TrimSpace(string(out)), "\n") {
|
||||||
|
if line == "" || line == "No devices found" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
// dmsetup ls output format: "name\t(major:minor)"
|
||||||
|
name, _, _ := strings.Cut(line, "\t")
|
||||||
|
if !strings.HasPrefix(name, "wrenn-") {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
slog.Warn("removing stale dm-snapshot device", "name", name)
|
||||||
|
if err := dmsetupRemove(name); err != nil {
|
||||||
|
slog.Warn("failed to remove stale device", "name", name, "error", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- low-level helpers ---
|
||||||
|
|
||||||
|
// losetupCreate attaches a file as a read-only loop device.
|
||||||
|
func losetupCreate(imagePath string) (string, error) {
|
||||||
|
out, err := exec.Command("losetup", "--read-only", "--find", "--show", imagePath).Output()
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("losetup --read-only: %w", err)
|
||||||
|
}
|
||||||
|
return strings.TrimSpace(string(out)), nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// losetupCreateRW attaches a file as a read-write loop device.
|
||||||
|
func losetupCreateRW(path string) (string, error) {
|
||||||
|
out, err := exec.Command("losetup", "--find", "--show", path).Output()
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("losetup: %w", err)
|
||||||
|
}
|
||||||
|
return strings.TrimSpace(string(out)), nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// losetupDetach detaches a loop device.
|
||||||
|
func losetupDetach(dev string) error {
|
||||||
|
return exec.Command("losetup", "-d", dev).Run()
|
||||||
|
}
|
||||||
|
|
||||||
|
// dmsetupCreate creates a dm-snapshot device with persistent metadata.
|
||||||
|
func dmsetupCreate(name, originDev, cowDev string, sectors int64) error {
|
||||||
|
// Table format: <start> <size> snapshot <origin> <cow> P <chunk_size>
|
||||||
|
// P = persistent — CoW metadata survives dmsetup remove.
|
||||||
|
table := fmt.Sprintf("0 %d snapshot %s %s P %d", sectors, originDev, cowDev, ChunkSize)
|
||||||
|
cmd := exec.Command("dmsetup", "create", name, "--table", table)
|
||||||
|
if out, err := cmd.CombinedOutput(); err != nil {
|
||||||
|
return fmt.Errorf("%s: %w", strings.TrimSpace(string(out)), err)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// dmsetupRemove removes a device-mapper device.
|
||||||
|
func dmsetupRemove(name string) error {
|
||||||
|
cmd := exec.Command("dmsetup", "remove", name)
|
||||||
|
if out, err := cmd.CombinedOutput(); err != nil {
|
||||||
|
return fmt.Errorf("%s: %w", strings.TrimSpace(string(out)), err)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// createSparseFile creates a sparse file of the given size.
|
||||||
|
func createSparseFile(path string, sizeBytes int64) error {
|
||||||
|
f, err := os.Create(path)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if err := f.Truncate(sizeBytes); err != nil {
|
||||||
|
f.Close()
|
||||||
|
os.Remove(path)
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
return f.Close()
|
||||||
|
}
|
||||||
@ -1,16 +0,0 @@
|
|||||||
package filesystem
|
|
||||||
|
|
||||||
import (
|
|
||||||
"fmt"
|
|
||||||
"os/exec"
|
|
||||||
)
|
|
||||||
|
|
||||||
// CloneRootfs creates a copy-on-write clone of the base rootfs image.
|
|
||||||
// Uses reflink if supported by the filesystem, falls back to regular copy.
|
|
||||||
func CloneRootfs(src, dst string) error {
|
|
||||||
cmd := exec.Command("cp", "--reflink=auto", src, dst)
|
|
||||||
if out, err := cmd.CombinedOutput(); err != nil {
|
|
||||||
return fmt.Errorf("cp --reflink=auto: %s: %w", string(out), err)
|
|
||||||
}
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
@ -1 +0,0 @@
|
|||||||
package filesystem
|
|
||||||
@ -104,7 +104,7 @@ func CreateNetwork(slot *Slot) error {
|
|||||||
return fmt.Errorf("get host namespace: %w", err)
|
return fmt.Errorf("get host namespace: %w", err)
|
||||||
}
|
}
|
||||||
defer hostNS.Close()
|
defer hostNS.Close()
|
||||||
defer netns.Set(hostNS)
|
defer func() { _ = netns.Set(hostNS) }()
|
||||||
|
|
||||||
// Create named network namespace.
|
// Create named network namespace.
|
||||||
ns, err := netns.NewNamed(slot.NamespaceID)
|
ns, err := netns.NewNamed(slot.NamespaceID)
|
||||||
@ -304,17 +304,17 @@ func RemoveNetwork(slot *Slot) error {
|
|||||||
|
|
||||||
// Remove host-side iptables rules (best effort).
|
// Remove host-side iptables rules (best effort).
|
||||||
if defaultIface != "" {
|
if defaultIface != "" {
|
||||||
iptablesHost(
|
_ = iptablesHost(
|
||||||
"-D", "FORWARD",
|
"-D", "FORWARD",
|
||||||
"-i", slot.VethName, "-o", defaultIface,
|
"-i", slot.VethName, "-o", defaultIface,
|
||||||
"-j", "ACCEPT",
|
"-j", "ACCEPT",
|
||||||
)
|
)
|
||||||
iptablesHost(
|
_ = iptablesHost(
|
||||||
"-D", "FORWARD",
|
"-D", "FORWARD",
|
||||||
"-i", defaultIface, "-o", slot.VethName,
|
"-i", defaultIface, "-o", slot.VethName,
|
||||||
"-j", "ACCEPT",
|
"-j", "ACCEPT",
|
||||||
)
|
)
|
||||||
iptablesHost(
|
_ = iptablesHost(
|
||||||
"-t", "nat", "-D", "POSTROUTING",
|
"-t", "nat", "-D", "POSTROUTING",
|
||||||
"-s", fmt.Sprintf("%s/32", slot.VpeerIP.String()),
|
"-s", fmt.Sprintf("%s/32", slot.VpeerIP.String()),
|
||||||
"-o", defaultIface,
|
"-o", defaultIface,
|
||||||
@ -324,18 +324,18 @@ func RemoveNetwork(slot *Slot) error {
|
|||||||
|
|
||||||
// Remove host route.
|
// Remove host route.
|
||||||
_, hostNet, _ := net.ParseCIDR(fmt.Sprintf("%s/32", slot.HostIP.String()))
|
_, hostNet, _ := net.ParseCIDR(fmt.Sprintf("%s/32", slot.HostIP.String()))
|
||||||
netlink.RouteDel(&netlink.Route{
|
_ = netlink.RouteDel(&netlink.Route{
|
||||||
Dst: hostNet,
|
Dst: hostNet,
|
||||||
Gw: slot.VpeerIP,
|
Gw: slot.VpeerIP,
|
||||||
})
|
})
|
||||||
|
|
||||||
// Delete veth (also destroys the peer in the namespace).
|
// Delete veth (also destroys the peer in the namespace).
|
||||||
if veth, err := netlink.LinkByName(slot.VethName); err == nil {
|
if veth, err := netlink.LinkByName(slot.VethName); err == nil {
|
||||||
netlink.LinkDel(veth)
|
_ = netlink.LinkDel(veth)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Delete the named namespace.
|
// Delete the named namespace.
|
||||||
netns.DeleteNamed(slot.NamespaceID)
|
_ = netns.DeleteNamed(slot.NamespaceID)
|
||||||
|
|
||||||
slog.Info("network removed", "ns", slot.NamespaceID)
|
slog.Info("network removed", "ns", slot.NamespaceID)
|
||||||
|
|
||||||
|
|||||||
@ -11,8 +11,8 @@ import (
|
|||||||
|
|
||||||
"github.com/google/uuid"
|
"github.com/google/uuid"
|
||||||
|
|
||||||
|
"git.omukk.dev/wrenn/sandbox/internal/devicemapper"
|
||||||
"git.omukk.dev/wrenn/sandbox/internal/envdclient"
|
"git.omukk.dev/wrenn/sandbox/internal/envdclient"
|
||||||
"git.omukk.dev/wrenn/sandbox/internal/filesystem"
|
|
||||||
"git.omukk.dev/wrenn/sandbox/internal/id"
|
"git.omukk.dev/wrenn/sandbox/internal/id"
|
||||||
"git.omukk.dev/wrenn/sandbox/internal/models"
|
"git.omukk.dev/wrenn/sandbox/internal/models"
|
||||||
"git.omukk.dev/wrenn/sandbox/internal/network"
|
"git.omukk.dev/wrenn/sandbox/internal/network"
|
||||||
@ -35,6 +35,7 @@ type Manager struct {
|
|||||||
cfg Config
|
cfg Config
|
||||||
vm *vm.Manager
|
vm *vm.Manager
|
||||||
slots *network.SlotAllocator
|
slots *network.SlotAllocator
|
||||||
|
loops *devicemapper.LoopRegistry
|
||||||
mu sync.RWMutex
|
mu sync.RWMutex
|
||||||
boxes map[string]*sandboxState
|
boxes map[string]*sandboxState
|
||||||
stopCh chan struct{}
|
stopCh chan struct{}
|
||||||
@ -46,6 +47,8 @@ type sandboxState struct {
|
|||||||
slot *network.Slot
|
slot *network.Slot
|
||||||
client *envdclient.Client
|
client *envdclient.Client
|
||||||
uffdSocketPath string // non-empty for sandboxes restored from snapshot
|
uffdSocketPath string // non-empty for sandboxes restored from snapshot
|
||||||
|
dmDevice *devicemapper.SnapshotDevice
|
||||||
|
baseImagePath string // path to the base template rootfs (for loop registry release)
|
||||||
}
|
}
|
||||||
|
|
||||||
// New creates a new sandbox manager.
|
// New creates a new sandbox manager.
|
||||||
@ -57,6 +60,7 @@ func New(cfg Config) *Manager {
|
|||||||
cfg: cfg,
|
cfg: cfg,
|
||||||
vm: vm.NewManager(),
|
vm: vm.NewManager(),
|
||||||
slots: network.NewSlotAllocator(),
|
slots: network.NewSlotAllocator(),
|
||||||
|
loops: devicemapper.NewLoopRegistry(),
|
||||||
boxes: make(map[string]*sandboxState),
|
boxes: make(map[string]*sandboxState),
|
||||||
stopCh: make(chan struct{}),
|
stopCh: make(chan struct{}),
|
||||||
}
|
}
|
||||||
@ -91,16 +95,33 @@ func (m *Manager) Create(ctx context.Context, sandboxID, template string, vcpus,
|
|||||||
return nil, fmt.Errorf("base rootfs not found at %s: %w", baseRootfs, err)
|
return nil, fmt.Errorf("base rootfs not found at %s: %w", baseRootfs, err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Clone rootfs.
|
// Acquire shared read-only loop device for the base image.
|
||||||
rootfsPath := filepath.Join(m.cfg.SandboxesDir, fmt.Sprintf("%s-%s.ext4", sandboxID, template))
|
originLoop, err := m.loops.Acquire(baseRootfs)
|
||||||
if err := filesystem.CloneRootfs(baseRootfs, rootfsPath); err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("clone rootfs: %w", err)
|
return nil, fmt.Errorf("acquire loop device: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
originSize, err := devicemapper.OriginSizeBytes(originLoop)
|
||||||
|
if err != nil {
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
|
return nil, fmt.Errorf("get origin size: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create dm-snapshot with per-sandbox CoW file.
|
||||||
|
dmName := "wrenn-" + sandboxID
|
||||||
|
cowPath := filepath.Join(m.cfg.SandboxesDir, fmt.Sprintf("%s.cow", sandboxID))
|
||||||
|
dmDev, err := devicemapper.CreateSnapshot(dmName, originLoop, cowPath, originSize)
|
||||||
|
if err != nil {
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
|
return nil, fmt.Errorf("create dm-snapshot: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Allocate network slot.
|
// Allocate network slot.
|
||||||
slotIdx, err := m.slots.Allocate()
|
slotIdx, err := m.slots.Allocate()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
os.Remove(cowPath)
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
return nil, fmt.Errorf("allocate network slot: %w", err)
|
return nil, fmt.Errorf("allocate network slot: %w", err)
|
||||||
}
|
}
|
||||||
slot := network.NewSlot(slotIdx)
|
slot := network.NewSlot(slotIdx)
|
||||||
@ -108,15 +129,17 @@ func (m *Manager) Create(ctx context.Context, sandboxID, template string, vcpus,
|
|||||||
// Set up network.
|
// Set up network.
|
||||||
if err := network.CreateNetwork(slot); err != nil {
|
if err := network.CreateNetwork(slot); err != nil {
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
os.Remove(cowPath)
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
return nil, fmt.Errorf("create network: %w", err)
|
return nil, fmt.Errorf("create network: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Boot VM.
|
// Boot VM — Firecracker gets the dm device path.
|
||||||
vmCfg := vm.VMConfig{
|
vmCfg := vm.VMConfig{
|
||||||
SandboxID: sandboxID,
|
SandboxID: sandboxID,
|
||||||
KernelPath: m.cfg.KernelPath,
|
KernelPath: m.cfg.KernelPath,
|
||||||
RootfsPath: rootfsPath,
|
RootfsPath: dmDev.DevicePath,
|
||||||
VCPUs: vcpus,
|
VCPUs: vcpus,
|
||||||
MemoryMB: memoryMB,
|
MemoryMB: memoryMB,
|
||||||
NetworkNamespace: slot.NamespaceID,
|
NetworkNamespace: slot.NamespaceID,
|
||||||
@ -128,9 +151,11 @@ func (m *Manager) Create(ctx context.Context, sandboxID, template string, vcpus,
|
|||||||
}
|
}
|
||||||
|
|
||||||
if _, err := m.vm.Create(ctx, vmCfg); err != nil {
|
if _, err := m.vm.Create(ctx, vmCfg); err != nil {
|
||||||
network.RemoveNetwork(slot)
|
warnErr("network cleanup error", sandboxID, network.RemoveNetwork(slot))
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
os.Remove(cowPath)
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
return nil, fmt.Errorf("create VM: %w", err)
|
return nil, fmt.Errorf("create VM: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -140,10 +165,12 @@ func (m *Manager) Create(ctx context.Context, sandboxID, template string, vcpus,
|
|||||||
defer waitCancel()
|
defer waitCancel()
|
||||||
|
|
||||||
if err := client.WaitUntilReady(waitCtx); err != nil {
|
if err := client.WaitUntilReady(waitCtx); err != nil {
|
||||||
m.vm.Destroy(context.Background(), sandboxID)
|
warnErr("vm destroy error", sandboxID, m.vm.Destroy(context.Background(), sandboxID))
|
||||||
network.RemoveNetwork(slot)
|
warnErr("network cleanup error", sandboxID, network.RemoveNetwork(slot))
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
os.Remove(cowPath)
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
return nil, fmt.Errorf("wait for envd: %w", err)
|
return nil, fmt.Errorf("wait for envd: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -158,12 +185,14 @@ func (m *Manager) Create(ctx context.Context, sandboxID, template string, vcpus,
|
|||||||
TimeoutSec: timeoutSec,
|
TimeoutSec: timeoutSec,
|
||||||
SlotIndex: slotIdx,
|
SlotIndex: slotIdx,
|
||||||
HostIP: slot.HostIP,
|
HostIP: slot.HostIP,
|
||||||
RootfsPath: rootfsPath,
|
RootfsPath: dmDev.DevicePath,
|
||||||
CreatedAt: now,
|
CreatedAt: now,
|
||||||
LastActiveAt: now,
|
LastActiveAt: now,
|
||||||
},
|
},
|
||||||
slot: slot,
|
slot: slot,
|
||||||
client: client,
|
client: client,
|
||||||
|
dmDevice: dmDev,
|
||||||
|
baseImagePath: baseRootfs,
|
||||||
}
|
}
|
||||||
|
|
||||||
m.mu.Lock()
|
m.mu.Lock()
|
||||||
@ -174,6 +203,7 @@ func (m *Manager) Create(ctx context.Context, sandboxID, template string, vcpus,
|
|||||||
"id", sandboxID,
|
"id", sandboxID,
|
||||||
"template", template,
|
"template", template,
|
||||||
"host_ip", slot.HostIP.String(),
|
"host_ip", slot.HostIP.String(),
|
||||||
|
"dm_device", dmDev.DevicePath,
|
||||||
)
|
)
|
||||||
|
|
||||||
return &sb.Sandbox, nil
|
return &sb.Sandbox, nil
|
||||||
@ -194,7 +224,7 @@ func (m *Manager) Destroy(ctx context.Context, sandboxID string) error {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Always clean up pause snapshot files (may exist if sandbox was paused).
|
// Always clean up pause snapshot files (may exist if sandbox was paused).
|
||||||
snapshot.Remove(m.cfg.SnapshotsDir, sandboxID)
|
warnErr("snapshot cleanup error", sandboxID, snapshot.Remove(m.cfg.SnapshotsDir, sandboxID))
|
||||||
|
|
||||||
slog.Info("sandbox destroyed", "id", sandboxID)
|
slog.Info("sandbox destroyed", "id", sandboxID)
|
||||||
return nil
|
return nil
|
||||||
@ -209,7 +239,18 @@ func (m *Manager) cleanup(ctx context.Context, sb *sandboxState) {
|
|||||||
slog.Warn("network cleanup error", "id", sb.ID, "error", err)
|
slog.Warn("network cleanup error", "id", sb.ID, "error", err)
|
||||||
}
|
}
|
||||||
m.slots.Release(sb.SlotIndex)
|
m.slots.Release(sb.SlotIndex)
|
||||||
os.Remove(sb.RootfsPath)
|
|
||||||
|
// Tear down dm-snapshot and release the base image loop device.
|
||||||
|
if sb.dmDevice != nil {
|
||||||
|
if err := devicemapper.RemoveSnapshot(sb.dmDevice); err != nil {
|
||||||
|
slog.Warn("dm-snapshot remove error", "id", sb.ID, "error", err)
|
||||||
|
}
|
||||||
|
os.Remove(sb.dmDevice.CowPath)
|
||||||
|
}
|
||||||
|
if sb.baseImagePath != "" {
|
||||||
|
m.loops.Release(sb.baseImagePath)
|
||||||
|
}
|
||||||
|
|
||||||
if sb.uffdSocketPath != "" {
|
if sb.uffdSocketPath != "" {
|
||||||
os.Remove(sb.uffdSocketPath)
|
os.Remove(sb.uffdSocketPath)
|
||||||
}
|
}
|
||||||
@ -228,12 +269,15 @@ func (m *Manager) Pause(ctx context.Context, sandboxID string) error {
|
|||||||
return fmt.Errorf("sandbox %s is not running (status: %s)", sandboxID, sb.Status)
|
return fmt.Errorf("sandbox %s is not running (status: %s)", sandboxID, sb.Status)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
pauseStart := time.Now()
|
||||||
|
|
||||||
// Step 1: Pause the VM (freeze vCPUs).
|
// Step 1: Pause the VM (freeze vCPUs).
|
||||||
if err := m.vm.Pause(ctx, sandboxID); err != nil {
|
if err := m.vm.Pause(ctx, sandboxID); err != nil {
|
||||||
return fmt.Errorf("pause VM: %w", err)
|
return fmt.Errorf("pause VM: %w", err)
|
||||||
}
|
}
|
||||||
|
slog.Debug("pause: VM paused", "id", sandboxID, "elapsed", time.Since(pauseStart))
|
||||||
|
|
||||||
// Step 2: Take a full snapshot (snapfile + memfile).
|
// Step 2: Take VM state snapshot (snapfile + memfile) — CoW file is saved separately.
|
||||||
if err := snapshot.EnsureDir(m.cfg.SnapshotsDir, sandboxID); err != nil {
|
if err := snapshot.EnsureDir(m.cfg.SnapshotsDir, sandboxID); err != nil {
|
||||||
return fmt.Errorf("create snapshot dir: %w", err)
|
return fmt.Errorf("create snapshot dir: %w", err)
|
||||||
}
|
}
|
||||||
@ -242,39 +286,74 @@ func (m *Manager) Pause(ctx context.Context, sandboxID string) error {
|
|||||||
rawMemPath := filepath.Join(snapDir, "memfile.raw")
|
rawMemPath := filepath.Join(snapDir, "memfile.raw")
|
||||||
snapPath := snapshot.SnapPath(m.cfg.SnapshotsDir, sandboxID)
|
snapPath := snapshot.SnapPath(m.cfg.SnapshotsDir, sandboxID)
|
||||||
|
|
||||||
|
// For UFFD-resumed sandboxes, FC must fault in ALL lazy-loaded pages to
|
||||||
|
// serialize memory — this is the main bottleneck on re-pause.
|
||||||
|
snapshotStart := time.Now()
|
||||||
if err := m.vm.Snapshot(ctx, sandboxID, snapPath, rawMemPath); err != nil {
|
if err := m.vm.Snapshot(ctx, sandboxID, snapPath, rawMemPath); err != nil {
|
||||||
snapshot.Remove(m.cfg.SnapshotsDir, sandboxID)
|
warnErr("snapshot dir cleanup error", sandboxID, snapshot.Remove(m.cfg.SnapshotsDir, sandboxID))
|
||||||
return fmt.Errorf("create VM snapshot: %w", err)
|
return fmt.Errorf("create VM snapshot: %w", err)
|
||||||
}
|
}
|
||||||
|
slog.Debug("pause: FC snapshot created", "id", sandboxID, "elapsed", time.Since(snapshotStart))
|
||||||
|
|
||||||
// Step 3: Process the raw memfile into a compact diff + header.
|
// Step 3: Process the raw memfile into a compact diff + header.
|
||||||
buildID := uuid.New()
|
buildID := uuid.New()
|
||||||
diffPath := snapshot.MemDiffPath(m.cfg.SnapshotsDir, sandboxID)
|
diffPath := snapshot.MemDiffPath(m.cfg.SnapshotsDir, sandboxID)
|
||||||
headerPath := snapshot.MemHeaderPath(m.cfg.SnapshotsDir, sandboxID)
|
headerPath := snapshot.MemHeaderPath(m.cfg.SnapshotsDir, sandboxID)
|
||||||
|
|
||||||
|
processStart := time.Now()
|
||||||
if _, err := snapshot.ProcessMemfile(rawMemPath, diffPath, headerPath, buildID); err != nil {
|
if _, err := snapshot.ProcessMemfile(rawMemPath, diffPath, headerPath, buildID); err != nil {
|
||||||
snapshot.Remove(m.cfg.SnapshotsDir, sandboxID)
|
warnErr("snapshot dir cleanup error", sandboxID, snapshot.Remove(m.cfg.SnapshotsDir, sandboxID))
|
||||||
return fmt.Errorf("process memfile: %w", err)
|
return fmt.Errorf("process memfile: %w", err)
|
||||||
}
|
}
|
||||||
|
slog.Debug("pause: memfile processed", "id", sandboxID, "elapsed", time.Since(processStart))
|
||||||
|
|
||||||
// Remove the raw memfile — we only keep the compact diff.
|
// Remove the raw memfile — we only keep the compact diff.
|
||||||
os.Remove(rawMemPath)
|
os.Remove(rawMemPath)
|
||||||
|
|
||||||
// Step 4: Copy rootfs into snapshot dir.
|
// Step 4: Destroy the VM first so Firecracker releases the dm device.
|
||||||
snapshotRootfs := snapshot.RootfsPath(m.cfg.SnapshotsDir, sandboxID)
|
if err := m.vm.Destroy(ctx, sb.ID); err != nil {
|
||||||
if err := filesystem.CloneRootfs(sb.RootfsPath, snapshotRootfs); err != nil {
|
slog.Warn("vm destroy error during pause", "id", sb.ID, "error", err)
|
||||||
snapshot.Remove(m.cfg.SnapshotsDir, sandboxID)
|
}
|
||||||
return fmt.Errorf("copy rootfs: %w", err)
|
|
||||||
|
// Step 5: Now that FC is gone, safely remove the dm-snapshot and save the CoW.
|
||||||
|
if sb.dmDevice != nil {
|
||||||
|
if err := devicemapper.RemoveSnapshot(sb.dmDevice); err != nil {
|
||||||
|
warnErr("dm-snapshot remove error during pause", sandboxID, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Move (not copy) the CoW file into the snapshot directory.
|
||||||
|
snapshotCow := snapshot.CowPath(m.cfg.SnapshotsDir, sandboxID)
|
||||||
|
if err := os.Rename(sb.dmDevice.CowPath, snapshotCow); err != nil {
|
||||||
|
warnErr("snapshot dir cleanup error", sandboxID, snapshot.Remove(m.cfg.SnapshotsDir, sandboxID))
|
||||||
|
return fmt.Errorf("move cow file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Record which base template this CoW was built against.
|
||||||
|
if err := snapshot.WriteMeta(m.cfg.SnapshotsDir, sandboxID, &snapshot.RootfsMeta{
|
||||||
|
BaseTemplate: sb.baseImagePath,
|
||||||
|
}); err != nil {
|
||||||
|
warnErr("snapshot dir cleanup error", sandboxID, snapshot.Remove(m.cfg.SnapshotsDir, sandboxID))
|
||||||
|
return fmt.Errorf("write rootfs meta: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Step 6: Clean up remaining resources (network, loop device, uffd socket).
|
||||||
|
if err := network.RemoveNetwork(sb.slot); err != nil {
|
||||||
|
slog.Warn("network cleanup error during pause", "id", sb.ID, "error", err)
|
||||||
|
}
|
||||||
|
m.slots.Release(sb.SlotIndex)
|
||||||
|
if sb.baseImagePath != "" {
|
||||||
|
m.loops.Release(sb.baseImagePath)
|
||||||
|
}
|
||||||
|
if sb.uffdSocketPath != "" {
|
||||||
|
os.Remove(sb.uffdSocketPath)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 5: Destroy the sandbox (free VM, network, rootfs clone).
|
|
||||||
m.mu.Lock()
|
m.mu.Lock()
|
||||||
delete(m.boxes, sandboxID)
|
delete(m.boxes, sandboxID)
|
||||||
m.mu.Unlock()
|
m.mu.Unlock()
|
||||||
|
|
||||||
m.cleanup(ctx, sb)
|
slog.Info("sandbox paused", "id", sandboxID, "total_elapsed", time.Since(pauseStart))
|
||||||
|
|
||||||
slog.Info("sandbox paused (snapshot + destroy)", "id", sandboxID)
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -307,19 +386,62 @@ func (m *Manager) Resume(ctx context.Context, sandboxID string) (*models.Sandbox
|
|||||||
return nil, fmt.Errorf("create memory source: %w", err)
|
return nil, fmt.Errorf("create memory source: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Clone snapshot rootfs for this sandbox.
|
// Read rootfs metadata to find the base template image.
|
||||||
snapshotRootfs := snapshot.RootfsPath(snapDir, sandboxID)
|
meta, err := snapshot.ReadMeta(snapDir, sandboxID)
|
||||||
rootfsPath := filepath.Join(m.cfg.SandboxesDir, fmt.Sprintf("%s-resume.ext4", sandboxID))
|
if err != nil {
|
||||||
if err := filesystem.CloneRootfs(snapshotRootfs, rootfsPath); err != nil {
|
|
||||||
source.Close()
|
source.Close()
|
||||||
return nil, fmt.Errorf("clone snapshot rootfs: %w", err)
|
return nil, fmt.Errorf("read rootfs meta: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Acquire the base image loop device and restore dm-snapshot from saved CoW.
|
||||||
|
baseImagePath := meta.BaseTemplate
|
||||||
|
originLoop, err := m.loops.Acquire(baseImagePath)
|
||||||
|
if err != nil {
|
||||||
|
source.Close()
|
||||||
|
return nil, fmt.Errorf("acquire loop device: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
originSize, err := devicemapper.OriginSizeBytes(originLoop)
|
||||||
|
if err != nil {
|
||||||
|
source.Close()
|
||||||
|
m.loops.Release(baseImagePath)
|
||||||
|
return nil, fmt.Errorf("get origin size: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Move CoW file from snapshot dir to sandboxes dir for the running sandbox.
|
||||||
|
savedCow := snapshot.CowPath(snapDir, sandboxID)
|
||||||
|
cowPath := filepath.Join(m.cfg.SandboxesDir, fmt.Sprintf("%s.cow", sandboxID))
|
||||||
|
if err := os.Rename(savedCow, cowPath); err != nil {
|
||||||
|
source.Close()
|
||||||
|
m.loops.Release(baseImagePath)
|
||||||
|
return nil, fmt.Errorf("move cow file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// rollbackCow attempts to move the CoW file back to the snapshot dir.
|
||||||
|
// Best-effort — logs a warning if it fails.
|
||||||
|
rollbackCow := func() {
|
||||||
|
if err := os.Rename(cowPath, savedCow); err != nil {
|
||||||
|
slog.Warn("failed to rollback cow file", "src", cowPath, "dst", savedCow, "error", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Restore dm-snapshot from existing persistent CoW file.
|
||||||
|
dmName := "wrenn-" + sandboxID
|
||||||
|
dmDev, err := devicemapper.RestoreSnapshot(dmName, originLoop, cowPath, originSize)
|
||||||
|
if err != nil {
|
||||||
|
source.Close()
|
||||||
|
m.loops.Release(baseImagePath)
|
||||||
|
rollbackCow()
|
||||||
|
return nil, fmt.Errorf("restore dm-snapshot: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Allocate network slot.
|
// Allocate network slot.
|
||||||
slotIdx, err := m.slots.Allocate()
|
slotIdx, err := m.slots.Allocate()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
source.Close()
|
source.Close()
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
rollbackCow()
|
||||||
|
m.loops.Release(baseImagePath)
|
||||||
return nil, fmt.Errorf("allocate network slot: %w", err)
|
return nil, fmt.Errorf("allocate network slot: %w", err)
|
||||||
}
|
}
|
||||||
slot := network.NewSlot(slotIdx)
|
slot := network.NewSlot(slotIdx)
|
||||||
@ -327,7 +449,9 @@ func (m *Manager) Resume(ctx context.Context, sandboxID string) (*models.Sandbox
|
|||||||
if err := network.CreateNetwork(slot); err != nil {
|
if err := network.CreateNetwork(slot); err != nil {
|
||||||
source.Close()
|
source.Close()
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
rollbackCow()
|
||||||
|
m.loops.Release(baseImagePath)
|
||||||
return nil, fmt.Errorf("create network: %w", err)
|
return nil, fmt.Errorf("create network: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -337,9 +461,11 @@ func (m *Manager) Resume(ctx context.Context, sandboxID string) (*models.Sandbox
|
|||||||
uffdServer := uffd.NewServer(uffdSocketPath, source)
|
uffdServer := uffd.NewServer(uffdSocketPath, source)
|
||||||
if err := uffdServer.Start(ctx); err != nil {
|
if err := uffdServer.Start(ctx); err != nil {
|
||||||
source.Close()
|
source.Close()
|
||||||
network.RemoveNetwork(slot)
|
warnErr("network cleanup error", sandboxID, network.RemoveNetwork(slot))
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
rollbackCow()
|
||||||
|
m.loops.Release(baseImagePath)
|
||||||
return nil, fmt.Errorf("start uffd server: %w", err)
|
return nil, fmt.Errorf("start uffd server: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -347,7 +473,7 @@ func (m *Manager) Resume(ctx context.Context, sandboxID string) (*models.Sandbox
|
|||||||
vmCfg := vm.VMConfig{
|
vmCfg := vm.VMConfig{
|
||||||
SandboxID: sandboxID,
|
SandboxID: sandboxID,
|
||||||
KernelPath: m.cfg.KernelPath,
|
KernelPath: m.cfg.KernelPath,
|
||||||
RootfsPath: rootfsPath,
|
RootfsPath: dmDev.DevicePath,
|
||||||
VCPUs: int(header.Metadata.Size / (1024 * 1024)), // Will be overridden by snapshot.
|
VCPUs: int(header.Metadata.Size / (1024 * 1024)), // Will be overridden by snapshot.
|
||||||
MemoryMB: int(header.Metadata.Size / (1024 * 1024)),
|
MemoryMB: int(header.Metadata.Size / (1024 * 1024)),
|
||||||
NetworkNamespace: slot.NamespaceID,
|
NetworkNamespace: slot.NamespaceID,
|
||||||
@ -360,11 +486,13 @@ func (m *Manager) Resume(ctx context.Context, sandboxID string) (*models.Sandbox
|
|||||||
|
|
||||||
snapPath := snapshot.SnapPath(snapDir, sandboxID)
|
snapPath := snapshot.SnapPath(snapDir, sandboxID)
|
||||||
if _, err := m.vm.CreateFromSnapshot(ctx, vmCfg, snapPath, uffdSocketPath); err != nil {
|
if _, err := m.vm.CreateFromSnapshot(ctx, vmCfg, snapPath, uffdSocketPath); err != nil {
|
||||||
uffdServer.Stop()
|
warnErr("uffd server stop error", sandboxID, uffdServer.Stop())
|
||||||
source.Close()
|
source.Close()
|
||||||
network.RemoveNetwork(slot)
|
warnErr("network cleanup error", sandboxID, network.RemoveNetwork(slot))
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
rollbackCow()
|
||||||
|
m.loops.Release(baseImagePath)
|
||||||
return nil, fmt.Errorf("restore VM from snapshot: %w", err)
|
return nil, fmt.Errorf("restore VM from snapshot: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -374,12 +502,14 @@ func (m *Manager) Resume(ctx context.Context, sandboxID string) (*models.Sandbox
|
|||||||
defer waitCancel()
|
defer waitCancel()
|
||||||
|
|
||||||
if err := client.WaitUntilReady(waitCtx); err != nil {
|
if err := client.WaitUntilReady(waitCtx); err != nil {
|
||||||
uffdServer.Stop()
|
warnErr("uffd server stop error", sandboxID, uffdServer.Stop())
|
||||||
source.Close()
|
source.Close()
|
||||||
m.vm.Destroy(context.Background(), sandboxID)
|
warnErr("vm destroy error", sandboxID, m.vm.Destroy(context.Background(), sandboxID))
|
||||||
network.RemoveNetwork(slot)
|
warnErr("network cleanup error", sandboxID, network.RemoveNetwork(slot))
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
os.Remove(cowPath)
|
||||||
|
m.loops.Release(baseImagePath)
|
||||||
return nil, fmt.Errorf("wait for envd: %w", err)
|
return nil, fmt.Errorf("wait for envd: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -394,25 +524,29 @@ func (m *Manager) Resume(ctx context.Context, sandboxID string) (*models.Sandbox
|
|||||||
TimeoutSec: 0,
|
TimeoutSec: 0,
|
||||||
SlotIndex: slotIdx,
|
SlotIndex: slotIdx,
|
||||||
HostIP: slot.HostIP,
|
HostIP: slot.HostIP,
|
||||||
RootfsPath: rootfsPath,
|
RootfsPath: dmDev.DevicePath,
|
||||||
CreatedAt: now,
|
CreatedAt: now,
|
||||||
LastActiveAt: now,
|
LastActiveAt: now,
|
||||||
},
|
},
|
||||||
slot: slot,
|
slot: slot,
|
||||||
client: client,
|
client: client,
|
||||||
uffdSocketPath: uffdSocketPath,
|
uffdSocketPath: uffdSocketPath,
|
||||||
|
dmDevice: dmDev,
|
||||||
|
baseImagePath: baseImagePath,
|
||||||
}
|
}
|
||||||
|
|
||||||
m.mu.Lock()
|
m.mu.Lock()
|
||||||
m.boxes[sandboxID] = sb
|
m.boxes[sandboxID] = sb
|
||||||
m.mu.Unlock()
|
m.mu.Unlock()
|
||||||
|
|
||||||
// Clean up the snapshot files now that the sandbox is running.
|
// Clean up remaining snapshot files (snapfile, memfile, header, meta).
|
||||||
snapshot.Remove(snapDir, sandboxID)
|
// The CoW file was already moved out.
|
||||||
|
warnErr("snapshot cleanup error", sandboxID, snapshot.Remove(snapDir, sandboxID))
|
||||||
|
|
||||||
slog.Info("sandbox resumed from snapshot",
|
slog.Info("sandbox resumed from snapshot",
|
||||||
"id", sandboxID,
|
"id", sandboxID,
|
||||||
"host_ip", slot.HostIP.String(),
|
"host_ip", slot.HostIP.String(),
|
||||||
|
"dm_device", dmDev.DevicePath,
|
||||||
)
|
)
|
||||||
|
|
||||||
return &sb.Sandbox, nil
|
return &sb.Sandbox, nil
|
||||||
@ -421,7 +555,10 @@ func (m *Manager) Resume(ctx context.Context, sandboxID string) (*models.Sandbox
|
|||||||
// CreateSnapshot creates a reusable template from a sandbox. Works on both
|
// CreateSnapshot creates a reusable template from a sandbox. Works on both
|
||||||
// running and paused sandboxes. If the sandbox is running, it is paused first.
|
// running and paused sandboxes. If the sandbox is running, it is paused first.
|
||||||
// The sandbox remains paused after this call (it can still be resumed).
|
// The sandbox remains paused after this call (it can still be resumed).
|
||||||
// The template files are copied to ImagesDir/{name}/.
|
//
|
||||||
|
// The rootfs is flattened (base + CoW merged) into a new standalone rootfs.ext4
|
||||||
|
// so the template has no dependency on the original base image. Memory state
|
||||||
|
// and VM snapshot files are copied as-is.
|
||||||
func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID, name string) (int64, error) {
|
func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID, name string) (int64, error) {
|
||||||
// If the sandbox is running, pause it first.
|
// If the sandbox is running, pause it first.
|
||||||
if _, err := m.get(sandboxID); err == nil {
|
if _, err := m.get(sandboxID); err == nil {
|
||||||
@ -435,29 +572,73 @@ func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID, name string) (i
|
|||||||
return 0, fmt.Errorf("no snapshot found for sandbox %s", sandboxID)
|
return 0, fmt.Errorf("no snapshot found for sandbox %s", sandboxID)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Copy snapshot files to ImagesDir/{name}/ as a reusable template.
|
// Create template directory.
|
||||||
if err := snapshot.EnsureDir(m.cfg.ImagesDir, name); err != nil {
|
if err := snapshot.EnsureDir(m.cfg.ImagesDir, name); err != nil {
|
||||||
return 0, fmt.Errorf("create template dir: %w", err)
|
return 0, fmt.Errorf("create template dir: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Copy VM snapshot and memory files.
|
||||||
srcDir := snapshot.DirPath(m.cfg.SnapshotsDir, sandboxID)
|
srcDir := snapshot.DirPath(m.cfg.SnapshotsDir, sandboxID)
|
||||||
dstDir := snapshot.DirPath(m.cfg.ImagesDir, name)
|
dstDir := snapshot.DirPath(m.cfg.ImagesDir, name)
|
||||||
|
|
||||||
for _, fname := range []string{snapshot.SnapFileName, snapshot.MemDiffName, snapshot.MemHeaderName, snapshot.RootfsFileName} {
|
for _, fname := range []string{snapshot.SnapFileName, snapshot.MemDiffName, snapshot.MemHeaderName} {
|
||||||
src := filepath.Join(srcDir, fname)
|
src := filepath.Join(srcDir, fname)
|
||||||
dst := filepath.Join(dstDir, fname)
|
dst := filepath.Join(dstDir, fname)
|
||||||
if err := filesystem.CloneRootfs(src, dst); err != nil {
|
if err := copyFile(src, dst); err != nil {
|
||||||
snapshot.Remove(m.cfg.ImagesDir, name)
|
warnErr("template dir cleanup error", name, snapshot.Remove(m.cfg.ImagesDir, name))
|
||||||
return 0, fmt.Errorf("copy %s: %w", fname, err)
|
return 0, fmt.Errorf("copy %s: %w", fname, err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Flatten rootfs: temporarily set up dm device from base + CoW, dd to new image.
|
||||||
|
meta, err := snapshot.ReadMeta(m.cfg.SnapshotsDir, sandboxID)
|
||||||
|
if err != nil {
|
||||||
|
warnErr("template dir cleanup error", name, snapshot.Remove(m.cfg.ImagesDir, name))
|
||||||
|
return 0, fmt.Errorf("read rootfs meta: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
originLoop, err := m.loops.Acquire(meta.BaseTemplate)
|
||||||
|
if err != nil {
|
||||||
|
warnErr("template dir cleanup error", name, snapshot.Remove(m.cfg.ImagesDir, name))
|
||||||
|
return 0, fmt.Errorf("acquire loop device for flatten: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
originSize, err := devicemapper.OriginSizeBytes(originLoop)
|
||||||
|
if err != nil {
|
||||||
|
m.loops.Release(meta.BaseTemplate)
|
||||||
|
warnErr("template dir cleanup error", name, snapshot.Remove(m.cfg.ImagesDir, name))
|
||||||
|
return 0, fmt.Errorf("get origin size: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Temporarily restore the dm-snapshot to read the merged view.
|
||||||
|
cowPath := snapshot.CowPath(m.cfg.SnapshotsDir, sandboxID)
|
||||||
|
tmpDmName := "wrenn-flatten-" + sandboxID
|
||||||
|
tmpDev, err := devicemapper.RestoreSnapshot(tmpDmName, originLoop, cowPath, originSize)
|
||||||
|
if err != nil {
|
||||||
|
m.loops.Release(meta.BaseTemplate)
|
||||||
|
warnErr("template dir cleanup error", name, snapshot.Remove(m.cfg.ImagesDir, name))
|
||||||
|
return 0, fmt.Errorf("restore dm-snapshot for flatten: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Flatten to new standalone rootfs.
|
||||||
|
flattenedPath := snapshot.RootfsPath(m.cfg.ImagesDir, name)
|
||||||
|
flattenErr := devicemapper.FlattenSnapshot(tmpDev.DevicePath, flattenedPath)
|
||||||
|
|
||||||
|
// Always clean up the temporary dm device.
|
||||||
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(tmpDev))
|
||||||
|
m.loops.Release(meta.BaseTemplate)
|
||||||
|
|
||||||
|
if flattenErr != nil {
|
||||||
|
warnErr("template dir cleanup error", name, snapshot.Remove(m.cfg.ImagesDir, name))
|
||||||
|
return 0, fmt.Errorf("flatten rootfs: %w", flattenErr)
|
||||||
|
}
|
||||||
|
|
||||||
sizeBytes, err := snapshot.DirSize(m.cfg.ImagesDir, name)
|
sizeBytes, err := snapshot.DirSize(m.cfg.ImagesDir, name)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
slog.Warn("failed to calculate snapshot size", "error", err)
|
slog.Warn("failed to calculate snapshot size", "error", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
slog.Info("snapshot created",
|
slog.Info("template snapshot created (rootfs flattened)",
|
||||||
"sandbox", sandboxID,
|
"sandbox", sandboxID,
|
||||||
"name", name,
|
"name", name,
|
||||||
"size_bytes", sizeBytes,
|
"size_bytes", sizeBytes,
|
||||||
@ -472,7 +653,9 @@ func (m *Manager) DeleteSnapshot(name string) error {
|
|||||||
|
|
||||||
// createFromSnapshot creates a new sandbox by restoring from a snapshot template
|
// createFromSnapshot creates a new sandbox by restoring from a snapshot template
|
||||||
// in ImagesDir/{snapshotName}/. Uses UFFD for lazy memory loading.
|
// in ImagesDir/{snapshotName}/. Uses UFFD for lazy memory loading.
|
||||||
func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotName string, vcpus, memoryMB, timeoutSec int) (*models.Sandbox, error) {
|
// The template's rootfs.ext4 is a flattened standalone image — we create a
|
||||||
|
// dm-snapshot on top of it just like a normal Create.
|
||||||
|
func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotName string, vcpus, _, timeoutSec int) (*models.Sandbox, error) {
|
||||||
imagesDir := m.cfg.ImagesDir
|
imagesDir := m.cfg.ImagesDir
|
||||||
|
|
||||||
// Read the header.
|
// Read the header.
|
||||||
@ -486,10 +669,8 @@ func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotNam
|
|||||||
return nil, fmt.Errorf("deserialize header: %w", err)
|
return nil, fmt.Errorf("deserialize header: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Snapshot determines memory size. VCPUs are also baked into the
|
// Snapshot determines memory size.
|
||||||
// snapshot state — the caller should pass the correct value from
|
memoryMB := int(header.Metadata.Size / (1024 * 1024))
|
||||||
// the template DB record.
|
|
||||||
memoryMB = int(header.Metadata.Size / (1024 * 1024))
|
|
||||||
|
|
||||||
// Build diff file map.
|
// Build diff file map.
|
||||||
diffPaths := map[string]string{
|
diffPaths := map[string]string{
|
||||||
@ -501,19 +682,37 @@ func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotNam
|
|||||||
return nil, fmt.Errorf("create memory source: %w", err)
|
return nil, fmt.Errorf("create memory source: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Clone snapshot rootfs.
|
// Set up dm-snapshot on the template's flattened rootfs.
|
||||||
snapshotRootfs := snapshot.RootfsPath(imagesDir, snapshotName)
|
baseRootfs := snapshot.RootfsPath(imagesDir, snapshotName)
|
||||||
rootfsPath := filepath.Join(m.cfg.SandboxesDir, fmt.Sprintf("%s-%s.ext4", sandboxID, snapshotName))
|
originLoop, err := m.loops.Acquire(baseRootfs)
|
||||||
if err := filesystem.CloneRootfs(snapshotRootfs, rootfsPath); err != nil {
|
if err != nil {
|
||||||
source.Close()
|
source.Close()
|
||||||
return nil, fmt.Errorf("clone snapshot rootfs: %w", err)
|
return nil, fmt.Errorf("acquire loop device: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
originSize, err := devicemapper.OriginSizeBytes(originLoop)
|
||||||
|
if err != nil {
|
||||||
|
source.Close()
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
|
return nil, fmt.Errorf("get origin size: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
dmName := "wrenn-" + sandboxID
|
||||||
|
cowPath := filepath.Join(m.cfg.SandboxesDir, fmt.Sprintf("%s.cow", sandboxID))
|
||||||
|
dmDev, err := devicemapper.CreateSnapshot(dmName, originLoop, cowPath, originSize)
|
||||||
|
if err != nil {
|
||||||
|
source.Close()
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
|
return nil, fmt.Errorf("create dm-snapshot: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Allocate network.
|
// Allocate network.
|
||||||
slotIdx, err := m.slots.Allocate()
|
slotIdx, err := m.slots.Allocate()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
source.Close()
|
source.Close()
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
os.Remove(cowPath)
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
return nil, fmt.Errorf("allocate network slot: %w", err)
|
return nil, fmt.Errorf("allocate network slot: %w", err)
|
||||||
}
|
}
|
||||||
slot := network.NewSlot(slotIdx)
|
slot := network.NewSlot(slotIdx)
|
||||||
@ -521,7 +720,9 @@ func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotNam
|
|||||||
if err := network.CreateNetwork(slot); err != nil {
|
if err := network.CreateNetwork(slot); err != nil {
|
||||||
source.Close()
|
source.Close()
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
os.Remove(cowPath)
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
return nil, fmt.Errorf("create network: %w", err)
|
return nil, fmt.Errorf("create network: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -531,9 +732,11 @@ func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotNam
|
|||||||
uffdServer := uffd.NewServer(uffdSocketPath, source)
|
uffdServer := uffd.NewServer(uffdSocketPath, source)
|
||||||
if err := uffdServer.Start(ctx); err != nil {
|
if err := uffdServer.Start(ctx); err != nil {
|
||||||
source.Close()
|
source.Close()
|
||||||
network.RemoveNetwork(slot)
|
warnErr("network cleanup error", sandboxID, network.RemoveNetwork(slot))
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
os.Remove(cowPath)
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
return nil, fmt.Errorf("start uffd server: %w", err)
|
return nil, fmt.Errorf("start uffd server: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -541,7 +744,7 @@ func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotNam
|
|||||||
vmCfg := vm.VMConfig{
|
vmCfg := vm.VMConfig{
|
||||||
SandboxID: sandboxID,
|
SandboxID: sandboxID,
|
||||||
KernelPath: m.cfg.KernelPath,
|
KernelPath: m.cfg.KernelPath,
|
||||||
RootfsPath: rootfsPath,
|
RootfsPath: dmDev.DevicePath,
|
||||||
VCPUs: vcpus,
|
VCPUs: vcpus,
|
||||||
MemoryMB: memoryMB,
|
MemoryMB: memoryMB,
|
||||||
NetworkNamespace: slot.NamespaceID,
|
NetworkNamespace: slot.NamespaceID,
|
||||||
@ -554,11 +757,13 @@ func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotNam
|
|||||||
|
|
||||||
snapPath := snapshot.SnapPath(imagesDir, snapshotName)
|
snapPath := snapshot.SnapPath(imagesDir, snapshotName)
|
||||||
if _, err := m.vm.CreateFromSnapshot(ctx, vmCfg, snapPath, uffdSocketPath); err != nil {
|
if _, err := m.vm.CreateFromSnapshot(ctx, vmCfg, snapPath, uffdSocketPath); err != nil {
|
||||||
uffdServer.Stop()
|
warnErr("uffd server stop error", sandboxID, uffdServer.Stop())
|
||||||
source.Close()
|
source.Close()
|
||||||
network.RemoveNetwork(slot)
|
warnErr("network cleanup error", sandboxID, network.RemoveNetwork(slot))
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
os.Remove(cowPath)
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
return nil, fmt.Errorf("restore VM from snapshot: %w", err)
|
return nil, fmt.Errorf("restore VM from snapshot: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -568,12 +773,14 @@ func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotNam
|
|||||||
defer waitCancel()
|
defer waitCancel()
|
||||||
|
|
||||||
if err := client.WaitUntilReady(waitCtx); err != nil {
|
if err := client.WaitUntilReady(waitCtx); err != nil {
|
||||||
uffdServer.Stop()
|
warnErr("uffd server stop error", sandboxID, uffdServer.Stop())
|
||||||
source.Close()
|
source.Close()
|
||||||
m.vm.Destroy(context.Background(), sandboxID)
|
warnErr("vm destroy error", sandboxID, m.vm.Destroy(context.Background(), sandboxID))
|
||||||
network.RemoveNetwork(slot)
|
warnErr("network cleanup error", sandboxID, network.RemoveNetwork(slot))
|
||||||
m.slots.Release(slotIdx)
|
m.slots.Release(slotIdx)
|
||||||
os.Remove(rootfsPath)
|
warnErr("dm-snapshot remove error", sandboxID, devicemapper.RemoveSnapshot(dmDev))
|
||||||
|
os.Remove(cowPath)
|
||||||
|
m.loops.Release(baseRootfs)
|
||||||
return nil, fmt.Errorf("wait for envd: %w", err)
|
return nil, fmt.Errorf("wait for envd: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -588,13 +795,15 @@ func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotNam
|
|||||||
TimeoutSec: timeoutSec,
|
TimeoutSec: timeoutSec,
|
||||||
SlotIndex: slotIdx,
|
SlotIndex: slotIdx,
|
||||||
HostIP: slot.HostIP,
|
HostIP: slot.HostIP,
|
||||||
RootfsPath: rootfsPath,
|
RootfsPath: dmDev.DevicePath,
|
||||||
CreatedAt: now,
|
CreatedAt: now,
|
||||||
LastActiveAt: now,
|
LastActiveAt: now,
|
||||||
},
|
},
|
||||||
slot: slot,
|
slot: slot,
|
||||||
client: client,
|
client: client,
|
||||||
uffdSocketPath: uffdSocketPath,
|
uffdSocketPath: uffdSocketPath,
|
||||||
|
dmDevice: dmDev,
|
||||||
|
baseImagePath: baseRootfs,
|
||||||
}
|
}
|
||||||
|
|
||||||
m.mu.Lock()
|
m.mu.Lock()
|
||||||
@ -605,6 +814,7 @@ func (m *Manager) createFromSnapshot(ctx context.Context, sandboxID, snapshotNam
|
|||||||
"id", sandboxID,
|
"id", sandboxID,
|
||||||
"snapshot", snapshotName,
|
"snapshot", snapshotName,
|
||||||
"host_ip", slot.HostIP.String(),
|
"host_ip", slot.HostIP.String(),
|
||||||
|
"dm_device", dmDev.DevicePath,
|
||||||
)
|
)
|
||||||
|
|
||||||
return &sb.Sandbox, nil
|
return &sb.Sandbox, nil
|
||||||
@ -735,7 +945,7 @@ func (m *Manager) reapExpired(ctx context.Context) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Shutdown destroys all sandboxes and stops the TTL reaper.
|
// Shutdown destroys all sandboxes, releases loop devices, and stops the TTL reaper.
|
||||||
func (m *Manager) Shutdown(ctx context.Context) {
|
func (m *Manager) Shutdown(ctx context.Context) {
|
||||||
close(m.stopCh)
|
close(m.stopCh)
|
||||||
|
|
||||||
@ -752,4 +962,35 @@ func (m *Manager) Shutdown(ctx context.Context) {
|
|||||||
slog.Warn("shutdown destroy failed", "id", sbID, "error", err)
|
slog.Warn("shutdown destroy failed", "id", sbID, "error", err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
m.loops.ReleaseAll()
|
||||||
|
}
|
||||||
|
|
||||||
|
// warnErr logs a warning if err is non-nil. Used for best-effort cleanup
|
||||||
|
// in error paths where the primary error has already been captured.
|
||||||
|
func warnErr(msg string, id string, err error) {
|
||||||
|
if err != nil {
|
||||||
|
slog.Warn(msg, "id", id, "error", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// copyFile copies a regular file from src to dst using streaming I/O.
|
||||||
|
func copyFile(src, dst string) error {
|
||||||
|
sf, err := os.Open(src)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("open %s: %w", src, err)
|
||||||
|
}
|
||||||
|
defer sf.Close()
|
||||||
|
|
||||||
|
df, err := os.Create(dst)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("create %s: %w", dst, err)
|
||||||
|
}
|
||||||
|
defer df.Close()
|
||||||
|
|
||||||
|
if _, err := df.ReadFrom(sf); err != nil {
|
||||||
|
os.Remove(dst)
|
||||||
|
return fmt.Errorf("copy %s → %s: %w", src, dst, err)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
}
|
}
|
||||||
|
|||||||
@ -1,10 +1,12 @@
|
|||||||
package snapshot
|
package snapshot
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"encoding/json"
|
||||||
"fmt"
|
"fmt"
|
||||||
"io/fs"
|
"io/fs"
|
||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
|
"syscall"
|
||||||
)
|
)
|
||||||
|
|
||||||
const (
|
const (
|
||||||
@ -12,6 +14,8 @@ const (
|
|||||||
MemDiffName = "memfile"
|
MemDiffName = "memfile"
|
||||||
MemHeaderName = "memfile.header"
|
MemHeaderName = "memfile.header"
|
||||||
RootfsFileName = "rootfs.ext4"
|
RootfsFileName = "rootfs.ext4"
|
||||||
|
RootfsCowName = "rootfs.cow"
|
||||||
|
RootfsMetaName = "rootfs.meta"
|
||||||
)
|
)
|
||||||
|
|
||||||
// DirPath returns the snapshot directory for a given name.
|
// DirPath returns the snapshot directory for a given name.
|
||||||
@ -39,15 +43,68 @@ func RootfsPath(baseDir, name string) string {
|
|||||||
return filepath.Join(DirPath(baseDir, name), RootfsFileName)
|
return filepath.Join(DirPath(baseDir, name), RootfsFileName)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// CowPath returns the path to the rootfs CoW diff file.
|
||||||
|
func CowPath(baseDir, name string) string {
|
||||||
|
return filepath.Join(DirPath(baseDir, name), RootfsCowName)
|
||||||
|
}
|
||||||
|
|
||||||
|
// MetaPath returns the path to the rootfs metadata file.
|
||||||
|
func MetaPath(baseDir, name string) string {
|
||||||
|
return filepath.Join(DirPath(baseDir, name), RootfsMetaName)
|
||||||
|
}
|
||||||
|
|
||||||
|
// RootfsMeta records which base template a CoW file was created against.
|
||||||
|
type RootfsMeta struct {
|
||||||
|
BaseTemplate string `json:"base_template"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// WriteMeta writes rootfs metadata to the snapshot directory.
|
||||||
|
func WriteMeta(baseDir, name string, meta *RootfsMeta) error {
|
||||||
|
data, err := json.Marshal(meta)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("marshal rootfs meta: %w", err)
|
||||||
|
}
|
||||||
|
if err := os.WriteFile(MetaPath(baseDir, name), data, 0644); err != nil {
|
||||||
|
return fmt.Errorf("write rootfs meta: %w", err)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// ReadMeta reads rootfs metadata from the snapshot directory.
|
||||||
|
func ReadMeta(baseDir, name string) (*RootfsMeta, error) {
|
||||||
|
data, err := os.ReadFile(MetaPath(baseDir, name))
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("read rootfs meta: %w", err)
|
||||||
|
}
|
||||||
|
var meta RootfsMeta
|
||||||
|
if err := json.Unmarshal(data, &meta); err != nil {
|
||||||
|
return nil, fmt.Errorf("unmarshal rootfs meta: %w", err)
|
||||||
|
}
|
||||||
|
return &meta, nil
|
||||||
|
}
|
||||||
|
|
||||||
// Exists reports whether a complete snapshot exists (all required files present).
|
// Exists reports whether a complete snapshot exists (all required files present).
|
||||||
|
// Supports both legacy (rootfs.ext4) and CoW-based (rootfs.cow + rootfs.meta) snapshots.
|
||||||
func Exists(baseDir, name string) bool {
|
func Exists(baseDir, name string) bool {
|
||||||
dir := DirPath(baseDir, name)
|
dir := DirPath(baseDir, name)
|
||||||
for _, f := range []string{SnapFileName, MemDiffName, MemHeaderName, RootfsFileName} {
|
|
||||||
|
// Common files required by both formats.
|
||||||
|
for _, f := range []string{SnapFileName, MemDiffName, MemHeaderName} {
|
||||||
if _, err := os.Stat(filepath.Join(dir, f)); err != nil {
|
if _, err := os.Stat(filepath.Join(dir, f)); err != nil {
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return true
|
|
||||||
|
// Accept either rootfs.ext4 (legacy) or rootfs.cow + rootfs.meta (dm-snapshot).
|
||||||
|
if _, err := os.Stat(filepath.Join(dir, RootfsFileName)); err == nil {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
if _, err := os.Stat(filepath.Join(dir, RootfsCowName)); err == nil {
|
||||||
|
if _, err := os.Stat(filepath.Join(dir, RootfsMetaName)); err == nil {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false
|
||||||
}
|
}
|
||||||
|
|
||||||
// IsTemplate reports whether a template image directory exists (has rootfs.ext4).
|
// IsTemplate reports whether a template image directory exists (has rootfs.ext4).
|
||||||
@ -61,6 +118,15 @@ func IsSnapshot(baseDir, name string) bool {
|
|||||||
return Exists(baseDir, name)
|
return Exists(baseDir, name)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// HasCow reports whether a snapshot uses CoW format (rootfs.cow + rootfs.meta)
|
||||||
|
// as opposed to legacy full rootfs (rootfs.ext4).
|
||||||
|
func HasCow(baseDir, name string) bool {
|
||||||
|
dir := DirPath(baseDir, name)
|
||||||
|
_, cowErr := os.Stat(filepath.Join(dir, RootfsCowName))
|
||||||
|
_, metaErr := os.Stat(filepath.Join(dir, RootfsMetaName))
|
||||||
|
return cowErr == nil && metaErr == nil
|
||||||
|
}
|
||||||
|
|
||||||
// EnsureDir creates the snapshot directory if it doesn't exist.
|
// EnsureDir creates the snapshot directory if it doesn't exist.
|
||||||
func EnsureDir(baseDir, name string) error {
|
func EnsureDir(baseDir, name string) error {
|
||||||
dir := DirPath(baseDir, name)
|
dir := DirPath(baseDir, name)
|
||||||
@ -75,12 +141,14 @@ func Remove(baseDir, name string) error {
|
|||||||
return os.RemoveAll(DirPath(baseDir, name))
|
return os.RemoveAll(DirPath(baseDir, name))
|
||||||
}
|
}
|
||||||
|
|
||||||
// DirSize returns the total byte size of all files in the snapshot directory.
|
// DirSize returns the actual disk usage of all files in the snapshot directory.
|
||||||
|
// Uses block-based accounting (stat.Blocks * 512) so sparse files report only
|
||||||
|
// the blocks that are actually allocated, not their apparent size.
|
||||||
func DirSize(baseDir, name string) (int64, error) {
|
func DirSize(baseDir, name string) (int64, error) {
|
||||||
var total int64
|
var total int64
|
||||||
dir := DirPath(baseDir, name)
|
dir := DirPath(baseDir, name)
|
||||||
|
|
||||||
err := filepath.WalkDir(dir, func(_ string, d fs.DirEntry, err error) error {
|
err := filepath.WalkDir(dir, func(path string, d fs.DirEntry, err error) error {
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
@ -91,7 +159,13 @@ func DirSize(baseDir, name string) (int64, error) {
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
total += info.Size()
|
if sys, ok := info.Sys().(*syscall.Stat_t); ok {
|
||||||
|
// Blocks is in 512-byte units regardless of filesystem block size.
|
||||||
|
total += sys.Blocks * 512
|
||||||
|
} else {
|
||||||
|
// Fallback to apparent size if syscall stat is unavailable.
|
||||||
|
total += info.Size()
|
||||||
|
}
|
||||||
return nil
|
return nil
|
||||||
})
|
})
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
|||||||
@ -115,7 +115,7 @@ func (s *Server) Ready() <-chan struct{} {
|
|||||||
// Stop signals the UFFD poll loop to exit and waits for it to finish.
|
// Stop signals the UFFD poll loop to exit and waits for it to finish.
|
||||||
func (s *Server) Stop() error {
|
func (s *Server) Stop() error {
|
||||||
// Write a byte to the exit pipe to wake the poll loop.
|
// Write a byte to the exit pipe to wake the poll loop.
|
||||||
s.exitW.Write([]byte{0})
|
_, _ = s.exitW.Write([]byte{0})
|
||||||
<-s.doneCh
|
<-s.doneCh
|
||||||
return s.doneErr
|
return s.doneErr
|
||||||
}
|
}
|
||||||
|
|||||||
@ -10,8 +10,8 @@ type VMConfig struct {
|
|||||||
// KernelPath is the path to the uncompressed Linux kernel (vmlinux).
|
// KernelPath is the path to the uncompressed Linux kernel (vmlinux).
|
||||||
KernelPath string
|
KernelPath string
|
||||||
|
|
||||||
// RootfsPath is the path to the ext4 rootfs image for this sandbox.
|
// RootfsPath is the path to the rootfs block device for this sandbox.
|
||||||
// This should be a per-sandbox copy (reflink clone of the base image).
|
// Typically a dm-snapshot device (e.g., /dev/mapper/wrenn-sb-a1b2c3d4).
|
||||||
RootfsPath string
|
RootfsPath string
|
||||||
|
|
||||||
// VCPUs is the number of virtual CPUs to allocate (default: 1).
|
// VCPUs is the number of virtual CPUs to allocate (default: 1).
|
||||||
|
|||||||
@ -131,15 +131,6 @@ func (c *fcClient) createSnapshot(ctx context.Context, snapPath, memPath string)
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
// loadSnapshot loads a VM snapshot from a file-backed memory image.
|
|
||||||
func (c *fcClient) loadSnapshot(ctx context.Context, snapPath, memPath string) error {
|
|
||||||
return c.do(ctx, http.MethodPut, "/snapshot/load", map[string]any{
|
|
||||||
"snapshot_path": snapPath,
|
|
||||||
"mem_file_path": memPath,
|
|
||||||
"resume_vm": false,
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
// loadSnapshotWithUffd loads a VM snapshot using a UFFD socket for
|
// loadSnapshotWithUffd loads a VM snapshot using a UFFD socket for
|
||||||
// lazy memory loading. Firecracker will connect to the socket and
|
// lazy memory loading. Firecracker will connect to the socket and
|
||||||
// send the uffd fd + memory region mappings.
|
// send the uffd fd + memory region mappings.
|
||||||
|
|||||||
@ -53,7 +53,7 @@ func (m *Manager) Create(ctx context.Context, cfg VMConfig) (*VM, error) {
|
|||||||
|
|
||||||
// Step 2: Wait for the API socket to appear.
|
// Step 2: Wait for the API socket to appear.
|
||||||
if err := waitForSocket(ctx, cfg.SocketPath, proc); err != nil {
|
if err := waitForSocket(ctx, cfg.SocketPath, proc); err != nil {
|
||||||
proc.stop()
|
_ = proc.stop()
|
||||||
return nil, fmt.Errorf("wait for socket: %w", err)
|
return nil, fmt.Errorf("wait for socket: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -61,13 +61,13 @@ func (m *Manager) Create(ctx context.Context, cfg VMConfig) (*VM, error) {
|
|||||||
client := newFCClient(cfg.SocketPath)
|
client := newFCClient(cfg.SocketPath)
|
||||||
|
|
||||||
if err := configureVM(ctx, client, &cfg); err != nil {
|
if err := configureVM(ctx, client, &cfg); err != nil {
|
||||||
proc.stop()
|
_ = proc.stop()
|
||||||
return nil, fmt.Errorf("configure VM: %w", err)
|
return nil, fmt.Errorf("configure VM: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 4: Start the VM.
|
// Step 4: Start the VM.
|
||||||
if err := client.startVM(ctx); err != nil {
|
if err := client.startVM(ctx); err != nil {
|
||||||
proc.stop()
|
_ = proc.stop()
|
||||||
return nil, fmt.Errorf("start VM: %w", err)
|
return nil, fmt.Errorf("start VM: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -218,7 +218,7 @@ func (m *Manager) CreateFromSnapshot(ctx context.Context, cfg VMConfig, snapPath
|
|||||||
|
|
||||||
// Step 2: Wait for the API socket.
|
// Step 2: Wait for the API socket.
|
||||||
if err := waitForSocket(ctx, cfg.SocketPath, proc); err != nil {
|
if err := waitForSocket(ctx, cfg.SocketPath, proc); err != nil {
|
||||||
proc.stop()
|
_ = proc.stop()
|
||||||
return nil, fmt.Errorf("wait for socket: %w", err)
|
return nil, fmt.Errorf("wait for socket: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -228,13 +228,13 @@ func (m *Manager) CreateFromSnapshot(ctx context.Context, cfg VMConfig, snapPath
|
|||||||
// No boot resources are configured — the snapshot carries kernel,
|
// No boot resources are configured — the snapshot carries kernel,
|
||||||
// drive, network, and machine config state.
|
// drive, network, and machine config state.
|
||||||
if err := client.loadSnapshotWithUffd(ctx, snapPath, uffdSocketPath); err != nil {
|
if err := client.loadSnapshotWithUffd(ctx, snapPath, uffdSocketPath); err != nil {
|
||||||
proc.stop()
|
_ = proc.stop()
|
||||||
return nil, fmt.Errorf("load snapshot: %w", err)
|
return nil, fmt.Errorf("load snapshot: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 4: Resume the VM.
|
// Step 4: Resume the VM.
|
||||||
if err := client.resumeVM(ctx); err != nil {
|
if err := client.resumeVM(ctx); err != nil {
|
||||||
proc.stop()
|
_ = proc.stop()
|
||||||
return nil, fmt.Errorf("resume VM: %w", err)
|
return nil, fmt.Errorf("resume VM: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user