Add auto-pause TTL and ping endpoint for sandbox inactivity management
Replace the existing auto-destroy TTL behavior with auto-pause: when a
sandbox exceeds its timeout_sec of inactivity, the TTL reaper now pauses
it (snapshot + teardown) instead of destroying it, preserving the ability
to resume later.
Key changes:
- TTL reaper calls Pause instead of Destroy, with fallback to Destroy if
pause fails (e.g. Firecracker process already gone)
- New PingSandbox RPC resets the in-memory LastActiveAt timer
- New POST /v1/sandboxes/{id}/ping REST endpoint resets both agent memory
and DB last_active_at
- ListSandboxes RPC now includes auto_paused_sandbox_ids so the reconciler
can distinguish auto-paused sandboxes from crashed ones in a single call
- Reconciler polls every 5s (was 30s) and marks auto-paused as "paused"
vs orphaned as "stopped"
- Resume RPC accepts timeout_sec from DB so TTL survives pause/resume cycles
- Reaper checks every 2s (was 10s) and uses a detached context to avoid
incomplete pauses on app shutdown
- Default timeout_sec changed from 300 to 0 (no auto-pause unless requested)
This commit is contained in:
@ -249,6 +249,40 @@ paths:
|
||||
schema:
|
||||
$ref: "#/components/schemas/Error"
|
||||
|
||||
/v1/sandboxes/{id}/ping:
|
||||
parameters:
|
||||
- name: id
|
||||
in: path
|
||||
required: true
|
||||
schema:
|
||||
type: string
|
||||
|
||||
post:
|
||||
summary: Reset sandbox inactivity timer
|
||||
operationId: pingSandbox
|
||||
tags: [sandboxes]
|
||||
security:
|
||||
- apiKeyAuth: []
|
||||
description: |
|
||||
Resets the last_active_at timestamp for a running sandbox, preventing
|
||||
the auto-pause TTL from expiring. Use this as a keepalive for sandboxes
|
||||
that are idle but should remain running.
|
||||
responses:
|
||||
"204":
|
||||
description: Ping acknowledged, inactivity timer reset
|
||||
"404":
|
||||
description: Sandbox not found
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: "#/components/schemas/Error"
|
||||
"409":
|
||||
description: Sandbox not running
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: "#/components/schemas/Error"
|
||||
|
||||
/v1/sandboxes/{id}/pause:
|
||||
parameters:
|
||||
- name: id
|
||||
@ -721,7 +755,11 @@ components:
|
||||
default: 512
|
||||
timeout_sec:
|
||||
type: integer
|
||||
default: 300
|
||||
default: 0
|
||||
description: >
|
||||
Auto-pause TTL in seconds. The sandbox is automatically paused
|
||||
after this duration of inactivity (no exec or ping). 0 means
|
||||
no auto-pause.
|
||||
|
||||
Sandbox:
|
||||
type: object
|
||||
|
||||
Reference in New Issue
Block a user