forked from wrenn/wrenn
Three root causes addressed: 1. Go page allocator corruption: allocations between the pre-snapshot GC and VM freeze leave the summary tree inconsistent. After restore, GC reads corrupted metadata — either panicking (killing PID 1 → kernel panic) or silently failing to collect, causing unbounded heap growth until OOM. Fix: move GC to after all HTTP allocations in PostSnapshotPrepare, then set GOMAXPROCS(1) so any remaining allocations run sequentially with no concurrent page allocator access. GOMAXPROCS is restored on first health check after restore. 2. PostInit timeout starvation: WaitUntilReady and PostInit shared a single 30s context. If WaitUntilReady consumed most of it, PostInit failed — RestoreAfterSnapshot never ran, leaving envd with keep-alives disabled and zombie connections. Fix: separate timeout contexts. 3. CP HTTP server missing timeouts: no ReadHeaderTimeout or IdleTimeout caused goroutine leaks from hung proxy connections. Fix: add both, matching host agent values. Also adds UFFD prefetch to proactively load all guest pages after restore, eliminating on-demand page fault latency for subsequent RPC calls.
63 lines
2.2 KiB
Go
63 lines
2.2 KiB
Go
// SPDX-License-Identifier: Apache-2.0
|
|
// Modifications by M/S Omukk
|
|
|
|
package api
|
|
|
|
import (
|
|
"net/http"
|
|
"runtime"
|
|
"runtime/debug"
|
|
)
|
|
|
|
// PostSnapshotPrepare quiesces continuous goroutines (port scanner, forwarder),
|
|
// closes idle HTTP connections, and forces a GC cycle before Firecracker takes
|
|
// a VM snapshot. Closing connections prevents Go runtime corruption from stale
|
|
// TCP state after snapshot restore. Keep-alives are disabled so the current
|
|
// request's connection also closes after the response.
|
|
//
|
|
// To prevent Go page allocator corruption, GOMAXPROCS is set to 1 after the
|
|
// final GC. With a single P, all goroutines (including any that allocate
|
|
// between now and the VM freeze) run sequentially. This eliminates concurrent
|
|
// page allocator access, so even if the freeze lands mid-allocation, the
|
|
// in-flight operation completes atomically on restore before any GC reads
|
|
// the summary tree. GOMAXPROCS is restored on the first health check after
|
|
// restore (see postRestoreRecovery).
|
|
//
|
|
// Called by the host agent as a best-effort signal before vm.Pause().
|
|
func (a *API) PostSnapshotPrepare(w http.ResponseWriter, r *http.Request) {
|
|
defer r.Body.Close()
|
|
|
|
if a.portSubsystem != nil {
|
|
a.portSubsystem.Stop()
|
|
a.logger.Info().Msg("snapshot/prepare: port subsystem quiesced")
|
|
}
|
|
|
|
if a.connTracker != nil {
|
|
a.connTracker.PrepareForSnapshot()
|
|
a.logger.Info().Msg("snapshot/prepare: idle connections closed, keep-alives disabled")
|
|
}
|
|
|
|
// Send the response before the GC so HTTP buffer allocations happen
|
|
// while GOMAXPROCS is still at its normal value.
|
|
w.Header().Set("Cache-Control", "no-store")
|
|
w.WriteHeader(http.StatusNoContent)
|
|
if f, ok := w.(http.Flusher); ok {
|
|
f.Flush()
|
|
}
|
|
|
|
// Final GC pass after all major allocations (connection cleanup,
|
|
// response write) are complete.
|
|
runtime.GC()
|
|
runtime.GC()
|
|
debug.FreeOSMemory()
|
|
|
|
// Reduce to a single P so any post-GC allocations (HTTP server
|
|
// connection teardown) run sequentially — no concurrent page allocator
|
|
// access that could leave the summary tree inconsistent if the VM
|
|
// freezes mid-update.
|
|
a.prevGOMAXPROCS = runtime.GOMAXPROCS(1)
|
|
|
|
a.needsRestore.Store(true)
|
|
a.logger.Info().Msg("snapshot/prepare: GOMAXPROCS=1, ready for freeze")
|
|
}
|