forked from wrenn/wrenn
envd crashes with "fatal error: bad summary data" after Firecracker snapshot/restore because the page allocator radix tree is inconsistent when vCPUs are frozen mid-allocation. The port scanner goroutine allocates heavily every second, making it the primary trigger. Add POST /snapshot/prepare to envd — the host agent calls it before vm.Pause to quiesce continuous goroutines and force GC. On restore, PostInit restarts the port subsystem via the existing /init endpoint. - New PortSubsystem abstraction with Start/Stop/Restart lifecycle - Context-based goroutine cancellation (replaces irreversible channel close) - Context-aware Signal to prevent scanner/forwarder deadlock - Fix forwarder goroutine leak (was spinning forever on closed channel) - Kill socat children on stop to prevent orphans across snapshots - Fix double cmd.Wait panic (exec.Command instead of CommandContext)
26 lines
730 B
Go
26 lines
730 B
Go
// SPDX-License-Identifier: Apache-2.0
|
|
// Modifications by M/S Omukk
|
|
|
|
package api
|
|
|
|
import (
|
|
"net/http"
|
|
)
|
|
|
|
// PostSnapshotPrepare quiesces continuous goroutines (port scanner, forwarder)
|
|
// and forces a GC cycle before Firecracker takes a VM snapshot. This ensures
|
|
// the Go runtime's page allocator is in a consistent state when vCPUs are frozen.
|
|
//
|
|
// Called by the host agent as a best-effort signal before vm.Pause().
|
|
func (a *API) PostSnapshotPrepare(w http.ResponseWriter, r *http.Request) {
|
|
defer r.Body.Close()
|
|
|
|
if a.portSubsystem != nil {
|
|
a.portSubsystem.Stop()
|
|
a.logger.Info().Msg("snapshot/prepare: port subsystem quiesced")
|
|
}
|
|
|
|
w.Header().Set("Cache-Control", "no-store")
|
|
w.WriteHeader(http.StatusNoContent)
|
|
}
|