forked from wrenn/wrenn
fix: sandbox network responsiveness under port-binding apps
Running port-binding applications (Jupyter, http.server, NextJS) inside
sandboxes caused severe PTY sluggishness and proxy navigation errors.
Root cause: the CP sandbox proxy and Connect RPC pool shared a single
HTTP transport. Heavy proxy traffic (Jupyter WebSocket, REST polling)
interfered with PTY RPC streams via HTTP/2 flow control contention.
Transport isolation (main fix):
- Add dedicated proxy transport on CP (NewProxyTransport) with HTTP/2
disabled, separate from the RPC pool transport
- Add dedicated proxy transport on host agent, replacing
http.DefaultTransport
- Add dedicated envdclient transport with tuned connection pooling
- Replace http.DefaultClient in file streaming RPCs with per-sandbox
envd client
Proxy path rewriting (navigation fix):
- Add ModifyResponse to rewrite Location headers with /proxy/{id}/{port}
prefix, handling both root-relative and absolute-URL redirects
- Strip prefix back out in CP subdomain proxy for correct browser
behavior
- Replace path.Join with string concat in CP Director to preserve
trailing slashes (prevents redirect loops on directory listings)
Proxy resilience:
- Add dial retry with linear backoff (3 attempts) to handle socat
startup delay when ports are first detected
- Cache ReverseProxy instances per sandbox+port+host in sync.Map
- Add EvictProxy callback wired into sandbox Manager.Destroy
Buffer and server hardening:
- Increase PTY and exec stream channel buffers from 16 to 256
- Add ReadHeaderTimeout (10s) and IdleTimeout (620s) to host agent
HTTP server
Network tuning:
- Set TAP device TxQueueLen to 5000 (up from default 1000)
- Add Firecracker tx_rate_limiter (200 MB/s sustained, 100 MB burst)
to prevent guest traffic from saturating the TAP
This commit is contained in:
@ -84,11 +84,21 @@ func (c *fcClient) setRootfsDrive(ctx context.Context, driveID, path string, rea
|
||||
}
|
||||
|
||||
// setNetworkInterface configures a network interface attached to a TAP device.
|
||||
// A tx_rate_limiter caps sustained guest→host throughput to prevent user
|
||||
// application traffic from completely saturating the TAP device and starving
|
||||
// envd control traffic (PTY, exec, file ops).
|
||||
func (c *fcClient) setNetworkInterface(ctx context.Context, ifaceID, tapName, macAddr string) error {
|
||||
return c.do(ctx, http.MethodPut, "/network-interfaces/"+ifaceID, map[string]any{
|
||||
"iface_id": ifaceID,
|
||||
"host_dev_name": tapName,
|
||||
"guest_mac": macAddr,
|
||||
"tx_rate_limiter": map[string]any{
|
||||
"bandwidth": map[string]any{
|
||||
"size": 209715200, // 200 MB/s sustained
|
||||
"refill_time": 1000, // refill period: 1 second
|
||||
"one_time_burst": 104857600, // 100 MB initial burst
|
||||
},
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user