1
0
forked from wrenn/wrenn

fix: sandbox network responsiveness under port-binding apps

Running port-binding applications (Jupyter, http.server, NextJS) inside
sandboxes caused severe PTY sluggishness and proxy navigation errors.

Root cause: the CP sandbox proxy and Connect RPC pool shared a single
HTTP transport. Heavy proxy traffic (Jupyter WebSocket, REST polling)
interfered with PTY RPC streams via HTTP/2 flow control contention.

Transport isolation (main fix):
- Add dedicated proxy transport on CP (NewProxyTransport) with HTTP/2
  disabled, separate from the RPC pool transport
- Add dedicated proxy transport on host agent, replacing
  http.DefaultTransport
- Add dedicated envdclient transport with tuned connection pooling
- Replace http.DefaultClient in file streaming RPCs with per-sandbox
  envd client

Proxy path rewriting (navigation fix):
- Add ModifyResponse to rewrite Location headers with /proxy/{id}/{port}
  prefix, handling both root-relative and absolute-URL redirects
- Strip prefix back out in CP subdomain proxy for correct browser
  behavior
- Replace path.Join with string concat in CP Director to preserve
  trailing slashes (prevents redirect loops on directory listings)

Proxy resilience:
- Add dial retry with linear backoff (3 attempts) to handle socat
  startup delay when ports are first detected
- Cache ReverseProxy instances per sandbox+port+host in sync.Map
- Add EvictProxy callback wired into sandbox Manager.Destroy

Buffer and server hardening:
- Increase PTY and exec stream channel buffers from 16 to 256
- Add ReadHeaderTimeout (10s) and IdleTimeout (620s) to host agent
  HTTP server

Network tuning:
- Set TAP device TxQueueLen to 5000 (up from default 1000)
- Add Firecracker tx_rate_limiter (200 MB/s sustained, 100 MB burst)
  to prevent guest traffic from saturating the TAP
This commit is contained in:
2026-04-25 04:21:55 +06:00
parent 5e13879954
commit bd98610153
11 changed files with 219 additions and 20 deletions

View File

@ -48,6 +48,13 @@ func (c *Client) BaseURL() string {
return c.base
}
// HTTPClient returns the underlying http.Client used for envd requests.
// Use this instead of http.DefaultClient when making direct HTTP calls to envd
// (e.g. file streaming) to avoid sharing the global transport with proxy traffic.
func (c *Client) HTTPClient() *http.Client {
return c.httpClient
}
// ExecResult holds the output of a command execution.
type ExecResult struct {
Stdout []byte
@ -142,7 +149,7 @@ func (c *Client) ExecStream(ctx context.Context, cmd string, args ...string) (<-
return nil, fmt.Errorf("start process: %w", err)
}
ch := make(chan ExecStreamEvent, 16)
ch := make(chan ExecStreamEvent, 256)
go func() {
defer close(ch)
defer stream.Close()

View File

@ -2,7 +2,9 @@ package envdclient
import (
"fmt"
"net"
"net/http"
"time"
)
// envdPort is the default port envd listens on inside the guest.
@ -13,9 +15,19 @@ func baseURL(hostIP string) string {
return fmt.Sprintf("http://%s:%d", hostIP, envdPort)
}
// newHTTPClient returns an http.Client suitable for talking to envd.
// No special transport is needed — envd is reachable via the host IP
// through the veth/TAP network path.
// newHTTPClient returns an http.Client with a dedicated transport for talking
// to envd. The transport is intentionally separate from http.DefaultTransport
// so that proxy traffic to user services inside the sandbox cannot interfere
// with envd RPC connections (PTY streams, exec, file ops).
func newHTTPClient() *http.Client {
return &http.Client{}
return &http.Client{
Transport: &http.Transport{
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
DialContext: (&net.Dialer{
Timeout: 10 * time.Second,
KeepAlive: 30 * time.Second,
}).DialContext,
},
}
}

View File

@ -162,7 +162,7 @@ type eventProvider interface {
// drainPtyStream reads events from either a Start or Connect stream and maps
// them into PtyEvent values on a channel.
func drainPtyStream(ctx context.Context, stream eventProvider, expectStart bool) <-chan PtyEvent {
ch := make(chan PtyEvent, 16)
ch := make(chan PtyEvent, 256)
go func() {
defer close(ch)
defer stream.Close()