Network Layer
The network layer in Lightpanda handles all HTTP communication, WebSocket connections, robots.txt compliance, and proxy configuration. Built on top of libcurl, it provides a high-performance, asynchronous networking stack purpose-built for headless browser automation.
Parent topic: Architecture
Overview
Lightpanda’s network layer is composed of four main modules:
- HTTP Client (
HttpClient.zig) – the high-level client that manages connections, request queuing, transfer lifecycle, and integration with the browser page - HTTP Core (
http.zig) – low-level libcurl bindings for connections, headers, and multi-handle management - WebSocket (
websocket.zig) – a full WebSocket implementation used by the CDP server for real-time communication - Robots.txt (
Robots.zig) – an RFC 9309-compliant parser and evaluator for robots.txt rules
HTTP Core
The HTTP core module wraps libcurl to provide connection management, header handling, and multiplexed request processing.
Connections
Each HTTP connection is represented by a Connection struct that wraps a libcurl easy handle. Connections are initialized with configuration for timeouts, redirects, proxy, TLS, and compression:
const conn = try Net.Connection.init(ca_blob, config);
try conn.setURL("https://example.com");
try conn.setMethod(.GET);
const status = try conn.request(&http_headers);
Key connection settings applied at initialization:
| Setting | Description | Source |
|---|---|---|
timeout_ms |
Total request timeout | Config.httpTimeout() |
connect_timeout_ms |
TCP connection timeout | Config.httpConnectTimeout() |
max_redirs |
Maximum redirect hops | Config.httpMaxRedirects() |
follow_location |
Automatic redirect following | Always enabled |
accept_encoding |
Compression support (gzip, etc.) | Auto-detected |
HTTP Methods
The network layer supports all standard HTTP methods through the Method enum:
pub const Method = enum(u8) {
GET, PUT, POST, DELETE, HEAD, OPTIONS, PATCH, PROPFIND,
};
Headers
The Headers struct manages request headers as a libcurl linked list. It supports iteration, cookie injection, and custom header addition:
var headers = try Net.Headers.init(user_agent_header);
defer headers.deinit();
try headers.add("Content-Type: application/json");
Response headers can be read through two iterator types: CurlHeaderIterator for live responses, and ListHeaderIterator for injected responses (used by CDP request interception).
Authentication
The AuthChallenge struct parses WWW-Authenticate and Proxy-Authenticate headers, supporting Basic and Digest authentication schemes:
const challenge = try AuthChallenge.parse(status, header_value);
// challenge.source: .server or .proxy
// challenge.scheme: .basic or .digest
Multi-Handle Management
For concurrent requests, the Handles struct wraps libcurl’s multi interface. It manages a pool of connections with configurable host-level concurrency limits:
var handles = try Net.Handles.init(config);
try handles.add(&conn);
const running = try handles.perform();
try handles.poll(extra_fds, timeout_ms);
The multi handle uses curl_multi_poll for efficient I/O multiplexing, and readMessage retrieves completed transfer results.
HTTP Client
The HttpClient is the high-level network client tied to a browser page. It manages the full lifecycle of HTTP requests including queuing, transfer management, robots.txt checking, and CDP network event integration.
Request Lifecycle
- Request creation – A
Requestis created with URL, method, headers, and optional body - Robots.txt check – If robots enforcement is enabled, the client checks whether the URL is allowed before proceeding
- Queue or execute – If connections are available, the request starts immediately; otherwise it is queued
- Transfer – A
Transferobject tracks the active request, manages response buffering, and handles callbacks - Completion – Response data is delivered, the connection is returned to the pool, and queued requests are started
Connection Pooling
The client maintains connections through a doubly-linked list (in_use). When a transfer completes, the connection handle is recycled. The max_host_connections setting (from Config.httpMaxHostOpen()) limits concurrent connections per host, preventing resource exhaustion.
Request Queuing
When all connection handles are in use, new requests are added to a TransferQueue (a doubly-linked list). As transfers complete, queued requests are dequeued and started automatically.
Request Interception (CDP)
The HTTP client supports CDP network request interception. When a CDP client is attached, requests can be paused, modified, or fulfilled before they reach the network:
// CDP can intercept requests at two stages:
// 1. Before the request is sent (Fetch.requestPaused)
// 2. After response headers arrive (Fetch.authRequired)
The intercepted counter tracks paused requests so that network idle detection works correctly even when requests are held by the CDP layer.
Proxy Configuration
Proxy support is configured at initialization and can be changed at runtime through CDP:
// Set via config at startup
const http_proxy = config.httpProxy();
// Changed at runtime via CDP
client.setProxy("http://proxy.example.com:8080");
client.restoreOriginalProxy(); // Revert to config value
Both HTTP and HTTPS proxy protocols are supported. When a proxy is configured, TLS verification settings are applied to the proxy connection as well.
TLS Configuration
TLS verification can be controlled globally. When a CA certificate blob is provided, full host and peer verification is performed. Without it, verification can be disabled (useful for development):
conn.setTlsVerify(true, use_proxy);
// Verifies both ssl_verify_host and ssl_verify_peer
// Also applies to proxy if use_proxy is true
WebSocket
The WebSocket module implements the WebSocket protocol (RFC 6455) for the CDP server. It handles the upgrade handshake, message framing, and bidirectional communication.
Connection Upgrade
The upgrade process validates the HTTP upgrade request, checking for required headers (Upgrade, Connection, Sec-WebSocket-Key, Sec-WebSocket-Version) and computing the Sec-WebSocket-Accept response using SHA-1:
try ws_conn.upgrade(request);
// Validates HTTP/1.1, required headers
// Responds with 101 Switching Protocols
Message Types
The WebSocket implementation supports all standard frame types:
| Type | Description |
|---|---|
text |
UTF-8 text data (primary type for CDP JSON messages) |
binary |
Binary data frames |
close |
Connection close with status code |
ping |
Keep-alive ping |
pong |
Keep-alive pong response |
Message Reading
The Reader is a streaming parser that handles WebSocket framing, including variable-length headers, masking (client-to-server), and message fragmentation:
var reader = try websocket.Reader(true).init(allocator);
// true = expect masked frames (from client)
while (try reader.next()) |msg| {
switch (msg.type) {
.text => handleCDPMessage(msg.data),
.ping => try ws.sendPong(msg.data),
.close => break,
}
}
The reader uses a dynamically growing buffer (starting at 16KB) and supports messages up to CDP_MAX_MESSAGE_SIZE. Fragmented messages are reassembled automatically.
SIMD-Optimized Masking
Client-to-server WebSocket frames are XOR-masked per the protocol specification. Lightpanda uses SIMD instructions when available for efficient unmasking:
// Uses std.simd.suggestVectorLength for platform-optimal
// vector width, falling back to scalar XOR for small payloads
Sending Messages
The WsConnection provides several send methods optimized for different use cases:
sendJSON– Serializes a Zig struct to JSON and frames it as a WebSocket text message. Reserves 10 bytes for the variable-length header to avoid a separate allocation.sendPong– Responds to ping framessendHttpError– Sends an HTTP error response before the WebSocket upgrade
The send path handles WouldBlock by temporarily switching the socket to blocking mode, avoiding the need for a write queue.
Robots.txt Handling
The robots.txt module implements RFC 9309 for web crawler access control. This is particularly important for Lightpanda’s use in scraping and automation scenarios.
Parsing
The parser processes robots.txt files line by line, extracting User-agent, Allow, and Disallow directives. It handles:
- UTF-8 BOM stripping
- Inline comments
- Case-insensitive user-agent matching
- Multiple user-agent entries per rule group
- Wildcard (
*) user-agent fallback
var robots = try Robots.fromBytes(allocator, "MyBot", robots_txt_content);
defer robots.deinit(allocator);
Pattern Matching
Three pattern types are supported:
| Pattern | Example | Behavior |
|---|---|---|
| Prefix | /admin/ |
Matches any path starting with the pattern |
| Exact | /admin$ |
Matches the path exactly (trailing $ anchor) |
| Wildcard | /*.php |
* matches zero or more characters |
Rules are sorted by pattern length (longest first), with Allow winning ties. This ensures the most specific rule takes precedence.
Robot Store
The RobotStore provides thread-safe caching of parsed robots.txt files per domain. It uses a case-insensitive hash map protected by a mutex:
// Check cache first
if (store.get(domain)) |entry| {
switch (entry) {
.present => |robots| return robots.isAllowed(path),
.absent => return true, // No robots.txt found
}
}
// Fetch and cache
const robots = try store.robotsFromBytes(user_agent, bytes);
try store.put(domain, robots);
When a robots.txt file is not found (HTTP 404), the store records it as .absent to avoid repeated fetches.
Integration with HTTP Client
The HTTP client integrates robots.txt checking into the request lifecycle. When robots enforcement is enabled, the client:
- Checks the
RobotStorecache for the target domain - If not cached, queues the original request and fetches the robots.txt first
- Evaluates the rules against the request path
- Proceeds or blocks the request based on the result
Multiple requests to the same uncached domain are batched in a pending_robots_queue so that only one robots.txt fetch is made per domain.
Configuration Reference
The network layer is configured through the Config module. Key settings:
| Setting | Method | Description |
|---|---|---|
| HTTP timeout | httpTimeout() |
Total request timeout in milliseconds |
| Connect timeout | httpConnectTimeout() |
TCP connection timeout in milliseconds |
| Max redirects | httpMaxRedirects() |
Maximum number of HTTP redirects to follow |
| Max host connections | httpMaxHostOpen() |
Concurrent connections per host |
| HTTP proxy | httpProxy() |
Proxy URL (HTTP/HTTPS) |
| TLS verify | tlsVerifyHost() |
Enable/disable TLS certificate verification |
Related Topics
- Architecture – Overall system architecture
- Browser Engine – How the browser engine uses the network layer
- CDP Protocol – CDP protocol implementation that communicates over WebSocket