Network Layer

The network layer in Lightpanda handles all HTTP communication, WebSocket connections, robots.txt compliance, and proxy configuration. Built on top of libcurl, it provides a high-performance, asynchronous networking stack purpose-built for headless browser automation.

Parent topic: Architecture

Overview

Lightpanda’s network layer is composed of four main modules:

HTTP Client (HttpClient.zig) – the high-level client that manages connections, request queuing, transfer lifecycle, and integration with the browser page
HTTP Core (http.zig) – low-level libcurl bindings for connections, headers, and multi-handle management
WebSocket (websocket.zig) – a full WebSocket implementation used by the CDP server for real-time communication
Robots.txt (Robots.zig) – an RFC 9309-compliant parser and evaluator for robots.txt rules

HTTP Core

The HTTP core module wraps libcurl to provide connection management, header handling, and multiplexed request processing.

Connections

Each HTTP connection is represented by a Connection struct that wraps a libcurl easy handle. Connections are initialized with configuration for timeouts, redirects, proxy, TLS, and compression:

const conn = try Net.Connection.init(ca_blob, config);
try conn.setURL("https://example.com");
try conn.setMethod(.GET);
const status = try conn.request(&http_headers);

Key connection settings applied at initialization:

Setting	Description	Source
`timeout_ms`	Total request timeout	`Config.httpTimeout()`
`connect_timeout_ms`	TCP connection timeout	`Config.httpConnectTimeout()`
`max_redirs`	Maximum redirect hops	`Config.httpMaxRedirects()`
`follow_location`	Automatic redirect following	Always enabled
`accept_encoding`	Compression support (gzip, etc.)	Auto-detected

HTTP Methods

The network layer supports all standard HTTP methods through the Method enum:

pub const Method = enum(u8) {
    GET, PUT, POST, DELETE, HEAD, OPTIONS, PATCH, PROPFIND,
};

Headers

The Headers struct manages request headers as a libcurl linked list. It supports iteration, cookie injection, and custom header addition:

var headers = try Net.Headers.init(user_agent_header);
defer headers.deinit();
try headers.add("Content-Type: application/json");

Response headers can be read through two iterator types: CurlHeaderIterator for live responses, and ListHeaderIterator for injected responses (used by CDP request interception).

Authentication

The AuthChallenge struct parses WWW-Authenticate and Proxy-Authenticate headers, supporting Basic and Digest authentication schemes:

const challenge = try AuthChallenge.parse(status, header_value);
// challenge.source: .server or .proxy
// challenge.scheme: .basic or .digest

Multi-Handle Management

For concurrent requests, the Handles struct wraps libcurl’s multi interface. It manages a pool of connections with configurable host-level concurrency limits:

var handles = try Net.Handles.init(config);
try handles.add(&conn);
const running = try handles.perform();
try handles.poll(extra_fds, timeout_ms);

The multi handle uses curl_multi_poll for efficient I/O multiplexing, and readMessage retrieves completed transfer results.

HTTP Client

The HttpClient is the high-level network client tied to a browser page. It manages the full lifecycle of HTTP requests including queuing, transfer management, robots.txt checking, and CDP network event integration.

Request Lifecycle

Request creation – A Request is created with URL, method, headers, and optional body
Robots.txt check – If robots enforcement is enabled, the client checks whether the URL is allowed before proceeding
Queue or execute – If connections are available, the request starts immediately; otherwise it is queued
Transfer – A Transfer object tracks the active request, manages response buffering, and handles callbacks
Completion – Response data is delivered, the connection is returned to the pool, and queued requests are started

Connection Pooling

The client maintains connections through a doubly-linked list (in_use). When a transfer completes, the connection handle is recycled. The max_host_connections setting (from Config.httpMaxHostOpen()) limits concurrent connections per host, preventing resource exhaustion.

Request Queuing

When all connection handles are in use, new requests are added to a TransferQueue (a doubly-linked list). As transfers complete, queued requests are dequeued and started automatically.

Request Interception (CDP)

The HTTP client supports CDP network request interception. When a CDP client is attached, requests can be paused, modified, or fulfilled before they reach the network:

// CDP can intercept requests at two stages:
// 1. Before the request is sent (Fetch.requestPaused)
// 2. After response headers arrive (Fetch.authRequired)

The intercepted counter tracks paused requests so that network idle detection works correctly even when requests are held by the CDP layer.

Proxy Configuration

Proxy support is configured at initialization and can be changed at runtime through CDP:

// Set via config at startup
const http_proxy = config.httpProxy();

// Changed at runtime via CDP
client.setProxy("http://proxy.example.com:8080");
client.restoreOriginalProxy(); // Revert to config value

Both HTTP and HTTPS proxy protocols are supported. When a proxy is configured, TLS verification settings are applied to the proxy connection as well.

TLS Configuration

TLS verification can be controlled globally. When a CA certificate blob is provided, full host and peer verification is performed. Without it, verification can be disabled (useful for development):

conn.setTlsVerify(true, use_proxy);
// Verifies both ssl_verify_host and ssl_verify_peer
// Also applies to proxy if use_proxy is true

WebSocket

The WebSocket module implements the WebSocket protocol (RFC 6455) for the CDP server. It handles the upgrade handshake, message framing, and bidirectional communication.

Connection Upgrade

The upgrade process validates the HTTP upgrade request, checking for required headers (Upgrade, Connection, Sec-WebSocket-Key, Sec-WebSocket-Version) and computing the Sec-WebSocket-Accept response using SHA-1:

try ws_conn.upgrade(request);
// Validates HTTP/1.1, required headers
// Responds with 101 Switching Protocols

Message Types

The WebSocket implementation supports all standard frame types:

Type	Description
`text`	UTF-8 text data (primary type for CDP JSON messages)
`binary`	Binary data frames
`close`	Connection close with status code
`ping`	Keep-alive ping
`pong`	Keep-alive pong response

Message Reading

The Reader is a streaming parser that handles WebSocket framing, including variable-length headers, masking (client-to-server), and message fragmentation:

var reader = try websocket.Reader(true).init(allocator);
// true = expect masked frames (from client)

while (try reader.next()) |msg| {
    switch (msg.type) {
        .text => handleCDPMessage(msg.data),
        .ping => try ws.sendPong(msg.data),
        .close => break,
    }
}

The reader uses a dynamically growing buffer (starting at 16KB) and supports messages up to CDP_MAX_MESSAGE_SIZE. Fragmented messages are reassembled automatically.

SIMD-Optimized Masking

Client-to-server WebSocket frames are XOR-masked per the protocol specification. Lightpanda uses SIMD instructions when available for efficient unmasking:

// Uses std.simd.suggestVectorLength for platform-optimal
// vector width, falling back to scalar XOR for small payloads

Sending Messages

The WsConnection provides several send methods optimized for different use cases:

sendJSON – Serializes a Zig struct to JSON and frames it as a WebSocket text message. Reserves 10 bytes for the variable-length header to avoid a separate allocation.
sendPong – Responds to ping frames
sendHttpError – Sends an HTTP error response before the WebSocket upgrade

The send path handles WouldBlock by temporarily switching the socket to blocking mode, avoiding the need for a write queue.

Robots.txt Handling

The robots.txt module implements RFC 9309 for web crawler access control. This is particularly important for Lightpanda’s use in scraping and automation scenarios.

Parsing

The parser processes robots.txt files line by line, extracting User-agent, Allow, and Disallow directives. It handles:

UTF-8 BOM stripping
Inline comments
Case-insensitive user-agent matching
Multiple user-agent entries per rule group
Wildcard (*) user-agent fallback

var robots = try Robots.fromBytes(allocator, "MyBot", robots_txt_content);
defer robots.deinit(allocator);

Pattern Matching

Three pattern types are supported:

Pattern	Example	Behavior
Prefix	`/admin/`	Matches any path starting with the pattern
Exact	`/admin$`	Matches the path exactly (trailing `$` anchor)
Wildcard	`/*.php`	`*` matches zero or more characters

Rules are sorted by pattern length (longest first), with Allow winning ties. This ensures the most specific rule takes precedence.

Robot Store

The RobotStore provides thread-safe caching of parsed robots.txt files per domain. It uses a case-insensitive hash map protected by a mutex:

// Check cache first
if (store.get(domain)) |entry| {
    switch (entry) {
        .present => |robots| return robots.isAllowed(path),
        .absent => return true, // No robots.txt found
    }
}

// Fetch and cache
const robots = try store.robotsFromBytes(user_agent, bytes);
try store.put(domain, robots);

When a robots.txt file is not found (HTTP 404), the store records it as .absent to avoid repeated fetches.

Integration with HTTP Client

The HTTP client integrates robots.txt checking into the request lifecycle. When robots enforcement is enabled, the client:

Checks the RobotStore cache for the target domain
If not cached, queues the original request and fetches the robots.txt first
Evaluates the rules against the request path
Proceeds or blocks the request based on the result

Multiple requests to the same uncached domain are batched in a pending_robots_queue so that only one robots.txt fetch is made per domain.

Configuration Reference

The network layer is configured through the Config module. Key settings:

Setting	Method	Description
HTTP timeout	`httpTimeout()`	Total request timeout in milliseconds
Connect timeout	`httpConnectTimeout()`	TCP connection timeout in milliseconds
Max redirects	`httpMaxRedirects()`	Maximum number of HTTP redirects to follow
Max host connections	`httpMaxHostOpen()`	Concurrent connections per host
HTTP proxy	`httpProxy()`	Proxy URL (HTTP/HTTPS)
TLS verify	`tlsVerifyHost()`	Enable/disable TLS certificate verification

Architecture – Overall system architecture
Browser Engine – How the browser engine uses the network layer
CDP Protocol – CDP protocol implementation that communicates over WebSocket

Network Layer #

Overview #

HTTP Core #

Connections #

HTTP Methods #

Headers #

Authentication #

Multi-Handle Management #

HTTP Client #

Request Lifecycle #

Connection Pooling #

Request Queuing #

Request Interception (CDP) #

Proxy Configuration #

TLS Configuration #

WebSocket #

Connection Upgrade #

Message Types #

Message Reading #

SIMD-Optimized Masking #

Sending Messages #

Robots.txt Handling #

Parsing #

Pattern Matching #

Robot Store #

Integration with HTTP Client #

Configuration Reference #

Related Topics #