Architecture
Lightpanda is a headless browser written in Zig, designed for high-performance web automation, scraping, and AI agent workflows. This page provides an overview of its major architectural components and how they fit together. For in-depth coverage, see the dedicated pages linked in each section.
Prerequisites
Before reading this document, you should be familiar with:
- Basic browser concepts (DOM, JavaScript execution, HTTP)
- The Chrome DevTools Protocol (CDP) at a high level
- The Overview page for project context
High-Level Design
Lightpanda follows a layered architecture with three main subsystems:
- Browser Engine — DOM management, HTML parsing, JavaScript execution, and page lifecycle
- Network Layer — HTTP client, WebSocket support, robots.txt handling, and proxy configuration
- CDP Protocol — Chrome DevTools Protocol server that exposes browser functionality to automation clients
These layers are coordinated by two central components: the App struct (application-level state) and the Server struct (connection management).
Application Lifecycle
The entry point (main.zig) initializes the allocator, parses command-line arguments, and creates the App instance. The application supports three operational modes:
- serve — Start a CDP-compatible WebSocket server for automation clients
- fetch — Load a single URL and dump the rendered output
- mcp — Start a Model Context Protocol server for AI agent integration
// Simplified startup flow
var app = try App.init(allocator, &args);
defer app.deinit();
switch (args.mode) {
.serve => { /* start CDP server */ },
.fetch => { /* single-page fetch */ },
.mcp => { /* MCP server */ },
}
The App Struct
App is the central coordinator that owns all major subsystems:
| Field | Purpose |
|---|---|
network |
The I/O event loop (epoll/kqueue runtime) |
platform |
JavaScript engine platform (V8) |
snapshot |
V8 startup snapshot for fast JS context creation |
telemetry |
Anonymous usage telemetry |
arena_pool |
Pooled arena allocators for per-request memory |
config |
Parsed command-line configuration |
This design keeps global state minimal and makes the dependency graph explicit.
Browser Engine
The browser engine is responsible for loading web pages, parsing HTML, building the DOM tree, and executing JavaScript. It is the largest subsystem in the codebase.
Key components include:
- Browser — Top-level browser instance. Each
Browserholds a JavaScript environment and manages a single session at a time. - Session — Represents a browsing session with its own set of pages.
- Page — Represents an individual web page with its DOM, JavaScript context, event handling, and navigation state.
- ScriptManager — Handles JavaScript evaluation and script element processing.
- Parser — An HTML5-compliant parser (based on html5ever via Zig bindings) that builds the DOM tree.
// Creating a browser and loading a page
var browser = try Browser.init(app, .{ .http_client = http_client });
defer browser.deinit();
var session = try browser.newSession(notification);
try session.navigate(url);
The Page module implements a comprehensive set of Web APIs including Window, Document, Element, Event, MutationObserver, IntersectionObserver, Location, Performance, and more. This allows Lightpanda to faithfully execute client-side JavaScript that interacts with the DOM.
For full details on DOM implementation, JavaScript integration, HTML parsing, and the page lifecycle, see Browser Engine.
Network Layer
The network layer provides the I/O foundation that all HTTP requests and WebSocket connections run on. It uses a platform-native event loop (epoll on Linux, kqueue on macOS) for efficient non-blocking I/O.
Key components:
- Runtime — The core event loop that manages socket polling and callback dispatch.
- http.zig — HTTP/HTTPS client implementation.
- websocket.zig — WebSocket protocol support, used for both CDP communication and page-level WebSocket APIs.
- Robots.zig — robots.txt fetching and compliance checking.
The HttpClient sits between the browser engine and the raw network layer, handling cookies, redirects, and content-type detection.
// Network initialization is handled by App
app.network = try Network.init(allocator, config);
// The event loop is started when serving
app.network.run();
For details on HTTP internals, WebSocket handling, robots.txt support, and proxy configuration, see Network Layer.
CDP Protocol
Lightpanda exposes a Chrome DevTools Protocol server over WebSocket, allowing standard automation tools like Puppeteer, Playwright, and chromedp to control the browser.
The CDP implementation includes:
- Server — Accepts incoming WebSocket connections and manages client threads. Uses a thread-per-connection model with configurable connection limits.
- CDP dispatcher — Routes incoming JSON-RPC messages to the appropriate domain handler.
- Domain handlers — Implement CDP domains such as
Page,Runtime,DOM,Network,Target, and more.
// Server initialization binds to address and starts accepting connections
var server = lp.Server.init(app, address);
defer server.deinit();
The server handles the WebSocket upgrade handshake, parses CDP messages, and dispatches them to domain-specific handlers. Each client connection gets its own thread with a dedicated Browser instance, ensuring isolation between concurrent automation sessions.
For the full list of supported CDP domains and implementation details, see CDP Protocol.
Memory Management
Lightpanda uses Zig’s explicit memory management model with several strategies:
- Arena allocators — Per-request memory is allocated from pooled arenas (
ArenaPool), enabling fast bulk deallocation when a page or session ends. - Debug allocator — In debug mode, a general-purpose allocator with leak detection is used to catch memory issues during development.
- C allocator — In release mode, the standard C allocator is used for performance.
- Memory pools — The server uses
MemoryPoolfor client objects to reduce allocation overhead under concurrent load.
This approach minimizes allocation overhead in hot paths while maintaining safety during development.
Concurrency Model
Lightpanda uses a hybrid concurrency model:
- Event loop — The network runtime uses non-blocking I/O with epoll/kqueue for efficient socket management.
- Thread-per-connection — Each CDP client connection spawns a dedicated OS thread. Thread count is bounded by a configurable maximum.
- Atomic operations — Thread-safe counters and compare-and-swap loops manage the active thread count without lock contention.
// CAS loop for thread-safe connection counting
var current = self.active_threads.load(.monotonic);
while (current < max_connections) {
current = self.active_threads.cmpxchgWeak(
current, current + 1, .monotonic, .monotonic
) orelse break;
}
Next Steps
Dive deeper into each subsystem:
- Browser Engine — DOM, JavaScript, HTML parsing, and page lifecycle
- Network Layer — HTTP client, WebSocket, robots.txt, and proxy support
- CDP Protocol — Chrome DevTools Protocol domains and automation
Related Topics
- Overview — Project introduction and key features
- Quick Start — Get up and running with Lightpanda
- Building from Source — Build prerequisites and instructions