Browser Engine
The browser engine is the core of Lightpanda, responsible for parsing HTML, managing the DOM, executing JavaScript, and orchestrating the page lifecycle. It is written in Zig for maximum performance and minimal memory overhead, making it well suited for headless automation workloads where thousands of pages may be processed in rapid succession.
Prerequisites: Familiarity with browser internals (DOM, JavaScript runtimes) is helpful but not required. For a high-level view of how the engine fits into the overall system, see Architecture.
Component Overview
The browser engine is organized around four primary subsystems:
| Subsystem | Source | Responsibility |
|---|---|---|
| Browser | Browser.zig |
Top-level instance, owns the JS environment and HTTP client |
| Session | Session.zig |
Manages page lifetime, cookie jar, navigation history, and storage |
| Page | Page.zig |
DOM tree, script execution context, event loop, and Web API bindings |
| ScriptManager | ScriptManager.zig |
Loads, orders, and executes inline, async, deferred, and module scripts |
Additional subsystems handle HTML parsing (parser/), JavaScript integration (js/), DOM events (EventManager.zig), and memory-efficient object allocation (Factory.zig).
Browser Instance
Browser is the entry point. Each instance creates an isolated JavaScript environment (V8 isolate) and holds a reference to the shared HTTP client. You can create multiple Browser instances, but each contains exactly one active session at a time.
// Simplified initialization
var browser = try Browser.init(app, .{
.http_client = http_client,
});
defer browser.deinit();
Key operations on a Browser:
newSession– closes any existing session and creates a fresh one with its own page, cookie jar, and navigation history.closeSession– tears down the current session and issues a critical memory pressure notification to V8 so it can reclaim garbage-collected memory.runMicrotasks/runMacrotasks– drives the JavaScript event loop. Microtasks (promises) run first, then macrotasks (timers, I/O callbacks). This two-phase approach mirrors the browser event loop specification.
Session Management
A Session represents a browsing context that persists across navigations. It owns:
- Cookie jar – stores cookies across page loads within the session.
- Navigation history – tracks back/forward navigation via the
HistoryWeb API. - Storage shed – implements
localStorageandsessionStorage. - Origin map – enables same-origin context sharing for V8 (so scripts from the same origin share global objects as expected).
Sessions also manage the Factory, which is the arena-based allocator for all DOM nodes and Web API objects on the current page. When the root page navigates, the factory is reset, releasing all memory from the previous page in a single operation rather than freeing individual objects.
// Create a new session with a notification callback
var session = try browser.newSession(¬ification);
// The session manages a single root page at a time
// Navigating replaces the page but preserves cookies and history
Page Lifecycle
Page is the largest and most complex component. It represents a single loaded document and manages:
DOM Tree
The page owns the full DOM tree, including:
- Element lookups – lazily-created style, classList, and dataset objects that save 24 bytes per element by only allocating when JavaScript actually accesses these properties.
- Attribute identity – a lookup table ensures that repeated access to the same DOM attribute returns the same
Attributeobject, preserving identity semantics without bloating every element. - Shadow DOM – shadow roots are stored in a separate lookup, enabling Web Components support.
- Custom Elements – tracks undefined custom elements and upgrades them when their definitions are registered.
Event System
The page integrates multiple observer patterns:
- MutationObserver – monitors DOM changes and batches delivery to avoid re-entrancy issues.
- IntersectionObserver – tracks element visibility (useful for lazy-loading detection in scrapers).
- PerformanceObserver – collects performance timing entries.
- Live Ranges – maintains active Range objects that update automatically when the DOM mutates.
Load State Machine
Page loading follows a state machine with the following phases:
pre– initial state before parsing begins.- Parsing – HTML is fed to the parser; the DOM tree is constructed incrementally.
- Scripts pending – deferred and async scripts are loaded and executed in the correct order.
complete– all scripts, iframes, and resources have loaded; theloadevent fires on thewindow.
The _pending_loads counter tracks outstanding loads (scripts, iframes) and triggers the load event only when it reaches zero.
Navigation
Navigation can be triggered by:
- CDP commands –
Page.navigatecalled from the CDP layer. - JavaScript –
window.locationchanges are queued and executed on the next tick to avoid re-entrancy during script evaluation. - Link clicks – handled via the event system.
// From a CDP client perspective (Puppeteer example)
await page.goto('https://example.com');
await page.waitForSelector('#content');
const html = await page.content();
HTML Parser
Lightpanda uses html5ever, a Rust-based HTML parser that implements the WHATWG HTML specification. The Zig integration works through a C FFI layer with a callback-driven architecture.
Parsing Modes
The parser supports three modes:
| Mode | Use Case |
|---|---|
document |
Full page parsing from network response |
fragment |
innerHTML and insertAdjacentHTML operations |
document_write |
Dynamic content injection via document.write() |
Callback Architecture
Rather than returning a tree, html5ever calls back into Zig for each DOM operation:
createElement– allocates a new element in the page’s arena.append– attaches a node or text to its parent.pop– signals that an element’s children are complete.removeFromParent/reparentChildren– handles tree mutations required by the HTML parsing algorithm (e.g., foster parenting, adoption agency).
Streaming Parser
For large documents, a streaming API allows feeding HTML in chunks:
// Streaming parse API (simplified)
const parser = html5ever_streaming_parser_create(doc, ctx, /* callbacks... */);
html5ever_streaming_parser_feed(parser, chunk.ptr, chunk.len);
// ... feed more chunks ...
html5ever_streaming_parser_finish(parser);
This is important for network-streamed pages where the full HTML is not available upfront.
JavaScript Integration
Lightpanda embeds V8 as its JavaScript engine. The integration is structured in layers:
Env (V8 Isolate)
Env wraps a V8 isolate – an isolated sandbox for executing JavaScript. It manages:
- Isolate lifecycle – creation, configuration, and teardown of the V8 isolate.
- Context pool – up to 64 execution contexts (one per frame/origin combination).
- Origin sharing – V8 contexts from the same origin share globals, matching standard browser behavior.
- Task scheduling – microtasks, macrotasks, and idle tasks are routed through V8’s platform API.
Bridge (Zig-to-V8 Bindings)
The bridge.zig module provides automatic binding generation between Zig structs and V8 JavaScript objects. This means Web API implementations in Zig are automatically exposed to JavaScript without manual binding code.
// The bridge maps Zig types to JavaScript automatically
// A Zig struct like Window becomes accessible as window.* in JS
pub fn Bridge(comptime T: type) type {
return bridge.Builder(T);
}
Key JS Types
The js/ directory provides Zig wrappers for V8 types:
| Zig Type | V8 Equivalent | Purpose |
|---|---|---|
Value |
v8::Value |
Base type for all JS values |
Object |
v8::Object |
JavaScript object |
Function |
v8::Function |
Callable function |
Promise |
v8::Promise |
Async result |
Module |
v8::Module |
ES module |
Context |
v8::Context |
Execution context with its own global object |
TryCatch |
v8::TryCatch |
Exception handling scope |
Snapshot Support
V8 snapshots allow serializing the initialized state of the JavaScript heap. Lightpanda uses this to speed up startup by loading pre-built snapshots of the Web API bindings rather than re-initializing them for every page.
Script Manager
The ScriptManager handles script loading and execution ordering, which is one of the most complex parts of a browser engine. It manages four categories of scripts:
Script Categories
| Category | HTML Example | Behavior |
|---|---|---|
| Inline | <script>code</script> |
Executes immediately, blocks parsing |
| Async | <script async src="..."> |
Downloads in parallel, executes when ready (any order) |
| Deferred | <script defer src="..."> |
Downloads in parallel, executes in order after parsing |
| Module | <script type="module"> |
Treated as deferred by default, supports import |
Import Maps
The ScriptManager supports import maps, which allow remapping module specifiers:
<script type="importmap">
{
"imports": {
"lodash": "https://cdn.example.com/lodash/4.17.21/lodash.min.js"
}
}
</script>
<script type="module">
import _ from "lodash"; // resolved via import map
</script>
Module Resolution
For ES modules, the ScriptManager maintains:
imported_modules– a cache of downloaded modules keyed by URL. Modules may complete out of order but are processed in dependency order by V8.importmap– a mapping from module specifiers to resolved URLs.- Buffer pool – reusable buffers for script content to minimize allocations during page loads.
Completion Tracking
The ScriptManager tracks when all scripts have finished loading and notifies the page, which is necessary before the load event can fire. The page_notified_of_completion flag ensures this notification happens exactly once.
Web API Implementation
Lightpanda implements a substantial subset of Web APIs as Zig structs. Each struct is automatically bound to JavaScript through the bridge system. The major APIs include:
DOM APIs
Document,Element,Node,ShadowRootDOMParser,XMLSerializerRange,Selection,TreeWalker,NodeIterator
Events
Event,EventTarget,CustomEventMutationObserver,IntersectionObserver,ResizeObserverKeyboardEvent,PageTransitionEvent
Navigation and History
Window,Location,History,NavigatorURL,URLSearchParams
Storage
localStorage,sessionStorage,Cookie
Other APIs
Console,Crypto,SubtleCryptoAbortController,AbortSignalBlob,File,FileReader,FileListPerformance,PerformanceObserverMessageChannel,MessagePort
Memory Management
Lightpanda uses arena-based allocation throughout the engine for predictable, efficient memory management:
Arena Hierarchy
- App arena – lives for the entire application lifetime.
- Session arena – lives for the duration of a browsing session.
- Page arena – lives for a single page load. When the page navigates or is closed, the entire arena is freed at once.
- Call arena – lives for a single Zig function invocation from JavaScript. This is the preferred arena when possible, as it has the shortest lifetime.
Arena Pool
The ArenaPool allows reusing arena allocators across pages to avoid repeated OS-level memory allocation. In debug mode, a leak tracker verifies that all borrowed arenas are returned to the pool.
Lazy Allocation
Many per-element data structures (style objects, class lists, datasets, shadow roots) are allocated lazily – only when JavaScript code actually accesses them. This saves significant memory on pages with thousands of elements where only a few are interacted with programmatically.
Related Topics
- Architecture Overview – high-level view of all Lightpanda components
- Network Layer – HTTP client, WebSocket, and proxy support
- CDP Protocol – Chrome DevTools Protocol implementation that drives the engine
- Building from Source – compile and develop the engine locally
- Testing – run unit tests and Web Platform Tests against the engine