Skip to content

feat: initial pass at a single base binary#1786

Draft
danyalprout wants to merge 2 commits intomainfrom
danyal/combined-binary
Draft

feat: initial pass at a single base binary#1786
danyalprout wants to merge 2 commits intomainfrom
danyal/combined-binary

Conversation

@danyalprout
Copy link
Copy Markdown
Collaborator

@danyalprout danyalprout commented Mar 30, 2026

Description

Setup a prototype single base binary that starts both an EL and CL client in process. Initially this binary just connects them over IPC.

This is a very rough first pass to get something going. There are two major things to iterate on:

  • UX of the CLI
  • Removing the engine API and having the CL/EL communicate completely in process.
~/c/e/base > just devnet status
./etc/scripts/devnet/status.sh

Component          | Unsafe     | Safe
-------------------+------------+-----------
L1                 | 25         | -
L2 Builder         | 18         | 10
Client: current    | 18         | 10
Client: combined   | 18         | 10
L2 Ingress         | not running | -

@cb-heimdall
Copy link
Copy Markdown
Collaborator

cb-heimdall commented Mar 30, 2026

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 0
Sum 1

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
base Ignored Ignored Preview Mar 31, 2026 4:38am

Request Review

let engine_config = EngineConfig {
config: Arc::new(rollup_config.clone()),
l2_url: auth_http_url,
l2_jwt_secret: auth_jwt.secret().clone(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JwtSecret is Copy (it's a [u8; 32] wrapper), so .clone() is misleading — but more importantly, this line moves auth_jwt.secret() by value (via .clone()) even though JwtSecret is Copy. The real issue is that auth_jwt is consumed here conceptually, but the InMemoryAuthJwt struct (and its _file: NamedTempFile) must remain alive for the entire lifetime of the process to keep the temp file on disk. If auth_jwt were accidentally dropped (e.g. by a future refactor), the OS could reclaim the temp file and the auth handshake would fail silently. Consider adding a comment documenting this invariant, or hold the InMemoryAuthJwt explicitly until shutdown.

@danyalprout danyalprout force-pushed the danyal/combined-binary branch from 5206771 to 8cf0548 Compare March 30, 2026 02:37

fn create_in_memory_auth_jwt() -> eyre::Result<InMemoryAuthJwt> {
let secret = JwtSecret::random();
let mut file = NamedTempFile::new().wrap_err("failed to create auth JWT temp file")?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempfile is a production dependency used to write the JWT secret to a temporary file for the auth IPC handshake. If the OS reclaims the temp file (e.g. via /tmp reaping on long-running nodes), the auth path becomes stale and reconnection will fail silently.

Consider writing the JWT file into the node's datadir instead (which is already under the operator's control) rather than relying on the OS temp directory. This also avoids the file permission concern flagged separately.

@github-actions github-actions bot added the M-prevent-stale Meta: Not Stale label Mar 31, 2026
@danyalprout danyalprout force-pushed the danyal/combined-binary branch from 8cf0548 to c8b8808 Compare March 31, 2026 04:38
Comment on lines +145 to +147
let rollup_config = resolved.rollup_config;
let engine_config = EngineConfig {
config: Arc::new(rollup_config.clone()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollup_config is cloned at line 147 to wrap in Arc for EngineConfig, then the original is moved into RollupNodeBuilder::new() at line 155, which wraps it in another Arc inside build(). Consider wrapping in Arc once and sharing:

let rollup_config = Arc::new(resolved.rollup_config);
let engine_config = EngineConfig {
    config: Arc::clone(&rollup_config),
    // ...
};

This requires RollupNodeBuilder::new to accept Arc<RollupConfig> instead of RollupConfig (or add a builder method for it), but avoids cloning the full config struct.

Comment on lines +240 to +251
match request_execution_shutdown(node_exit_future.as_mut(), execution_shutdown)
.await
{
Ok(el_exit) => {
if let Err(ref e) = el_exit {
warn!(target: "base", error = %e, "Execution layer exited with error during shutdown");
}
}
Err(e) => {
warn!(target: "base", error = %e, "Execution layer shutdown mechanism failed");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ExitReason::Consensus branch, the EL shutdown result is logged but never propagated. If the EL exits with an error during shutdown (line 244) or the shutdown mechanism itself fails (line 248), only a warn! is emitted. The function then proceeds to check consensus_exit_is_unexpected and potentially returns Ok(()) even though the EL failed.

This means operators won't get a non-zero exit code when the EL has a critical failure during CL-triggered shutdown. Consider at least propagating the EL error when the consensus exited cleanly:

Ok(el_exit) => {
    el_exit?; // propagate EL errors
}

Or store it and return whichever error is most actionable.

let filter = tracing_subscriber::EnvFilter::builder()
.with_default_directive(log_config.global_level.into())
.from_env_lossy()
.add_directive("discv5=error".parse().expect("valid directive"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"discv5=error" is added as a directive here AND also hardcoded in LogConfig::init_tracing_subscriber() at crates/utilities/cli/src/tracing.rs:84. Since init_logging calls init_tracing_subscriber_with_layers (which skips the default init_tracing_subscriber), the duplicate here is correct — but it's fragile. If someone later switches to init_tracing_subscriber(), the directive gets doubled. Consider extracting the common noise-suppression directives into a shared helper to avoid drift.

Comment on lines +174 to +178
async move {
rollup_node
.start_with_shutdown(consensus_shutdown)
.await
.map_err(|e| eyre::eyre!("failed to start rollup node service: {e}"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_with_shutdown returns Result<(), String>, and the {e} in the format string just re-embeds the string error message into a new eyre::Report. This works, but the resulting error report will say "failed to start rollup node service: <actual error>" as a single flat string with no causal chain.

This is a consequence of the upstream Result<(), String> API — not something this PR can easily fix. But it's worth noting that when the consensus service migrates to a proper error type, this can be improved to use .wrap_err() to preserve the chain.

@github-actions
Copy link
Copy Markdown
Contributor

Review Summary

Solid prototype for the unified EL+CL binary. The CLI design is clean, the shutdown orchestration is well-structured, and the IPC-based EL/CL link avoids network overhead.

Findings

1. rollup_config cloned to create two separate Arcs (bin/base/src/node.rs:145-155) - The config is cloned at line 147 for EngineConfig Arc, then the original is moved into RollupNodeBuilder which wraps it in another Arc. Wrap in Arc once and share.

2. EL shutdown errors swallowed in Consensus exit branch (bin/base/src/node.rs:240-251) - When the consensus layer exits first, the EL shutdown result is only logged but never propagated. Operators will not get a non-zero exit code if EL fails during CL-triggered shutdown.

3. Upstream String error API (bin/base/src/node.rs:174-178) - start_with_shutdown returns a flat String error, so map_err wrapping produces a single-layer error with no causal chain. Worth improving when the consensus service migrates to a proper error type.

4. Duplicated discv5=error noise suppression (bin/base/src/cli.rs:278) - This directive is added manually here and also hardcoded in LogConfig::init_tracing_subscriber(). The duplication is correct but fragile. Consider extracting into a shared helper.

Notes (no action needed)

  • JWT temp file permissions (0o600) and lifetime management look correct.
  • The ExitReason-based shutdown handles all three cases (signal, EL exit, CL exit) properly.
  • IPC auth model is correct: reth auth IPC relies on filesystem permissions rather than JWT tokens.
  • Test coverage is good across CLI parsing, config resolution, shutdown, JWT lifecycle, and ports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

M-prevent-stale Meta: Not Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants