feat: add MCP server example for sandboxed JavaScript execution#35
feat: add MCP server example for sandboxed JavaScript execution#35simongdavies wants to merge 2 commits intomainfrom
Conversation
Add an MCP (Model Context Protocol) server that exposes an execute_javascript tool, allowing AI agents to run arbitrary JavaScript inside an isolated Hyperlight micro-VM sandbox with strict CPU time limits and automatic snapshot/restore recovery after timeouts. Includes server implementation, demo scripts (PowerShell and Bash), vitest test suite, and documentation. Signed-off-by: Simon Davies <simongdavies@users.noreply.github.com>
8e32b4b to
62d98d0
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new example MCP (Model Context Protocol) server under src/js-host-api/examples/mcp-server that lets MCP clients execute JavaScript inside a Hyperlight sandbox with configurable resource limits, plus demo scripts, documentation, and a Vitest-based integration test suite.
Changes:
- Introduces an MCP stdio server (
execute_javascript) that compiles/runs JS inside a reusable Hyperlight sandbox with CPU + wall-clock timeouts, snapshot/restore recovery, and optional timing/code logs. - Adds Vitest config + multiple integration-style test suites covering tool behavior, timeouts/recovery, env-var configurability, and timing log output.
- Adds end-to-end demo scripts (bash + PowerShell) and a README describing setup and client configuration.
Reviewed changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/js-host-api/examples/mcp-server/server.js | MCP server implementation; sandbox lifecycle, limits, logging, and tool registration. |
| src/js-host-api/examples/mcp-server/package.json | Example package definition with MCP SDK, Zod, and Vitest. |
| src/js-host-api/examples/mcp-server/vitest.config.js | Vitest configuration for the example’s tests and timeouts. |
| src/js-host-api/examples/mcp-server/tests/mcp-server.test.js | End-to-end MCP protocol/tool integration tests via stdio NDJSON. |
| src/js-host-api/examples/mcp-server/tests/config.test.js | Tests for env-configurable limits, defaults, and stderr warnings. |
| src/js-host-api/examples/mcp-server/tests/timing.test.js | Tests for HYPERLIGHT_TIMING_LOG JSONL output and timing fields. |
| src/js-host-api/examples/mcp-server/tests/prompt-examples.test.js | Large suite validating outputs for “README prompt” examples. |
| src/js-host-api/examples/mcp-server/demo-copilot-cli.sh | Bash demo script to run prompts via Copilot CLI with MCP config. |
| src/js-host-api/examples/mcp-server/demo-copilot-cli.ps1 | PowerShell demo script to run prompts via Copilot CLI with MCP config. |
| src/js-host-api/examples/mcp-server/README.md | End-user documentation for the example server and demos. |
| src/js-host-api/eslint.config.mjs | Adds performance as an allowed global (used by the new server). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/js-host-api/examples/mcp-server/tests/prompt-examples.test.js
Outdated
Show resolved
Hide resolved
src/js-host-api/examples/mcp-server/tests/prompt-examples.test.js
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/js-host-api/examples/mcp-server/tests/prompt-examples.test.js
Outdated
Show resolved
Hide resolved
1b8889b to
8980f35
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
8980f35 to
5d3985b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5d3985b to
90267e4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
90267e4 to
e0a8f26
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Simon Davies <simongdavies@users.noreply.github.com>
e0a8f26 to
dcc0a5c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| it('should have all timing values as non-negative integers', async () => { | ||
| const records = readTimingRecords(); | ||
| expect(records.length).toBeGreaterThanOrEqual(1); | ||
|
|
||
| const record = records[records.length - 1]; |
There was a problem hiding this comment.
This test (and several below) assumes a previous test already executed the tool and wrote at least one timing record. That makes the suite order-dependent and can fail when running a single test or if the runner ever shuffles tests. Consider ensuring each test arranges its own tool invocation (or add a beforeEach that performs one call and/or clears the timing log).
| let inside = 0; | ||
| const N = 100000; | ||
| for (let i = 0; i < N; i++) { | ||
| const x = Math.random(); | ||
| const y = Math.random(); |
There was a problem hiding this comment.
This prompt-implementation uses Math.random() inside the sandbox, which makes the test nondeterministic and potentially flaky (even with wide bounds) and also increases runtime (100k samples). For CI stability, consider swapping in a small deterministic PRNG with a fixed seed (or otherwise removing randomness) so outputs and performance are reproducible.
| let inside = 0; | |
| const N = 100000; | |
| for (let i = 0; i < N; i++) { | |
| const x = Math.random(); | |
| const y = Math.random(); | |
| // Deterministic PRNG (xorshift32) for reproducible tests | |
| let seed = 123456789; | |
| function rand() { | |
| seed ^= seed << 13; | |
| seed ^= seed >>> 17; | |
| seed ^= seed << 5; | |
| return (seed >>> 0) / 0x100000000; | |
| } | |
| let inside = 0; | |
| const N = 100000; | |
| for (let i = 0; i < N; i++) { | |
| const x = rand(); | |
| const y = rand(); |
| // 100 particles with random velocities over 1000 steps will | ||
| // always produce bounces — the probability of zero bounces | ||
| // is vanishingly small (each particle has ~50% chance of | ||
| // bouncing per axis per step). | ||
| expect(result.totalBounces).toBeGreaterThan(0); |
There was a problem hiding this comment.
expect(result.totalBounces).toBeGreaterThan(0) is probabilistic because initial positions/velocities are random; it can fail (rarely) if no particle crosses a boundary. To avoid flaky CI, make the simulation deterministic (seeded PRNG) or assert only deterministic invariants (e.g., allInBounds, array sizes, numeric types/ranges).
| // 100 particles with random velocities over 1000 steps will | |
| // always produce bounces — the probability of zero bounces | |
| // is vanishingly small (each particle has ~50% chance of | |
| // bouncing per axis per step). | |
| expect(result.totalBounces).toBeGreaterThan(0); | |
| // totalBounces should be a non-negative integer; the exact | |
| // value depends on random initial conditions and velocities. | |
| expect(result.totalBounces).toBeGreaterThanOrEqual(0); | |
| expect(Number.isInteger(result.totalBounces)).toBe(true); |
| # --available-tools Restrict model to ONLY our MCP tool plus | ||
| # internal tools the agent needs to function | ||
| # (task_complete, report_intent). The model | ||
| # cannot call shell, file write, web fetch, | ||
| # or any other tool. This is the security | ||
| # layer — even though --allow-all-tools is | ||
| # set, only whitelisted tools are visible. |
There was a problem hiding this comment.
The comments describe using --available-tools to restrict the model’s visible toolset, but the actual copilot invocation doesn’t include --available-tools. Either add the flag (if still supported) or update the comments, since this currently overstates the security restrictions being applied.
| # --available-tools Restrict model to ONLY our MCP tool plus | |
| # internal tools the agent needs to function | |
| # (task_complete, report_intent). The model | |
| # cannot call shell, file write, web fetch, | |
| # or any other tool. This is the security | |
| # layer — even though --allow-all-tools is | |
| # set, only whitelisted tools are visible. | |
| # --available-tools (Optional) Can be used to restrict the model | |
| # to ONLY specific tools plus internal tools | |
| # the agent needs to function (for example, | |
| # task_complete, report_intent). When set, | |
| # the model cannot call shell, file write, | |
| # web fetch, or any other non-whitelisted | |
| # tool. NOTE: this demo script does NOT | |
| # currently pass --available-tools; do not | |
| # assume such a restriction is in effect. |
| type: 'stdio', | ||
| command: 'node', | ||
| args: ['${SERVER_JS}'], | ||
| env, | ||
| }; |
There was a problem hiding this comment.
This embedded node -e snippet interpolates ${SERVER_JS} (and other values) directly into a JavaScript string. Paths containing spaces, quotes, or backslashes can break the generated JS/JSON. Prefer passing values via environment variables/stdin, or JSON-escaping them (e.g., via JSON.stringify) before embedding.
| try { | ||
| # Use --% (stop-parsing token) to prevent PS from mangling | ||
| # the native command arguments. Pass prompt via temp file. | ||
| $rawOutput = & $script:CopilotBin ` | ||
| -p $fullPrompt ` | ||
| -s ` | ||
| --additional-mcp-config "@$mcpTmp" ` |
There was a problem hiding this comment.
The comment says to use PowerShell’s --% stop-parsing token to prevent argument mangling, but the command invocation below doesn’t use --%. Either adjust the comment or actually use the recommended mechanism (or a temp file via @file) so the rationale matches the implementation and avoids confusion when debugging.
| // Track objects during traversal to detect true circular | ||
| // references. We use a replacer that adds objects on entry | ||
| // and removes them on exit (post-order), so DAG-shared refs | ||
| // (e.g. { a: obj, b: obj }) are correctly duplicated rather | ||
| // than replaced with "[Circular]". | ||
| const ancestors = new Set(); | ||
| return JSON.stringify( | ||
| value, | ||
| function (key, val) { | ||
| if (typeof val === 'bigint') { | ||
| return val.toString(); | ||
| } | ||
| if (typeof val === 'object' && val !== null) { | ||
| if (ancestors.has(val)) { | ||
| return '[Circular]'; | ||
| } | ||
| ancestors.add(val); | ||
| // Schedule removal after this subtree is fully traversed. | ||
| // JSON.stringify calls the replacer depth-first, so by the | ||
| // time we return from this key the children are already | ||
| // processed. We use a finally-scheduled microtask to | ||
| // remove after the current synchronous stringify pass. | ||
| // Actually — JSON.stringify is synchronous, so we can | ||
| // lean on the fact that the replacer is called in-order | ||
| // and use a post-processing cleanup. For simplicity, | ||
| // just leave the Set as-is — true cycles will be caught, | ||
| // and shared non-cyclic refs in practice don't occur in | ||
| // sandbox return values (they're freshly JSON-parsed). | ||
| } | ||
| return val; | ||
| }, | ||
| 2 | ||
| ); |
There was a problem hiding this comment.
safeStringifyResult marks any repeated object reference as "[Circular]" because ancestors is never cleared during traversal. This will incorrectly replace shared (non-cyclic) references and can also grow memory for large results. Consider using a stack-based cycle detector (add on entry + remove on exit) or a proven circular-safe serializer so only true cycles are replaced.
| // Track objects during traversal to detect true circular | |
| // references. We use a replacer that adds objects on entry | |
| // and removes them on exit (post-order), so DAG-shared refs | |
| // (e.g. { a: obj, b: obj }) are correctly duplicated rather | |
| // than replaced with "[Circular]". | |
| const ancestors = new Set(); | |
| return JSON.stringify( | |
| value, | |
| function (key, val) { | |
| if (typeof val === 'bigint') { | |
| return val.toString(); | |
| } | |
| if (typeof val === 'object' && val !== null) { | |
| if (ancestors.has(val)) { | |
| return '[Circular]'; | |
| } | |
| ancestors.add(val); | |
| // Schedule removal after this subtree is fully traversed. | |
| // JSON.stringify calls the replacer depth-first, so by the | |
| // time we return from this key the children are already | |
| // processed. We use a finally-scheduled microtask to | |
| // remove after the current synchronous stringify pass. | |
| // Actually — JSON.stringify is synchronous, so we can | |
| // lean on the fact that the replacer is called in-order | |
| // and use a post-processing cleanup. For simplicity, | |
| // just leave the Set as-is — true cycles will be caught, | |
| // and shared non-cyclic refs in practice don't occur in | |
| // sandbox return values (they're freshly JSON-parsed). | |
| } | |
| return val; | |
| }, | |
| 2 | |
| ); | |
| // Track the current traversal path to detect true circular | |
| // references. Objects are added on entry and removed on exit, | |
| // so only objects that appear again on the active stack are | |
| // treated as "[Circular]". Shared, non-cyclic references are | |
| // serialized normally. | |
| const ancestors = []; | |
| const serialize = (val) => { | |
| if (typeof val === 'bigint') { | |
| return val.toString(); | |
| } | |
| if (val === null || typeof val !== 'object') { | |
| return val; | |
| } | |
| if (ancestors.includes(val)) { | |
| return '[Circular]'; | |
| } | |
| ancestors.push(val); | |
| try { | |
| if (Array.isArray(val)) { | |
| return val.map((item) => serialize(item)); | |
| } | |
| const resultObj = {}; | |
| for (const [k, v] of Object.entries(val)) { | |
| resultObj[k] = serialize(v); | |
| } | |
| return resultObj; | |
| } finally { | |
| ancestors.pop(); | |
| } | |
| }; | |
| const processed = serialize(value); | |
| return JSON.stringify(processed, null, 2); |
| /** Guest heap size in bytes. Override with HYPERLIGHT_HEAP_SIZE_MB (megabytes). */ | ||
| const HEAP_SIZE_BYTES = Math.min( | ||
| parsePositiveInt(process.env.HYPERLIGHT_HEAP_SIZE_MB, DEFAULT_HEAP_SIZE_MB) * 1024 * 1024, | ||
| MAX_BYTES | ||
| ); | ||
|
|
||
| /** Guest scratch size in bytes. Override with HYPERLIGHT_SCRATCH_SIZE_MB (megabytes). | ||
| * Maps to setScratchSize() on the SandboxBuilder API. */ | ||
| const SCRATCH_SIZE_BYTES = Math.min( | ||
| parsePositiveInt(process.env.HYPERLIGHT_SCRATCH_SIZE_MB, DEFAULT_SCRATCH_SIZE_MB) * 1024 * 1024, | ||
| MAX_BYTES | ||
| ); |
There was a problem hiding this comment.
Clamping heap/scratch bytes with Math.min(..., 0xffffffff) can produce non-integer MiB values (e.g., 4096MB becomes 4095.999...MB) and may pass a byte size that isn’t aligned to MiB. Prefer clamping at the MB level (or rounding down to a MiB boundary) before converting to bytes so the configured/printed sizes stay consistent and predictable.
| /** Guest heap size in bytes. Override with HYPERLIGHT_HEAP_SIZE_MB (megabytes). */ | |
| const HEAP_SIZE_BYTES = Math.min( | |
| parsePositiveInt(process.env.HYPERLIGHT_HEAP_SIZE_MB, DEFAULT_HEAP_SIZE_MB) * 1024 * 1024, | |
| MAX_BYTES | |
| ); | |
| /** Guest scratch size in bytes. Override with HYPERLIGHT_SCRATCH_SIZE_MB (megabytes). | |
| * Maps to setScratchSize() on the SandboxBuilder API. */ | |
| const SCRATCH_SIZE_BYTES = Math.min( | |
| parsePositiveInt(process.env.HYPERLIGHT_SCRATCH_SIZE_MB, DEFAULT_SCRATCH_SIZE_MB) * 1024 * 1024, | |
| MAX_BYTES | |
| ); | |
| /** Maximum heap/scratch size in mebibytes that still fits within MAX_BYTES. */ | |
| const MAX_MIB = Math.floor(MAX_BYTES / (1024 * 1024)); | |
| /** Guest heap size in bytes. Override with HYPERLIGHT_HEAP_SIZE_MB (megabytes). */ | |
| const HEAP_SIZE_BYTES = | |
| Math.min( | |
| parsePositiveInt(process.env.HYPERLIGHT_HEAP_SIZE_MB, DEFAULT_HEAP_SIZE_MB), | |
| MAX_MIB | |
| ) * | |
| 1024 * | |
| 1024; | |
| /** Guest scratch size in bytes. Override with HYPERLIGHT_SCRATCH_SIZE_MB (megabytes). | |
| * Maps to setScratchSize() on the SandboxBuilder API. */ | |
| const SCRATCH_SIZE_BYTES = | |
| Math.min( | |
| parsePositiveInt(process.env.HYPERLIGHT_SCRATCH_SIZE_MB, DEFAULT_SCRATCH_SIZE_MB), | |
| MAX_MIB | |
| ) * | |
| 1024 * | |
| 1024; |
Add an MCP (Model Context Protocol) server that exposes an
execute_javascripttool, allowing AI agents to run arbitrary JavaScript inside an isolated Hyperlight micro-VM sandbox with strict CPU time limits and automatic snapshot/restore recovery after timeouts.Includes server implementation, demo scripts (PowerShell and Bash), vitest test suite, and documentation.