Skip to content

feat: implements base optimization logic in the optimization sdk#116

Open
andrewklatzke wants to merge 6 commits intomainfrom
aklatzke/AIC-1990/optimize-method
Open

feat: implements base optimization logic in the optimization sdk#116
andrewklatzke wants to merge 6 commits intomainfrom
aklatzke/AIC-1990/optimize-method

Conversation

@andrewklatzke
Copy link
Copy Markdown
Contributor

@andrewklatzke andrewklatzke commented Mar 31, 2026

Requirements

  • I have added test coverage for new or changed functionality
  • I have followed the repository's pull request submission guidelines
  • I have validated my changes against all supported platform versions

Related issues

This specifically only affects the currently empty and 0.0.0 version of the optimization package.

Pulls in the optimize method from the moonshot branch, updates it to be more production-ready, adds tests, logically splits up code into more manageable chunks.

Describe the solution you've provided

This is the initial implementation of the optimization method that we're pulling into this SDK. Right now this services the same surface area as the moonshot - namely, it implements the optimize_from_options() method while leaving the optimize_from_config() method unimplemented. This also does not handle additional features we'll be adding, such as the ability to compare to ground-truth responses or the ability to post back to LaunchDarkly.

The logs are set to debug level, so enabling them will allow you to trace along with the progress.

Using the manual/passing options implementation of this looks like:

    tool_handlers = {
        "user_preferences_lookup": get_preferences,
    }

    def resolve_tools(
        config: AIAgentConfig,
        provided_tool_handlers: dict[str, Callable[[Any], Any]], 
    ) -> Sequence[Tool]:
        tools: list[Tool] = []
        # ... user implementation detail ...
        return tools

    async def handle_agent_call(
        key: str,
        config: AIAgentConfig,
        context: OptimizationContext,
        provided_tool_handlers: dict[str, Callable[[Any], Any]], # tools we provide, such as the required structured output tool, or the tool to validate a newly created variation generation
    ) -> str:
        model = config.model.get_parameter("name") if config.model else None
        root = Agent(
            name=key,
            instructions=config.instructions,
            handoffs=[],
            tools=resolve_tools(config, provided_tool_handlers),
            model=model,
        )
        response = await Runner.run(root, context.user_input or "")
        return response.final_output

    async def handle_judge_call( # separate method, as this can also be run as a completion by accessing `config.messages` and doesn't need to be distinctly agent. A user may also want to capture or log intermediary data here.
        judge_key: str,
        config: AIAgentConfig,
        context: OptimizationJudgeContext,
        provided_tool_handlers: dict[str, Callable[[Any], Any]],
    ) -> str:
        model = config.model.get_parameter("name") if config.model else None
        root = Agent(
            name=judge_key,
            instructions=config.instructions,
            handoffs=[],
            tools=resolve_tools(config, provided_tool_handlers),
            model=model,
        )
        response = await Runner.run(root, context.user_input or "")
        return response.final_output
        
     # Everything below this is the actual

    options = OptimizationOptions(
        judges={
            "acceptance": OptimizationJudge(
                acceptance_statement="The orchestrator should appropriately fetch the user preferences and route to the correct sub-agent, carrying through any relevant information from the users' query. The orchestrator should not provide any answers itself, just pass to the correct sub-agent. Inability to fetch user preferences or mentions of missing data should be automatic failures. If preferences are not included, that should be an automatic failure.",
                threshold=0.95,
            ),
        },
        context_choices=[
            context_builder("user-123"),
        ],
        max_attempts=5,
        model_choices=["gpt-5", "gpt-5.1", "gpt-5.4", "gpt-5.4-mini"],
        judge_model="gpt-5.4-mini",
        variable_choices=[
            {
                "user_id": "user-123",
                "trip_purpose": "business",
            },
            {
                "user_id": "user-125",
                "trip_purpose": "personal",
            },
        ],
        user_input_options=[
            "I'm going to austin next week, where should I stay?"
        ],
        handle_agent_call=handle_agent_call,
        handle_judge_call=handle_judge_call,
    )

    client = OptimizationClient(ld_ai_client) 
    result = await client.optimize_from_options("travel-agent-orchestrator", options) # distinct step so that optimization options can be re-used

Note

Medium Risk
Introduces a large amount of new orchestration logic around LLM prompting, structured JSON parsing, and iterative control flow; errors or edge cases could cause incorrect scoring loops or brittle parsing despite added tests.

Overview
Implements a real OptimizationClient to iteratively optimize an agent via optimize_from_options, executing agent turns, scoring responses with either LD-configured judges (judge_key) or inline acceptance-statement judges, and generating new configuration variations until thresholds pass or max_attempts is reached.

Adds first-class types (OptimizationOptions, OptimizationContext, OptimizationJudge, AIJudgeCallConfig, ToolDefinition) plus utilities for structured tool-based outputs (evaluation + variation tools), JSON extraction from LLM responses, and runtime interpolation of {{variables}} while preserving raw instruction templates.

Re-exports the new API from __init__.py, updates smoke tests accordingly, and adds extensive unit tests covering judge evaluation, variation prompting/application, and full optimization loop behavior.

Written by Cursor Bugbot for commit f8e5509. This will update automatically on new commits. Configure here.

@andrewklatzke andrewklatzke requested a review from a team as a code owner March 31, 2026 01:22
@andrewklatzke andrewklatzke changed the title Aklatzke/aic 1990/optimize method feat: implements base optimization logic in the optimization sdk Mar 31, 2026
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

start_idx = response_str.find('{', start_idx + 1)
if start_idx == -1:
break
brace_count = 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Balanced-brace scanner retry uses stale start index

Medium Severity

The balanced-brace scanning fallback in extract_json_from_response has broken retry logic. When a balanced {…} block fails json.loads, start_idx is updated to the next { after the original start (which is before i), and brace_count is reset to 0, but the for loop continues from i + 1 — already past the new start_idx. This means the scanner never re-processes characters from the updated start position. On the next time brace_count hits 0, response_str[start_idx:i + 1] spans from the stale inner start_idx to the new closing brace, producing a garbage substring that won't parse. The retry path effectively never works.

Fix in Cursor Fix in Web

},
"required": ["passed", "rationale"],
},
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused create_boolean_tool function is dead code

Low Severity

create_boolean_tool is defined but never called, imported, or exported anywhere in the codebase. It appears to be leftover scaffolding that was superseded by create_evaluation_tool, which is the tool actually used for judge evaluations.

Fix in Cursor Fix in Web

if self.judges is None and self.on_turn is None:
raise ValueError("Either judges or on_turn must be provided")
if self.judge_model is None:
raise ValueError("judge_model must be provided")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation for empty variable_choices list

Medium Severity

__post_init__ validates that context_choices and model_choices each have at least one element, but no equivalent check exists for variable_choices. An empty list passes validation but causes random.choice() to raise an IndexError at runtime in _run_optimization and _create_optimization_context.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant