HUD Documentation — Evaluations and RL Environments.

HUD environments work with any agent framework. The Environment class provides format converters for all major providers, and hud.eval() handles setup, evaluation, and tracing automatically. Every example on this page uses the eval defined below and the HUD gateway for inference.

The Example Environment

import hud

CEOS = {"hud": "Jay Ram", "openai": "Sam Altman", "anthropic": "Dario Amodei"}

env = hud.Environment("trivia")

@env.tool()
def lookup_ceo(company: str) -> str:
    """Look up the CEO of a company."""
    return CEOS.get(company.lower(), "Unknown")

@env.scenario("initials")
async def find_initials(company: str):
    answer = yield f"What are the initials of the CEO of {company}?"
    ceo = CEOS.get(company.lower())
    correct = "".join(word[0] for word in ceo.split()) if ceo else None
    yield 1.0 if answer and correct and correct in answer.upper() else 0.0

task = env("initials", company="HUD")

OpenAI

The OpenAI SDK supports three APIs: Chat Completions, Responses, and the Agents SDK.

Chat Completions

import os
from openai import AsyncOpenAI
import hud

client = AsyncOpenAI(
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

async with hud.eval(eval) as ctx:
    messages = [{"role": "user", "content": ctx.prompt}]
    
    while True:
        response = await client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=ctx.as_openai_chat_tools()
        )
        
        msg = response.choices[0].message
        messages.append(msg)
        
        if not msg.tool_calls:
            break
            
        for tool_call in msg.tool_calls:
            result = await ctx.call_tool(tool_call)
            messages.append(result)
    
    await ctx.submit(msg.content or "")

Chat Completions (Single-Call Runner)

If you want HUD to handle the chat tool loop for a scenario task, use hud.run_scenario_chat(...):

import os
from openai import AsyncOpenAI
import hud

env = hud.Environment("trivia")
task = env("initials", company="HUD")

client = AsyncOpenAI(
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

result = await hud.run_scenario_chat(
    client=client,
    model="gpt-4o",
    task=task,
    api="chat_completions",  # or "responses" / "auto"
)

print(result.answer)
print(result.reward)
print(result.trace_id)

Interactive Scenario Chat (Turn-by-Turn)

Use hud.run_scenario_chat_interactive(...) when you want to send multiple user turns before final evaluation:

import os
from openai import AsyncOpenAI
import hud

env = hud.Environment("trivia")

client = AsyncOpenAI(
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

async with hud.run_scenario_chat_interactive(
    client=client,
    model="gpt-4o",
    env=env,
    scenario="initials",
    args={"company": "HUD"},
) as chat:
    first = await chat.send("Start with your initial investigation.")
    follow_up = await chat.send("Now provide a concise final answer.")
    result = await chat.finish()  # submits + evaluates

print(first.answer)
print(follow_up.answer)
print(result.reward)
print(result.trace_id)

Responses API

async with hud.eval(eval) as ctx:
    response = await client.responses.create(
        model="gpt-4o",
        input=ctx.prompt,
        tools=ctx.as_openai_responses_tools()
    )
    
    for item in response.output:
        if item.type == "function_call":
            await ctx.call_tool(item)
    
    await ctx.submit(response.output_text)

Agents SDK

from agents import Agent, Runner
import hud

async with hud.eval(eval) as ctx:
    agent = Agent(
        name="trivia-agent",
        instructions="Answer trivia questions. Use tools to look up information.",
        tools=ctx.as_openai_agent_tools()
    )
    
    result = await Runner.run(agent, ctx.prompt)
    await ctx.submit(result.final_output)

Requires: pip install openai-agents

Serve Scenarios as an HTTP Endpoint

If you want external agents to run your scenarios without the HUD SDK, use env.serve_as_agent(). It starts a local OpenAI-compatible server — any OpenAI client in any language can connect.

Server (`04_scenario_server.py`)

import os
import hud
from openai import AsyncOpenAI

env = hud.Environment(os.environ["HUD_ENV_NAME"])
env.connect_hub(os.environ["HUD_ENV_NAME"])

env.serve_as_agent(
    client=AsyncOpenAI(
        base_url="https://inference.hud.ai",
        api_key=os.environ["HUD_API_KEY"],
    ),
    model="gpt-4o",
    port=8321,
)

The server exposes:

Endpoint	Purpose
`GET /scenarios`	List available scenarios and their required args
`GET /v1/lifecycle-tools`	List scenario lifecycle tool schemas
`POST /v1/lifecycle-tools/call`	Call lifecycle tools (`scenario_list/start/send/finish`)
`POST /v1/chat/completions`	Start or continue a session
`POST /v1/sessions/{id}/finish`	Submit and evaluate
`GET /v1/sessions`	List active sessions
`GET /mcp/tools`	MCP-native lifecycle tool list
`POST /mcp/tools/call`	MCP-native lifecycle tool execution

Client (`05_scenario_client.py`)

No HUD SDK needed. Use any standard OpenAI client:

import httpx
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="not-needed")

# 1. Discover scenarios
scenarios = httpx.get("http://localhost:8321/scenarios").json()["scenarios"]
selected = scenarios[0]

# 2. First turn — pass scenario name and args in the request body
#    (both fields are required for session bootstrap)
first = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Begin."}],
    extra_body={
        "scenario": selected["short_name"],
        "scenario_args": {"arg": "value"},
    },
)
session_id = first.hud["session_id"]  # returned in every response

# 3. Follow-up turns — pass session ID in the header
follow_up = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What are the root causes?"}],
    extra_headers={"X-HUD-Session-Id": session_id},
)

#    You can also pass `thread_id` / `conversation_id` in `extra_body`.
follow_up_alt = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Any remaining risks?"}],
    extra_body={"thread_id": session_id},
)

# 4. Finish — submits the answer and returns reward + trace URL
result = httpx.post(f"http://localhost:8321/v1/sessions/{session_id}/finish").json()
print(result["reward"], result["trace_url"])

Streaming works the same way — just pass stream=True. The server sends standard SSE chunks, with a final chunk carrying hud.session_id and hud.trace_url.

Lifecycle Tools (Agent-native Helpers)

If your orchestrator prefers explicit lifecycle calls, use:

GET /v1/lifecycle-tools + POST /v1/lifecycle-tools/call
or the MCP-native aliases: GET /mcp/tools + POST /mcp/tools/call

Available tool names:

scenario_list
scenario_start (requires scenario + scenario_args)
scenario_send
scenario_finish

Requires: pip install hud-python[server] (installs fastapi and uvicorn)

Anthropic

Claude’s Messages API with tool use.

import os
from anthropic import AsyncAnthropic
import hud

client = AsyncAnthropic(
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

async with hud.eval(eval) as ctx:
    messages = [{"role": "user", "content": ctx.prompt}]
    
    while True:
        response = await client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=messages,
            tools=ctx.as_claude_tools()
        )
        
        tool_uses = [b for b in response.content if b.type == "tool_use"]
        if not tool_uses:
            break
        
        tool_results = [await ctx.call_tool(block) for block in tool_uses]
        
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})
    
    text = next((b.text for b in response.content if b.type == "text"), "")
    await ctx.submit(text)

Requires: pip install anthropic

Gemini

Google’s Gemini API with function calling.

import os
import google.generativeai as genai
import hud

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-2.0-flash")

async with hud.eval(eval) as ctx:
    chat = model.start_chat()
    
    response = chat.send_message(
        ctx.prompt,
        tools=ctx.as_gemini_tools(),
        tool_config=ctx.as_gemini_tool_config()
    )
    
    while True:
        part = response.candidates[0].content.parts[0]
        if not hasattr(part, "function_call") or not part.function_call:
            break
        
        result = await ctx.call_tool(part)
        response = chat.send_message(result)
    
    await ctx.submit(response.text)

Requires: pip install google-generativeai

browser-use

Browser automation for web agents.

import os
from browser_use import Agent
from langchain_openai import ChatOpenAI
import hud

llm = ChatOpenAI(
    model="gpt-4o",
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

async with hud.eval(eval) as ctx:
    agent = Agent(task=ctx.prompt, llm=llm)
    result = await agent.run()
    await ctx.submit(str(result))

Requires: pip install browser-use playwright && playwright install

LangChain

LangChain’s agent framework with tool calling.

import os
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
import hud

llm = ChatOpenAI(
    model="gpt-4o",
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

async with hud.eval(eval) as ctx:
    tools = ctx.as_langchain_tools()
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant."),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ])
    
    agent = create_tool_calling_agent(llm, tools, prompt)
    executor = AgentExecutor(agent=agent, tools=tools)
    
    result = await executor.ainvoke({"input": ctx.prompt})
    await ctx.submit(result["output"])

Requires: pip install langchain langchain-openai langchain-core

LlamaIndex

LlamaIndex’s ReAct agent with tool integration.

import os
from llama_index.llms.openai import OpenAI
from llama_index.core.agent import ReActAgent
import hud

llm = OpenAI(
    model="gpt-4o",
    api_base="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

async with hud.eval(eval) as ctx:
    tools = ctx.as_llamaindex_tools()
    
    agent = ReActAgent.from_tools(tools, llm=llm, verbose=True)
    response = await agent.achat(ctx.prompt)
    
    await ctx.submit(str(response))

Requires: pip install llama-index-core llama-index-llms-openai

Google ADK

Google’s Agent Development Kit for Gemini-powered agents.

import os
from google.adk.agents import Agent
from google.adk.runners import Runner
import hud

async with hud.eval(eval) as ctx:
    agent = Agent(
        name="trivia-agent",
        model="gemini-2.0-flash",
        instruction="Answer trivia questions. Use tools to look up information.",
        tools=ctx.as_adk_tools()
    )
    
    runner = Runner(agent=agent)
    result = await runner.run(ctx.prompt)
    
    await ctx.submit(result.output)

Requires: pip install google-adk

CrewAI

Multi-agent orchestration with roles and tasks.

import os
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
import hud

llm = ChatOpenAI(
    model="gpt-4o",
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

async with hud.eval(eval) as ctx:
    tools = ctx.as_langchain_tools()
    
    researcher = Agent(
        role="Researcher",
        goal="Find accurate information",
        backstory="Expert at finding information",
        tools=tools,
        llm=llm
    )
    
    task = Task(
        description=ctx.prompt,
        expected_output="The initials of the CEO",
        agent=researcher
    )
    
    crew = Crew(agents=[researcher], tasks=[task])
    result = crew.kickoff()
    await ctx.submit(str(result))

Requires: pip install crewai langchain-openai

AutoGen

Microsoft’s multi-agent conversation framework.

import os
from autogen import AssistantAgent, UserProxyAgent
import hud

async with hud.eval(eval) as ctx:
    config_list = [{
        "model": "gpt-4o",
        "base_url": "https://inference.hud.ai",
        "api_key": os.environ["HUD_API_KEY"]
    }]
    
    assistant = AssistantAgent(
        name="assistant",
        llm_config={"config_list": config_list}
    )
    
    for tool in ctx.as_tools():
        @assistant.register_for_execution()
        async def tool_fn(name=tool.name, **kwargs):
            return await ctx.call_tool(name, **kwargs)
    
    user = UserProxyAgent(
        name="user",
        human_input_mode="NEVER",
        code_execution_config=False
    )
    
    result = await user.a_initiate_chat(assistant, message=ctx.prompt)
    await ctx.submit(result.summary)

Requires: pip install pyautogen

Format Reference

Method	Returns	Use With
`as_openai_chat_tools()`	OpenAI Chat format	OpenAI Chat Completions
`as_openai_responses_tools()`	OpenAI Responses format	OpenAI Responses API
`as_openai_agent_tools()`	FunctionTool objects	OpenAI Agents SDK
`as_claude_tools()`	Anthropic format	Claude API
`as_gemini_tools()`	Gemini format	Google AI
`as_adk_tools()`	ADK FunctionTool objects	Google ADK
`as_langchain_tools()`	StructuredTool objects	LangChain, CrewAI
`as_llamaindex_tools()`	FunctionTool objects	LlamaIndex
`as_tools()`	MCP Tool objects	Raw MCP, AutoGen

All call_tool() calls auto-detect the input format and return matching output format.

Bring Your Own

Don’t see your framework? The pattern is simple:

Get tools in your framework’s format (or use as_tools() for raw MCP)
Run your agent loop
Call ctx.call_tool() for each tool invocation
Call ctx.submit() with the final answer

async with hud.eval(eval) as ctx:
    tools = ctx.as_tools()  # Raw MCP format
    
    result = await my_custom_agent(ctx.prompt, tools, ctx.call_tool)
    
    await ctx.submit(result)

The environment handles setup, evaluation, and tracing. You handle the agent logic.

Documentation Index

​The Example Environment

​OpenAI

​Chat Completions

​Chat Completions (Single-Call Runner)

​Interactive Scenario Chat (Turn-by-Turn)

​Responses API

​Agents SDK

​Serve Scenarios as an HTTP Endpoint

​Server (04_scenario_server.py)

​Client (05_scenario_client.py)

​Lifecycle Tools (Agent-native Helpers)

​Anthropic

​Gemini

​browser-use

​LangChain

​LlamaIndex

​Google ADK

​CrewAI

​AutoGen

​Format Reference

​Bring Your Own